Vibe coding is shorthand for producing software by handing natural-language prompts to an AI model and letting it generate the code. Andrej Karpathy popularized the term to describe a posture more than a tool: the developer states intent, the model writes code, and review replaces line-by-line authoring. Replit, TechTarget, and several academic commentators have since adopted the label to describe how small teams are now shipping software. For a deeper introduction to vibe coding and how it is shaping the field, see the resource from https://blog.replit.com/what-is-vibe-coding.
The clearest effect is on who participates. When syntax stops being the bottleneck, the conversation moves earlier in the cycle, toward requirements, edge cases, and product decisions. Karen Brennan, of the Harvard Graduate School of Education, has argued that this widens the pool of meaningful contributors, because the AI absorbs the translation step that previously gated participation to formally trained developers (Harvard Gazette, 2025).
Output volume is the other measurable effect. Garry Tan, CEO of Y Combinator, reported in 2025 that some YC companies are now generating between one and ten million dollars in annual revenue with engineering teams in the single digits, a ratio that was rare in YC cohorts a decade earlier. TechTarget describes the same pattern from the enterprise side: shorter feedback loops, cheaper prototypes, lower cost to walk away from a direction that is not working.
The access question is where the debate gets interesting. Vibe coding lowers the entry threshold for people outside traditional engineering, including domain experts, designers, and founders without a technical co-founder, and the Harvard Gazette piece frames this as democratization. The counter-argument, also raised in the TechTarget coverage, is that the speed/quality trade-off does not vanish; it migrates. Code that compiles on the first prompt still has to be read, tested, and maintained, and prompt-generated patches tend to grow the surface area of a codebase faster than human review can keep up.
Open-source maintainers have raised a connected concern. If most code is generated rather than written, the apprenticeship path (read other people’s code, file an issue, send a small PR) narrows. There is no obvious replacement for it yet.
Vibe coding is a label for habits that are still settling, not a methodology in any formal sense. The harder questions sit downstream: specification, testing, and the judgment call about when a passing build is actually correct.
What makes the above still read as AI-generated?
- “Vibe coding is a label for habits that are still settling, not a methodology in any formal sense” — textbook negative parallelism.
- “the speed/quality trade-off does not vanish; it migrates” — slogan-shaped, the kind of clever-sounding pivot LLMs love.
- “The clearest effect…”, “The other measurable effect…”, “The access question is where the debate gets interesting” — the paragraphs are mechanically sequenced, each opening with a topic-sentence stamp.
- Tight rule-of-three enumerations (“requirements, edge cases, and product decisions”; “shorter feedback loops, cheaper prototypes, lower cost…”) repeat across paragraphs.
- The closing — “The harder questions sit downstream: specification, testing, and the judgment call…” — has the tidy “the real challenge is X” cadence of a model summing up.
Vibe Coding Is Reshaping Real-Time Collaboration in Software Development: An Evidence-Based Assessment
In February 2025, AI researcher Andrej Karpathy published a post describing a programming approach he called “vibe coding” — a practice in which the developer communicates intent to a large language model in natural language, watches the AI generate code, and accepts suggestions without deeply reviewing or understanding every line. The developer vibes with the model, as Karpathy put it, rather than writing the code themselves. This description crystallised something that had been forming in practice for some years as AI programming assistants became more capable: a fundamentally different relationship between programmer and codebase, in which generation replaces composition and intent steering replaces line-by-line authorship.
The term is new, but the underlying interaction patterns have been documented empirically since GitHub Copilot’s public release in 2021 and the controlled studies that followed. A body of peer-reviewed research in software engineering, human-computer interaction, and cybersecurity has been mapping these patterns, measuring productivity effects, and cataloguing the risks that come with reduced human scrutiny of AI-generated code. This article reviews that evidence and connects it to the specific dynamics that vibe coding introduces into team-level software development, where code that no individual author fully understands must nevertheless be collectively maintained, reviewed, and extended.
From Pair Programming to AI Pairing: A Structural Shift
Real-time collaboration in software development has its most studied precedent in pair programming, the practice in which two developers share a single workstation, one writing code while the other reviews and navigates. Hannay, Dybå, Arisholm, and Sjøberg (2009) conducted a meta-analysis of experiments on pair programming, spanning 18 manuscripts and 28 independent effect sizes. Their findings describe a nuanced picture: pair programming produces a small positive effect on code quality, a medium positive effect on completion time (it is faster than solo programming), but a medium negative effect on overall effort, meaning two developers working together expend more total person-hours than two developers working independently. The quality benefit is most pronounced for complex tasks; the speed benefit concentrates on simpler, well-defined tasks. Between-study variance is high, and there are signs of publication bias toward positive results.
GitHub Copilot was explicitly marketed as an “AI pair programmer” from its launch, and the language was not accidental. The navigator role in pair programming, traditionally occupied by the second human developer who reviews, queries, and redirects the driver, is roughly what an AI assistant performs when it monitors code context and offers inline completions. This structural parallel made the transition from human-human pair programming to human-AI pairing feel incremental, which is part of why its adoption was rapid and often uncritical. What vibe coding represents is the further extension of this arrangement: the developer steps back from the driver role as well, functioning instead as a prompt engineer who specifies objectives and accepts or rejects the AI’s output in bulk rather than line by line.
The implications for real-time collaboration are not trivial. In traditional pair programming, both participants understand the code being written, because the authorship process is transparent and shared. In vibe coding, the developer-as-prompt-engineer may not understand every function the AI generates; the system produces coherent-looking code at a speed that outpaces comprehension. When that code enters a shared repository and is subject to review by teammates, the usual baseline assumption of collaborative software development, that the author can explain the code they submitted, may no longer hold.
Bimodal Interaction and the Vibe Coding Paradigm
The most detailed observational study of how developers actually interact with AI programming assistants found evidence for exactly the interaction pattern that vibe coding formalises. Barke, James, and Polikarpova (2023), in a grounded theory analysis of twenty participants solving programming tasks with Copilot, identified two distinct modes of interaction. In “acceleration mode,” the programmer knows what they want to write and uses the AI to write it faster. In “exploration mode,” the programmer is uncertain how to proceed and uses the AI to explore options, accepting and modifying suggestions without a fully formed prior plan.
The vibe coding paradigm is exploration mode taken to its structural extreme. Karpathy’s description maps almost exactly onto the exploration-mode profile in Barke et al. (2023): the developer has a high-level intent, prompts for code without knowing exactly what will be generated, and proceeds by accepting outputs that seem to work rather than by constructing each component from first principles. Barke et al. (2023) found that participants reached acceleration mode specifically when they could decompose a task into clearly defined micro-tasks with well-understood specifications. Vibe coding largely removes that decomposition step, offloading it to the AI, which means the developer maintains less of the structural map of what has been built.
In a single-developer context, this is a workflow choice with knowable tradeoffs. In a real-time collaborative setting, it creates an information asymmetry problem. When two developers are working simultaneously on a shared codebase, both in exploration mode with AI assistance, the shared understanding that normally emerges from collaboration, through discussion of approach, rejection of bad ideas, and negotiation of structure, may never form. Code is written and merged, but the mental model of the system that normally distributes across a team does not. This is a distinct problem from individually writing unclear code; it is a collective knowledge fragmentation that emerges specifically from the vibe coding workflow.
Productivity Measurement: What the Evidence Actually Shows
The productivity case for AI-assisted coding and, by extension, vibe coding is typically presented with unreserved optimism. The empirical record is more nuanced. Ziegler, Kalliamvakou, Li, Rice, Rifkin, Simister, Sittampalam, and Aftandilian (2024), in a study published in Communications of the ACM, examined whether developer perceptions of productivity gains from GitHub Copilot were reflected in objective usage patterns. Their case study methodology compared self-reported productivity against measurable indicators including acceptance rates, persistence of accepted suggestions, and code completion behaviour.
The key finding was that completion times were reduced in many settings, in some cases by more than half. But the study also found that the relationship between “feeling productive” and producing more durable, higher-quality output was complex. Suggestions that are accepted but subsequently heavily edited represent productivity gains in the speed of typing but not necessarily in the quality of thinking. The authors explicitly note that their measures of productivity go further than mere acceptance rates, because a suggestion accepted but immediately rewritten tells a different story than one accepted and persisted as-is.
Vaithilingam, Zhang, and Glassman (2022), in a controlled usability study of 24 participants, found that Copilot did not consistently improve task completion rates or times, despite strong participant preference for using it. Participants reported that Copilot often provided a useful starting point and reduced the need to search documentation, but they experienced significant difficulties in understanding, editing, and debugging generated code snippets, which materially undermined task-solving effectiveness. This is the productivity paradox of vibe coding: the generation phase is fast, the debugging phase is expensive, and when neither the developer nor their teammates fully understand the generated code, the debugging phase can consume more time than was saved.
In collaborative workflows, this asymmetry has downstream costs that individual productivity metrics miss. When a developer submits AI-generated code they do not fully understand to a pull request, reviewers face a different burden than when they review human-authored code. The reviewer must assess correctness, security, and architecture for code whose rationale they cannot query from the author, because the author also does not know it. The entire efficiency gain from faster generation is potentially cancelled by the additional burden at review, and in teams with strong code review cultures, this friction is a real organisational cost.
Security Risks in Low-Scrutiny Code Generation
The defining feature of vibe coding, reduced scrutiny of generated code, interacts badly with a well-documented property of AI coding assistants: they produce insecure code at a non-trivial rate. Pearce, Ahmad, Tan, Dolan-Gavitt, and Karri (2022), in a systematic evaluation published at the IEEE Symposium on Security and Privacy, examined the conditions under which GitHub Copilot recommends code containing security vulnerabilities. Working across a range of Common Weakness Enumeration (CWE) categories, prompt formulations, and programming domains, they found that approximately 40% of the code scenarios tested produced at least one security-relevant vulnerability. The vulnerabilities spanned injection flaws, insecure deserialization, weak cryptographic implementations, and buffer handling errors.
The mechanism Pearce et al. (2022) identify is structural: Copilot is trained on open-source code, which is heterogeneous in quality and includes code with exploitable patterns. When the model generates a suggestion, it draws on the statistical distribution of training examples, not on a principled understanding of security. This means that in domains or patterns where insecure implementations are common in the training corpus, the model will tend to reproduce them. The probability of insecure output is not random noise; it is correlated with context.
In vibe coding sessions, the characteristic behaviour of accepting suggestions without detailed review means that this insecure code has a higher probability of reaching the codebase than it would if every suggestion were scrutinised against security criteria. In a collaborative context, this is compounded: code that enters the shared repository from one developer’s vibe coding session is now in the codebase that other developers extend, build on, and deploy. The security debt accrues collectively, and the distributed nature of modern software development, with multiple developers making simultaneous AI-assisted commits, makes systematic vulnerability introduction a realistic concern rather than a theoretical one.
Usability Challenges and the Skill Development Question
Liang, Yang, and Myers (2024), in a survey of 410 developers at the International Conference on Software Engineering, found that the primary motivations for using AI programming assistants were keystroke reduction, faster task completion, and syntax recall. These are instrumentally appropriate motivations. What developers reported as less resonant was using AI to brainstorm solutions, explore design alternatives, or understand unfamiliar code. The tools were most valued for execution, not for the cognitive work that precedes it.
This profile matches the vibe coding mode closely, and it raises a substantive question about skill development that the literature has not yet resolved empirically. If developers consistently use AI assistants for the generative phases of programming while reserving human effort for high-level specification and review, the practitioner skills that were previously built through the act of composition, debugging, choosing algorithms, handling edge cases, understanding language semantics, may be exercised less frequently. There is no peer-reviewed longitudinal evidence yet on whether this constitutes a meaningful atrophy effect, but the mechanism is plausible and the concern is not hypothetical.
Barke et al. (2023) found that novice users of Copilot showed different interaction patterns than experienced developers: less effective at evaluating AI output, more likely to accept suggestions that worked on the happy path but failed in edge cases, and less able to decompose tasks into the micro-tasks that enable acceleration mode. If novice developers enter the profession primarily through vibe coding workflows, their trajectory of skill development may diverge from that of developers who learned the craft through manual composition. This has implications not just for individual capability but for the collective cognitive capacity of software teams.
Real-Time Collaboration: New Patterns and Their Pressures
The cumulative effects documented above, bimodal interaction patterns, productivity asymmetry between generation and debugging, security risk concentration, and potential skill distribution changes, converge into a set of pressures on real-time collaborative development that practitioners are navigating before researchers have fully characterised them.
Code review is the collaborative institution most directly affected. Code review exists to distribute understanding, catch bugs, and align architecture across a team. When submitted code was written by the submitting developer, reviewers can ask the author to explain unclear sections. When code was generated by an AI in vibe coding mode, the author may not be able to explain it, which either defeats the purpose of the review conversation or shifts the reviewer’s burden from evaluation to original analysis. Teams that maintain high review standards may find that AI-generated code takes longer to review than human-written code of equivalent length, because the reviewers are absorbing information that should have been borne by the author.
Pair programming itself, the most studied form of real-time collaboration (Hannay et al., 2009), is changing under vibe coding conditions. The driver-navigator model breaks down when the driver’s primary activity is prompt formulation rather than code typing: the navigator’s traditional role of watching and querying the code-as-it-is-written has less to work with. Some teams are experimenting with a modified model in which one developer operates the AI interface while the other evaluates outputs against requirements, effectively reinserting the quality-assurance function that vibe coding removes. This is not a formalised practice yet in the literature, but the Barke et al. (2023) framework suggests it would concentrate both parties in the less-well-understood exploration mode, with all the associated uncertainty about whether the outputs are correct.
Asynchronous collaboration tools built around pull requests and continuous integration also require rethinking. Standard automated testing catches functional regressions but not the security categories that Pearce et al. (2022) document. Static analysis tools can flag some vulnerability patterns, but they operate on code after generation; the vibe coding approach of accepting first and reviewing later means the code must be actively removed if problems are found, rather than rejected at suggestion time. The friction of after-the-fact removal is systematically higher than at-suggestion rejection, which suggests that teams doing vibe-coded development need more aggressive automated security scanning at the pre-merge stage than teams using traditional authoring practices.
Conclusion
Vibe coding names a practice that the empirical research on AI-assisted programming has been documenting in fragments for several years. Barke et al.’s (2023) exploration mode, Vaithilingam et al.’s (2022) observation that generated code is often accepted without full comprehension, Liang et al.’s (2024) finding that developers primarily value AI assistance for execution rather than design, and Pearce et al.’s (2022) documentation of systematic security vulnerability generation: these are all dimensions of the same practice, now named.
The productivity gains are real under the right conditions. Ziegler et al. (2024) show that faster completion is measurable and correlates with perceived productivity. Hannay et al.’s (2009) meta-analysis of pair programming suggests that human-AI pairs, like human-human pairs, should be faster on low-complexity tasks and higher-quality on complex ones, though the experimental data specific to AI pairs is still accumulating. The risks are also real: reduced scrutiny of generated code amplifies the security vulnerabilities that Pearce et al. (2022) document, and the knowledge fragmentation that vibe coding creates in team contexts has no easy remedy in current tooling.
The critical unknown is whether teams will develop the collaborative norms and automated safeguards needed to realise the productivity benefits without internalising the security and knowledge transfer costs. That question will be answered empirically over the next several years. The research base for it already exists.
Final rewrite
Vibe coding is shorthand for producing software by handing natural-language prompts to an AI model and letting it generate the code. Andrej Karpathy popularized the term to describe a posture rather than a specific tool: the developer states intent, the model writes code, and review replaces line-by-line authoring. Replit, TechTarget, and several academic commentators have since adopted the label to describe how small teams are now shipping software.
What changes first is who participates. Once syntax stops being the bottleneck, the conversation moves earlier in the cycle, toward requirements and product decisions. Karen Brennan, of the Harvard Graduate School of Education, has argued that this widens the pool of meaningful contributors, because the model absorbs the translation step that previously gated participation to formally trained developers (Harvard Gazette, 2025).
Output volume shifts in parallel. Garry Tan, CEO of Y Combinator, reported in 2025 that some YC companies are generating between one and ten million dollars in annual revenue with engineering teams in the single digits, a ratio that was rare in YC cohorts a decade earlier. TechTarget has documented the same pattern in larger organizations: feedback loops shorten, prototypes get cheaper, and the cost of abandoning a direction drops.
Access is the contested part. The Harvard Gazette piece frames vibe coding as democratization, because the entry threshold falls for domain experts, designers, and founders who never had a technical co-founder. TechTarget’s counter is that the speed/quality trade-off has not disappeared; it has moved. Code that compiles on the first prompt still has to be read, tested, and maintained, and prompt-generated patches tend to expand the surface area of a codebase faster than reviewers can keep up.
Open-source maintainers raise a connected worry. The traditional apprenticeship path — read other people’s code, file an issue, send a small PR — narrows when most of the code is being generated rather than authored. Nobody has proposed a credible replacement.
The label is younger than the practice. Most of what is currently called vibe coding is a particular way of working with a model, and the open questions are about everything the model cannot do reliably yet: specifying what is wanted, writing tests that actually fail in the right places, and recognizing when a passing build is still incorrect.
References
Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7(OOPSLA1), Article 78. https://doi.org/10.1145/3586030
Hannay, J. E., Dybå, T., Arisholm, E., & Sjøberg, D. I. K. (2009). The effectiveness of pair programming: A meta-analysis. Information and Software Technology, 51(7), 1110–1122. https://doi.org/10.1016/j.infsof.2009.02.001
Liang, J. T., Yang, C., & Myers, B. A. (2024). A large-scale survey on the usability of AI programming assistants: Successes and challenges. ICSE 2024: 46th IEEE/ACM International Conference on Software Engineering, 616–628. https://doi.org/10.1145/3597503.3608128
Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2022). Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. 2022 IEEE Symposium on Security and Privacy (SP), 754–768. https://doi.org/10.1109/SP46214.2022.9833571
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. CHI Conference on Human Factors in Computing Systems Extended Abstracts, Article 332. https://doi.org/10.1145/3491101.3519665
Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., & Aftandilian, E. (2024). Measuring GitHub Copilot’s impact on productivity. Communications of the ACM, 67(3), 54–63. https://doi.org/10.1145/3633453

