Vibe coding is shorthand for producing software by handing natural-language prompts to an AI model and letting it generate the code. Andrej Karpathy popularized the term to describe a posture more than a tool: the developer states intent, the model writes code, and review replaces line-by-line authoring. Replit, TechTarget, and several academic commentators have since adopted the label to describe how small teams now ship software. For a fuller introduction to vibe coding and how it is shaping the field, see the resource fromA https://blog.replit.com/what-is-vibe-coding.

What changes first is who participates. Once syntax stops being the bottleneck, the conversation moves earlier in the cycle, toward requirements, edge cases, and product decisions. Karen Brennan, of the Harvard Graduate School of Education, has argued that this widens the pool of meaningful contributors, because the AI absorbs the translation step that previously gated participation to formally trained developers (Harvard Gazette, 2025).

Output volume shifts in parallel. Garry Tan, CEO of Y Combinator, reported in 2025 that some YC companies are generating between one and ten million dollars in annual revenue with engineering teams in the single digits, a ratio that was rare in YC cohorts a decade earlier. TechTarget describes the same pattern from the enterprise side: feedback loops shorten, prototypes get cheaper, and the cost of walking away from a direction that is not working drops.

Access is the contested part. Vibe coding lowers the entry threshold for people outside traditional engineering, including domain experts, designers, and founders without a technical co-founder, and the Harvard Gazette piece frames this as democratization. The counter-argument, also raised in the TechTarget coverage, is that the speed/quality trade-off has not disappeared; it has moved. Code that compiles on the first prompt still has to be read, tested, and maintained, and prompt-generated patches tend to expand the surface area of a codebase faster than human review can keep up.

Open-source maintainers have raised a connected concern. When most code is generated rather than written, the apprenticeship path (read other people’s code, file an issue, send a small PR) narrows. Nobody has proposed a credible replacement for it yet.

The label is younger than the practice it describes, so treat it as a set of habits that are still settling rather than a formal methodology. Most of the open questions are about everything the model cannot do reliably yet: specifying what is wanted, testing it, and judging when a passing build is actually correct.

What makes the above still read as AI-generated?

“Vibe coding is a label for habits that are still settling, not a methodology in any formal sense”, textbook negative parallelism.
“the speed/quality trade-off does not vanish; it migrates”, slogan-shaped, the kind of clever-sounding pivot LLMs love.
“The clearest effect…”, “The other measurable effect…”, “The access question is where the debate gets interesting”, the paragraphs are mechanically sequenced, each opening with a topic-sentence stamp.
Tight rule-of-three enumerations (“requirements, edge cases, and product decisions”; “shorter feedback loops, cheaper prototypes, lower cost…”) repeat across paragraphs.
The closing, “The harder questions sit downstream: specification, testing, and the judgment call…”, has the tidy “the real challenge is X” cadence of a model summing up.

Vibe coding is reshaping real-time collaboration in software development: an evidence-based assessment

In February 2025, AI researcher Andrej Karpathy published a post describing a programming approach he called “vibe coding,” a practice in which the developer communicates intent to a large language model in natural language, watches the AI generate code, and accepts suggestions without deeply reviewing or understanding every line. The developer vibes with the model, as Karpathy put it, rather than writing the code themselves. His description captured something that had been forming in practice for some years as AI programming assistants grew more capable: a different relationship between programmer and codebase, in which generation replaces composition and intent steering replaces line-by-line authorship.

The term is new, but the underlying interaction patterns have been documented empirically since GitHub Copilot’s public release in 2021 and the controlled studies that followed. Peer-reviewed research in software engineering, human-computer interaction, and cybersecurity has been mapping these patterns, measuring productivity effects, and cataloguing the risks that come with reduced human scrutiny of AI-generated code. This article reviews that evidence and connects it to the specific dynamics vibe coding introduces into team-level software development, where code that no individual author fully understands must still be collectively maintained, reviewed, and extended.

From pair programming to AI pairing: a structural shift

Real-time collaboration in software development has its most studied precedent in pair programming, the practice in which two developers share a single workstation, one writing code while the other reviews and navigates. Hannay, Dyba, Arisholm, and Sjoberg (2009) conducted a meta-analysis of experiments on pair programming, spanning 18 manuscripts and 28 independent effect sizes. Their findings describe a nuanced picture: pair programming produces a small positive effect on code quality, a medium positive effect on completion time (it is faster than solo programming), but a medium negative effect on overall effort, meaning two developers working together spend more total person-hours than two developers working independently. The quality benefit is most pronounced for complex tasks; the speed benefit concentrates on simpler, well-defined tasks. Between-study variance is high, and there are signs of publication bias toward positive results.

GitHub Copilot was marketed as an “AI pair programmer” from its launch, and the language was not accidental. The navigator role in pair programming, traditionally occupied by the second human developer who reviews, queries, and redirects the driver, is roughly what an AI assistant performs when it monitors code context and offers inline completions. This structural parallel made the move from human-human pair programming to human-AI pairing feel incremental, which is part of why adoption was rapid and often uncritical. Vibe coding extends the arrangement further: the developer steps back from the driver role too, working instead as a prompt engineer who specifies objectives and accepts or rejects the AI’s output in bulk rather than line by line.

The implications for real-time collaboration are not trivial. In traditional pair programming, both participants understand the code being written, because the authorship process is transparent and shared. In vibe coding, the developer-as-prompt-engineer may not understand every function the AI generates; the system produces coherent-looking code at a speed that outpaces comprehension. When that code enters a shared repository and is reviewed by teammates, the usual baseline of collaborative development, that the author can explain the code they submitted, may no longer hold.

Bimodal interaction and the vibe coding paradigm

The most detailed observational study of how developers actually interact with AI programming assistants found exactly the interaction pattern that vibe coding formalises. Barke, James, and Polikarpova (2023), in a grounded theory analysis of twenty participants solving programming tasks with Copilot, identified two distinct modes of interaction. In “acceleration mode,” the programmer knows what they want to write and uses the AI to write it faster. In “exploration mode,” the programmer is uncertain how to proceed and uses the AI to explore options, accepting and modifying suggestions without a fully formed prior plan.

The vibe coding paradigm is exploration mode taken to its structural extreme. Karpathy’s description maps almost exactly onto the exploration-mode profile in Barke et al. (2023): the developer has a high-level intent, prompts for code without knowing exactly what will be generated, and proceeds by accepting outputs that seem to work rather than by constructing each component from first principles. Barke et al. (2023) found that participants reached acceleration mode specifically when they could decompose a task into clearly defined micro-tasks with well-understood specifications. Vibe coding largely removes that decomposition step, offloading it to the AI, so the developer keeps less of the structural map of what has been built.

In a single-developer context, this is a workflow choice with knowable tradeoffs. In a real-time collaborative setting, it creates an information asymmetry problem. When two developers are working simultaneously on a shared codebase, both in exploration mode with AI assistance, the shared understanding that normally emerges from collaboration, through discussion of approach, rejection of bad ideas, and negotiation of structure, may never form. Code is written and merged, but the mental model of the system that normally distributes across a team does not. This is a different problem from one person writing unclear code; it is a collective knowledge fragmentation that emerges specifically from the vibe coding workflow.

Productivity measurement: what the evidence actually shows

The productivity case for AI-assisted coding and, by extension, vibe coding is usually presented with unreserved optimism. The empirical record is more mixed. Ziegler, Kalliamvakou, Li, Rice, Rifkin, Simister, Sittampalam, and Aftandilian (2024), in a study published in Communications of the ACM, examined whether developer perceptions of productivity gains from GitHub Copilot were reflected in objective usage patterns. Their case study methodology compared self-reported productivity against measurable indicators including acceptance rates, persistence of accepted suggestions, and code completion behaviour.

The key finding was that completion times were reduced in many settings, in some cases by more than half. But the study also found that the relationship between “feeling productive” and producing more durable, higher-quality output was complicated. Suggestions that are accepted but then heavily edited represent gains in the speed of typing, not necessarily in the quality of thinking. The authors note that their measures go beyond acceptance rates, because a suggestion accepted but immediately rewritten tells a different story than one accepted and kept as-is.

Vaithilingam, Zhang, and Glassman (2022), in a controlled usability study of 24 participants, found that Copilot did not consistently improve task completion rates or times, despite strong participant preference for using it. Participants reported that Copilot often provided a useful starting point and reduced the need to search documentation, but they struggled to understand, edit, and debug generated code snippets, which materially undermined how effectively they solved tasks. This is the productivity paradox of vibe coding: the generation phase is fast, the debugging phase is expensive, and when neither the developer nor their teammates fully understand the generated code, debugging can consume more time than was saved.

In collaborative workflows, this asymmetry has downstream costs that individual productivity metrics miss. When a developer submits AI-generated code they do not fully understand to a pull request, reviewers face a different burden than when they review human-authored code. The reviewer has to assess correctness, security, and architecture for code whose rationale they cannot query from the author, because the author does not know it either. The efficiency gain from faster generation is potentially cancelled by the added burden at review, and in teams with strong code review cultures, that friction is a real organisational cost.

Security risks in low-scrutiny code generation

The defining feature of vibe coding, reduced scrutiny of generated code, interacts badly with a well-documented property of AI coding assistants: they produce insecure code at a non-trivial rate. Pearce, Ahmad, Tan, Dolan-Gavitt, and Karri (2022), in a systematic evaluation published at the IEEE Symposium on Security and Privacy, examined the conditions under which GitHub Copilot recommends code containing security vulnerabilities. Working across a range of Common Weakness Enumeration (CWE) categories, prompt formulations, and programming domains, they found that roughly 40% of the code scenarios tested produced at least one security-relevant vulnerability. The vulnerabilities spanned injection flaws, insecure deserialization, weak cryptographic implementations, and buffer handling errors.

The mechanism Pearce et al. (2022) identify is structural: Copilot is trained on open-source code, which is heterogeneous in quality and includes code with exploitable patterns. When the model generates a suggestion, it draws on the statistical distribution of training examples, not on a principled understanding of security. So in domains or patterns where insecure implementations are common in the training corpus, the model tends to reproduce them. The probability of insecure output is not random noise; it correlates with context.

In vibe coding sessions, the habit of accepting suggestions without detailed review means insecure code has a higher chance of reaching the codebase than it would if every suggestion were checked against security criteria. In a collaborative context, this compounds: code that enters the shared repository from one developer’s vibe coding session is now in the codebase that other developers extend, build on, and deploy. The security debt accrues collectively, and the distributed nature of modern software development, with multiple developers making simultaneous AI-assisted commits, makes systematic vulnerability introduction a realistic concern rather than a theoretical one.

Usability challenges and the skill development question

Liang, Yang, and Myers (2024), in a survey of 410 developers at the International Conference on Software Engineering, found that the main motivations for using AI programming assistants were keystroke reduction, faster task completion, and syntax recall. These are instrumentally appropriate motivations. What developers found less compelling was using AI to brainstorm solutions, explore design alternatives, or understand unfamiliar code. The tools were most valued for execution, not for the cognitive work that precedes it.

This profile matches the vibe coding mode closely, and it raises a real question about skill development that the literature has not yet resolved empirically. If developers consistently use AI assistants for the generative phases of programming while reserving human effort for high-level specification and review, the practitioner skills that were previously built through composition, debugging, choosing algorithms, handling edge cases, understanding language semantics, may be exercised less often. There is no peer-reviewed longitudinal evidence yet on whether this amounts to meaningful atrophy, but the mechanism is plausible and the concern is not hypothetical.

Barke et al. (2023) found that novice users of Copilot showed different interaction patterns than experienced developers: less effective at evaluating AI output, more likely to accept suggestions that worked on the happy path but failed in edge cases, and less able to decompose tasks into the micro-tasks that enable acceleration mode. If novice developers enter the profession mainly through vibe coding workflows, their trajectory of skill development may diverge from that of developers who learned the craft through manual composition. That matters not just for individual capability but for the collective cognitive capacity of software teams.

Real-time collaboration: new patterns and their pressures

The effects documented above, bimodal interaction patterns, productivity asymmetry between generation and debugging, security risk concentration, and possible shifts in how skill is distributed, converge into a set of pressures on real-time collaborative development that practitioners are working through before researchers have fully characterised them.

Code review is the collaborative institution most directly affected. Code review exists to distribute understanding, catch bugs, and align architecture across a team. When submitted code was written by the submitting developer, reviewers can ask the author to explain unclear sections. When code was generated by an AI in vibe coding mode, the author may not be able to explain it, which either defeats the purpose of the review conversation or shifts the reviewer’s burden from evaluation to original analysis. Teams that keep high review standards may find AI-generated code takes longer to review than human-written code of equivalent length, because the reviewers are absorbing information that should have been carried by the author.

Pair programming itself, the most studied form of real-time collaboration (Hannay et al., 2009), is changing under vibe coding conditions. The driver-navigator model breaks down when the driver’s main activity is prompt formulation rather than code typing: the navigator’s traditional role of watching and querying the code as it is written has less to work with. Some teams are experimenting with a modified model in which one developer operates the AI interface while the other evaluates outputs against requirements, effectively reinserting the quality-assurance function that vibe coding removes. This is not a formalised practice in the literature yet, but the Barke et al. (2023) framework suggests it would put both parties in the less-well-understood exploration mode, with all the associated uncertainty about whether the outputs are correct.

Asynchronous collaboration tools built around pull requests and continuous integration also need rethinking. Standard automated testing catches functional regressions but not the security categories that Pearce et al. (2022) document. Static analysis tools can flag some vulnerability patterns, but they operate on code after generation; the vibe coding approach of accepting first and reviewing later means the code has to be actively removed if problems are found, rather than rejected at suggestion time. The friction of after-the-fact removal is systematically higher than rejection at the suggestion stage, which suggests that teams doing vibe-coded development need more aggressive automated security scanning before merge than teams using traditional authoring practices.

Conclusion

Vibe coding names a practice that empirical research on AI-assisted programming has been documenting in fragments for several years. Barke et al.’s (2023) exploration mode, Vaithilingam et al.’s (2022) observation that generated code is often accepted without full comprehension, Liang et al.’s (2024) finding that developers primarily value AI assistance for execution rather than design, and Pearce et al.’s (2022) documentation of systematic security vulnerability generation: these are all dimensions of the same practice, now named.

The productivity gains are real under the right conditions. Ziegler et al. (2024) show that faster completion is measurable and correlates with perceived productivity. Hannay et al.’s (2009) meta-analysis of pair programming suggests that human-AI pairs, like human-human pairs, should be faster on low-complexity tasks and higher-quality on complex ones, though the experimental data specific to AI pairs is still accumulating. The risks are also real: reduced scrutiny of generated code amplifies the security vulnerabilities that Pearce et al. (2022) document, and the knowledge fragmentation that vibe coding creates in team contexts has no easy remedy in current tooling.

Whether teams develop the collaborative norms and automated safeguards needed to capture the productivity benefits without absorbing the security and knowledge transfer costs is still open. That question will be answered empirically over the next several years, and the research base for answering it already exists.

Final rewrite

Vibe coding is shorthand for producing software by handing natural-language prompts to an AI model and letting it generate the code. Andrej Karpathy popularized the term to describe a posture rather than a specific tool: the developer states intent, the model writes code, and review replaces line-by-line authoring. Replit, TechTarget, and several academic commentators have since adopted the label to describe how small teams now ship software.

What changes first is who participates. Once syntax stops being the bottleneck, the conversation moves earlier in the cycle, toward requirements and product decisions. Karen Brennan, of the Harvard Graduate School of Education, has argued that this widens the pool of meaningful contributors, because the model absorbs the translation step that previously gated participation to formally trained developers (Harvard Gazette, 2025).

Output volume shifts in parallel. Garry Tan, CEO of Y Combinator, reported in 2025 that some YC companies are generating between one and ten million dollars in annual revenue with engineering teams in the single digits, a ratio that was rare in YC cohorts a decade earlier. TechTarget has documented the same pattern in larger organizations: feedback loops shorten, prototypes get cheaper, and the cost of abandoning a direction drops.

Access is the contested part. The Harvard Gazette piece frames vibe coding as democratization, because the entry threshold falls for domain experts, designers, and founders who never had a technical co-founder. TechTarget’s counter is that the speed/quality trade-off has not disappeared; it has moved. Code that compiles on the first prompt still has to be read, tested, and maintained, and prompt-generated patches tend to expand the surface area of a codebase faster than reviewers can keep up.

Open-source maintainers raise a connected worry. The traditional apprenticeship path (read other people’s code, file an issue, send a small PR) narrows when most of the code is being generated rather than authored. Nobody has proposed a credible replacement.

The label is younger than the practice. Most of what is currently called vibe coding is a particular way of working with a model, and the open questions are about everything the model cannot do reliably yet: specifying what is wanted, writing tests that actually fail in the right places, and recognizing when a passing build is still incorrect.

References

Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7(OOPSLA1), Article 78. https://doi.org/10.1145/3586030

Hannay, J. E., Dyba, T., Arisholm, E., & Sjoberg, D. I. K. (2009). The effectiveness of pair programming: A meta-analysis. Information and Software Technology, 51(7), 1110-1122. https://doi.org/10.1016/j.infsof.2009.02.001

Liang, J. T., Yang, C., & Myers, B. A. (2024). A large-scale survey on the usability of AI programming assistants: Successes and challenges. ICSE 2024: 46th IEEE/ACM International Conference on Software Engineering, 616-628. https://doi.org/10.1145/3597503.3608128

Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. (2022). Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. 2022 IEEE Symposium on Security and Privacy (SP), 754-768. https://doi.org/10.1109/SP46214.2022.9833571

Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. CHI Conference on Human Factors in Computing Systems Extended Abstracts, Article 332. https://doi.org/10.1145/3491101.3519665

Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., Sittampalam, G., & Aftandilian, E. (2024). Measuring GitHub Copilot’s impact on productivity. Communications of the ACM, 67(3), 54-63. https://doi.org/10.1145/3633453

How Vibe Coding Is Reshaping Real-Time Collaboration in Software Development