Vibe Coding's Real Limits
A technology's advocates tend to describe what it can do; a technology's honest educators describe both what it can do and where it fails. Vibe coding has genuine, significant limits that are not just 'current AI is imperfect — wait for the next model.' Some of these limits are fundamental to what language models are. Others are practical limits of the approach that persist regardless of model quality. Understanding them is not pessimism; it is the foundation of using the tool appropriately.
Where the Approach Breaks Down
Novelty without analogy. AI code generation works by pattern matching against its training data. When a problem closely resembles problems that have been solved publicly many times before, the model has rich analogues to draw from and performs well. When a problem is genuinely novel — a new algorithm, a domain-specific optimization, a scientific computation with no established implementation pattern — the model has few analogues. It will produce something, but that something is an extrapolation, not a pattern match, and extrapolations are far less reliable. A graduate student implementing a new signal-processing algorithm for their research cannot simply describe it in English and receive a correct implementation. The algorithm is new; no correct version exists in the training data. The model may generate plausible-looking code that is mathematically wrong in subtle ways. In this territory, you need traditional programming knowledge to verify correctness — and vibe coding without that knowledge produces code you cannot evaluate. Security-critical systems. Secure software requires not just that the happy path works, but that every possible input, including malicious ones, is handled safely. This requires a threat model — a systematic analysis of how an adversary might attack the system and what they would gain. AI models do not produce threat models; they produce code. They can be prompted to check for specific vulnerabilities, but they will miss attack vectors that were not in the prompt and may not be in their training data at all. For systems handling financial transactions, medical records, critical infrastructure, or any data whose compromise could harm real people, vibe coding without professional security review is a serious risk. The problem is not that AI generates insecure code more often than humans — it sometimes generates more secure defaults than novice programmers. The problem is that the vibe coder cannot reliably evaluate whether the generated code is secure, and gaps in that evaluation can have severe consequences.
The deepest limit of vibe coding is the evaluation gap: when a problem exceeds the builder's ability to evaluate whether the AI's output is correct, the entire quality-assurance loop breaks down. The builder cannot test for errors they do not know to look for. The evaluation gap is not fixed by a better AI model — it is fixed by the human developing deeper expertise in the problem domain.
Performance-critical and scale-sensitive systems. An AI model optimizes for correctness and clarity in generated code, not for computational efficiency. For most applications — internal tools, moderate-scale web apps, scripts processing small datasets — this is fine. For systems where performance is the constraint — real-time systems, high-frequency trading, large-scale data pipelines, game engines — the AI's generated code may be orders of magnitude slower than an expert implementation. More subtly, vibe-coded systems often accumulate what engineers call technical debt: design choices that work at small scale but become increasingly painful as the system grows. A data model that seemed reasonable at 100 users may require a complete rewrite at 100,000 users. The AI generates code that works; it does not design for the scale you will need in two years. That architecture foresight requires human judgment and experience. Long-horizon consistency. Most AI code generation models work with a context window — a limit on how much prior conversation they can consider at once. A complex project may span hundreds of thousands of lines of code across dozens of sessions. The AI cannot hold all of that context simultaneously. It will generate code in session 15 that contradicts decisions made in session 3, using different naming conventions, different error-handling patterns, different data formats. Over a long project, the burden of maintaining consistency falls entirely on the human. Legal, regulatory, and compliance requirements. Software operating in regulated domains — healthcare (HIPAA), finance (SOX, PCI-DSS), education (FERPA), European markets (GDPR) — must meet specific legal requirements. AI models trained on general code do not have reliable, current knowledge of all compliance requirements, and they do not update as regulations change. They may generate code that violates regulations without flagging this. Compliance is a human responsibility that requires consulting primary sources and legal expertise, not a task that can be safely delegated to an AI.
Complete each statement about vibe coding's limits.
The Honesty About Hallucination
AI language models hallucinate — they generate plausible-sounding content that is factually incorrect. In code generation, hallucination takes several specific forms: Non-existent library functions. The model generates code calling a function that does not exist in the library it named. The code looks correct syntactically; it fails at runtime. This is especially common with less-documented libraries and with recent releases that the model may not have seen in training. Invented API behavior. The model generates code that calls a real external API but with incorrect parameters, incorrect authentication, or incorrect response handling — based on a version of the API that does not match the current one. Confident but wrong logic. For problems that look similar to common problems but differ in important ways, the model may generate a confidently wrong solution that passes casual inspection. Hallucination frequency varies significantly by model, by domain, and by how similar the task is to training data. What does not vary: the model's output looks the same whether it is correct or hallucinated. Fluency is not reliability. The only defense is verification — running the code, checking the documentation, testing against known-correct outputs.
Documenting limits is not an argument against using the approach — it is an argument for using it where it is appropriate and for understanding what additional work is required where it is insufficient. A hammer's limit is not that it fails to drive screws; the limit tells you when to reach for a different tool. Knowing vibe coding's limits is what allows you to deploy it confidently in its appropriate domain.
Why does vibe coding perform less reliably on genuinely novel algorithms than on common programming tasks?
A team vibe-codes a healthcare app that stores patient diagnoses and sends them to a doctor's portal. They rely on AI-generated code without a compliance review. What specific risk does this create?
Fit-for-Purpose Analysis
- Step 1: Consider each of the following software projects. For each, decide whether vibe coding is: (A) well-suited with standard care, (B) possible but requires additional expertise or review beyond typical vibe coding practice, or (C) inappropriate without substantial traditional programming knowledge.
- Project 1: A personal habit tracker that logs your daily exercise.
- Project 2: A web form that collects patient medical history at a doctor's office.
- Project 3: A script that renames files on your computer according to a date pattern.
- Project 4: An algorithm that detects novel fraud patterns in banking transactions in real time.
- Project 5: A school club website with a public event calendar.
- Project 6: Software that controls the braking system in an autonomous vehicle.
- Step 2: For each project you rated B or C, write specifically what additional expertise, review, or process would be required before deployment.
- Step 3: Discuss: what is the pattern that distinguishes A projects from B and C projects? Articulate a general principle.