When the AI Is Confidently Wrong
AI language models do not know when they are wrong. This is not a limitation that will soon be patched away — it is a fundamental property of how these systems work. They produce text that is statistically probable given their training data, presented with uniform confidence regardless of whether it is correct. Understanding this at a mechanical level is what separates a careful builder from a naive one.
Why Confident Errors Happen
A language model generates text by predicting, token by token, what comes next given everything that came before. It does not have a separate verification module that checks its output against ground truth. It cannot run code internally to see if it executes. It cannot look up current library documentation unless given a tool to do so. This produces several distinct failure modes: Hallucinated APIs: the model produces a function call to an API method that does not exist. The method name sounds plausible — it matches the naming conventions of the real library — but it was never implemented. The code looks correct until you run it. Stale information: the model's training data has a cutoff date. A library or framework it learned about may have had a breaking change after that cutoff. The model confidently uses the old API, which no longer works. Patching the wrong thing: the model remembers that a certain kind of problem is solved by a certain pattern. It applies that pattern to your situation even when the problem is subtly different, producing something that resembles a solution without being one. Context blindness: the model only knows what is in its context window. If your codebase has a specific architecture, a custom data model, or a constraint you did not mention, the model will make assumptions that conflict with your actual situation — and those assumptions will be presented as confidently as correct ones.
A language model is optimized to produce plausible text. Plausible code is code that looks like it could work — it uses real function names, follows language syntax, and matches common patterns. Plausibility is a property of appearance. Correctness is a property of behavior. These are independent. A piece of code can be completely plausible and completely wrong at the same time.
Real categories of confident AI errors in coding: Off-by-one in logic: a loop that should run from 0 to n-1 runs from 1 to n, silently skipping the first element or throwing an index error on the last. Assumed data contract: the AI assumes the function will receive a non-null list, but in your system it sometimes receives null. No null check is included. The code crashes on a legitimate input. Wrong library version: the AI uses a method from version 3 of a library; you are using version 2. The method does not exist, or has a different signature. Logically inverted condition: a conditional check is subtly inverted — less-than instead of less-than-or-equal, or the wrong variable being compared. The code works for most inputs but fails at a specific boundary. Incomplete error handling: the AI handles the success case fully and the error case with a placeholder or not at all — a catch block that catches an exception and does nothing, leaving the calling code to assume success.
Mandatory Verification Practices
Given that the AI cannot verify its own output, the verification burden falls entirely on you. 'Mandatory' means there is no situation — not deadline pressure, not overconfidence in the AI, not the apparent simplicity of the output — in which you skip verification. For every piece of AI-generated code: 1. Run it: do not trust that it is correct because it looks correct. Execute it with real or realistic inputs. 2. Test boundaries: run it on empty input, null input, maximum-size input, and off-by-one input. 3. Check library calls: for every function call to an external library, verify that function exists in the version you are using. Check the current documentation, not your memory. 4. Read the logic: step through the control flow manually for one complete execution. Does each decision point behave as described? 5. Verify against the actual requirement: does the code solve your problem, or a similar-sounding problem the AI imagined?
Complete the statement about AI language model behavior.
The most common context in which builders skip verification is when they feel time pressure. This is the riskiest moment to skip it. Shipping broken code under deadline pressure creates a larger time debt when the defect surfaces in production, when fixing it is harder and the consequences are higher. Verification is never optional — it is a cost that must be paid, and paying it earlier is almost always cheaper.
An AI generates a function that calls library.parseResponse(data, 'strict'). You have never seen this method before. What is the correct next step?
A developer says: 'I use AI for all my code, and I've never had a bug.' Which of the following best explains why this claim is almost certainly wrong even if the developer believes it?
Catch the Confident Error
- Step 1: Ask an AI to write a function that does something moderately specific — for example, 'a function that returns the second-largest unique value in a list of integers, or null if no such value exists.'
- Step 2: Before running the code, write down three inputs you predict might expose a bug: one that exercises the normal case, one edge case (e.g., a list with only one element), and one degenerate case (e.g., an empty list, or a list where all values are the same).
- Step 3: Run all three. Record the actual output vs. your expected output.
- Step 4: If you find a bug, identify which of the AI failure categories from this lesson it falls into (hallucinated API, wrong logic, missing null check, etc.).
- Step 5: Fix the bug yourself without asking the AI. Write one sentence explaining the root cause of the error and why the AI produced it confidently despite being wrong.