Skip to main content
Testing, Quality & Craft
🧪 Testing & CraftLesson 12 of 13

Working With AI Like a Senior Engineer

Specs as prompts, reviewing AI output like a staff engineer, and when NOT to trust it.

Working With AI Like a Senior Engineer

Stage 3 · Testing, Quality & Craft · B.U.I.L.D. letter: U

You already know how to build with AI. That's real. Vibe coding is the on-ramp that got you here — and it's a genuine skill. This lesson is about the upgrade: from "it works on my machine" to "I'd stake my name on every line." Senior engineers use AI too. The difference is they never forget who is accountable.


⚠️ The vibe trap

AI output looks authoritative. It's formatted, confident, and usually runs on the first try — which is exactly why it's dangerous. "It looks right" is not the same as "it is right." You can ship a subtle security hole, a function that fails on empty input, or a race condition that only appears under load — and every line will look perfectly reasonable in a quick scroll. The trap isn't laziness. It's the false confidence that fluent prose creates in code form.


🧠 The senior mindset: AI is a fast junior, not an oracle

A junior developer is not useless — they write a lot of code quickly and can follow a clear spec. But you wouldn't merge a junior's PR without reading it. You wouldn't let them design the auth system solo. You'd give them context, review their output, and stay accountable for what ships. That is exactly the relationship to have with AI.

Mental model: You are the tech lead. AI is the fastest junior you've ever hired. Your job is not to prompt harder — it's to supervise better.

Why it matters: AI models are trained to produce plausible output. Plausible is not correct. The model has no idea what your existing codebase looks like, what your security requirements are, or what "done" means for your team unless you tell it. It will confidently fill in the gaps with whatever pattern it has seen most often — which may have nothing to do with your situation.

Common mistake: Treating a long, generated answer as proof the answer is right. Length and fluency are not quality signals.


📐 The spec IS the skill

The quality of AI output is almost entirely determined by the quality of your input. A vague prompt produces plausible garbage. A precise spec produces reviewable code. Writing a tight spec is the highest-leverage thing you do when working with AI — and it's the same skill as writing a good ticket, a good PR description, or a good design doc.

--- VAGUE (produces plausible garbage) ---
"Write a function that validates a user."
--- PRECISE (produces reviewable code) ---
"Write a TypeScript function validateUser(user: unknown): user is User.
 - Accepts an unknown value from an untrusted API response.
 - Returns true only if: user is a non-null object, user.id is a
   non-empty string, user.email matches /^[^\s@]+@[^\s@]+\.[^\s@]+$/,
   and user.role is one of: 'student' | 'instructor' | 'admin'.
 - Returns false for any other shape — never throws.
 - No external dependencies.
 - Include a JSDoc comment with one passing and one failing example."

The vague prompt will give you something that compiles. The precise prompt gives you something you can reason about, test against, and confidently ship.

Mental model: Your spec is a contract. If the spec is ambiguous, the implementation will be too — and you won't know which interpretation you got.

Why it matters: When you write the spec first, you are doing the hard thinking before the code exists. That's exactly what senior engineers do: they don't start coding until they understand what "correct" means.

Common mistake: Starting with code and reverse-engineering a spec afterward. At that point you're validating what the AI did, not what you actually needed.


🔍 Review AI output like a PR

Every piece of AI-generated code is an unreviewed pull request from someone you've never met. Apply the full D5 toolkit — the same way you would for a human colleague.

// AI-generated: "a safe divide function"
function safeDivide(a, b) {
  if (b === 0) return null;
  return a / b;
}

// Looks fine. But review it like a PR:
//
// CORRECTNESS: What does null mean to the caller?
//   safeDivide(10, 0) === null — but null is falsy in JS.
//   A caller doing `if (safeDivide(x, y)) doThing()` will silently
//   skip doThing() when the result is a legitimate 0 (e.g., 0 / 5).
//   The bug: null conflates "division by zero" with "result is zero."
//
// FIX: Return a discriminated result, throw, or use Number types clearly.
function safeDivide(a, b) {
  if (b === 0) throw new RangeError('Division by zero');
  return a / b;
}
// Now the caller must decide how to handle the error explicitly.
// The subtle bug is gone — and it's testable.

// TESTS the AI should also have written:
// expect(safeDivide(10, 2)).toBe(5);
// expect(safeDivide(0, 5)).toBe(0);       // result IS zero — not an error
// expect(() => safeDivide(10, 0)).toThrow(RangeError);

Walk through each lens from the track:

  • Correctness — does it handle every case in your spec, including edge cases and empty inputs?
  • Security — does it trust user input? Does it expose data it shouldn't? Does it use a deprecated or unsafe API?
  • Tests — are there tests? Do the tests actually cover the behavior, or just the happy path?
  • Edge cases — null, empty string, zero, negative numbers, concurrent calls, network failure.

Mental model: AI wrote the first draft. You write the second draft — in your head, by reading it as if someone else's career depended on you catching the bugs.

Why it matters: AI is confidently wrong in ways that look like confident right. The only defense is a systematic review, not a vibe.

Common mistake: Running the code once, seeing it work for the obvious input, and shipping it.


🗂️ Decompose the task; give AI context

AI does its best work on well-scoped, self-contained problems. A prompt like "build me a checkout flow" produces a sketch. A prompt like "write the validateCartItems function that checks stock levels against this existing inventory type" produces something reviewable. Decomposition is the skill of breaking a large problem into AI-sized pieces — each one small enough that you can verify the output fully before wiring it in.

--- BIG TASK (too large to verify) ---
"Build a user authentication system."

--- DECOMPOSED (each piece is verifiable) ---
Step 1: "Write a hashPassword(plain: string): Promise<string> function
         using bcrypt with 12 rounds. No other logic."

Step 2: "Write verifyPassword(plain: string, hash: string): Promise<boolean>
         using the same bcrypt library. Input: one plain-text string,
         one existing hash from hashPassword."

Step 3: "Write createSession(userId: string): Session using this Session
         type: { id: string; userId: string; expiresAt: Date }
         where expiresAt is 24 hours from now. No DB calls — pure function."

Step 4: "Write the tests for each of the three functions above,
         using Vitest. Mock nothing — these are unit tests of pure logic."

At each step you review and approve before moving to the next. If Step 1 is wrong, you fix it before Step 2 depends on it. This is how senior engineers prevent bugs from compounding.

Mental model: Think of each decomposed step as a commit. Small commits are reviewable. One giant commit is not.

Why it matters: The larger the AI's task, the larger the surface area for subtle errors, and the harder it is to test and reason about the output.

Common mistake: Handing AI the whole task because it feels faster. It often is faster — until the integration-level bug surfaces at 11pm the night before launch.


🚫 Know when NOT to use AI

There is one situation where AI assistance is actively harmful: when you don't understand the domain well enough to verify the output. If you can't recognize a wrong answer, you have no defense against a confident wrong answer. Using AI in this state doesn't speed you up — it produces technical debt you can't audit.

Mental model: AI amplifies your ability to execute a known solution. It cannot replace your ability to recognize a correct one. If you couldn't write this code yourself in twice the time, you probably can't review it either.

Why it matters: You are accountable for every line you ship, no matter who or what wrote it. "The AI did it" is not a defense in a security audit, a production incident review, or a code review with your team.

Common mistake: Using AI to bootstrap code in a domain you're actively learning, then treating the output as authoritative. The right move: learn the domain first (or pair with a human who knows it), then use AI to accelerate once you can verify.


🛠️ Your mission

Take a real task from something you're building — or use this one: write a function that sanitizes a user-supplied filename before saving it to disk (strip path traversal characters, limit length, allow only alphanumeric plus hyphens and dots).

  1. Write the spec first. Before you open a chat, write out exactly what the function must do, must not do, and what edge cases you expect. Use the precise-spec format from the section above.

  2. Prompt AI with your spec. Ask it to implement the function AND write the tests in the same response.

  3. Review the output against the Code Review Rubric. Go through every lens: correctness, security, tests, edge cases, readability. Does it handle ../../etc/passwd? An empty string? A filename that is only dots?

  4. Find at least one thing to fix. There will be something — a missing edge case, a test that only tests the happy path, a return type that could be cleaner. Fix it yourself or write a follow-up prompt that is precise enough to get a correct fix.

  5. Run the tests. All of them. If any fail, that's information — not failure. Investigate, fix, re-run.

  6. Check off the Pre-Ship Checklist before you call it done.


✅ You're done when…

  • You have written a tight spec before generating any code, and you can explain why every constraint in it matters
  • You have reviewed the AI output against the Code Review Rubric and documented at least one finding
  • You have checked every item on the Pre-Ship Checklist before declaring the function shippable
  • The tests cover the happy path, at least two edge cases, and one security-relevant input
  • You can explain, in plain language, what every generated line does — no "I think this is fine"

➡️ Next: the Capstone — Messy App → Tested, Clean & Documented. Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.