Performance & Load Testing
Stage 3 · DevOps, Deployment & Operations · B.U.I.L.D. letter: D
You shipped. Users came. Then at 9:47 AM on a Tuesday your server turned into a paperweight. Not because of a bug — because of ten users clicking at the same time. Performance problems don't knock. They arrive all at once, at the worst possible moment. This lesson is how you find them first.
⚠️ The vibe trap
Vibe coding gets you to a working product fast — and that's genuinely worth celebrating. The trap is that "works on my machine with one tab open" is not the same as "works for 500 people who all hit submit at once." Developers guess at bottlenecks all the time: they add caching to the wrong query, rewrite a function that wasn't slow, and ignore the one database call that takes 800 ms every single time. Guessing wastes days and ships regressions. The professional move is to measure first, then fix exactly the thing the numbers point to — nothing else.
🔬 Section 1 — Measure Before You Touch Anything
Before you change a single line, you need to know where the time goes. Every Node/Express app can get basic timing with almost zero code.
// middleware/timing.js — drop this near the top of your Express app
export function timingMiddleware(req, res, next) {
const start = process.hrtime.bigint();
res.on('finish', () => {
const durationMs = Number(process.hrtime.bigint() - start) / 1_000_000;
const slow = durationMs > 500;
console[slow ? 'warn' : 'info'](
`[${req.method}] ${req.path} → ${res.statusCode} — ${durationMs.toFixed(2)} ms${slow ? ' ⚠️ SLOW' : ''}`
);
});
next();
}
Mental model: Every request is a pipeline. The timing middleware wraps the whole pipeline and tells you the total cost. Once you see a route taking 600 ms you have a target. Without this you are flying blind.
Why it matters: Logging at the route level costs essentially nothing at runtime and gives you a permanent production record of which endpoints are slow. You can query your logs (D7) later and see the worst offenders ranked.
Common mistake: Only measuring in development. Dev boxes have no concurrent users, warm caches, and fast disks. Always check timing in your staging environment under realistic conditions, not localhost.
🐢 Section 2 — Common Backend Bottlenecks (and How to Spot Them)
Once timing tells you which endpoint is slow, you need to know why. Almost every backend slowdown falls into one of four buckets.
1. Slow or N+1 queries (see D2 — Databases) An N+1 happens when you fetch a list of N records and then run one extra query per record to get related data. Fetching 100 posts and then querying the author for each one = 101 queries. It feels fast with 5 rows in dev; it crawls with 10 000 in prod.
// BAD — N+1: one query per post to get its author
const posts = await db.query('SELECT * FROM posts');
for (const post of posts) {
post.author = await db.query(
'SELECT name FROM users WHERE id = $1',
[post.author_id] // runs once per post 🐌
);
}
// GOOD — single JOIN: one round-trip regardless of post count
const posts = await db.query(`
SELECT p.*, u.name AS author_name
FROM posts p
JOIN users u ON u.id = p.author_id
`);
2. Missing cache for expensive reads (see D2 — Caching)
If a query is slow but its result is the same for every user for the next 60 seconds, compute it once and serve the cached copy. A Redis GET/SET costs ~0.5 ms; a complex aggregation query can cost 400 ms.
3. Blocking work on the request thread (see D1 — Jobs & Queues)
Sending an email, resizing an image, or calling a slow third-party API synchronously inside a request handler holds the connection open and blocks that thread for every other user. Push slow work to a background job queue and return 202 Accepted immediately.
4. Oversized payloads
Returning SELECT * when the client needs three columns. Sending 500 records when the UI shows 20. Every extra byte travels the wire and gets parsed. Add LIMIT, OFFSET, and field selection to your queries, and enable gzip compression in your HTTP layer.
Common mistake: Fixing bucket #3 when the real killer is bucket #1. Profile first. The query that runs 200 times per request is almost always the culprit.
🧪 Section 3 — Load Testing with k6
Once you know your endpoints are individually fast, you need to know what happens when 200 people use them at the same time. That is load testing. k6 is an open-source tool that lets you script realistic user flows in JavaScript and hammer your server from the command line.
// load-tests/homepage-and-login.js
import http from 'k6/http';
import { check, sleep } from 'k6';
// Test configuration: ramp up to 50 virtual users over 30 s,
// hold for 1 minute, then ramp back down.
export const options = {
stages: [
{ duration: '30s', target: 50 }, // ramp-up
{ duration: '1m', target: 50 }, // sustained load
{ duration: '15s', target: 0 }, // ramp-down
],
thresholds: {
// FAIL the test if median response > 500 ms or error rate > 1 %
http_req_duration: ['p(50)<500', 'p(95)<1500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
// Simulate a user loading the home page
const home = http.get('https://staging.hyvecares.org/');
check(home, { 'home 200': (r) => r.status === 200 });
sleep(1); // 1-second think time between requests (realistic!)
// Simulate hitting the most-used API endpoint
const lessons = http.get('https://staging.hyvecares.org/api/lessons?track=stage-1');
check(lessons, {
'lessons 200': (r) => r.status === 200,
'lessons < 800ms': (r) => r.timings.duration < 800,
});
sleep(2);
}
Run it:
# Install k6 once (macOS/Linux/WSL)
brew install k6 # macOS
# or: sudo apt install k6 (Ubuntu/Debian)
# Point at staging, never production, for your first real test
k6 run load-tests/homepage-and-login.js
Mental model: Each "virtual user" (VU) is an independent goroutine that loops through your default function continuously. 50 VUs each looping every 3 seconds = ~16 requests/second sustained. That is a real small-to-medium load for a web app.
Why it matters: A single request passing in ~200 ms tells you nothing about what happens under concurrency. Database connection pools saturate, memory climbs, garbage collection pauses accumulate. Load testing is how you discover your actual breaking point before users do.
Common mistake: Running load tests against production. Always use a staging environment (D1 — Environments) that mirrors prod. A load test is intentionally abusive; you do not want to abuse real users.
📊 Section 4 — Reading Your Load Test Results
k6 prints a summary after every run. Here is how to read it without guessing.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: load-tests/homepage-and-login.js
output: -
scenarios: (100.00%) 1 scenario, 50 max VUs, 2m15s max duration
✓ home 200
✓ lessons 200
✗ lessons < 800ms
↳ 72% — ✓ 1188 / ✗ 462
checks.........................: 95.18% ✓ 3564 ✗ 462
data_received..................: 18 MB 133 kB/s
data_sent......................: 2.1 MB 16 kB/s
http_req_blocked...............: avg=1.2ms p(95)=3.1ms
http_req_duration..............: avg=312ms p(95)=1.22s p(99)=2.41s
{ expected_response:true }...: avg=298ms p(95)=1.17s
http_req_failed................: 0.00% ✓ 0 ✗ 1650
http_reqs......................: 1650 12.5/s
vus............................: 50 min=50 max=50
vus_max........................: 50
What each number means:
| Metric | What it tells you |
|---|---|
http_req_duration p(95) | 95 % of requests finished in ≤ this time. The most important single number — outliers hurt real users. |
http_req_duration p(99) | The worst-case experience for 1 in 100 requests. If this is 5 s, someone is waiting 5 s. |
http_reqs / 12.5/s | Throughput — how many requests/second your server handled. |
http_req_failed | Any non-2xx/3xx response. Even 0.5 % failure under load is a production incident waiting to happen. |
checks ✗ 462 | Your custom assertions that failed — here, 28 % of /api/lessons calls took over 800 ms under 50 VUs. That is your bottleneck. |
Common mistake: Only looking at average duration. Averages hide pain. A p(95) of 1.2 s while the average is 312 ms means a large chunk of users are experiencing a slow app even though the average looks fine. Always look at percentiles.
📐 Section 5 — Capacity Planning Basics
After a load test you know your breaking point. Capacity planning is the practice of making sure you have enough headroom above that point for normal traffic spikes — and knowing what to do when you need more.
The three questions:
- What is my current max throughput? (From load test: requests/second before errors climb above 1 %.)
- What is my expected peak traffic? (Check your analytics — D8 Metrics. Black Friday? After a tweet goes viral? Double your worst real day and add 50 %.)
- How do I scale if I hit the ceiling? (Horizontal scaling: add more server instances. Vertical scaling: bigger instance. Connection pool size: more DB connections. CDN: offload static assets entirely.)
A simple rule of thumb: Your load test should push to at least 2× your expected peak before you call it production-ready. If you expect 100 concurrent users at peak, your system should handle 200 without error rates above 1 %.
# Quick check: how many Node processes are you running?
# If it's 1, you're leaving most of your CPU idle on multi-core hardware.
# PM2 cluster mode uses all cores automatically.
pm2 start dist/server.js -i max # spawns one process per CPU core
pm2 status # confirm all instances are online
Common mistake: Treating capacity planning as a one-time activity. Traffic patterns change. Run a load test every time you make a significant backend change or before any anticipated traffic spike (a launch, a marketing campaign, a Product Hunt post).
🖼️ A Quick Word on Frontend Performance
Frontend performance is real and important — bundle size, image weight, and render-blocking scripts directly affect how fast users see your page. Tools like Lighthouse, next build --analyze, and WebP image formats belong in your toolkit. But they live in a different problem space than backend/ops performance. A frontend that loads in 1.2 s instead of 3 s is a win. A backend that falls over at 50 concurrent users is a service outage. Both matter; ops is just more binary. Keep your images in WebP, lazy-load below-the-fold content, and put static assets behind a CDN — then spend the rest of your performance budget on the backend, where the load testing numbers actually live.
🛠️ Your Mission
- Add the
timingMiddlewareto your app and deploy it to staging. - Watch the logs for 5 minutes. Identify the three slowest endpoints by median duration.
- For your slowest endpoint, trace why it is slow: is it a query, a missing cache, a blocking call, or a fat payload?
- Write a k6 load test that targets your two most-used API routes.
- Run the test at 25 VUs for 1 minute. Record your p(95) latency and your max throughput before errors appear.
- Fix the bottleneck you found in step 3, re-run the load test, and compare the before/after numbers.
Document your findings in a short PERF.md in the project root: what was slow, why, what you changed, and the before/after p(95).
✅ You're done when…
- Your app passes the Production-Readiness Checklist for performance: p(95) latency under 1 s at 2× expected peak load, error rate under 1 % at that load, and at least one slow endpoint traced to its root cause and fixed
- You have a k6 load test script committed to your repo that can be re-run against staging on demand
- Your timing middleware is deployed and producing logs that show per-route durations in your log aggregator (D7)
- You can explain the difference between p(50), p(95), and p(99) latency to a teammate without looking it up
- You have identified and documented at least one N+1 query, missing cache, or blocking-call bottleneck in your own codebase
➡️ Next: Incidents & On-Call.
Build It Right, Or Don't Build It At All. 🏛️