Indexes & Query Performance
Stage 3 · Data & Databases · B.U.I.L.D. letter: D
You shipped a beautiful app. Your demo flew. Three months later a user clicks "Load Posts" and waits eight seconds. The code didn't change — the data did. This lesson teaches you to see what the database actually does with your queries, and how to make it fast before your users leave.
⚠️ The vibe trap
With 10 rows every query feels instant — the database is just scanning a tiny list. Hit 10 000 rows and that same scan reads every single one before returning one result. Worse, ORMs make it trivially easy to write a loop that fires a separate query for every row you just fetched: the N+1 problem. The code looks clean, the tests pass, and production quietly dies. Vibe coding got you a working product; this lesson gets it production-ready.
📖 Section 1 — What a Full Table Scan Actually Costs
When you run SELECT * FROM posts WHERE user_id = 42 with no index, the database engine reads every row in the table from disk, checks the user_id column, and discards the ones that don't match. This is called a full table scan (or sequential scan).
Mental model: Imagine a library with no catalog. To find every book by a specific author you pull every book off every shelf, read the spine, and put it back. Ten books — fine. Ten million books — you need a new job.
Why it matters at scale:
| Rows in table | Rows scanned | Rows returned | Work done |
|---|---|---|---|
| 100 | 100 | 3 | trivial |
| 10 000 | 10 000 | 3 | slow |
| 1 000 000 | 1 000 000 | 3 | painful |
The work grows linearly with table size — O(n). An index makes it O(log n) or O(1) depending on type.
Worked example — seeing the scan:
-- A simple query that looks innocent
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;
Before you add any index, run EXPLAIN to see what the planner will do:
EXPLAIN
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;
Output (PostgreSQL):
Seq Scan on posts (cost=0.00..2541.00 rows=3 width=56)
Filter: (user_id = 42)
Seq Scan = full table scan. cost=0.00..2541.00 is the planner's estimate of work units. The higher the second number, the more expensive.
Common mistake: Assuming the query is fast because it returns few rows. The cost is in the scan, not the result size.
🗂️ Section 2 — What an Index Is (and What It Costs)
A database index is a separate data structure — almost always a B-tree — that the engine maintains alongside your table. It stores the indexed column values in sorted order, each pointing to the row's physical location. To find all rows where user_id = 42, the engine walks the tree in O(log n) steps instead of scanning every row.
Mental model: The index at the back of a textbook. Instead of reading every page to find every mention of "normalization", you look up the word alphabetically and jump straight to the page numbers listed. The index takes up extra pages (write cost), but lookup is instant.
Creating an index:
-- Basic single-column index
CREATE INDEX idx_posts_user_id ON posts (user_id);
-- Composite index for the WHERE + ORDER BY combination above
-- Column order matters: put the equality filter first, the sort column second
CREATE INDEX idx_posts_user_id_created ON posts (user_id, created_at DESC);
EXPLAIN after the index:
EXPLAIN
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;
Output:
Index Scan using idx_posts_user_id_created on posts (cost=0.42..18.63 rows=3 width=56)
Index Cond: (user_id = 42)
Cost dropped from 2541 to 18. That is a ~140x improvement — and it only gets better as the table grows.
The write cost: Every INSERT, UPDATE, or DELETE now has to update the index tree as well as the table. For a table with 10 indexes, writes are roughly 10x more expensive than a table with no indexes. Don't index every column — index the columns your reads need.
Common mistake: Adding an index to a column you filter with LIKE '%term%' (leading wildcard). The B-tree can't use prefix wildcards. You need a full-text index (CREATE INDEX ... USING gin) for that.
🎯 Section 3 — Which Columns to Index
The rule of thumb: index any column that appears in a WHERE, JOIN ON, or ORDER BY clause — especially on large tables and especially on foreign keys.
Foreign keys are the most commonly forgotten index. ORMs create the constraint but not the index. Every JOIN posts ON posts.user_id = users.id scans the entire posts table unless posts.user_id is indexed.
Decision checklist:
| Column usage | Index? |
|---|---|
WHERE col = ? (high cardinality, e.g. user_id) | Yes |
WHERE col = ? (low cardinality, e.g. boolean) | Usually no — full scan may be faster |
JOIN ON col (foreign key) | Yes, always |
ORDER BY col | Yes if combined with a filtered query |
SELECT col only, never filtered | No |
Worked example — foreign key index:
-- posts.user_id is a foreign key to users.id
-- Without this index, every join scans the entire posts table
CREATE INDEX idx_posts_user_id ON posts (user_id);
-- comments.post_id is a foreign key to posts.id
CREATE INDEX idx_comments_post_id ON comments (post_id);
EXPLAIN ANALYZE (runs the query and shows actual times, not just estimates):
EXPLAIN ANALYZE
SELECT users.username, COUNT(posts.id) AS post_count
FROM users
JOIN posts ON posts.user_id = users.id
GROUP BY users.id;
Without idx_posts_user_id:
Hash Join (cost=... actual time=412.3..819.7 rows=500 ...)
-> Seq Scan on posts (actual time=0.02..310.4 ...)
After idx_posts_user_id:
Hash Join (cost=... actual time=18.1..24.9 rows=500 ...)
-> Index Scan using idx_posts_user_id on posts (actual time=0.01..9.2 ...)
Common mistake: Creating a composite index in the wrong column order. CREATE INDEX ON posts (created_at, user_id) will NOT efficiently serve WHERE user_id = 42 ORDER BY created_at. The equality-filter column must come first.
🔁 Section 4 — The N+1 Problem
This is the most common performance killer in apps built with ORMs or written with a "query per record" loop. It looks like this:
The broken pattern (N+1 queries):
// Fetch 100 posts — 1 query
const posts = await db.query('SELECT id, title, user_id FROM posts LIMIT 100');
// Now get the author for each post — 100 MORE queries!
for (const post of posts) {
const author = await db.query(
'SELECT username FROM users WHERE id = $1',
[post.user_id]
);
post.author = author.rows[0];
}
// Total: 101 queries. With 1000 posts: 1001 queries.
Every iteration fires a round trip to the database. With 100 posts that's 101 queries. With 10 000 posts it's 10 001 queries. Each query has network latency + parse cost + execution cost. This is the N+1 problem: 1 query to get N rows, then N more queries to hydrate them.
The fix — a single JOIN:
-- One query does the work of 101
SELECT
posts.id,
posts.title,
users.username AS author_username
FROM posts
JOIN users ON users.id = posts.user_id
LIMIT 100;
// In code
const posts = await db.query(`
SELECT
posts.id,
posts.title,
users.username AS author_username
FROM posts
JOIN users ON users.id = posts.user_id
LIMIT 100
`);
// posts.rows already has author_username on every object.
// Total: 1 query. Always.
The alternative fix — batched IN query (when a JOIN isn't clean):
// Fetch posts
const posts = await db.query('SELECT id, title, user_id FROM posts LIMIT 100');
// Collect all the user IDs we need — no duplicates
const userIds = [...new Set(posts.rows.map(p => p.user_id))];
// One query for ALL authors
const authors = await db.query(
'SELECT id, username FROM users WHERE id = ANY($1)',
[userIds]
);
// Build a lookup map — O(1) access
const authorMap = Object.fromEntries(authors.rows.map(u => [u.id, u]));
// Attach authors in memory — no DB round trip
for (const post of posts.rows) {
post.author = authorMap[post.user_id];
}
// Total: 2 queries, regardless of how many posts.
When to use which:
- JOIN: when you need a flat result set and the relationship is simple.
- Batched IN: when you've already fetched data and need to hydrate it, or when your ORM makes joins awkward (but fix the ORM, long-term).
Mental model: Think of the database as a restaurant kitchen. N+1 is running back to the kitchen 101 times for one plate at a time. The JOIN is handing the kitchen one order that comes back complete. Even if the kitchen is fast, 101 trips is always slower than 1.
Common mistake: Using .map() with async/await and thinking it's parallel:
// This still fires 100 separate queries — they just run concurrently
// which is better than sequential but still not as good as a single JOIN
const posts = await Promise.all(
postIds.map(id => db.query('SELECT ... WHERE id = $1', [id]))
);
// Use a JOIN or IN query instead
📊 Section 5 — Reading EXPLAIN Like a Pro
EXPLAIN is your X-ray vision into query execution. EXPLAIN ANALYZE actually runs the query and adds real timing. Learn to read them fluently.
Key terms to look for:
| Term | What it means | Good or bad? |
|---|---|---|
Seq Scan | Full table scan | Bad on large tables |
Index Scan | Used an index, fetches matching rows | Good |
Index Only Scan | All needed columns are in the index itself | Best |
Bitmap Heap Scan | Combines index lookup + selective row fetch | Good |
Hash Join | Joins two sets by building a hash table | Good for large joins |
Nested Loop | For each row in outer, scan inner | Bad if inner is large |
cost=X..Y | Estimated work: X = startup, Y = total | Lower Y is better |
actual time=X..Y | Real milliseconds (ANALYZE only) | Lower is better |
rows=N | Planner estimate vs actual rows=N | Big mismatch = run ANALYZE |
Full worked example — before and after:
-- Table: comments (500 000 rows)
-- No index on post_id yet
EXPLAIN ANALYZE
SELECT body, created_at
FROM comments
WHERE post_id = 1234
ORDER BY created_at DESC
LIMIT 10;
Before index:
Limit (cost=8321.45..8321.47 rows=10) (actual time=284.3..284.3 rows=10)
-> Sort (cost=8321.45..8324.12) (actual time=284.2..284.3 rows=10)
Sort Key: created_at DESC
-> Seq Scan on comments (actual time=0.02..271.8 rows=412)
Filter: (post_id = 1234)
Rows Removed by Filter: 499588
284ms. Nearly half a million rows removed by filter — wasted work.
CREATE INDEX idx_comments_post_id_created ON comments (post_id, created_at DESC);
After index:
Limit (cost=0.56..18.73 rows=10) (actual time=0.09..0.14 rows=10)
-> Index Scan using idx_comments_post_id_created on comments
Index Cond: (post_id = 1234)
0.14ms. That's a 2000x speedup from one CREATE INDEX statement.
When the planner ignores your index: If the planner chooses a Seq Scan even though an index exists, it usually means:
- The table is small enough that the full scan is actually faster.
- Statistics are stale — run
ANALYZE comments;to refresh them. - The index doesn't match the query (wrong column order, wrong type cast).
Common mistake: Forgetting to run EXPLAIN ANALYZE on realistic data volume. An index looks unnecessary on 100 rows and essential on 100 000. Always test with production-scale data, or at least seed enough rows to see the difference.
🛠️ Your Mission
Find one slow query or one N+1 pattern in your own app and fix it. Here's how:
-
Identify the hot path. What's the page that loads the most data? A feed, a dashboard, a list view? That's your target.
-
Log your queries. In Node/Express with
pg, you can log every query temporarily:// Add to your db client setup — remove before deploying const { Pool } = require('pg'); const pool = new Pool({ /* config */ }); const originalQuery = pool.query.bind(pool); let queryCount = 0; pool.query = (...args) => { queryCount++; console.log(`[Query #${queryCount}]`, typeof args[0] === 'string' ? args[0].slice(0, 80) : args[0].text?.slice(0, 80)); return originalQuery(...args); };If loading one page logs 50+ queries, you have an N+1.
-
Run EXPLAIN ANALYZE on the slowest query you find. Look for
Seq Scanon any table with more than a few thousand rows. -
Create the right index. Match the column(s) in your
WHEREandORDER BY. Add it, re-runEXPLAIN ANALYZE, confirm the cost dropped. -
If it's N+1: Rewrite the loop as a JOIN or a batched IN query. Confirm the query count drops to 1 or 2.
✅ You're done when…
- You've run
EXPLAIN ANALYZEon at least one real query in your app and can read the output (Production-Readiness Checklist: "query plan reviewed before shipping a new data-fetching route") - You've added at least one index that reduced a
Seq Scanto anIndex Scanand measured the before/after cost - Every foreign key column in your main tables has a corresponding index (
CREATE INDEX idx_<table>_<col> ON <table> (<col>)) - You've identified whether any page in your app has an N+1 and either confirmed it doesn't or fixed it with a JOIN or batched query
➡️ Next: Transactions, Consistency & Race Conditions.
Build It Right, Or Don't Build It At All. 🏛️