Skip to main content
Data & Databases
🗄️ Data & DatabasesLesson 5 of 13

Indexes & Query Performance

Why your app gets slow — indexes, query plans, and the infamous N+1 problem.

Indexes & Query Performance

Stage 3 · Data & Databases · B.U.I.L.D. letter: D

You shipped a beautiful app. Your demo flew. Three months later a user clicks "Load Posts" and waits eight seconds. The code didn't change — the data did. This lesson teaches you to see what the database actually does with your queries, and how to make it fast before your users leave.


⚠️ The vibe trap

With 10 rows every query feels instant — the database is just scanning a tiny list. Hit 10 000 rows and that same scan reads every single one before returning one result. Worse, ORMs make it trivially easy to write a loop that fires a separate query for every row you just fetched: the N+1 problem. The code looks clean, the tests pass, and production quietly dies. Vibe coding got you a working product; this lesson gets it production-ready.


📖 Section 1 — What a Full Table Scan Actually Costs

When you run SELECT * FROM posts WHERE user_id = 42 with no index, the database engine reads every row in the table from disk, checks the user_id column, and discards the ones that don't match. This is called a full table scan (or sequential scan).

Mental model: Imagine a library with no catalog. To find every book by a specific author you pull every book off every shelf, read the spine, and put it back. Ten books — fine. Ten million books — you need a new job.

Why it matters at scale:

Rows in tableRows scannedRows returnedWork done
1001003trivial
10 00010 0003slow
1 000 0001 000 0003painful

The work grows linearly with table size — O(n). An index makes it O(log n) or O(1) depending on type.

Worked example — seeing the scan:

-- A simple query that looks innocent
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;

Before you add any index, run EXPLAIN to see what the planner will do:

EXPLAIN
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;

Output (PostgreSQL):

Seq Scan on posts  (cost=0.00..2541.00 rows=3 width=56)
  Filter: (user_id = 42)

Seq Scan = full table scan. cost=0.00..2541.00 is the planner's estimate of work units. The higher the second number, the more expensive.

Common mistake: Assuming the query is fast because it returns few rows. The cost is in the scan, not the result size.


🗂️ Section 2 — What an Index Is (and What It Costs)

A database index is a separate data structure — almost always a B-tree — that the engine maintains alongside your table. It stores the indexed column values in sorted order, each pointing to the row's physical location. To find all rows where user_id = 42, the engine walks the tree in O(log n) steps instead of scanning every row.

Mental model: The index at the back of a textbook. Instead of reading every page to find every mention of "normalization", you look up the word alphabetically and jump straight to the page numbers listed. The index takes up extra pages (write cost), but lookup is instant.

Creating an index:

-- Basic single-column index
CREATE INDEX idx_posts_user_id ON posts (user_id);

-- Composite index for the WHERE + ORDER BY combination above
-- Column order matters: put the equality filter first, the sort column second
CREATE INDEX idx_posts_user_id_created ON posts (user_id, created_at DESC);

EXPLAIN after the index:

EXPLAIN
SELECT id, title, created_at
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20;

Output:

Index Scan using idx_posts_user_id_created on posts  (cost=0.42..18.63 rows=3 width=56)
  Index Cond: (user_id = 42)

Cost dropped from 2541 to 18. That is a ~140x improvement — and it only gets better as the table grows.

The write cost: Every INSERT, UPDATE, or DELETE now has to update the index tree as well as the table. For a table with 10 indexes, writes are roughly 10x more expensive than a table with no indexes. Don't index every column — index the columns your reads need.

Common mistake: Adding an index to a column you filter with LIKE '%term%' (leading wildcard). The B-tree can't use prefix wildcards. You need a full-text index (CREATE INDEX ... USING gin) for that.


🎯 Section 3 — Which Columns to Index

The rule of thumb: index any column that appears in a WHERE, JOIN ON, or ORDER BY clause — especially on large tables and especially on foreign keys.

Foreign keys are the most commonly forgotten index. ORMs create the constraint but not the index. Every JOIN posts ON posts.user_id = users.id scans the entire posts table unless posts.user_id is indexed.

Decision checklist:

Column usageIndex?
WHERE col = ? (high cardinality, e.g. user_id)Yes
WHERE col = ? (low cardinality, e.g. boolean)Usually no — full scan may be faster
JOIN ON col (foreign key)Yes, always
ORDER BY colYes if combined with a filtered query
SELECT col only, never filteredNo

Worked example — foreign key index:

-- posts.user_id is a foreign key to users.id
-- Without this index, every join scans the entire posts table
CREATE INDEX idx_posts_user_id ON posts (user_id);

-- comments.post_id is a foreign key to posts.id
CREATE INDEX idx_comments_post_id ON comments (post_id);

EXPLAIN ANALYZE (runs the query and shows actual times, not just estimates):

EXPLAIN ANALYZE
SELECT users.username, COUNT(posts.id) AS post_count
FROM users
JOIN posts ON posts.user_id = users.id
GROUP BY users.id;

Without idx_posts_user_id:

Hash Join  (cost=... actual time=412.3..819.7 rows=500 ...)
  ->  Seq Scan on posts  (actual time=0.02..310.4 ...)

After idx_posts_user_id:

Hash Join  (cost=... actual time=18.1..24.9 rows=500 ...)
  ->  Index Scan using idx_posts_user_id on posts  (actual time=0.01..9.2 ...)

Common mistake: Creating a composite index in the wrong column order. CREATE INDEX ON posts (created_at, user_id) will NOT efficiently serve WHERE user_id = 42 ORDER BY created_at. The equality-filter column must come first.


🔁 Section 4 — The N+1 Problem

This is the most common performance killer in apps built with ORMs or written with a "query per record" loop. It looks like this:

The broken pattern (N+1 queries):

// Fetch 100 posts — 1 query
const posts = await db.query('SELECT id, title, user_id FROM posts LIMIT 100');

// Now get the author for each post — 100 MORE queries!
for (const post of posts) {
  const author = await db.query(
    'SELECT username FROM users WHERE id = $1',
    [post.user_id]
  );
  post.author = author.rows[0];
}
// Total: 101 queries. With 1000 posts: 1001 queries.

Every iteration fires a round trip to the database. With 100 posts that's 101 queries. With 10 000 posts it's 10 001 queries. Each query has network latency + parse cost + execution cost. This is the N+1 problem: 1 query to get N rows, then N more queries to hydrate them.

The fix — a single JOIN:

-- One query does the work of 101
SELECT
  posts.id,
  posts.title,
  users.username AS author_username
FROM posts
JOIN users ON users.id = posts.user_id
LIMIT 100;
// In code
const posts = await db.query(`
  SELECT
    posts.id,
    posts.title,
    users.username AS author_username
  FROM posts
  JOIN users ON users.id = posts.user_id
  LIMIT 100
`);
// posts.rows already has author_username on every object.
// Total: 1 query. Always.

The alternative fix — batched IN query (when a JOIN isn't clean):

// Fetch posts
const posts = await db.query('SELECT id, title, user_id FROM posts LIMIT 100');

// Collect all the user IDs we need — no duplicates
const userIds = [...new Set(posts.rows.map(p => p.user_id))];

// One query for ALL authors
const authors = await db.query(
  'SELECT id, username FROM users WHERE id = ANY($1)',
  [userIds]
);

// Build a lookup map — O(1) access
const authorMap = Object.fromEntries(authors.rows.map(u => [u.id, u]));

// Attach authors in memory — no DB round trip
for (const post of posts.rows) {
  post.author = authorMap[post.user_id];
}
// Total: 2 queries, regardless of how many posts.

When to use which:

  • JOIN: when you need a flat result set and the relationship is simple.
  • Batched IN: when you've already fetched data and need to hydrate it, or when your ORM makes joins awkward (but fix the ORM, long-term).

Mental model: Think of the database as a restaurant kitchen. N+1 is running back to the kitchen 101 times for one plate at a time. The JOIN is handing the kitchen one order that comes back complete. Even if the kitchen is fast, 101 trips is always slower than 1.

Common mistake: Using .map() with async/await and thinking it's parallel:

// This still fires 100 separate queries — they just run concurrently
// which is better than sequential but still not as good as a single JOIN
const posts = await Promise.all(
  postIds.map(id => db.query('SELECT ... WHERE id = $1', [id]))
);
// Use a JOIN or IN query instead

📊 Section 5 — Reading EXPLAIN Like a Pro

EXPLAIN is your X-ray vision into query execution. EXPLAIN ANALYZE actually runs the query and adds real timing. Learn to read them fluently.

Key terms to look for:

TermWhat it meansGood or bad?
Seq ScanFull table scanBad on large tables
Index ScanUsed an index, fetches matching rowsGood
Index Only ScanAll needed columns are in the index itselfBest
Bitmap Heap ScanCombines index lookup + selective row fetchGood
Hash JoinJoins two sets by building a hash tableGood for large joins
Nested LoopFor each row in outer, scan innerBad if inner is large
cost=X..YEstimated work: X = startup, Y = totalLower Y is better
actual time=X..YReal milliseconds (ANALYZE only)Lower is better
rows=NPlanner estimate vs actual rows=NBig mismatch = run ANALYZE

Full worked example — before and after:

-- Table: comments (500 000 rows)
-- No index on post_id yet

EXPLAIN ANALYZE
SELECT body, created_at
FROM comments
WHERE post_id = 1234
ORDER BY created_at DESC
LIMIT 10;

Before index:

Limit  (cost=8321.45..8321.47 rows=10) (actual time=284.3..284.3 rows=10)
  ->  Sort  (cost=8321.45..8324.12) (actual time=284.2..284.3 rows=10)
        Sort Key: created_at DESC
        ->  Seq Scan on comments  (actual time=0.02..271.8 rows=412)
              Filter: (post_id = 1234)
              Rows Removed by Filter: 499588

284ms. Nearly half a million rows removed by filter — wasted work.

CREATE INDEX idx_comments_post_id_created ON comments (post_id, created_at DESC);

After index:

Limit  (cost=0.56..18.73 rows=10) (actual time=0.09..0.14 rows=10)
  ->  Index Scan using idx_comments_post_id_created on comments
        Index Cond: (post_id = 1234)

0.14ms. That's a 2000x speedup from one CREATE INDEX statement.

When the planner ignores your index: If the planner chooses a Seq Scan even though an index exists, it usually means:

  1. The table is small enough that the full scan is actually faster.
  2. Statistics are stale — run ANALYZE comments; to refresh them.
  3. The index doesn't match the query (wrong column order, wrong type cast).

Common mistake: Forgetting to run EXPLAIN ANALYZE on realistic data volume. An index looks unnecessary on 100 rows and essential on 100 000. Always test with production-scale data, or at least seed enough rows to see the difference.


🛠️ Your Mission

Find one slow query or one N+1 pattern in your own app and fix it. Here's how:

  1. Identify the hot path. What's the page that loads the most data? A feed, a dashboard, a list view? That's your target.

  2. Log your queries. In Node/Express with pg, you can log every query temporarily:

    // Add to your db client setup — remove before deploying
    const { Pool } = require('pg');
    const pool = new Pool({ /* config */ });
    const originalQuery = pool.query.bind(pool);
    let queryCount = 0;
    pool.query = (...args) => {
      queryCount++;
      console.log(`[Query #${queryCount}]`, typeof args[0] === 'string' ? args[0].slice(0, 80) : args[0].text?.slice(0, 80));
      return originalQuery(...args);
    };
    

    If loading one page logs 50+ queries, you have an N+1.

  3. Run EXPLAIN ANALYZE on the slowest query you find. Look for Seq Scan on any table with more than a few thousand rows.

  4. Create the right index. Match the column(s) in your WHERE and ORDER BY. Add it, re-run EXPLAIN ANALYZE, confirm the cost dropped.

  5. If it's N+1: Rewrite the loop as a JOIN or a batched IN query. Confirm the query count drops to 1 or 2.


✅ You're done when…

  • You've run EXPLAIN ANALYZE on at least one real query in your app and can read the output (Production-Readiness Checklist: "query plan reviewed before shipping a new data-fetching route")
  • You've added at least one index that reduced a Seq Scan to an Index Scan and measured the before/after cost
  • Every foreign key column in your main tables has a corresponding index (CREATE INDEX idx_<table>_<col> ON <table> (<col>))
  • You've identified whether any page in your app has an N+1 and either confirmed it doesn't or fixed it with a JOIN or batched query

➡️ Next: Transactions, Consistency & Race Conditions.

Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.