Skip to main content
Data & Databases
🗄️ Data & DatabasesLesson 10 of 13

Caching & Cache Invalidation

Make reads fast with caching — and survive the hardest problem: invalidation.

Caching & Cache Invalidation

Stage 3 · Data & Databases · B.U.I.L.D. letter: D

You vibe-coded a beautiful app. It works. Then it gets popular, and suddenly your database is drowning in identical queries — the same product page fetched 4,000 times a minute, the same leaderboard recalculated on every request, the same user profile read for every API call in a session. A cache is the fix. Invalidation is the part that will humble you.


⚠️ The vibe trap

The naive move is to add a cache everywhere you notice slowness — fetch the data, store it, serve it fast, feel like a genius. The trap is that the cache keeps serving its snapshot of the world while the actual data moves on. You cache a product's price at $29.99, the seller updates it to $49.99, and your cache cheerfully charges the old price for the next hour. Worse: you cache the response for /api/user/42/profile and then a bug in your key logic serves User 42's private data to User 99. Caching wrong is worse than not caching at all — it adds a whole new category of bugs that only show up in production, under load, in front of real people.


🧠 Section 1 — What a Cache Actually Is

Mental model

A cache is a fast, temporary storage layer that sits between your application and something slow — a database query, an external API call, a heavy computation. The contract is simple: if the answer is already in the cache, return it immediately without touching the slow thing. If it isn't, go get it the slow way, save the result, and return it.

The trade-off: caches store a copy of data. The original can change. Your cache doesn't know. That gap between what the cache says and what the DB says is called stale data — and managing it is 80% of what caching is actually about.

// The most basic cache: a plain in-memory object.
// Real apps use Redis — same concept, network-accessible and persistent.
const cache = {};

async function getProduct(productId) {
  const key = `product:${productId}`;

  // 1. Check the cache first
  if (cache[key]) {
    console.log('cache hit');
    return cache[key];            // fast path: no DB touch
  }

  // 2. Cache miss — go to the DB
  console.log('cache miss — querying DB');
  const product = await db.query(
    'SELECT * FROM products WHERE id = $1',
    [productId]
  );

  // 3. Store the result so the next call is fast
  cache[key] = product;
  return product;
}

Why this matters

A single product page hit can trigger 10–30 DB queries. If that page gets 500 req/s, that's 5,000–15,000 DB queries per second — all asking for the same rows. With a cache, 499 of those 500 requests never touch the DB at all.

Common mistake

Using cache[key] as the existence check when the value might legitimately be null or 0 or false. A product that was deleted might return null from the DB — you store null, and then if (cache[key]) is falsy, so you re-query forever. Use key in cache or cache[key] !== undefined as your check.

// Wrong: if (cache[key]) — misses null/0/false values
// Right:
if (key in cache) {
  return cache[key];
}

⏱️ Section 2 — TTLs: Caches That Forget on Purpose

Mental model

A TTL (Time-To-Live) is an expiration timer attached to each cache entry. After the TTL elapses, the entry is considered stale and evicted — the next request gets a fresh miss, re-queries the DB, and resets the clock. TTLs are the simplest form of cache invalidation: you don't have to think about when data changes, you just accept that your cache is at most N seconds behind reality.

The art is picking the right TTL:

  • Too short → constant cache misses, you might as well have no cache
  • Too long → stale data causes real problems (wrong prices, outdated user data)
  • "It depends" → yes, it always does; here's how to think about it
// TTL with a plain in-memory cache (milliseconds)
const cache = new Map();
const TTL_MS = 60 * 1000; // 60 seconds

async function getProduct(productId) {
  const key = `product:${productId}`;
  const entry = cache.get(key);

  if (entry && Date.now() < entry.expiresAt) {
    return entry.data;           // still fresh
  }

  // Stale or missing — fetch from DB
  const product = await db.query(
    'SELECT * FROM products WHERE id = $1',
    [productId]
  );

  cache.set(key, {
    data: product,
    expiresAt: Date.now() + TTL_MS,
  });

  return product;
}

Redis version (what you'd use in production)

import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

async function getProduct(productId) {
  const key = `product:${productId}`;

  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const product = await db.query(
    'SELECT * FROM products WHERE id = $1',
    [productId]
  );

  // EX = expire in N seconds
  await redis.set(key, JSON.stringify(product), { EX: 60 });
  return product;
}

Why TTL alone is often enough

For data that's mostly read, rarely written, and where a few seconds of staleness is acceptable (public product listings, leaderboards, config values), a TTL of 30–300 seconds is usually fine. The simplicity is the point — you don't need to wire up invalidation logic.

Common mistake

Setting the same TTL for everything. A session token that expires in 24 hours should be cached for hours. A real-time stock ticker should not be cached at all, or for 1–2 seconds at most. A product description can be cached for 5 minutes. Treat TTL as a domain decision, not a default.


🔄 Section 3 — Cache-Aside + Invalidation on Write

Mental model

TTL handles "eventually consistent" use cases. For data that must be accurate quickly after a write — a user updating their own profile and expecting to see the change immediately — you need explicit invalidation: when a write happens, actively delete or replace the relevant cache entry so the next read gets fresh data.

The cache-aside (also called "lazy loading") pattern is:

  1. Read: check cache → miss → load from DB → store → return
  2. Write: update DB → bust the cache key
// --- READ (cache-aside) ---
async function getUser(userId) {
  const key = `user:${userId}`;

  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const user = await db.query(
    'SELECT id, name, email, avatar_url FROM users WHERE id = $1',
    [userId]
  );

  await redis.set(key, JSON.stringify(user), { EX: 300 }); // 5 min TTL as safety net
  return user;
}

// --- WRITE (invalidate on update) ---
async function updateUser(userId, updates) {
  // 1. Write the source of truth first
  await db.query(
    'UPDATE users SET name = $1, avatar_url = $2 WHERE id = $3',
    [updates.name, updates.avatarUrl, userId]
  );

  // 2. Bust the cache — next read will re-query the DB
  await redis.del(`user:${userId}`);
}

Why write-to-DB-first matters

Always update the DB before busting (or writing) the cache. If you update the cache first and the DB write fails, your cache says one thing and your DB says another — and the DB is wrong. Update the source of truth first. The cache is a derivative.

Write-through variant

Some teams prefer write-through: on every write, update both the DB and the cache in one operation. You never serve a miss after a write, but you pay the overhead of cache writes even for data that might never be read again.

async function updateUser(userId, updates) {
  await db.query(
    'UPDATE users SET name = $1, avatar_url = $2 WHERE id = $3',
    [updates.name, updates.avatarUrl, userId]
  );

  // Write-through: update the cache with the new value
  const fresh = await db.query(
    'SELECT id, name, email, avatar_url FROM users WHERE id = $1',
    [userId]
  );
  await redis.set(`user:${userId}`, JSON.stringify(fresh), { EX: 300 });
}

Write-through is heavier but means the cache is always warm after a write. Cache-aside (delete on write) is simpler and equally correct — it just means the next read pays a miss.

Common mistake

Forgetting to invalidate related keys. You update user:42 in the DB and bust user:42 from the cache — but you also have a cached key user:42:dashboard and a list key users:all that contains User 42's data. If you don't bust those too, they serve stale data. The solution is disciplined key naming and a list of all keys that depend on a given record (covered in Section 4).


🗝️ Section 4 — Cache Keys: The Design That Makes or Breaks Everything

Mental model

A cache key is the address of a cached value. Keys must be:

  • Unique — two different pieces of data must never share a key
  • Deterministic — the same request always produces the same key
  • Namespaced — grouped by entity so you can bust related keys together
  • User-scoped when needed — private data must include the user ID

The most common format: entity:id:variant

// Good key patterns
`product:${productId}`                     // single product
`product:${productId}:related`             // related products list
`user:${userId}:profile`                   // user profile (private)
`user:${userId}:orders:page:${page}`       // paginated orders
`leaderboard:weekly:top100`                // shared, public list
`search:${encodeURIComponent(query)}`      // search results

// Dangerous — never do these
`product`                       // no ID: all products share one key
`${userId}`                     // no namespace: collides with product IDs
`user_profile`                  // no user ID: User A gets User B's profile

Busting a group of related keys

When a product is updated, you may need to bust product:42, product:42:related, and any search cache entries that included it. Redis supports pattern deletion (with caution — it scans all keys):

// Bust all keys for a product (use sparingly on large caches)
async function bustProductCache(productId) {
  const keys = await redis.keys(`product:${productId}*`);
  if (keys.length > 0) {
    await redis.del(keys);
  }
}

For large caches, prefer tracking which keys to bust in a separate structure (a Redis Set per entity) rather than pattern-scanning.

Why this is the hardest part

Cache invalidation is famously one of the two hard problems in computer science ("cache invalidation, naming things, and off-by-one errors"). The difficulty isn't the mechanics — it's knowing all the places a piece of data appears in your cache. A user's name might be in user:42:profile, post:88:author, comment:200:author, search:john, and leaderboard:weekly:top100. Update the name and forget one → users see inconsistent data.

The discipline: every time you add a new cached view of an entity, document it. Keep a comment near your cache layer listing all keys for each entity type.

Common mistake

Building cache keys with user-controlled input without sanitizing it. If your key is search:${req.query.q} and the user sends search:admin:secret, they've just crafted a key that could collide with your admin cache namespace. Always normalize and encode user input in keys.


🚫 Section 5 — What NOT to Cache

Mental model

Not all data belongs in a cache. Caching the wrong things creates security holes, correctness bugs, and wasted memory. Here's the explicit list of what to keep out of your cache:

Do not cacheWhy
Passwords or hashed passwordsSecurity — never put credentials in a shared store
Session tokens (in a shared cache)Each token is already O(1) lookup in the DB; a cache adds exposure
Private user data without user-scoped keysLeaks User A's data to User B
Data that changes on every requestCache miss rate is 100% — you get the cost with none of the benefit
Real-time balances or inventory countsA stale "3 items left" is a real business error
Partially-written dataNever cache inside a transaction before it commits

What IS worth caching

// Great caching candidates — high read, low write, public or user-scoped
const GOOD_CACHE_TARGETS = [
  'Public product catalog',          // changes rarely, read constantly
  'Blog post / article content',     // mostly static after publish
  'Config / feature flags',          // read on every request, changes rarely
  'Aggregated stats (leaderboards)', // expensive to compute, slightly stale is OK
  'External API responses',          // rate-limited, slow, idempotent
  'Compiled templates or assets',    // pure computation, same input = same output
];

// Poor caching candidates — avoid or use very short TTLs
const BAD_CACHE_TARGETS = [
  'Live inventory counts',           // oversell risk
  'Payment or order status',         // must be real-time accurate
  'Authentication state',            // security-critical
  'User-specific recommendations',   // personalized, changes with behavior
  'Anything with side effects',      // caching a POST response makes no sense
];

Why caching external API responses is underrated

If your app calls a weather API, a geocoding API, or a currency conversion API, those responses are often rate-limited and slow. A TTL cache for those responses — even 60 seconds — slashes latency and protects you from hitting rate limits under traffic spikes.

async function getWeather(city) {
  const key = `weather:${city.toLowerCase()}`;
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  const data = await fetch(`https://api.weather.com/v1/current?city=${city}`);
  const json = await data.json();

  await redis.set(key, JSON.stringify(json), { EX: 300 }); // 5 min — weather doesn't change per second
  return json;
}

Common mistake

Adding a cache "just in case" to every endpoint before profiling. Caches add complexity, memory usage, and a whole class of correctness bugs. Profile first, cache the proven bottlenecks, keep the rest simple.


🛠️ Your Mission

Find one expensive read in your app — a query that runs on every page load, or that scans a large table to compute a number. Add cache-aside caching to it with a sensible TTL, and wire up correct invalidation on the write path.

Concretely:

  1. Identify the slow read (look for queries without indexes, aggregations, or external API calls).
  2. Pick a Redis key name using the entity:id:variant pattern.
  3. Implement the cache-aside read: check cache → miss → query → store with TTL → return.
  4. Find the write(s) that change this data (UPDATE, DELETE, POST).
  5. Add redis.del(key) after each successful write (invalidate on write pattern).
  6. Test the cold path (delete the key, confirm it re-queries), the warm path (confirm it hits the cache), and the invalidation (update the data, confirm the next read returns fresh data).

If you don't have a real project, implement this for a products table: cache a product read for 60 seconds, and bust the cache when the product's price is updated.


✅ You're done when…

  • Your cached read returns data in under 5ms on a warm cache hit — confirm it with console.time or your profiler (Production-Readiness: p95 latency in range).
  • Your cache key includes the entity type and ID, and private data keys include the user ID.
  • A write to the underlying data triggers redis.del() on the relevant key(s) before the response returns.
  • You can name at least one piece of data in your app that you deliberately chose NOT to cache and explain why.
  • You've set a TTL on every cached entry — no entry lives forever.

➡️ Next: Search. Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.