Caching & Cache Invalidation
Stage 3 · Data & Databases · B.U.I.L.D. letter: D
You vibe-coded a beautiful app. It works. Then it gets popular, and suddenly your database is drowning in identical queries — the same product page fetched 4,000 times a minute, the same leaderboard recalculated on every request, the same user profile read for every API call in a session. A cache is the fix. Invalidation is the part that will humble you.
⚠️ The vibe trap
The naive move is to add a cache everywhere you notice slowness — fetch the data, store it, serve it fast, feel like a genius. The trap is that the cache keeps serving its snapshot of the world while the actual data moves on. You cache a product's price at $29.99, the seller updates it to $49.99, and your cache cheerfully charges the old price for the next hour. Worse: you cache the response for /api/user/42/profile and then a bug in your key logic serves User 42's private data to User 99. Caching wrong is worse than not caching at all — it adds a whole new category of bugs that only show up in production, under load, in front of real people.
🧠 Section 1 — What a Cache Actually Is
Mental model
A cache is a fast, temporary storage layer that sits between your application and something slow — a database query, an external API call, a heavy computation. The contract is simple: if the answer is already in the cache, return it immediately without touching the slow thing. If it isn't, go get it the slow way, save the result, and return it.
The trade-off: caches store a copy of data. The original can change. Your cache doesn't know. That gap between what the cache says and what the DB says is called stale data — and managing it is 80% of what caching is actually about.
// The most basic cache: a plain in-memory object.
// Real apps use Redis — same concept, network-accessible and persistent.
const cache = {};
async function getProduct(productId) {
const key = `product:${productId}`;
// 1. Check the cache first
if (cache[key]) {
console.log('cache hit');
return cache[key]; // fast path: no DB touch
}
// 2. Cache miss — go to the DB
console.log('cache miss — querying DB');
const product = await db.query(
'SELECT * FROM products WHERE id = $1',
[productId]
);
// 3. Store the result so the next call is fast
cache[key] = product;
return product;
}
Why this matters
A single product page hit can trigger 10–30 DB queries. If that page gets 500 req/s, that's 5,000–15,000 DB queries per second — all asking for the same rows. With a cache, 499 of those 500 requests never touch the DB at all.
Common mistake
Using cache[key] as the existence check when the value might legitimately be null or 0 or false. A product that was deleted might return null from the DB — you store null, and then if (cache[key]) is falsy, so you re-query forever. Use key in cache or cache[key] !== undefined as your check.
// Wrong: if (cache[key]) — misses null/0/false values
// Right:
if (key in cache) {
return cache[key];
}
⏱️ Section 2 — TTLs: Caches That Forget on Purpose
Mental model
A TTL (Time-To-Live) is an expiration timer attached to each cache entry. After the TTL elapses, the entry is considered stale and evicted — the next request gets a fresh miss, re-queries the DB, and resets the clock. TTLs are the simplest form of cache invalidation: you don't have to think about when data changes, you just accept that your cache is at most N seconds behind reality.
The art is picking the right TTL:
- Too short → constant cache misses, you might as well have no cache
- Too long → stale data causes real problems (wrong prices, outdated user data)
- "It depends" → yes, it always does; here's how to think about it
// TTL with a plain in-memory cache (milliseconds)
const cache = new Map();
const TTL_MS = 60 * 1000; // 60 seconds
async function getProduct(productId) {
const key = `product:${productId}`;
const entry = cache.get(key);
if (entry && Date.now() < entry.expiresAt) {
return entry.data; // still fresh
}
// Stale or missing — fetch from DB
const product = await db.query(
'SELECT * FROM products WHERE id = $1',
[productId]
);
cache.set(key, {
data: product,
expiresAt: Date.now() + TTL_MS,
});
return product;
}
Redis version (what you'd use in production)
import { createClient } from 'redis';
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
async function getProduct(productId) {
const key = `product:${productId}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const product = await db.query(
'SELECT * FROM products WHERE id = $1',
[productId]
);
// EX = expire in N seconds
await redis.set(key, JSON.stringify(product), { EX: 60 });
return product;
}
Why TTL alone is often enough
For data that's mostly read, rarely written, and where a few seconds of staleness is acceptable (public product listings, leaderboards, config values), a TTL of 30–300 seconds is usually fine. The simplicity is the point — you don't need to wire up invalidation logic.
Common mistake
Setting the same TTL for everything. A session token that expires in 24 hours should be cached for hours. A real-time stock ticker should not be cached at all, or for 1–2 seconds at most. A product description can be cached for 5 minutes. Treat TTL as a domain decision, not a default.
🔄 Section 3 — Cache-Aside + Invalidation on Write
Mental model
TTL handles "eventually consistent" use cases. For data that must be accurate quickly after a write — a user updating their own profile and expecting to see the change immediately — you need explicit invalidation: when a write happens, actively delete or replace the relevant cache entry so the next read gets fresh data.
The cache-aside (also called "lazy loading") pattern is:
- Read: check cache → miss → load from DB → store → return
- Write: update DB → bust the cache key
// --- READ (cache-aside) ---
async function getUser(userId) {
const key = `user:${userId}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const user = await db.query(
'SELECT id, name, email, avatar_url FROM users WHERE id = $1',
[userId]
);
await redis.set(key, JSON.stringify(user), { EX: 300 }); // 5 min TTL as safety net
return user;
}
// --- WRITE (invalidate on update) ---
async function updateUser(userId, updates) {
// 1. Write the source of truth first
await db.query(
'UPDATE users SET name = $1, avatar_url = $2 WHERE id = $3',
[updates.name, updates.avatarUrl, userId]
);
// 2. Bust the cache — next read will re-query the DB
await redis.del(`user:${userId}`);
}
Why write-to-DB-first matters
Always update the DB before busting (or writing) the cache. If you update the cache first and the DB write fails, your cache says one thing and your DB says another — and the DB is wrong. Update the source of truth first. The cache is a derivative.
Write-through variant
Some teams prefer write-through: on every write, update both the DB and the cache in one operation. You never serve a miss after a write, but you pay the overhead of cache writes even for data that might never be read again.
async function updateUser(userId, updates) {
await db.query(
'UPDATE users SET name = $1, avatar_url = $2 WHERE id = $3',
[updates.name, updates.avatarUrl, userId]
);
// Write-through: update the cache with the new value
const fresh = await db.query(
'SELECT id, name, email, avatar_url FROM users WHERE id = $1',
[userId]
);
await redis.set(`user:${userId}`, JSON.stringify(fresh), { EX: 300 });
}
Write-through is heavier but means the cache is always warm after a write. Cache-aside (delete on write) is simpler and equally correct — it just means the next read pays a miss.
Common mistake
Forgetting to invalidate related keys. You update user:42 in the DB and bust user:42 from the cache — but you also have a cached key user:42:dashboard and a list key users:all that contains User 42's data. If you don't bust those too, they serve stale data. The solution is disciplined key naming and a list of all keys that depend on a given record (covered in Section 4).
🗝️ Section 4 — Cache Keys: The Design That Makes or Breaks Everything
Mental model
A cache key is the address of a cached value. Keys must be:
- Unique — two different pieces of data must never share a key
- Deterministic — the same request always produces the same key
- Namespaced — grouped by entity so you can bust related keys together
- User-scoped when needed — private data must include the user ID
The most common format: entity:id:variant
// Good key patterns
`product:${productId}` // single product
`product:${productId}:related` // related products list
`user:${userId}:profile` // user profile (private)
`user:${userId}:orders:page:${page}` // paginated orders
`leaderboard:weekly:top100` // shared, public list
`search:${encodeURIComponent(query)}` // search results
// Dangerous — never do these
`product` // no ID: all products share one key
`${userId}` // no namespace: collides with product IDs
`user_profile` // no user ID: User A gets User B's profile
Busting a group of related keys
When a product is updated, you may need to bust product:42, product:42:related, and any search cache entries that included it. Redis supports pattern deletion (with caution — it scans all keys):
// Bust all keys for a product (use sparingly on large caches)
async function bustProductCache(productId) {
const keys = await redis.keys(`product:${productId}*`);
if (keys.length > 0) {
await redis.del(keys);
}
}
For large caches, prefer tracking which keys to bust in a separate structure (a Redis Set per entity) rather than pattern-scanning.
Why this is the hardest part
Cache invalidation is famously one of the two hard problems in computer science ("cache invalidation, naming things, and off-by-one errors"). The difficulty isn't the mechanics — it's knowing all the places a piece of data appears in your cache. A user's name might be in user:42:profile, post:88:author, comment:200:author, search:john, and leaderboard:weekly:top100. Update the name and forget one → users see inconsistent data.
The discipline: every time you add a new cached view of an entity, document it. Keep a comment near your cache layer listing all keys for each entity type.
Common mistake
Building cache keys with user-controlled input without sanitizing it. If your key is search:${req.query.q} and the user sends search:admin:secret, they've just crafted a key that could collide with your admin cache namespace. Always normalize and encode user input in keys.
🚫 Section 5 — What NOT to Cache
Mental model
Not all data belongs in a cache. Caching the wrong things creates security holes, correctness bugs, and wasted memory. Here's the explicit list of what to keep out of your cache:
| Do not cache | Why |
|---|---|
| Passwords or hashed passwords | Security — never put credentials in a shared store |
| Session tokens (in a shared cache) | Each token is already O(1) lookup in the DB; a cache adds exposure |
| Private user data without user-scoped keys | Leaks User A's data to User B |
| Data that changes on every request | Cache miss rate is 100% — you get the cost with none of the benefit |
| Real-time balances or inventory counts | A stale "3 items left" is a real business error |
| Partially-written data | Never cache inside a transaction before it commits |
What IS worth caching
// Great caching candidates — high read, low write, public or user-scoped
const GOOD_CACHE_TARGETS = [
'Public product catalog', // changes rarely, read constantly
'Blog post / article content', // mostly static after publish
'Config / feature flags', // read on every request, changes rarely
'Aggregated stats (leaderboards)', // expensive to compute, slightly stale is OK
'External API responses', // rate-limited, slow, idempotent
'Compiled templates or assets', // pure computation, same input = same output
];
// Poor caching candidates — avoid or use very short TTLs
const BAD_CACHE_TARGETS = [
'Live inventory counts', // oversell risk
'Payment or order status', // must be real-time accurate
'Authentication state', // security-critical
'User-specific recommendations', // personalized, changes with behavior
'Anything with side effects', // caching a POST response makes no sense
];
Why caching external API responses is underrated
If your app calls a weather API, a geocoding API, or a currency conversion API, those responses are often rate-limited and slow. A TTL cache for those responses — even 60 seconds — slashes latency and protects you from hitting rate limits under traffic spikes.
async function getWeather(city) {
const key = `weather:${city.toLowerCase()}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const data = await fetch(`https://api.weather.com/v1/current?city=${city}`);
const json = await data.json();
await redis.set(key, JSON.stringify(json), { EX: 300 }); // 5 min — weather doesn't change per second
return json;
}
Common mistake
Adding a cache "just in case" to every endpoint before profiling. Caches add complexity, memory usage, and a whole class of correctness bugs. Profile first, cache the proven bottlenecks, keep the rest simple.
🛠️ Your Mission
Find one expensive read in your app — a query that runs on every page load, or that scans a large table to compute a number. Add cache-aside caching to it with a sensible TTL, and wire up correct invalidation on the write path.
Concretely:
- Identify the slow read (look for queries without indexes, aggregations, or external API calls).
- Pick a Redis key name using the
entity:id:variantpattern. - Implement the cache-aside read: check cache → miss → query → store with TTL → return.
- Find the write(s) that change this data (UPDATE, DELETE, POST).
- Add
redis.del(key)after each successful write (invalidate on write pattern). - Test the cold path (delete the key, confirm it re-queries), the warm path (confirm it hits the cache), and the invalidation (update the data, confirm the next read returns fresh data).
If you don't have a real project, implement this for a products table: cache a product read for 60 seconds, and bust the cache when the product's price is updated.
✅ You're done when…
- Your cached read returns data in under 5ms on a warm cache hit — confirm it with
console.timeor your profiler (Production-Readiness: p95 latency in range). - Your cache key includes the entity type and ID, and private data keys include the user ID.
- A write to the underlying data triggers
redis.del()on the relevant key(s) before the response returns. - You can name at least one piece of data in your app that you deliberately chose NOT to cache and explain why.
- You've set a TTL on every cached entry — no entry lives forever.
➡️ Next: Search. Build It Right, Or Don't Build It At All. 🏛️