Skip to main content
Architecture & System Design
📐 ArchitectureLesson 11 of 13

Caching Layers & CDNs

Make big things fast with caching layers and CDNs — at every level of the stack.

Caching Layers & CDNs

Stage 3 · Architecture & System Design · B.U.I.L.D. letter: I

You shipped the app. Users are complaining it's slow. Your first instinct is to upgrade the server — but the real fix costs $0 and takes an afternoon. Almost every production system is slower than it needs to be because requests are traveling further than they have to. Caching is the art of meeting the request closer to where it originated. Master the layers and you make your app feel instant — at any scale.


⚠️ The vibe trap

You vibe-coded a working app and every asset — your logo, your bundled JS, your hero image — is fetched fresh from your origin server on every single page load. Works fine with ten users. With ten thousand, your server is melting and your users in São Paulo are waiting 800ms for a 200 KB image that never changes. The second trap is the mirror image: you shoved a CDN in front of everything and now user account pages are being cached and served to the wrong person. Both problems have the same root cause — no intentional caching strategy, just vibes all the way down.


🗺️ The Cache Hierarchy — Every Layer Absorbs Load From the Next

Think of caching as a series of tollbooths placed between the user and your database. Each booth tries to answer the request itself. Only when it can't does the request travel further.

REQUEST JOURNEY (outermost → innermost)
────────────────────────────────────────────────────────────
[Browser Cache]        → served from user's own disk
        ↓ miss
[CDN / Edge Node]      → served from a PoP 20ms away
        ↓ miss
[Reverse Proxy Cache]  → served from Nginx/Varnish at your origin
        ↓ miss
[App / In-Memory Cache]→ served from RAM inside your process
        ↓ miss
[Distributed Cache]    → served from Redis / Memcached
        ↓ miss
[DB Query Cache]       → served from database's own plan/result cache
        ↓ miss
[Storage / Disk]       ← actual data read, result travels back up
────────────────────────────────────────────────────────────
Each layer that answers a request is a layer that never touches the one below it.

Mental model: The cache hierarchy is a pyramid. The further out you cache, the cheaper and faster the answer — and the more users you can serve with a single cached copy. One CDN-cached image serves 100,000 users with zero origin hits. One in-process memoized value serves your single server's traffic.

Why it matters: Without understanding layers, engineers add Redis for everything and wonder why static files still hammer origin bandwidth. With the model, you add caching at the right layer for each type of content.

Common mistake: Treating "caching" as one thing. Redis is not the same decision as a CDN. An in-memory LRU cache inside your Node process is not Redis. Each layer has different trade-offs on staleness, cost, and who shares the cache.


🌐 Browser Cache & HTTP Headers — the Outermost Layer

The browser cache is the most powerful cache you're not using. It's free, it's fast (literally disk reads on the user's machine), and it's controlled entirely by HTTP headers your server sends.

# For static assets that never change (hashed filenames like main.abc123.js)
HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
Content-Type: application/javascript

# For an HTML page that should revalidate every visit
HTTP/1.1 200 OK
Cache-Control: no-cache
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d"

# For a user's dashboard — must NOT be cached by shared layers
HTTP/1.1 200 OK
Cache-Control: private, no-store

Key directives explained:

DirectiveMeaning
publicAny cache (CDN, proxy, browser) may store this
privateOnly the end-user's browser; no shared cache
max-age=NServe from cache for N seconds without revalidating
immutableDon't even bother revalidating — this URL will never change
no-cacheAlways revalidate with the server (uses ETag / Last-Modified)
no-storeNever cache, not even briefly. Use for sensitive data.

ETag flow: Browser sends If-None-Match: "33a64df5..." on the next request. Server checks — if the resource hasn't changed, it replies 304 Not Modified with no body. Zero bandwidth, near-zero latency.

Mental model: max-age is an expiry date. ETag is a fingerprint. Use both: long max-age for assets with hashed filenames (they never get stale because the filename changes on deploy), short max-age + ETag for pages that might change.

Why it matters: A correctly cached asset is never downloaded twice. For a 500 KB JavaScript bundle, that is 500 KB saved on every subsequent page load, per user, forever.

Common mistake: Setting Cache-Control: no-cache on static JS/CSS/images "to be safe." This forces a server round-trip on every page load for assets that never change. Use hashed filenames and immutable instead.


🔵 CDN & Edge Caching — Static Assets Belong at the Edge

A CDN (Content Delivery Network) is a globally distributed fleet of servers — called Points of Presence (PoPs) — that cache your content close to your users. When a user in Lagos requests your logo, they get it from a PoP in Lagos, not your server in us-east-1.

What to put at the edge:

  • All static assets (images, fonts, JS bundles, CSS)
  • Publicly cacheable API responses (e.g., a product catalog)
  • Rendered HTML for pages that are the same for all users

What must NOT be at the edge (shared CDN layer):

  • Authenticated API responses (user profile, orders, anything with a session)
  • Any response that differs per user
  • Anything containing tokens, PII, or payment data

CDN behavior in practice:

First request to CDN (cache MISS):
User → CDN PoP (cache miss) → Origin Server → CDN stores copy → User

Subsequent requests (cache HIT):
User → CDN PoP (cache hit) → User
         ↑ Origin never contacted

Edge caching headers: CDNs respect Cache-Control. Set public, max-age=3600, s-maxage=86400s-maxage overrides max-age for shared caches (like CDNs) while the browser still uses max-age. This lets you cache something for 1 hour in the browser but 24 hours at the CDN.

HTTP/1.1 200 OK
Cache-Control: public, max-age=3600, s-maxage=86400
Vary: Accept-Encoding

Mental model: Your CDN is a read-through cache sitting in front of your origin. It eats the majority of your global traffic. Your origin only sees cache misses — which, for static assets after the first request, should be almost nothing.

Why it matters: A user 12,000 km from your server experiences ~120ms of speed-of-light latency alone, before any processing. A CDN PoP 50 km from that user is ~0.5ms. This is not a micro-optimization — it is the difference between a snappy app and a frustrating one.

Common mistake: Only putting images on the CDN and still serving JS/CSS from origin. Your JavaScript bundle is often the largest, most-requested asset you have. It belongs at the edge most of all.


🔴 Reverse Proxy Cache — Nginx/Varnish at Your Origin

Between the CDN and your application sits a reverse proxy (Nginx, Varnish, Caddy). It can cache responses too — useful for content that's too dynamic or personalized to cache at the CDN but still shared across many users.

# Nginx micro-cache: cache any GET response for 1 second
proxy_cache_path /tmp/nginx-cache levels=1:2 keys_zone=micro:10m inactive=60s;

location /api/public/ {
    proxy_cache micro;
    proxy_cache_valid 200 1s;
    proxy_cache_use_stale updating;
    add_header X-Cache-Status $upstream_cache_status;
}

A 1-second micro-cache sounds trivial, but at 500 requests/second it means only one request per second reaches your app server — a 499x reduction in load during a traffic spike.

Mental model: The reverse proxy cache is your last line of defense before compute costs money. The CDN filters global traffic; the reverse proxy cache filters the traffic that actually reaches your datacenter.

Common mistake: Bypassing the reverse proxy entirely in development, shipping to production with no proxy cache config, and wondering why the app server falls over under load.


🟡 App & Distributed Cache — Redis and In-Process Memory

You built this in D2 (database caching, cache-aside pattern, Redis invalidation — go back to that lesson if you need the mechanics). The architectural point here is where Redis sits in the hierarchy and what it's for.

// In-process memoization — fastest possible, but per-instance only
const memo = new Map();

async function getProductCategory(id) {
  if (memo.has(id)) return memo.get(id);           // ~0ms, local RAM
  const result = await redis.get(`cat:${id}`);     // ~1ms, network
  if (result) { memo.set(id, JSON.parse(result)); return JSON.parse(result); }
  const row = await db.query('SELECT ...', [id]);  // ~5-50ms, DB round-trip
  await redis.setex(`cat:${id}`, 300, JSON.stringify(row));
  memo.set(id, row);
  return row;
}

In-process cache (Map/LRU): Fastest. Lives in your server's RAM. Invisible to other server instances. Great for small, hot, rarely-changing lookup tables (config, feature flags, category trees).

Redis / distributed cache: Slightly slower (network hop), but shared across all instances. The right layer for: computed aggregates, rendered page fragments, session data, rate-limit counters, anything that multiple server instances need to agree on.

DB query cache: Many databases (PostgreSQL, MySQL) cache query plans and, in some configurations, result sets internally. Don't rely on it — it's the last resort, not the strategy.

Cache stampede: When a cached key expires and hundreds of concurrent requests all miss simultaneously, they all hit the database at once, overwhelming it — then all write the same value back to cache. One mitigation: probabilistic early expiration (refresh the cache slightly before it expires) or a distributed lock that lets only one process regenerate the value while the rest wait.

Mental model: Redis is the shared L2 cache for your entire fleet. In-process memory is the L1 cache for a single instance. Use both for hot data: L1 absorbs repeated reads within a single request burst, L2 absorbs reads across instances.

Common mistake: Using Redis for everything including data that is user-specific and sensitive. Redis is a shared cache — if you store a user's private data under a guessable key, another request could read it. Use private in HTTP headers and never store session data in a shared cache under a predictable key without proper scoping.


🔒 What Must Never Be Cached at Shared Layers

This deserves its own section because the consequences are severe.

Never cache at CDN / reverse proxy / shared Redis:

✗  /api/user/profile             — personal data, different per user
✗  /api/orders/123               — transaction data
✗  /dashboard                    — page content shaped by auth state
✗  Any response with Set-Cookie  — session cookies must not be shared
✗  Anything containing a JWT or API key in the body
✗  Payment confirmation pages
✗  Anything behind an auth check that returns 200 for unauthenticated requests

The CDN poisoning bug: A developer caches /api/user at the CDN. User A logs in, CDN caches the response. User B requests /api/user — CDN serves User A's profile. This is a data breach, and it has happened to real companies.

The rule: If the response would be different for two different users, it must carry Cache-Control: private (for browser-only caching) or Cache-Control: no-store (no caching at all). Never public.


🛠️ Your Mission

Pick one real project — an app you've vibe-coded or are building now.

  1. Audit your static assets. Open DevTools → Network, hard reload, and look at the response headers for your JS, CSS, and images. Do any of them have Cache-Control? Most vibe-coded apps ship with none at all.

  2. Put your static assets behind a CDN. If you're on Vercel or Netlify, your /public folder is already CDN-distributed — confirm by checking the x-vercel-cache or cf-cache-status response header. If you're self-hosting, configure your CDN (Cloudflare is free) to cache *.js, *.css, *.webp, *.woff2.

  3. Set correct Cache-Control on one route. Choose one public API route that returns the same data for all users (a list of products, a public post, a config endpoint). Add Cache-Control: public, max-age=60, s-maxage=300 and confirm the CDN is caching it.

  4. Mark one private route explicitly. Find a route that returns user-specific data. Add Cache-Control: private, no-store and verify in DevTools that no CDN or proxy is caching it.


✅ You're done when…

  • You can describe all six cache layers from browser to database in the correct order without looking at notes (reference: the Production-Readiness Checklist)
  • Your static assets (JS/CSS/images) have Cache-Control: public, max-age=31536000, immutable and are served from an edge/CDN node, verified by a cf-cache-status: HIT or equivalent header in DevTools
  • Every authenticated or user-specific route in your app sends Cache-Control: private or no-store
  • You can explain cache stampede and state one mitigation strategy in plain English
  • You have reviewed your project against the rule: "if it differs per user, it must never be public"

➡️ Next: Idempotency, Retries & Designing for Failure.

Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.