Skip to main content
Backend & APIs
⚙️ Backend & APIsLesson 12 of 13

Rate Limiting, Idempotency & Resilience

Rate limiting, idempotency keys, and retries — protect your API from abuse and chaos.

Rate Limiting, Idempotency & Resilience

Stage 3 · Backend & APIs · B.U.I.L.D. letter: L

Every API you ship will be hit by a bot, retried by a mobile app on flaky Wi-Fi, and called by a billing service that cannot afford to charge a user twice — design for that reality before it finds you.


⚠️ The vibe trap

You built a login endpoint with no rate limit. A script tries 100,000 passwords in four minutes — your DB melts, your cloud bill spikes, and a user's account gets cracked. You built a "place order" endpoint with no idempotency key. The mobile app retries on a timeout and the customer is charged twice. Both incidents were completely preventable with patterns you'll finish in this lesson. Rate limiting and idempotency aren't polish — they're load-bearing walls.


🪣 Rate Limiting — Token Bucket & Fixed Window

A rate limit tells a caller: "you're allowed N requests per time window; beyond that you wait."

Two algorithms dominate production code:

AlgorithmHow it worksBest for
Fixed windowCount requests in each 60-second slot; reset the counter at the boundarySimple; easy to reason about
Token bucketA bucket refills at a steady rate; each request costs one token; burst up to bucket capacitySmoother; allows short bursts

For most Node/Express APIs, a fixed-window in-memory or Redis counter is the right first move.

// middleware/rateLimiter.js
// Fixed-window rate limiter — no external deps, good for a single-process server.
// For multi-process / multi-instance, swap the Map for a Redis INCR + EXPIRE.

const windows = new Map(); // key → { count, resetAt }

/**
 * createRateLimiter(options)
 *   limit      — max requests per window  (default 100)
 *   windowMs   — window length in ms       (default 60_000)
 *   keyFn      — (req) => string key        (default: IP address)
 */
export function createRateLimiter({
  limit = 100,
  windowMs = 60_000,
  keyFn = (req) => req.ip,
} = {}) {
  return function rateLimitMiddleware(req, res, next) {
    const key = keyFn(req);
    const now = Date.now();

    let entry = windows.get(key);

    // Start a fresh window if none exists or the old one expired.
    if (!entry || now >= entry.resetAt) {
      entry = { count: 0, resetAt: now + windowMs };
      windows.set(key, entry);
    }

    entry.count += 1;

    // Tell the client about its quota no matter what.
    res.set('X-RateLimit-Limit', limit);
    res.set('X-RateLimit-Remaining', Math.max(0, limit - entry.count));
    res.set('X-RateLimit-Reset', Math.ceil(entry.resetAt / 1000)); // Unix seconds

    if (entry.count > limit) {
      const retryAfter = Math.ceil((entry.resetAt - now) / 1000);
      res.set('Retry-After', retryAfter);
      return res.status(429).json({
        error: 'Too Many Requests',
        retryAfter,
      });
    }

    next();
  };
}
// routes/auth.js  — protect login with a strict per-IP limiter
import { createRateLimiter } from '../middleware/rateLimiter.js';

const loginLimiter = createRateLimiter({
  limit: 10,          // 10 attempts …
  windowMs: 15 * 60_000, // … per 15 minutes per IP
});

router.post('/login', loginLimiter, async (req, res) => {
  // your existing login handler
});

Mental model: your server is a bouncer at a velvet rope. Regular users never notice it. Bots and runaway clients hit the rope and get a polite "come back in 30 seconds."

Why 429 + Retry-After? The HTTP spec reserves 429 Too Many Requests for this exact case. Including Retry-After lets well-behaved clients back off automatically instead of hammering you harder trying to get through.

Common mistake: rate-limiting only by IP. A determined attacker rotates IPs. Layer your limits: IP for anonymous endpoints, user ID for authenticated ones, API key for third-party integrations.


🔑 Idempotency Keys — Retried Requests That Can't Double-Fire

An idempotent operation can be applied multiple times and always produces the same result. GET, PUT, and DELETE are naturally idempotent. POST is not — calling POST /payments twice creates two charges.

The fix: require the client to send a unique Idempotency-Key header. The server remembers the first response for that key and replays it on any retry — no second charge, no duplicate order, no double email.

// middleware/idempotency.js
// Stores responses in memory. In production, use Redis with a 24-hour TTL.

const store = new Map(); // idempotencyKey → { status, body, createdAt }
const TTL_MS = 24 * 60 * 60_000; // 24 hours

export function idempotencyMiddleware(req, res, next) {
  // Only apply to state-mutating methods.
  if (!['POST', 'PATCH', 'PUT'].includes(req.method)) return next();

  const key = req.headers['idempotency-key'];
  if (!key) return next(); // key is optional unless your route enforces it

  const cached = store.get(key);

  if (cached) {
    // Replay the original response — do NOT re-run the handler.
    res.set('Idempotent-Replayed', 'true');
    return res.status(cached.status).json(cached.body);
  }

  // Intercept res.json so we can cache the response before sending it.
  const originalJson = res.json.bind(res);
  res.json = (body) => {
    store.set(key, {
      status: res.statusCode,
      body,
      createdAt: Date.now(),
    });
    // Evict after TTL (lazy cleanup — in production Redis handles this).
    setTimeout(() => store.delete(key), TTL_MS);
    return originalJson(body);
  };

  next();
}
// routes/payments.js — enforcing the key on a sensitive POST
router.post('/payments', idempotencyMiddleware, async (req, res) => {
  const key = req.headers['idempotency-key'];
  if (!key) {
    return res.status(400).json({
      error: 'Idempotency-Key header is required for payment requests.',
    });
  }
  // … charge the card, create DB record, etc.
  res.status(201).json({ id: newPayment.id, status: 'succeeded' });
});

Mental model: an idempotency key is like a transaction receipt number. If you hand the cashier the same receipt twice, they look it up and say "already processed" rather than swiping your card again.

Common mistake: generating the idempotency key server-side. The client generates it (a UUID v4 works perfectly) so that the same retry carries the same key. If the server generates it, a retry arrives with a fresh key and the whole mechanism breaks.


⏱️ Timeouts & Retries with Exponential Backoff + Jitter

Every outbound HTTP call your server makes — to a payment gateway, a third-party API, another microservice — must have a timeout. Without one, a slow downstream service will tie up your thread forever, causing your own server to back up, queue requests, and eventually fall over.

// lib/fetchWithRetry.js
// fetch() with timeout, exponential backoff, and jitter.

/**
 * fetchWithRetry(url, options)
 *   timeoutMs   — abort after N ms            (default 5 000)
 *   retries     — max retry attempts           (default 3)
 *   baseDelayMs — starting backoff delay in ms (default 200)
 */
export async function fetchWithRetry(url, {
  timeoutMs = 5_000,
  retries = 3,
  baseDelayMs = 200,
  ...fetchOptions
} = {}) {
  for (let attempt = 0; attempt <= retries; attempt++) {
    const controller = new AbortController();
    const timer = setTimeout(() => controller.abort(), timeoutMs);

    try {
      const response = await fetch(url, {
        ...fetchOptions,
        signal: controller.signal,
      });
      clearTimeout(timer);

      // Retry on 429 (respect Retry-After) or 5xx server errors.
      if (response.status === 429 || response.status >= 500) {
        if (attempt === retries) return response; // out of retries — return it anyway
        const retryAfter = response.headers.get('Retry-After');
        const delay = retryAfter
          ? parseFloat(retryAfter) * 1000
          : backoffDelay(attempt, baseDelayMs);
        await sleep(delay);
        continue;
      }

      return response; // success or non-retryable error (4xx)
    } catch (err) {
      clearTimeout(timer);
      if (attempt === retries) throw err;
      await sleep(backoffDelay(attempt, baseDelayMs));
    }
  }
}

// Exponential backoff with full jitter:
// delay = random value in [0, baseDelay * 2^attempt]
function backoffDelay(attempt, baseDelayMs) {
  const cap = baseDelayMs * Math.pow(2, attempt);
  return Math.random() * cap;
}

function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

Why jitter? If ten servers all back off and retry at the exact same moment (no jitter), they create a synchronized thundering herd that overwhelms the downstream service all over again. Adding randomness spreads the load.

Why exponential? Each failure suggests the downstream service is struggling. Doubling the wait each time gives it breathing room to recover instead of being hammered with a steady stream of retries.

Common mistake: retrying on every error. Only retry transient errors — timeouts, network failures, 429, and 5xx. Never retry on 400 or 401; those errors won't resolve by waiting.


🔌 Circuit Breakers & Graceful Degradation

A circuit breaker is a switch that "trips open" after too many consecutive failures and stops sending requests to a broken service entirely for a cooldown period. It prevents your healthy code from being dragged down by a downstream outage.

// lib/circuitBreaker.js
// Minimal three-state circuit breaker: CLOSED → OPEN → HALF-OPEN → CLOSED

const STATE = { CLOSED: 'CLOSED', OPEN: 'OPEN', HALF_OPEN: 'HALF_OPEN' };

export class CircuitBreaker {
  constructor({
    failureThreshold = 5,   // trips open after this many consecutive failures
    cooldownMs = 30_000,    // stays open for this long before trying again
  } = {}) {
    this.state = STATE.CLOSED;
    this.failures = 0;
    this.failureThreshold = failureThreshold;
    this.cooldownMs = cooldownMs;
    this.openedAt = null;
  }

  async call(fn) {
    if (this.state === STATE.OPEN) {
      const elapsed = Date.now() - this.openedAt;
      if (elapsed < this.cooldownMs) {
        throw new Error('Circuit is OPEN — downstream service unavailable');
      }
      this.state = STATE.HALF_OPEN; // allow one probe request through
    }

    try {
      const result = await fn();
      this._onSuccess();
      return result;
    } catch (err) {
      this._onFailure();
      throw err;
    }
  }

  _onSuccess() {
    this.failures = 0;
    this.state = STATE.CLOSED;
  }

  _onFailure() {
    this.failures += 1;
    if (this.failures >= this.failureThreshold) {
      this.state = STATE.OPEN;
      this.openedAt = Date.now();
    }
  }
}
// Usage — wrap a fragile third-party call
import { CircuitBreaker } from '../lib/circuitBreaker.js';
import { fetchWithRetry } from '../lib/fetchWithRetry.js';

const breaker = new CircuitBreaker({ failureThreshold: 5, cooldownMs: 30_000 });

async function getShippingQuote(orderId) {
  try {
    return await breaker.call(() =>
      fetchWithRetry(`https://shipping.partner.io/quote/${orderId}`)
        .then((r) => r.json())
    );
  } catch {
    // Graceful degradation: return a safe fallback instead of crashing.
    return { provider: 'standard', estimatedDays: 5, quote: null, fallback: true };
  }
}

Mental model: a circuit breaker is the fuse box in your house. When a circuit overloads, the fuse trips to protect everything else. After a cooldown you reset it — if the fault is gone, everything comes back online; if not, it trips again immediately.

Graceful degradation is what you do when the circuit is open: return a cached result, a sensible default, or a "service temporarily unavailable" message rather than a 500 that crashes the whole request. The shipping-quote example above returns estimated delivery with a fallback: true flag — the user still gets a response, and the caller knows to treat the quote as approximate.

Common mistake: swallowing the error silently without returning a fallback. The caller then receives undefined and crashes in a different, harder-to-debug place. Always pair a caught circuit-breaker error with an explicit fallback return value.


🛠️ Your Mission

Take an existing API project (your Stage 3 capstone app, or any Node/Express server you have locally):

  1. Add a rate limiter to your login or signup endpoint using the createRateLimiter middleware above. Set a tight limit (10 requests / 15 minutes) and verify in your HTTP client (Insomnia, curl, etc.) that the 11th request returns 429 with a Retry-After header.

  2. Add idempotency to any POST endpoint that creates a record (payment, order, booking). Require the Idempotency-Key header, cache the first response, and confirm that sending the exact same request twice returns the cached response with Idempotent-Replayed: true and does not create a duplicate record in your database.

  3. Wrap one outbound fetch (to Stripe, a weather API, or any third-party URL) with fetchWithRetry. Simulate a timeout by pointing the URL at a slow endpoint and confirm the function retries and eventually throws cleanly rather than hanging forever.


✅ You're Done When…

  • A curl loop to your login endpoint trips the 429 within 15 requests, and the Retry-After header matches the time remaining in the window (Production-Readiness checklist: brute-force protection on auth endpoints).
  • Sending the same POST /payments (or equivalent) twice with the same Idempotency-Key creates exactly one database record — not two — and the second response carries Idempotent-Replayed: true (Security checklist: no duplicate charges / double-creates on retry).
  • Any outbound fetch in your codebase has a timeout and a retry limit; removing the timeout and pointing the URL at httpbin.org/delay/30 no longer hangs your server indefinitely (Production-Readiness checklist: all outbound I/O bounded by a timeout).
  • At least one service call is wrapped in a circuit breaker with a graceful fallback — a downstream outage returns a degraded-but-valid response rather than a 500 (Production-Readiness checklist: partial failure does not cascade).

➡️ Next: the Capstone — Build a Production-Grade API. Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.