Skip to main content
Architecture & System Design
📐 ArchitectureLesson 8 of 13

Designing a System (the Whiteboard Skill)

The whiteboard skill: take 'design Twitter' from blank page to real architecture.

Designing a System (the Whiteboard Skill)

Stage 3 · Architecture & System Design · B.U.I.L.D. letter: I

You can ship a front end in an afternoon with AI. Designing the system behind it — one that holds up under real traffic, real failures, and real teammates — is a different skill. It is a learnable, repeatable process, and this lesson hands it to you in seven steps.


⚠️ The vibe trap

The most expensive mistake in software is starting to build before you understand what you are building. When a prompt like "design a URL shortener" lands in your lap it is tempting to open a code editor and start wiring routes together — but without requirements you will build the wrong thing at the wrong scale with the wrong data model. The framework below is the antidote: spend the first twenty minutes asking questions and sketching, and you will write better code faster with far fewer dead ends.


🔎 Step 1 — Clarify Requirements & Scope

Before touching a keyboard, interrogate the prompt. Every system design has two layers of requirements.

Functional requirements — what the system must do:

  • Users can submit a long URL and receive a short code (e.g. hyv.cc/xK3m)
  • Anyone with the short code is redirected to the original URL
  • (Optional) Link owners can see a click count

Non-functional requirements — how the system must behave:

  • Read-heavy (redirects) vs. write-heavy? → overwhelmingly reads
  • Availability vs. consistency? → availability wins; a stale redirect beats a 503
  • Latency target? → < 50 ms p99 for redirects
  • Durability? → links should never silently vanish

Scope decisions (write these down explicitly):

  • Are analytics in scope? → no for v1
  • Custom slugs? → no for v1
  • Link expiry? → no for v1

Scoping out features is as important as scoping them in. An interviewer — or a stakeholder — respects a candidate who defines boundaries early.


📐 Step 2 — Estimate Scale (Back-of-Envelope)

Rough numbers unlock the entire rest of the design. You do not need precision; you need an order of magnitude.

URL Shortener — back-of-envelope estimate
==========================================
Daily active users:          10 million
New links created/day:       10 million / 100  =  100 K writes/day
Redirects/day (100:1 r/w):   10 million reads/day

Writes/sec (peak 2×avg):     100 K / 86 400 ≈ 1.2  → ~3 writes/sec peak
Reads/sec  (peak 2×avg):     10 M  / 86 400 ≈ 116  → ~250 reads/sec peak

Storage per link row:        ~500 bytes (URL + code + metadata)
Storage/year:                100 K × 365 × 500 B ≈ 18 GB/year  (tiny)

Bandwidth (redirect response ≈ 0.5 KB):
  250 reads/sec × 0.5 KB = 125 KB/sec outbound  (trivial)

Conclusion: read-heavy, low write volume, storage is not the constraint.
Cache the hot links; a single DB can handle writes for years.

This estimate tells you three things that directly shape your design:

  1. Cache is the highest-value investment (250 reads/sec, same few popular links hit repeatedly).
  2. Write throughput is low — no need for write sharding on day one.
  3. Storage is tiny — no special object store needed for link data.

Mental model: Every number you put on the whiteboard is a lever. When you ask "what if traffic 10×?" you can re-run the math in seconds and say "we'd need to distribute the cache" — that is the answer the interviewer wants.

Common mistake: Skipping this step and jumping to "we'll use microservices" with no idea whether that is warranted. Estimates justify (or reject) every architectural choice that follows.


🔌 Step 3 — Define the API

The API is the contract between your system and the outside world. Sketch it before you design internals.

URL Shortener — API sketch
==========================

POST /links
  Body:    { "long_url": "https://example.com/very/long/path" }
  Returns: { "short_code": "xK3m", "short_url": "https://hyv.cc/xK3m" }
  Status:  201 Created

GET /{short_code}
  Returns: 301 Redirect → Location: <long_url>
  Status:  301 (cached by browsers) or 302 (not cached, better for analytics)

(Out of scope for v1)
GET /links/{short_code}/stats
  Returns: { "clicks": 42 }

Two endpoints. That is it. Defining the API first forces you to think about what data flows in and out, which in turn shapes your data model.

Common mistake: Designing the database first and then bolting on an API that reflects the DB schema rather than the caller's needs. API-first keeps you honest about what actually matters to users.


🗄️ Step 4 — Data Model

With the API defined, the data model almost writes itself.

URL Shortener — data model
==========================

Table: links
┌───────────────┬──────────────────┬──────────────────────────────┐
│ Column        │ Type             │ Notes                        │
├───────────────┼──────────────────┼──────────────────────────────┤
│ short_code    │ VARCHAR(8)  PK   │ e.g. "xK3m"  — base62 chars │
│ long_url      │ TEXT        NN   │ original URL, up to ~2 KB    │
│ created_at    │ TIMESTAMPTZ NN   │ default now()                │
│ click_count   │ BIGINT      NN   │ default 0 (v1 out of scope,  │
│               │                  │ but cheap to add now)        │
└───────────────┴──────────────────┴──────────────────────────────┘

Index: links_short_code_idx  ON links(short_code)  — PRIMARY KEY covers this
No FK constraints needed; table is self-contained.

short_code generation strategy:
  Option A: Hash long_url (MD5), take first 6 chars base62 → collision risk
  Option B: Random 6–8 base62 chars, check uniqueness before insert ← preferred
  Option C: Auto-increment ID, encode to base62 ← simplest, no collision risk
  Pick C for v1: INSERT → get row ID → encode ID → UPDATE short_code.

The data model is small because the problem is small. Resist the urge to add tables for features you scoped out. You can always migrate later; you cannot un-complicate an over-engineered schema easily.


🏗️ Step 5 — High-Level Architecture

Now draw the boxes. Every component should earn its place — if you cannot explain why a box exists, remove it.

URL Shortener — high-level architecture
========================================

Browser / Client
      │
      │  POST /links  (create)
      │  GET /{code}  (redirect)
      ▼
┌─────────────────────┐
│   API Gateway /     │  ← TLS termination, rate limiting, auth headers
│   Load Balancer     │
└──────────┬──────────┘
           │
    ┌──────▼──────┐
    │  App Server │  ← stateless; 2–4 replicas behind LB
    │  (Node /    │
    │   Go / etc) │
    └──┬──────┬───┘
       │      │
       │      │  read path (lookup short_code)
       │      ▼
       │  ┌──────────────┐
       │  │  Redis Cache │  ← short_code → long_url; TTL 24 h
       │  │  (hot links) │    ~95% of reads never hit the DB
       │  └──────────────┘
       │
       │  write path (create link) + cache miss reads
       ▼
┌─────────────────────┐
│  Postgres (primary) │  ← single node handles our write volume easily
│                     │    read replica optional for analytics later
└─────────────────────┘

Read path (redirect):
  1. App receives GET /xK3m
  2. Check Redis for key "xK3m"
  3a. Cache HIT  → 301 to long_url  (< 5 ms)
  3b. Cache MISS → query Postgres, populate cache, 301 to long_url

Write path (shorten):
  1. App receives POST /links
  2. Generate base62 short_code (encode auto-increment ID)
  3. INSERT into Postgres → get ID → encode → 201 response
  4. Optionally pre-warm cache

No queue needed for v1 — write volume is ~3/sec.

Mental model: Draw the client on the left, the database on the right, and fill in the middle. Each hop is a place where latency, failure, or cost can hide. Name every hop.

Common mistake: Drawing microservices for a system that has two endpoints and 3 writes/second. Microservices are a solution to organisational and scaling problems that this system does not yet have. Ship a single well-structured app first (see Lesson 7: Monolith vs. Services).


📈 Step 6 — Drill Into Bottlenecks & Scale Them

Your back-of-envelope told you this system is read-heavy. Walk through each component and ask: "what breaks first if traffic 10× or 100×?"

Cache — already the answer to read load. At 2 500 reads/sec (10×) Redis still handles it trivially; Redis can sustain hundreds of thousands of ops/sec on a single node.

Database writes — at 30 writes/sec (10×) a single Postgres instance is comfortable. At 300 writes/sec you would look at connection pooling (PgBouncer) before considering sharding.

Short-code generation — encoding the auto-increment PK is single-threaded through the primary DB. If write volume explodes, switch to a distributed ID generator (Twitter Snowflake pattern) and you remove the DB round-trip from the generation path.

App servers — stateless, so horizontal scaling is trivial. Add replicas behind the load balancer.

Cache invalidation — if a link is deleted or updated, you must evict the Redis key. Set a sensible TTL (24 h) so stale entries self-heal even if an explicit eviction fails.

Trade-offs to discuss explicitly:

  • 301 (permanent redirect) vs. 302 (temporary): 301 is cached by browsers → faster for repeat visitors, but you lose analytics visibility. 302 lets your server see every click.
  • Availability vs. consistency: if Redis is down, fall back to Postgres (availability). If Postgres is down during a write, return 503 and tell the user to retry (no silent data loss).
  • Short-code length: 6 base62 chars = 56 billion combinations. 7 chars = 3.5 trillion. Choose based on how many links you expect over the service lifetime.

🛠️ Your mission

Open the System Design Template resource (linked in your Institute dashboard). Work through it end-to-end for a system of your choice — a pastebin, a ride-share dispatch service, a notification delivery system, or anything else you find interesting. Fill in every section:

  1. Functional & non-functional requirements, with explicit scope decisions
  2. Back-of-envelope estimate with at least three derived numbers
  3. API sketch (at minimum two endpoints)
  4. Data model table with column types and a note on the primary access pattern
  5. Architecture diagram in plain text showing at least: client, API layer, app server, cache, database
  6. Bottleneck analysis — what breaks first and how you would scale it
  7. At least two named trade-offs with your reasoning

If you already have a side project in mind, use that. Real motivation makes the exercise ten times more valuable.


✅ You're done when…

  • You opened and used the System Design Template to structure your design
  • You wrote a back-of-envelope estimate and used the numbers to justify at least one architectural decision
  • Your architecture diagram shows the full request path from client to database and back
  • You named at least two trade-offs and explained which side you chose and why

➡️ Next: Scaling Fundamentals. Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.