Designing a System (the Whiteboard Skill)
Stage 3 · Architecture & System Design · B.U.I.L.D. letter: I
You can ship a front end in an afternoon with AI. Designing the system behind it — one that holds up under real traffic, real failures, and real teammates — is a different skill. It is a learnable, repeatable process, and this lesson hands it to you in seven steps.
⚠️ The vibe trap
The most expensive mistake in software is starting to build before you understand what you are building. When a prompt like "design a URL shortener" lands in your lap it is tempting to open a code editor and start wiring routes together — but without requirements you will build the wrong thing at the wrong scale with the wrong data model. The framework below is the antidote: spend the first twenty minutes asking questions and sketching, and you will write better code faster with far fewer dead ends.
🔎 Step 1 — Clarify Requirements & Scope
Before touching a keyboard, interrogate the prompt. Every system design has two layers of requirements.
Functional requirements — what the system must do:
- Users can submit a long URL and receive a short code (e.g.
hyv.cc/xK3m) - Anyone with the short code is redirected to the original URL
- (Optional) Link owners can see a click count
Non-functional requirements — how the system must behave:
- Read-heavy (redirects) vs. write-heavy? → overwhelmingly reads
- Availability vs. consistency? → availability wins; a stale redirect beats a 503
- Latency target? → < 50 ms p99 for redirects
- Durability? → links should never silently vanish
Scope decisions (write these down explicitly):
- Are analytics in scope? → no for v1
- Custom slugs? → no for v1
- Link expiry? → no for v1
Scoping out features is as important as scoping them in. An interviewer — or a stakeholder — respects a candidate who defines boundaries early.
📐 Step 2 — Estimate Scale (Back-of-Envelope)
Rough numbers unlock the entire rest of the design. You do not need precision; you need an order of magnitude.
URL Shortener — back-of-envelope estimate
==========================================
Daily active users: 10 million
New links created/day: 10 million / 100 = 100 K writes/day
Redirects/day (100:1 r/w): 10 million reads/day
Writes/sec (peak 2×avg): 100 K / 86 400 ≈ 1.2 → ~3 writes/sec peak
Reads/sec (peak 2×avg): 10 M / 86 400 ≈ 116 → ~250 reads/sec peak
Storage per link row: ~500 bytes (URL + code + metadata)
Storage/year: 100 K × 365 × 500 B ≈ 18 GB/year (tiny)
Bandwidth (redirect response ≈ 0.5 KB):
250 reads/sec × 0.5 KB = 125 KB/sec outbound (trivial)
Conclusion: read-heavy, low write volume, storage is not the constraint.
Cache the hot links; a single DB can handle writes for years.
This estimate tells you three things that directly shape your design:
- Cache is the highest-value investment (250 reads/sec, same few popular links hit repeatedly).
- Write throughput is low — no need for write sharding on day one.
- Storage is tiny — no special object store needed for link data.
Mental model: Every number you put on the whiteboard is a lever. When you ask "what if traffic 10×?" you can re-run the math in seconds and say "we'd need to distribute the cache" — that is the answer the interviewer wants.
Common mistake: Skipping this step and jumping to "we'll use microservices" with no idea whether that is warranted. Estimates justify (or reject) every architectural choice that follows.
🔌 Step 3 — Define the API
The API is the contract between your system and the outside world. Sketch it before you design internals.
URL Shortener — API sketch
==========================
POST /links
Body: { "long_url": "https://example.com/very/long/path" }
Returns: { "short_code": "xK3m", "short_url": "https://hyv.cc/xK3m" }
Status: 201 Created
GET /{short_code}
Returns: 301 Redirect → Location: <long_url>
Status: 301 (cached by browsers) or 302 (not cached, better for analytics)
(Out of scope for v1)
GET /links/{short_code}/stats
Returns: { "clicks": 42 }
Two endpoints. That is it. Defining the API first forces you to think about what data flows in and out, which in turn shapes your data model.
Common mistake: Designing the database first and then bolting on an API that reflects the DB schema rather than the caller's needs. API-first keeps you honest about what actually matters to users.
🗄️ Step 4 — Data Model
With the API defined, the data model almost writes itself.
URL Shortener — data model
==========================
Table: links
┌───────────────┬──────────────────┬──────────────────────────────┐
│ Column │ Type │ Notes │
├───────────────┼──────────────────┼──────────────────────────────┤
│ short_code │ VARCHAR(8) PK │ e.g. "xK3m" — base62 chars │
│ long_url │ TEXT NN │ original URL, up to ~2 KB │
│ created_at │ TIMESTAMPTZ NN │ default now() │
│ click_count │ BIGINT NN │ default 0 (v1 out of scope, │
│ │ │ but cheap to add now) │
└───────────────┴──────────────────┴──────────────────────────────┘
Index: links_short_code_idx ON links(short_code) — PRIMARY KEY covers this
No FK constraints needed; table is self-contained.
short_code generation strategy:
Option A: Hash long_url (MD5), take first 6 chars base62 → collision risk
Option B: Random 6–8 base62 chars, check uniqueness before insert ← preferred
Option C: Auto-increment ID, encode to base62 ← simplest, no collision risk
Pick C for v1: INSERT → get row ID → encode ID → UPDATE short_code.
The data model is small because the problem is small. Resist the urge to add tables for features you scoped out. You can always migrate later; you cannot un-complicate an over-engineered schema easily.
🏗️ Step 5 — High-Level Architecture
Now draw the boxes. Every component should earn its place — if you cannot explain why a box exists, remove it.
URL Shortener — high-level architecture
========================================
Browser / Client
│
│ POST /links (create)
│ GET /{code} (redirect)
▼
┌─────────────────────┐
│ API Gateway / │ ← TLS termination, rate limiting, auth headers
│ Load Balancer │
└──────────┬──────────┘
│
┌──────▼──────┐
│ App Server │ ← stateless; 2–4 replicas behind LB
│ (Node / │
│ Go / etc) │
└──┬──────┬───┘
│ │
│ │ read path (lookup short_code)
│ ▼
│ ┌──────────────┐
│ │ Redis Cache │ ← short_code → long_url; TTL 24 h
│ │ (hot links) │ ~95% of reads never hit the DB
│ └──────────────┘
│
│ write path (create link) + cache miss reads
▼
┌─────────────────────┐
│ Postgres (primary) │ ← single node handles our write volume easily
│ │ read replica optional for analytics later
└─────────────────────┘
Read path (redirect):
1. App receives GET /xK3m
2. Check Redis for key "xK3m"
3a. Cache HIT → 301 to long_url (< 5 ms)
3b. Cache MISS → query Postgres, populate cache, 301 to long_url
Write path (shorten):
1. App receives POST /links
2. Generate base62 short_code (encode auto-increment ID)
3. INSERT into Postgres → get ID → encode → 201 response
4. Optionally pre-warm cache
No queue needed for v1 — write volume is ~3/sec.
Mental model: Draw the client on the left, the database on the right, and fill in the middle. Each hop is a place where latency, failure, or cost can hide. Name every hop.
Common mistake: Drawing microservices for a system that has two endpoints and 3 writes/second. Microservices are a solution to organisational and scaling problems that this system does not yet have. Ship a single well-structured app first (see Lesson 7: Monolith vs. Services).
📈 Step 6 — Drill Into Bottlenecks & Scale Them
Your back-of-envelope told you this system is read-heavy. Walk through each component and ask: "what breaks first if traffic 10× or 100×?"
Cache — already the answer to read load. At 2 500 reads/sec (10×) Redis still handles it trivially; Redis can sustain hundreds of thousands of ops/sec on a single node.
Database writes — at 30 writes/sec (10×) a single Postgres instance is comfortable. At 300 writes/sec you would look at connection pooling (PgBouncer) before considering sharding.
Short-code generation — encoding the auto-increment PK is single-threaded through the primary DB. If write volume explodes, switch to a distributed ID generator (Twitter Snowflake pattern) and you remove the DB round-trip from the generation path.
App servers — stateless, so horizontal scaling is trivial. Add replicas behind the load balancer.
Cache invalidation — if a link is deleted or updated, you must evict the Redis key. Set a sensible TTL (24 h) so stale entries self-heal even if an explicit eviction fails.
Trade-offs to discuss explicitly:
- 301 (permanent redirect) vs. 302 (temporary): 301 is cached by browsers → faster for repeat visitors, but you lose analytics visibility. 302 lets your server see every click.
- Availability vs. consistency: if Redis is down, fall back to Postgres (availability). If Postgres is down during a write, return 503 and tell the user to retry (no silent data loss).
- Short-code length: 6 base62 chars = 56 billion combinations. 7 chars = 3.5 trillion. Choose based on how many links you expect over the service lifetime.
🛠️ Your mission
Open the System Design Template resource (linked in your Institute dashboard). Work through it end-to-end for a system of your choice — a pastebin, a ride-share dispatch service, a notification delivery system, or anything else you find interesting. Fill in every section:
- Functional & non-functional requirements, with explicit scope decisions
- Back-of-envelope estimate with at least three derived numbers
- API sketch (at minimum two endpoints)
- Data model table with column types and a note on the primary access pattern
- Architecture diagram in plain text showing at least: client, API layer, app server, cache, database
- Bottleneck analysis — what breaks first and how you would scale it
- At least two named trade-offs with your reasoning
If you already have a side project in mind, use that. Real motivation makes the exercise ten times more valuable.
✅ You're done when…
- You opened and used the System Design Template to structure your design
- You wrote a back-of-envelope estimate and used the numbers to justify at least one architectural decision
- Your architecture diagram shows the full request path from client to database and back
- You named at least two trade-offs and explained which side you chose and why
➡️ Next: Scaling Fundamentals. Build It Right, Or Don't Build It At All. 🏛️