Queues & Event-Driven Architecture
Stage 3 · Architecture & System Design · B.U.I.L.D. letter: D
A user clicks "Place Order." Your server charges their card, saves the order, sends a confirmation email, notifies the warehouse, updates inventory, and logs the event — all in a single function, synchronously, before it replies. One day the email provider goes down for four seconds. Every order during those four seconds fails. You just made email a load-bearing wall in your checkout flow.
⚠️ The vibe trap
When you're moving fast, chaining async/await calls feels natural: do step 1, then step 2, then step 3, then respond. The trap is that the user's request is now hostage to every single step. One slow downstream service (email, analytics, inventory sync) adds its full latency to the response time. One failure anywhere in the chain fails the entire request. You've accidentally made a dozen optional side-effects into hard blockers on your critical path — and your users feel every millisecond.
🔄 Synchronous vs. asynchronous — what's actually different
Synchronous means the caller waits for the result before moving on. The request-response cycle is frozen until every step finishes.
Asynchronous means the caller fires an action and moves on immediately. The work happens somewhere else, on its own schedule.
SYNCHRONOUS — everything blocks the HTTP response
─────────────────────────────────────────────────
Browser ──POST /checkout──► Server
│
├─ charge card ← MUST wait (needs result)
├─ save order ← MUST wait (needs order ID)
├─ send email ◄─── optional side-effect, yet it blocks
├─ notify warehouse ← optional, yet it blocks
└─ log analytics ← optional, yet it blocks
│
Browser ◄──────── 200 OK ───┘ (only after all 5 steps complete)
ASYNCHRONOUS — critical path stays thin
────────────────────────────────────────
Browser ──POST /checkout──► Server
│
├─ charge card ← MUST wait
├─ save order ← MUST wait
└─ emit "OrderPlaced" event ──► queue
│
Browser ◄──── 200 OK ───────┘ (fast — side-effects run independently)
queue ──► email worker
queue ──► warehouse worker
queue ──► analytics worker
Mental model: Your HTTP handler should only do the minimum work required to give the user an honest answer. Everything else — emails, webhooks, notifications, syncs — is a reaction to something that already happened, and it can happen asynchronously.
Why it matters: Response time drops from 800 ms to 80 ms. A warehouse outage no longer fails checkouts. You can scale each worker independently. You can replay failed jobs.
Common mistake: Wrapping the entire chain in a try/catch and thinking that makes it safe. It doesn't — it just means you return a 500 to the user instead of a 200, which is worse.
📬 Message queues — the buffer that decouples everything
A message queue is a durable, ordered list of work items (messages) that sit between producers and consumers. The producer pushes a message and forgets. One or more consumers pull messages and process them at their own pace.
PRODUCER QUEUE (durable storage) CONSUMERS
───────── ─────────────────────── ─────────────────────
OrderService ───► [msg1][msg2][msg3][msg4] ───► EmailWorker (1 consumer)
[msg5][msg6][msg7] ───► WarehouseWorker (1 consumer)
───► AnalyticsWorker (1 consumer)
Key properties:
• Buffer: queue absorbs spikes — producer doesn't wait for consumers
• Durability: messages survive consumer crashes (stored on disk/DB)
• Backpressure: if consumers fall behind, queue grows, not the producer
• Independent scale: add more WarehouseWorkers during peak without touching OrderService
Mental model: A queue is like a restaurant's ticket rail. The kitchen (consumer) works through tickets at its own pace. The server (producer) doesn't stand at the pass waiting — they take the next table. If a big party orders, the rail backs up briefly; nobody's blocked.
Why it matters: Without a queue, your producer and all consumers must be healthy and fast at exactly the same time. With a queue, they only need to agree on the message format — not their uptime schedule or throughput.
Common mistake: Using the database as a queue (a table with a processed boolean column that a cron job polls). This works at low volume and becomes a bottleneck and a source of lock contention at scale. It is also the pattern that predates proper queues in most vibe-coded apps — if you have one, it's a signal to graduate to a real queue.
📡 Publish/Subscribe — many listeners, one event
In pub/sub, a producer publishes an event to a topic or channel. Any number of subscribers have registered interest in that topic, and each one receives its own copy of the event independently. The publisher doesn't know who's listening — or how many.
// ── event-bus.js ─────────────────────────────────────────────────────────────
// Simple in-process pub/sub (works for a monolith; swap for Redis Pub/Sub,
// AWS SNS, or a message broker for distributed systems)
const listeners = {};
export const eventBus = {
publish(event, payload) {
(listeners[event] || []).forEach(fn => fn(payload));
},
subscribe(event, fn) {
if (!listeners[event]) listeners[event] = [];
listeners[event].push(fn);
},
};
// ── order-service.js ──────────────────────────────────────────────────────────
import { eventBus } from './event-bus.js';
export async function placeOrder(cart) {
const order = await db.orders.create(cart); // critical path
await payments.charge(cart.userId, cart.total); // critical path
// Fire the event — OrderService does NOT know who reacts or how many
eventBus.publish('OrderPlaced', { orderId: order.id, userId: cart.userId });
return order; // respond immediately; side-effects happen in subscribers
}
// ── email-handler.js ─────────────────────────────────────────────────────────
import { eventBus } from './event-bus.js';
eventBus.subscribe('OrderPlaced', async ({ orderId, userId }) => {
const user = await db.users.findById(userId);
await email.send(user.email, 'order-confirmation', { orderId });
});
// ── inventory-handler.js ─────────────────────────────────────────────────────
import { eventBus } from './event-bus.js';
eventBus.subscribe('OrderPlaced', async ({ orderId }) => {
await inventory.decrementForOrder(orderId);
});
// Adding a third handler (warehouse notification) = zero changes to OrderService.
Mental model: Publishing an event is like ringing the school bell. Every class that has "bell = start of day" on their schedule starts their thing. You don't ring a different bell for each class — you ring one bell and whoever cares, reacts.
Why it matters: Adding a new behavior (loyalty-points handler, fraud-detection hook, audit log) is additive. You write a new subscriber, subscribe it to the existing event, and deploy. You touch zero existing code. This is the practical definition of the Open/Closed principle from Lesson 4 (SOLID): open for extension, closed for modification.
Common mistake: Making event names vague ('updated', 'changed') instead of descriptive past-tense domain actions ('OrderPlaced', 'UserEmailVerified', 'SubscriptionCancelled'). Vague event names attract unrelated subscribers over time and turn the event bus into a mystery dependency.
🔁 At-least-once delivery — why consumers must be idempotent
Real message queues guarantee at-least-once delivery: a message will be delivered at least once, but under failure conditions (consumer crash, network blip, timeout) it may be delivered more than once. The queue doesn't know if your consumer processed the message or just disappeared mid-way.
This means your consumers must be idempotent: processing the same message twice must produce the same result as processing it once.
// ── email-worker.js ───────────────────────────────────────────────────────────
// ❌ NOT idempotent — crashes halfway through? Message re-delivered → duplicate email
async function processOrderConfirmation(message) {
await email.send(message.userEmail, 'order-confirmation', message);
// If the process dies after send() but before ack, the message is re-queued.
await queue.ack(message.id);
}
// ✅ Idempotent — check before acting, then mark as done atomically
async function processOrderConfirmation(message) {
// Guard: did we already send this email for this exact order?
const alreadySent = await db.emailLog.findOne({
orderId: message.orderId,
type: 'order-confirmation',
});
if (!alreadySent) {
await email.send(message.userEmail, 'order-confirmation', message);
await db.emailLog.create({ orderId: message.orderId, type: 'order-confirmation' });
}
// Safe to ack whether we sent or skipped — result is the same either way
await queue.ack(message.id);
}
Mental model: Idempotency means "doing it twice is the same as doing it once." Your consumer should ask: "Has this already happened?" before acting. If yes, skip and ack. The deduplication key is usually the message's own ID or a business-level ID (orderId + event type).
Why it matters: Without idempotency, network retries and queue redeliveries cause double-charges, duplicate emails, and double-decremented inventory. These are exactly the bugs that are hardest to reproduce and most catastrophic for user trust.
Common mistake: Relying on the queue to guarantee exactly-once delivery. Almost no production queue system guarantees this (it's extremely expensive to implement correctly). Exactly-once delivery is your job, not the queue's.
💀 Dead-letter queues — when messages just won't process
A dead-letter queue (DLQ) is a separate queue where messages land after they've failed processing more than N times. Instead of losing the message or blocking the main queue, failed messages are parked in the DLQ for investigation and replay.
MAIN QUEUE ──► Consumer Worker
│
(fails 3 times)
│
▼
DEAD-LETTER QUEUE ──► Alert (PagerDuty/Slack)
Manual inspection
Replay after fix
Mental model: The DLQ is the "needs attention" tray on your desk. Work in progress goes in one inbox; things that are stuck or broken go in another. Nothing is lost, but the main pipeline keeps moving.
Why it matters: Without a DLQ, a single malformed message (a "poison pill") can block every consumer forever as they each retry it in a loop. A DLQ quarantines the bad message so healthy messages continue processing.
Common mistake: Setting up a DLQ but never monitoring it. The DLQ filling up silently is itself a production incident — you're losing data, you just don't know it yet. Wire an alert to any DLQ message count above zero.
🚫 When NOT to add a queue
Queues solve real problems, but they add real complexity. Don't reach for one when:
- You need the result immediately. If the next step in your function literally requires the output of this step, it must be synchronous. Queues are for fire-and-forget side-effects, not request-response flows.
- Your traffic is tiny and predictable. A queue for a tool with 50 users and no spikes is over-engineering. The background-jobs pattern from D1 (a simple cron or in-process async function) is often enough.
- You can't afford eventual consistency. If the user needs to see the inventory update right now on the confirmation page, you need synchronous inventory decrement — or a smart UI that shows "pending."
- Your team is small and unfamiliar with queue semantics. Idempotency, at-least-once delivery, DLQs, and consumer group management are real operational overhead. For a solo or two-person project, weigh the cost honestly.
The test: can the user get an honest, complete answer to their request without waiting for this step? If yes, it's a queue candidate. If no, it must stay synchronous.
🛠️ Your mission
Find one place in your app where your HTTP handler or main function chains multiple side-effects synchronously after the "real work" is done — sending a notification, logging an event, syncing to an external service, updating a secondary record.
- Identify the critical path: the minimum work needed to give the user a correct response.
- Identify the reactions: the side-effects that could happen after the response without the user noticing the difference.
- Move the reactions behind an
eventBus.publish()call (or a job queue push if you want durability). - Write one subscriber handler for each reaction.
- Make at least one subscriber idempotent: add a guard that checks whether the work was already done before doing it again.
You don't need an external broker (Redis, RabbitMQ, AWS SQS) to do this exercise. An in-process event bus as shown above is enough to learn the pattern. When you hit real scale, you swap the bus — all your handlers stay the same.
✅ You're done when…
- You've identified a synchronous chain in your app and redrawn it as a critical path + event(s) + handlers — the handler count matches the number of side-effects you extracted.
- Your HTTP handler returns before any of the extracted side-effects finish (verify in your network tab: response time drops, side-effects log after the response).
- At least one handler is idempotent — it checks a log or a record before acting, and you can call it twice with the same message without doubling the side-effect.
- You've reviewed your design against the Production-Readiness Checklist and confirmed: no unbounded synchronous chains on the critical path, side-effects are decoupled from the response, and failure in a handler cannot bubble up to fail the original request.
- You can answer: "What happens if the email worker crashes while processing a message?" — and the answer is "the message is retried; nothing is lost; the user's order is unaffected."
➡️ Next: Caching Layers & CDNs.
Build It Right, Or Don't Build It At All. 🏛️