Observability: Logging
Stage 3 · DevOps, Deployment & Operations · B.U.I.L.D. letter: D
You shipped it. It's running. Something just went wrong and a user is telling you about it on Twitter. You open your terminal and stare at a blank screen — because production has no logs. This is the moment every engineer learns that observability isn't optional. Let's make sure you never have that moment.
⚠️ The vibe trap
When you're building fast, console.log("here") and console.log(data) feel like enough — and honestly, in dev they are. The trap springs in production: those logs either don't exist (serverless functions vanish), can't be searched (plain text is grep-hostile at scale), or contain passwords and tokens you accidentally printed. The other trap is the opposite: you log everything, bury the signal in noise, and still can't find the bug. Good logging is a discipline, not an afterthought, and it costs almost nothing to get right from the start.
📐 Structured Logging: JSON, Not Console Soup
A console.log("user signed in: " + email) is a string. You can read it with your eyes. You cannot query it, aggregate it, alert on it, or join it to another log line from the same request. A structured log is a JSON object with consistent fields. Every log platform on earth (Datadog, Logtail, Grafana Loki, AWS CloudWatch Insights) can index, filter, and graph JSON fields instantly.
// ❌ The vibe-code way — readable by humans, hostile to machines
console.log("POST /api/orders 201 took 142ms user=" + userId);
console.log("Stripe charge created", chargeId);
console.log("ERROR:", err.message);
// ✅ Structured logging with pino (drop-in replacement, zero config)
// npm install pino pino-http
import pino from "pino";
const log = pino({
// In dev: pretty-print; in prod: pure JSON (fast, indexable)
transport:
process.env.NODE_ENV !== "production"
? { target: "pino-pretty", options: { colorize: true } }
: undefined,
// Redact sensitive fields before they ever reach the log sink
redact: {
paths: ["req.headers.authorization", "*.password", "*.token", "*.cardNumber"],
censor: "[REDACTED]",
},
// Base fields present on every single log line
base: { service: "api", env: process.env.APP_ENV ?? "unknown" },
});
export default log;
Mental model: A structured log line is a database row, not a sentence. Every field has a name. You query fields, not text. When you have 10,000 log lines and need to find all errors for user abc123, you filter userId = "abc123" AND level = "error" — not grep "abc123" across a flat file.
Why it matters: Structured logs are the foundation of every other observability tool. Metrics dashboards, alerting rules, and distributed traces all need consistent field names to do their job. Switching from console.log to pino or winston takes 20 minutes and pays dividends forever.
Common mistake: Mixing structured and unstructured logging in the same service. Once you adopt a logger, remove all naked console.log calls. Mixed output breaks parsers and creates blind spots.
🎚️ Log Levels: The Volume Knob You Actually Need
Not every event deserves equal urgency. Log levels let you turn the volume up in dev and down in prod — or crank it up again during an incident — without changing your code.
// The four levels you'll use every day, in ascending severity:
// DEBUG — fine-grained detail; only useful when you're actively debugging
// Never leave debug logs on in production; they drown the signal
log.debug({ query: sql, params }, "DB query executed");
// INFO — normal, expected events worth recording; the "heartbeat" of your app
log.info({ userId, orderId }, "Order created successfully");
// WARN — something unexpected happened but the app kept going;
// investigate soon, not right now
log.warn({ userId, retryCount: 3 }, "Payment retry threshold approaching");
// ERROR — something broke; a user got an error; a request failed
// Always include the error object — pino serializes the stack trace for you
log.error({ err, userId, orderId }, "Payment charge failed");
// In production: set minimum level to "info"
// During an incident: temporarily lower to "debug" to see more detail
// In dev: "debug" or trace-level is fine
const log = pino({ level: process.env.LOG_LEVEL ?? "info" });
LOG LEVEL CHEAT SHEET
─────────────────────────────────────────────────────────────
Level │ Use for │ On in prod?
─────────┼──────────────────────────────────┼───────────────
debug │ SQL queries, loop iterations, │ No (too noisy)
│ internal state during dev │
─────────┼──────────────────────────────────┼───────────────
info │ User actions, successful ops, │ Yes — default
│ startup events, key milestones │
─────────┼──────────────────────────────────┼───────────────
warn │ Retries, deprecated usage, │ Yes
│ unexpected-but-handled states │
─────────┼──────────────────────────────────┼───────────────
error │ Thrown exceptions, failed │ Yes — always
│ external calls, broken contracts │
─────────────────────────────────────────────────────────────
Mental model: Think of log levels like a hospital triage system. A debug log is a routine vital-signs check; an error is the crash cart. If everything sounds the alarm, nothing gets attention.
Why it matters: Production logs at debug level on a busy service generate gigabytes per hour. That costs real money to store and makes finding real errors feel like searching for a word in a novel. Level discipline keeps costs sane and signal clear.
Common mistake: Logging everything at error level "just to be safe." When every line is an error, on-call engineers stop trusting the alerts. Reserve error for events that actually require human attention.
🔍 What to Log — and What You Must Never Log
The question isn't "should I log this?" — it's "will I need this information to debug a production issue, and is it safe to keep?" Answer both halves.
// ✅ GOOD: Log a request lifecycle with all the context you'll need in a crisis
app.use(async (req, res, next) => {
const start = Date.now();
// Attach a unique request ID to every log in this request's scope
const reqId = req.headers["x-request-id"] ?? crypto.randomUUID();
req.log = log.child({ reqId, method: req.method, path: req.path });
req.log.info({ ip: req.ip, userAgent: req.headers["user-agent"] }, "Request received");
res.on("finish", () => {
const durationMs = Date.now() - start;
const level = res.statusCode >= 500 ? "error" : res.statusCode >= 400 ? "warn" : "info";
req.log[level](
{
statusCode: res.statusCode,
durationMs,
userId: req.user?.id ?? null, // internal ID only — never email/name
},
"Request completed"
);
});
next();
});
// ✅ GOOD: Log a business-critical event with enough context to investigate
log.info(
{ userId, planId, trialEndDate, reqId },
"Subscription upgraded"
);
// ✅ GOOD: Log an error with its full context
try {
await chargeCard(paymentMethodId, amountCents);
} catch (err) {
log.error(
{ err, userId, amountCents, reqId },
"Card charge failed — payment not collected"
);
throw err; // re-throw; don't swallow errors silently
}
WHAT TO LOG vs. WHAT NEVER TO LOG
──────────────────────────────────────────────────────────────────────
✅ LOG THIS │ 🚫 NEVER LOG THIS
────────────────────────────────────┼─────────────────────────────────
Request method, path, status, ms │ Passwords (plaintext or hashed)
Internal user ID │ Session tokens / JWT values
Request ID / correlation ID │ Full credit card numbers (PAN)
Error message + stack trace │ CVV / CVC codes
Key business events (order, signup) │ Social security / tax IDs
External API response codes │ Full OAuth tokens or API keys
Config values at startup (non-secret│ Raw request bodies with PII
— e.g., LOG_LEVEL, APP_ENV) │ Health/medical data
──────────────────────────────────────────────────────────────────────
RULE: If a string can be used to impersonate a user, move money, or
violate HIPAA/GDPR, it must never appear in a log line. Ever.
Mental model: Logs are written to disk, shipped to a vendor, indexed, and retained for 30–90 days. Treat every log field as if it will be read by an auditor, a breach investigator, and your user's lawyer. Because one day, it might be.
Why it matters: GDPR fines for logging PII without retention controls can reach 4% of global revenue. PCI-DSS compliance is immediately violated by logging a card number. Beyond regulation, a breach that exposes your log storage is far worse if those logs contain passwords and tokens.
Common mistake: Using JSON.stringify(req.body) in a log and not realizing the body contains password, cardNumber, or ssn. Use pino's redact option (shown in section 1) to strip sensitive field paths before the data ever touches disk.
🧵 Correlation IDs: Following One Request Through the Maze
Your app handles thousands of requests per minute. When one fails, you need to find every log line from that single request — not lines from other requests that happened to run at the same millisecond. A request ID (also called a correlation ID or trace ID) is a unique string you attach to every log line for the life of one request.
// middleware/requestId.js — attach an ID at the very start of every request
import { randomUUID } from "crypto";
export function requestIdMiddleware(req, res, next) {
// Accept an ID from an upstream service (API gateway, load balancer)
// or generate one if this is the origin request
const reqId = req.headers["x-request-id"] ?? randomUUID();
// Echo it back in the response so clients can reference it in support tickets
res.setHeader("x-request-id", reqId);
// Attach a child logger scoped to this request — every log.info/warn/error
// on req.log will automatically include reqId in the JSON output
req.log = log.child({ reqId });
req.reqId = reqId;
next();
}
// In your route handlers: use req.log instead of the global log
app.post("/api/orders", async (req, res) => {
req.log.info({ userId: req.user.id }, "Creating order");
try {
const order = await db.createOrder(req.body);
req.log.info({ orderId: order.id }, "Order persisted");
await notifyWarehouse(order.id, req.reqId); // propagate the ID downstream
req.log.info({ orderId: order.id }, "Warehouse notified");
res.status(201).json(order);
} catch (err) {
req.log.error({ err }, "Order creation failed");
res.status(500).json({ error: "Order failed", reqId: req.reqId }); // surface to client
}
});
And here is what a healthy JSON log line actually looks like in your log aggregator:
{
"level": "info",
"time": "2026-06-03T18:42:11.034Z",
"service": "api",
"env": "production",
"reqId": "019012f4-b2a3-7c00-af1a-3f8e1b2c4d5e",
"userId": "usr_4829ab",
"orderId": "ord_88821c",
"method": "POST",
"path": "/api/orders",
"statusCode": 201,
"durationMs": 87,
"msg": "Request completed"
}
Mental model: A request ID is like a wristband at a festival. Every vendor (log line, downstream service, error tracker) scans that wristband. At the end of the night you can reconstruct exactly where that wristband went — without having to track the person's face in 10,000 photos.
Why it matters: Without a correlation ID, debugging a distributed failure means manually cross-referencing timestamps across multiple services, hoping nothing else happened at the exact same millisecond. With it, you run one query — reqId = "019012f4..." — and every related log line appears instantly.
Common mistake: Generating the request ID inside a route handler instead of the earliest possible middleware. Any log emitted by middleware before your handler (auth, rate limiting) won't carry the ID, and you'll lose the first half of the request's story.
🚢 Shipping Logs Off the Box: Centralized Aggregation
In production, logs must be collected somewhere outside the process that created them. If your server crashes, its local logs are gone. If you have two instances, your logs are split across two machines. Centralized log aggregation solves both.
// pino-compatible transports to popular log platforms — zero-cost tiers available
// Logtail (Better Stack): npm install @logtail/pino
import { Logtail } from "@logtail/node";
import { LogtailTransport } from "@logtail/pino";
import pino from "pino";
const logtail = new Logtail(process.env.LOGTAIL_SOURCE_TOKEN);
const log = pino(
{
level: process.env.LOG_LEVEL ?? "info",
redact: {
paths: ["req.headers.authorization", "*.password", "*.token"],
censor: "[REDACTED]",
},
base: { service: "api", env: process.env.APP_ENV },
},
// In production: write to Logtail; in dev: pretty-print to stdout
process.env.NODE_ENV === "production"
? LogtailTransport(logtail)
: pino.transport({ target: "pino-pretty", options: { colorize: true } })
);
export default log;
// Alternative platforms (same pattern, different transport):
// Axiom: npm install @axiomhq/pino
// Datadog: npm install pino-datadog-transport
// Grafana Loki: use Promtail sidecar or Alloy agent — no code changes needed
// AWS CloudWatch: pino-cloudwatch or Lambda's built-in stdout capture
Mental model: Think of your log aggregator as a flight recorder. The plane (your server) can crash and burn, but the black box (the aggregator) survives and holds every event in the minutes before impact. Logs on disk are not a black box — they go down with the plane.
Why it matters: Serverless functions (Vercel Edge, AWS Lambda) have no persistent disk at all. If you don't ship logs to an external sink, they vanish when the function invocation ends. Centralized aggregation is not optional in a serverless world — it is the only way to have logs at all.
Common mistake: Writing to a log file on the local filesystem in production and calling it done. Files fill up, servers get replaced, and log rotation silently discards history. Stdout → log platform is the modern, correct pattern.
🛠️ Your Mission
Add production-grade structured logging to your app. Here's the exact sequence:
-
Install pino (
npm install pino pino-http pino-pretty) and create alib/logger.js(orlogger.ts) module that exports a configured logger withredact,basefields (service,env), and environment-aware transport. -
Replace every
console.login your codebase with the appropriate log level call. Uselog.debugfor dev noise,log.infofor events you'd want to see in prod,log.warnfor recoverable problems,log.errorfor failures. Delete all rawconsole.logcalls. -
Add a
requestIdMiddlewarethat generates or forwards ax-request-idheader, creates a child logger bound to that ID, and attaches it to the request object. Usereq.login every route handler from this point on. -
Audit your log calls for sensitive data. Search your codebase for
password,token,secret,card,ssn,key. If any of those strings appear near a log call, add them to pino'sredact.paths. Confirm in dev that the field shows[REDACTED]in the output. -
Connect a log aggregator. Sign up for a free-tier account on Logtail, Axiom, or Grafana Cloud. Add the transport. Deploy. Verify that a real request appears in the aggregator's UI within 60 seconds.
-
Test failure visibility. Throw a deliberate error in a non-critical route, trigger it, and confirm you can find the full stack trace and the
reqIdin your log platform within 30 seconds.
✅ You're done when…
- You have verified your logging setup against the Production-Readiness Checklist and confirmed every checklist item for observability passes
- Every log line in production is valid JSON with at least
level,time,service,env,reqId, andmsgfields - A search for
password OR token OR cardNumberacross your log platform returns zero results with real values (only[REDACTED]or no results) - You can paste a
reqIdfrom a failed request into your log aggregator and see every log line from that request — nothing more, nothing less - Your app is shipping logs to a centralized aggregator (not local disk, not only stdout) and the aggregator shows live data
- Log levels are set to
infoin production; you can lower todebugvia an env var change without a redeploy
➡️ Next: Observability: Metrics & Tracing.
Build It Right, Or Don't Build It At All. 🏛️