Migrations: Evolving a Schema Safely
Stage 3 · Data & Databases · B.U.I.L.D. letter: D
Your app is live. Real users. Real data. And you just realized the
userstable needs adisplay_namecolumn. You open TablePlus, click "Add Column," and — boom — done. Right? Wrong. What you just did will haunt you at 2 a.m. when your teammate deploys and nothing matches, production silently breaks, and you have no record of what you changed or when. Migrations are how real engineers change a live database without chaos.
⚠️ The vibe trap
Vibe coding gets you a working app fast — and that's genuinely awesome. But "just editing the schema in the console" is the database equivalent of pushing straight to main with no commit message. Nobody else on your team knows what changed. Your staging environment doesn't match prod. Rolling back is impossible. Next deploy, the columns your code expects simply don't exist on another machine. No record means no recovery.
📁 Section 1 — Migrations Are Just Numbered SQL Files
A migration is a plain SQL file with a number in its name and a single job: describe one schema change. That's it. The file lives in your repo. Every developer checks it out. Every environment runs it in order.
# A typical migrations folder
supabase/migrations/
001_create_users.sql
002_add_email_verified.sql
003_create_posts.sql
004_add_post_status.sql
-- 004_add_post_status.sql
ALTER TABLE posts
ADD COLUMN status TEXT NOT NULL DEFAULT 'draft';
Mental model: Think of migrations as a commit history for your schema. Just like git log shows you every code change in order, your migrations folder shows every database change in order. The number prefix is the commit message's sequence number.
Why it matters: Any engineer, any CI server, any fresh environment can run all migrations in order and arrive at exactly the same schema as production. No tribal knowledge. No "oh I forgot to tell you I added a column last week."
Common mistake: Naming files with timestamps only (20260603_add_column.sql) without zero-padded sequences. Two developers creating files at the same millisecond get a conflict. Use NNN_description.sql — simple, sortable, readable.
🔒 Section 2 — The Golden Rule: Never Edit an Applied Migration
Once a migration has been run — on anyone's machine, on staging, on production — it is frozen. Sealed. Done. You do not go back and edit it.
-- ❌ WRONG: You applied 004_add_post_status.sql yesterday.
-- Today you realize you also need an index. DO NOT DO THIS:
-- (editing 004_add_post_status.sql — BAD)
ALTER TABLE posts
ADD COLUMN status TEXT NOT NULL DEFAULT 'draft';
CREATE INDEX idx_posts_status ON posts (status); -- ← added after the fact
-- ✅ RIGHT: Create a new migration file instead.
-- 005_index_post_status.sql
CREATE INDEX idx_posts_status ON posts (status);
Mental model: Applied migrations are like published git commits. You don't rewrite history that other people have already pulled. A new change always means a new file.
Why it matters: Your database keeps a table (usually called schema_migrations or supabase_migrations.schema_migrations) listing which files have already run. If you edit a file that's already in that table, migration tooling will skip it — your change silently never happens on the next deploy — or worse, a checksum mismatch will crash your deployment.
Common mistake: Fixing a typo in an old migration "because it's small." Even a comment edit can break checksum-based tools. Add a new migration that ALTERs the thing you got wrong.
🔄 Section 3 — Up Migrations and Reversibility
Every migration has a forward direction (up): the change you are making. Many teams also write a rollback direction (down): the reverse. Not every migration is safely reversible (you can't un-delete data), but thinking about it forces clarity.
-- 006_add_user_avatar_url.sql (up)
ALTER TABLE users
ADD COLUMN avatar_url TEXT;
-- To roll back this specific migration (down):
-- ALTER TABLE users DROP COLUMN avatar_url;
Some tools (like Flyway, Liquibase, or Supabase's own CLI) support separate .up.sql and .down.sql files. Others keep both directions in one file, separated by a comment marker.
-- Using a two-section pattern (some tools parse this automatically):
-- migrate:up
ALTER TABLE users ADD COLUMN avatar_url TEXT;
-- migrate:down
ALTER TABLE users DROP COLUMN avatar_url;
Mental model: "Up" is forward in time. "Down" is a time machine — but only if you planned for it. A destructive migration (dropping a column, deleting rows) is a one-way door. Label it clearly so future-you knows.
Why it matters: During an incident, being able to run migrate down to the previous version buys you breathing room while you fix the real problem. Without a down migration, your only option is to write a new "undo" migration under pressure — the worst time to write careful SQL.
Common mistake: Writing a down migration that drops a column you seeded with data during the up migration. Now rolling back destroys real content. If the up migration inserts rows, the down migration needs to delete them first — or accept it's irreversible and document that.
🏗️ Section 4 — The Expand/Contract Pattern for Zero-Downtime Changes
This is the most important pattern in production database engineering. The problem: you want to rename a column (or change its type). If you just ALTER TABLE ... RENAME COLUMN, every query in your running app that references the old name breaks instantly.
The solution: three separate deployments, three separate migrations.
Phase 1 — Expand: Add the new thing alongside the old thing.
-- 007_expand_add_full_name.sql
-- Add the new column. Keep the old one. Nothing breaks.
ALTER TABLE users
ADD COLUMN full_name TEXT;
Phase 2 — Backfill: Populate the new column from existing data.
-- 008_backfill_full_name.sql
-- Copy existing data into the new column.
-- Run this on a slow table in batches to avoid locking.
UPDATE users
SET full_name = first_name || ' ' || last_name
WHERE full_name IS NULL;
Now update your application code to write to BOTH columns, and read from full_name. Deploy that code change. Verify it works.
Phase 3 — Contract: Drop the old thing once nothing uses it.
-- 009_contract_drop_old_name_columns.sql
-- Only run this after verifying no code reads first_name/last_name.
ALTER TABLE users
DROP COLUMN first_name,
DROP COLUMN last_name;
Mental model: Think of it as building a new bridge next to the old one. Traffic flows over the old bridge while the new one is under construction. Once the new bridge is ready and everyone has crossed, you tear down the old one. You never dynamite the old bridge while cars are still on it.
Why it matters: A direct rename on a busy production table can lock the table for seconds or minutes, timing out every request. The expand/contract pattern means your schema and code evolve together with zero downtime and a clean rollback at each phase.
Common mistake: Skipping the backfill and jumping straight to contract. You drop the old columns before copying the data. All your users' first_name and last_name values are gone. There is no undo. Backfill first, always verify counts, then contract.
📊 Section 5 — A Complete Real Migration Walkthrough
Your app currently has this table:
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The product team wants: a published_at timestamp (nullable, so drafts work), and a slug for SEO URLs (unique, derived from title for existing rows).
Step 1 — Expand: add the columns.
-- 010_expand_add_published_at_and_slug.sql
ALTER TABLE articles
ADD COLUMN published_at TIMESTAMPTZ,
ADD COLUMN slug TEXT;
No NOT NULL yet on slug — existing rows have no slug, so adding it as NOT NULL would fail immediately. We'll tighten constraints after the backfill.
Step 2 — Backfill: populate slugs for existing rows.
-- 011_backfill_article_slugs.sql
-- Build a slug from the title: lowercase, replace spaces with hyphens,
-- strip non-alphanumeric, append id to guarantee uniqueness.
UPDATE articles
SET slug = lower(regexp_replace(title, '[^a-zA-Z0-9 ]', '', 'g'))
|| '-' || id::text
WHERE slug IS NULL;
Step 3 — Contract: enforce constraints now that data is clean.
-- 012_contract_enforce_slug_constraints.sql
-- Safe to add NOT NULL and UNIQUE because every row has a value.
ALTER TABLE articles
ALTER COLUMN slug SET NOT NULL;
CREATE UNIQUE INDEX uq_articles_slug ON articles (slug);
Mental model for the whole sequence:
Migration 010: Schema opens the door (adds nullable columns)
Migration 011: Data walks through the door (backfill)
Migration 012: Schema locks the door (constraints + index)
Never try to do all three in one migration. If your backfill script takes 90 seconds on a million-row table, splitting it into its own migration means the expand migration commits fast, the backfill can be monitored and even re-run if it fails, and the contract only runs after you've confirmed every row has a valid value.
Common mistake: Running the backfill and constraint in the same transaction. Postgres holds a lock on the table for the entire transaction. A 90-second UPDATE + ADD CONSTRAINT = 90 seconds of locked writes. Split them.
🛠️ Your Mission
Your app has a products table:
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
price_cents INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
You need to add an is_archived boolean so you can soft-delete products without losing their history.
Write three migration files using the expand/contract pattern:
013_expand_add_is_archived.sql— Add the column as nullable with no default yet.014_backfill_is_archived.sql— Setis_archived = falsefor all existing rows where it is NULL.015_contract_is_archived_not_null.sql— AddNOT NULL DEFAULT falseconstraint now that every row has a value. Add a partial index on(id) WHERE is_archived = falseso queries for active products stay fast.
Bonus: Write the down migration for each file as a comment block at the bottom. Identify which of the three down migrations is truly irreversible (hint: it's not the one you'd expect).
✅ You're Done When…
- You can explain why editing a migration that has already been applied in production is more dangerous than writing a new one, and what the checksum failure actually looks like.
- Your three mission migration files are written, numbered in sequence, and each does exactly one thing (expand, backfill, or contract).
- You've identified the backfill as the step most likely to need a
WHEREclause to be idempotent (safe to run twice without doubling the effect). - You can describe what "zero-downtime" means in the context of the expand/contract pattern and why a single
RENAME COLUMNmigration is not zero-downtime. - Your migration files are committed to git alongside your application code — not applied silently through a GUI and forgotten (review each file against the Pre-Ship Checklist and Production-Readiness Checklist before deploying to prod).
➡️ Next: Relationships in Practice. Build It Right, Or Don't Build It At All. 🏛️