Capstone: Design & Build a Real Data Layer
Stage 3 · Data & Databases · Capstone
The prettiest front end in the world falls apart the first time its data layer loses a record, returns stale cache, or locks up under a JOIN it never anticipated. This capstone is where you earn the right to say your app is real — because the data underneath it is trustworthy, fast, and recoverable.
🎯 The mission
Pick an app you have already vibe-coded (a marketplace, a learning tracker, a community forum — anything with users, content, and relationships) and build its complete data layer from scratch, engineered to survive production. You will model the domain correctly, write versioned migration files, lock data integrity down with constraints and transactions, prove you can answer tough queries fast, and verify that a backup can actually come back to life. When you are finished, the flashy React shell you shipped earlier will finally have a foundation it can stand on.
🧱 What to build
Schema
- At least four tables covering the core domain of your app
- At least one 1-to-many relationship (e.g., a user has many posts)
- At least one many-to-many relationship resolved through a proper junction table with its own meaningful columns (e.g., a
project_collaboratorstable that also storesroleandjoined_at) - Every table has a primary key; every foreign key column is declared with an explicit
REFERENCESconstraint and anON DELETErule (CASCADE, SET NULL, or RESTRICT — your choice, but justify it in a comment) - Schema is in Third Normal Form (3NF) — no repeating groups, no partial dependencies, no transitive dependencies. Annotate any place you deliberately denormalize and explain why
Migrations
- All schema changes live in numbered migration files (
001_init.sql,002_add_indexes.sql, …) committed to git — never modified after they have been applied - A
migrationstable in the DB records which files have run and when - You can blow away the database and replay all migrations to get back to the current state in under two minutes
Constraints
-
NOT NULLon every column that must never be empty -
UNIQUEconstraints where business logic demands uniqueness (usernames, email addresses, slug values) - At least one
CHECKconstraint enforcing a domain rule (e.g.,CHECK (price >= 0),CHECK (status IN ('draft','published','archived')))
Seed script
- A single
seed.sql(or equivalent script) that populates realistic sample data across all tables — enough rows that queries return non-trivial result sets - Seed is idempotent: running it twice produces the same state, not double the data
Indexes
- Identify every column that appears in a
WHERE,JOIN ON, orORDER BYclause in your app's most common queries - Add a B-tree index on each; add a composite index wherever two columns are almost always filtered together
- Run
EXPLAIN ANALYZE(Postgres) orEXPLAIN(MySQL/SQLite) on your slowest query before and after adding the index; paste both plans into aperformance-notes.mdfile
Transactions
- At least one multi-step write is wrapped in an explicit transaction. A good candidate: enrolling a user in a course (insert into
enrollments, decrementseats_availableoncourses, insert anotificationrow — all three happen together or not at all) - Demonstrate that if the second step throws an error, the entire transaction rolls back cleanly
-- Example: atomic course enrollment
BEGIN;
INSERT INTO enrollments (user_id, course_id, status)
VALUES ($1, $2, 'active');
UPDATE courses
SET seats_available = seats_available - 1
WHERE id = $2 AND seats_available > 0;
INSERT INTO notifications (user_id, message)
VALUES ($1, 'You have been enrolled.');
COMMIT;
-- If any step above fails, issue ROLLBACK; and no row is written.
Soft deletes
- Core content tables have a
deleted_at TIMESTAMPTZcolumn (nullable, default NULL) - Your query layer filters
WHERE deleted_at IS NULLby default - A record can be undeleted by nulling that column — no data is ever destroyed
Backup and restore
- Run a full database dump (
pg_dump,mysqldump,.dumpin SQLite, or the equivalent for your engine) - Drop the database (or a test copy of it), restore from the dump, and verify row counts match
- Document the exact commands in
backup-runbook.md
🗺️ Run it through B.U.I.L.D.
Understand the domain first. Before you write a single line of SQL, draw an Entity-Relationship diagram on paper or in a tool like dbdiagram.io. Name every entity, every attribute, every relationship cardinality. Ask: what are the nouns in my app? What does each one own, and what does it belong to? Only move to code once the diagram stops surprising you.
Interface — a clean, queryable schema.
Your schema is an API. Name tables as plural nouns (users, projects, enrollments). Name foreign keys as <singular_table>_id. Keep column names consistent across tables — if the timestamp for creation is created_at on one table, it is created_at on every table. Write views for the three or four queries your app runs on every page load so application code never touches raw JOINs.
-- Example: a view that collapses the N+1 risk on the project list page
CREATE VIEW project_summary AS
SELECT
p.id,
p.title,
p.status,
u.display_name AS owner_name,
COUNT(DISTINCT c.id) AS collaborator_count,
COUNT(DISTINCT t.id) AS open_task_count
FROM projects p
JOIN users u ON u.id = p.owner_id
LEFT JOIN project_collaborators c ON c.project_id = p.id
LEFT JOIN tasks t ON t.project_id = p.id AND t.completed_at IS NULL
GROUP BY p.id, u.display_name;
Lock it down — constraints, transactions, and parameterized access. Every write path that touches more than one table belongs in a transaction. Every query that accepts user input must use parameterized statements — never string-interpolate a user value into SQL. Layer your constraints so the database itself rejects bad data even if application code has a bug:
-- The DB enforces the rule even if your API forgets to
ALTER TABLE enrollments
ADD CONSTRAINT enrollment_unique UNIQUE (user_id, course_id),
ADD CONSTRAINT enrollment_status_check
CHECK (status IN ('pending', 'active', 'completed', 'cancelled'));
Document, test, deploy.
Your migration files are your deployment story — they are the single source of truth for what the schema is and how it got there. Tag the git commit that contains your final migration set. Run your seed script on a fresh database and confirm every page of your app still loads correctly. Take your backup, restore it to a throwaway database instance, and screenshot the row counts. Then commit backup-runbook.md so future-you (or a teammate) can execute a restore without digging through Slack history.
🧪 Deliverables
- Migration files — all schema DDL in numbered
.sqlfiles, committed to git, replayable from zero to current state seed.sql— idempotent, realistic sample data across all tablesperformance-notes.md— side-by-sideEXPLAIN ANALYZEoutput for at least one query, before and after your index or query rewrite, with a one-paragraph explanation of what changed and whybackup-runbook.md— the exact commands to dump the database, the exact commands to restore it, and a screenshot or terminal output showing row counts matching after restore- Transaction demo — a code snippet or SQL script (in
examples/transactional-write.sql) showing your multi-step write, a forced error mid-transaction, and the rollback confirmation
🏆 Stretch goals
Full-text search.
Add a tsvector column (Postgres) or a dedicated full-text index (MySQL, SQLite FTS5) to your primary content table. Write a search query that ranks results by relevance. Compare it to a naive LIKE '%term%' scan on the same data.
A denormalized read model with cache invalidation.
Create a summaries table that pre-computes the expensive aggregates your dashboard uses. Write a trigger or application-level hook that updates the summary row whenever the underlying data changes. Add a Redis (or in-memory) cache in front of it and demonstrate that a write to the source table correctly evicts the stale cache entry.
Point-in-time recovery. Enable WAL archiving (Postgres) or binary logging (MySQL). Simulate data loss at a specific timestamp. Restore to 30 seconds before that timestamp and confirm the recovered data is correct.
✅ You're done when…
- Every entity and relationship in your ER diagram maps one-to-one to a table and FK constraint in your schema — open the Data Modeling Cheat Sheet from Lesson D2-01 and tick off each normalization rule against your actual table definitions
- You have walked through the Production-Readiness Checklist from Lesson D2-12 and every item is either done or explicitly documented as out of scope with a reason
- Every relationship in the schema is enforced by a foreign key constraint, every frequently-queried column has an index, and your
EXPLAIN ANALYZEoutput confirms those indexes are actually being used by the query planner - A backup has actually been restored to a separate database instance and the row counts match — not just taken and left sitting in a file you hope works
➡️ Next: Stage 3 continues with Auth, Identity & Security — lock down the data you just designed.
Build It Right, Or Don't Build It At All. 🏛️