Skip to main content
Data & Databases
🗄️ Data & DatabasesLesson 13 of 13

Capstone: Design & Build a Real Data Layer

Design and build a real data layer: schema, migrations, indexes, seed, and a tuned query.

Capstone: Design & Build a Real Data Layer

Stage 3 · Data & Databases · Capstone

The prettiest front end in the world falls apart the first time its data layer loses a record, returns stale cache, or locks up under a JOIN it never anticipated. This capstone is where you earn the right to say your app is real — because the data underneath it is trustworthy, fast, and recoverable.


🎯 The mission

Pick an app you have already vibe-coded (a marketplace, a learning tracker, a community forum — anything with users, content, and relationships) and build its complete data layer from scratch, engineered to survive production. You will model the domain correctly, write versioned migration files, lock data integrity down with constraints and transactions, prove you can answer tough queries fast, and verify that a backup can actually come back to life. When you are finished, the flashy React shell you shipped earlier will finally have a foundation it can stand on.


🧱 What to build

Schema

  • At least four tables covering the core domain of your app
  • At least one 1-to-many relationship (e.g., a user has many posts)
  • At least one many-to-many relationship resolved through a proper junction table with its own meaningful columns (e.g., a project_collaborators table that also stores role and joined_at)
  • Every table has a primary key; every foreign key column is declared with an explicit REFERENCES constraint and an ON DELETE rule (CASCADE, SET NULL, or RESTRICT — your choice, but justify it in a comment)
  • Schema is in Third Normal Form (3NF) — no repeating groups, no partial dependencies, no transitive dependencies. Annotate any place you deliberately denormalize and explain why

Migrations

  • All schema changes live in numbered migration files (001_init.sql, 002_add_indexes.sql, …) committed to git — never modified after they have been applied
  • A migrations table in the DB records which files have run and when
  • You can blow away the database and replay all migrations to get back to the current state in under two minutes

Constraints

  • NOT NULL on every column that must never be empty
  • UNIQUE constraints where business logic demands uniqueness (usernames, email addresses, slug values)
  • At least one CHECK constraint enforcing a domain rule (e.g., CHECK (price >= 0), CHECK (status IN ('draft','published','archived')))

Seed script

  • A single seed.sql (or equivalent script) that populates realistic sample data across all tables — enough rows that queries return non-trivial result sets
  • Seed is idempotent: running it twice produces the same state, not double the data

Indexes

  • Identify every column that appears in a WHERE, JOIN ON, or ORDER BY clause in your app's most common queries
  • Add a B-tree index on each; add a composite index wherever two columns are almost always filtered together
  • Run EXPLAIN ANALYZE (Postgres) or EXPLAIN (MySQL/SQLite) on your slowest query before and after adding the index; paste both plans into a performance-notes.md file

Transactions

  • At least one multi-step write is wrapped in an explicit transaction. A good candidate: enrolling a user in a course (insert into enrollments, decrement seats_available on courses, insert a notification row — all three happen together or not at all)
  • Demonstrate that if the second step throws an error, the entire transaction rolls back cleanly
-- Example: atomic course enrollment
BEGIN;

INSERT INTO enrollments (user_id, course_id, status)
  VALUES ($1, $2, 'active');

UPDATE courses
  SET seats_available = seats_available - 1
  WHERE id = $2 AND seats_available > 0;

INSERT INTO notifications (user_id, message)
  VALUES ($1, 'You have been enrolled.');

COMMIT;
-- If any step above fails, issue ROLLBACK; and no row is written.

Soft deletes

  • Core content tables have a deleted_at TIMESTAMPTZ column (nullable, default NULL)
  • Your query layer filters WHERE deleted_at IS NULL by default
  • A record can be undeleted by nulling that column — no data is ever destroyed

Backup and restore

  • Run a full database dump (pg_dump, mysqldump, .dump in SQLite, or the equivalent for your engine)
  • Drop the database (or a test copy of it), restore from the dump, and verify row counts match
  • Document the exact commands in backup-runbook.md

🗺️ Run it through B.U.I.L.D.

Understand the domain first. Before you write a single line of SQL, draw an Entity-Relationship diagram on paper or in a tool like dbdiagram.io. Name every entity, every attribute, every relationship cardinality. Ask: what are the nouns in my app? What does each one own, and what does it belong to? Only move to code once the diagram stops surprising you.

Interface — a clean, queryable schema. Your schema is an API. Name tables as plural nouns (users, projects, enrollments). Name foreign keys as <singular_table>_id. Keep column names consistent across tables — if the timestamp for creation is created_at on one table, it is created_at on every table. Write views for the three or four queries your app runs on every page load so application code never touches raw JOINs.

-- Example: a view that collapses the N+1 risk on the project list page
CREATE VIEW project_summary AS
SELECT
  p.id,
  p.title,
  p.status,
  u.display_name AS owner_name,
  COUNT(DISTINCT c.id) AS collaborator_count,
  COUNT(DISTINCT t.id) AS open_task_count
FROM projects p
JOIN users u ON u.id = p.owner_id
LEFT JOIN project_collaborators c ON c.project_id = p.id
LEFT JOIN tasks t ON t.project_id = p.id AND t.completed_at IS NULL
GROUP BY p.id, u.display_name;

Lock it down — constraints, transactions, and parameterized access. Every write path that touches more than one table belongs in a transaction. Every query that accepts user input must use parameterized statements — never string-interpolate a user value into SQL. Layer your constraints so the database itself rejects bad data even if application code has a bug:

-- The DB enforces the rule even if your API forgets to
ALTER TABLE enrollments
  ADD CONSTRAINT enrollment_unique UNIQUE (user_id, course_id),
  ADD CONSTRAINT enrollment_status_check
    CHECK (status IN ('pending', 'active', 'completed', 'cancelled'));

Document, test, deploy. Your migration files are your deployment story — they are the single source of truth for what the schema is and how it got there. Tag the git commit that contains your final migration set. Run your seed script on a fresh database and confirm every page of your app still loads correctly. Take your backup, restore it to a throwaway database instance, and screenshot the row counts. Then commit backup-runbook.md so future-you (or a teammate) can execute a restore without digging through Slack history.


🧪 Deliverables

  1. Migration files — all schema DDL in numbered .sql files, committed to git, replayable from zero to current state
  2. seed.sql — idempotent, realistic sample data across all tables
  3. performance-notes.md — side-by-side EXPLAIN ANALYZE output for at least one query, before and after your index or query rewrite, with a one-paragraph explanation of what changed and why
  4. backup-runbook.md — the exact commands to dump the database, the exact commands to restore it, and a screenshot or terminal output showing row counts matching after restore
  5. Transaction demo — a code snippet or SQL script (in examples/transactional-write.sql) showing your multi-step write, a forced error mid-transaction, and the rollback confirmation

🏆 Stretch goals

Full-text search. Add a tsvector column (Postgres) or a dedicated full-text index (MySQL, SQLite FTS5) to your primary content table. Write a search query that ranks results by relevance. Compare it to a naive LIKE '%term%' scan on the same data.

A denormalized read model with cache invalidation. Create a summaries table that pre-computes the expensive aggregates your dashboard uses. Write a trigger or application-level hook that updates the summary row whenever the underlying data changes. Add a Redis (or in-memory) cache in front of it and demonstrate that a write to the source table correctly evicts the stale cache entry.

Point-in-time recovery. Enable WAL archiving (Postgres) or binary logging (MySQL). Simulate data loss at a specific timestamp. Restore to 30 seconds before that timestamp and confirm the recovered data is correct.


✅ You're done when…

  • Every entity and relationship in your ER diagram maps one-to-one to a table and FK constraint in your schema — open the Data Modeling Cheat Sheet from Lesson D2-01 and tick off each normalization rule against your actual table definitions
  • You have walked through the Production-Readiness Checklist from Lesson D2-12 and every item is either done or explicitly documented as out of scope with a reason
  • Every relationship in the schema is enforced by a foreign key constraint, every frequently-queried column has an index, and your EXPLAIN ANALYZE output confirms those indexes are actually being used by the query planner
  • A backup has actually been restored to a separate database instance and the row counts match — not just taken and left sitting in a file you hope works

➡️ Next: Stage 3 continues with Auth, Identity & Security — lock down the data you just designed.

Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.