Skip to main content
← The Owens Vibe Coding & Development Institute
Rigor toolkit

Data Modeling Cheat Sheet

Relationships, normalization, and indexing rules of thumb — the data-design quick reference.

Data Modeling Cheat Sheet

Model the data right first — everything else is downstream of this.

A well-designed schema is invisible. A poorly designed one poisons every query, every feature, and every migration forever. This sheet is your quick-reference for the decisions that matter most.


1. Modeling Steps (in order)

  1. Find entities — read the requirements, circle every noun: User, Order, Product, Invoice.
  2. Add attributes — what facts does each entity own? User.email, Order.placed_at.
  3. Map relationships — how do entities connect? One-to-one, one-to-many, many-to-many?
  4. Assign keys — every table needs a PK; FKs enforce the relationships.
  5. Normalize — remove redundancy (see §4).
  6. Add constraints — NOT NULL, UNIQUE, CHECK, FK rules.
  7. Choose types — pick the right column type from the start (see §7).

2. Keys

Key TypeWhat It IsWhen to Use
Primary Key (PK)Uniquely identifies every rowEvery table — no exceptions
Natural keyA real-world unique value (email, isbn)When it is truly stable and unique
Surrogate keySystem-generated (id SERIAL, UUID)Preferred default — safe from real-world changes
Foreign Key (FK)Points to a PK in another tableWhenever two tables are related
-- Surrogate PK (preferred)
CREATE TABLE users (
  id   SERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE
);

-- Natural key (only when it truly never changes)
CREATE TABLE countries (
  iso_code CHAR(2) PRIMARY KEY,   -- 'US', 'GB' — stable enough
  name     TEXT NOT NULL
);

3. Relationships Quick Reference

One-to-One (1:1)

Rare. Use when you need to split a wide table or restrict access to sensitive columns.

CREATE TABLE user_profiles (
  user_id INT PRIMARY KEY REFERENCES users(id),
  bio     TEXT
);

One-to-Many (1:N)

Most common. FK goes on the "many" side.

CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INT NOT NULL REFERENCES users(id),  -- FK on orders (the many)
  placed_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

Many-to-Many (M:N)

Requires a join table. The join table holds two FKs and usually its own PK plus any relationship attributes.

CREATE TABLE order_items (
  id         SERIAL PRIMARY KEY,
  order_id   INT NOT NULL REFERENCES orders(id),
  product_id INT NOT NULL REFERENCES products(id),
  quantity   INT NOT NULL DEFAULT 1
);

4. Normalization in One Screen

FormPlain-English RuleViolation Example
1NFEvery cell holds one atomic value; no repeating groupstags = 'sql,python,data' in one column
2NFEvery non-key column depends on the whole PK (matters for composite PKs)order_items(order_id, product_id, product_name) — product_name depends only on product_id
3NFNo column depends on another non-key column (no transitive deps)orders(id, customer_id, customer_email) — email depends on customer, not order

Fix for each violation:

  • 1NF → separate user_tags table or use an array/JSON column deliberately
  • 2NF → move product_name to products
  • 3NF → move customer_email to users

When to Denormalize Deliberately

  • Read-heavy reporting tables — pre-join for speed, accept write overhead
  • Event logs / audit trails — snapshot the value at the time, not a FK to mutable data
  • Search indexes / materialized views — denormalized copies built for queries
  • Always document why and protect with a comment or migration note

5. Indexing Rules of Thumb

  • Always index foreign keys — every FK column gets an index, or JOINs will scan.
  • Index columns in WHERE, JOIN ON, and ORDER BY — if a query filters or sorts by it, consider an index.
  • Composite indexes — column order matters; put the most selective column first.
  • Indexes cost writes — every INSERT/UPDATE/DELETE must update all indexes on the table. Don't over-index.
  • Partial indexes — index only the rows you query frequently.
-- Index the FK
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Partial index for active-only queries
CREATE INDEX idx_orders_open ON orders(placed_at)
  WHERE status = 'open';

N+1 reminder: An index doesn't fix N+1 query patterns. If your ORM fires one query per row, you have a code problem, not an index problem — use JOIN or eager loading.


6. Constraints

ConstraintPurposeExample
NOT NULLColumn must always have a valueemail TEXT NOT NULL
UNIQUENo two rows share this value (allows one NULL per SQL standard)UNIQUE(email)
CHECKArbitrary boolean conditionCHECK (quantity > 0)
REFERENCES (FK)Value must exist in the parent tableREFERENCES users(id)

FK ON DELETE Behaviors

BehaviorWhat HappensUse When
CASCADEDeleting parent deletes childrenOrder → order_items (items are meaningless without the order)
RESTRICTBlock delete if children existUser → orders (don't silently destroy order history)
SET NULLChildren's FK becomes NULLOptional parent (post author deleted, post stays)
SET DEFAULTChildren's FK gets the column defaultRare — be explicit about what "default" means

Gotcha: SET NULL requires the FK column to allow NULL. Combining NOT NULL with ON DELETE SET NULL is a constraint error.

-- Cascade: clean up children automatically
CREATE TABLE order_items (
  id       SERIAL PRIMARY KEY,
  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE
);

-- Restrict: protect parent record
CREATE TABLE orders (
  id      SERIAL PRIMARY KEY,
  user_id INT NOT NULL REFERENCES users(id) ON DELETE RESTRICT
);

7. Column Type Sanity

Use CaseRight TypeNever Use
Money / currencyNUMERIC(12, 2) or integer centsFLOAT / REAL — floating-point rounding errors will bite you
TimestampsTIMESTAMPTZ (with time zone)TIMESTAMP without tz unless you're 100% UTC-only
IDs (sequential)SERIAL / BIGSERIAL / IDENTITYVARCHAR for numeric IDs
IDs (distributed)UUIDSequential IDs when ordering across systems matters
Boolean flagsBOOLEANTINYINT(1) or CHAR(1) — use the real type
Enumerated valuesTEXT + CHECK or a proper ENUM typeMagic integers (1=active, 2=inactive) undocumented
Large textTEXTVARCHAR(255) as a default — only add length limits you actually enforce
JSON blobsJSONB (Postgres)TEXT storing JSON — you lose operators and indexing
-- Money: integer cents, never float
CREATE TABLE invoices (
  id           SERIAL PRIMARY KEY,
  amount_cents INT NOT NULL CHECK (amount_cents >= 0),
  currency     CHAR(3) NOT NULL DEFAULT 'USD'
);

-- Always store timestamps with timezone
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()

8. Migrations & Data Safety

  • Version every migration — numbered files, one direction per file (001_create_users.sql).
  • Never edit an applied migration — write a new one that corrects it.
  • Backup before destructive changesDROP COLUMN, DROP TABLE, type changes.
  • Prefer ADD COLUMN with a default over multi-step backfills when possible.
  • Soft deletes — add deleted_at TIMESTAMPTZ instead of hard-deleting rows when you need audit history.
  • Lock-safe migrations on live tables — avoid ADD COLUMN NOT NULL without a default on large tables (full table rewrite in older Postgres).
  • Test migrations on a copy of prod data before running on production.
-- Safe: add nullable column first, backfill, then add NOT NULL
ALTER TABLE users ADD COLUMN display_name TEXT;
UPDATE users SET display_name = email WHERE display_name IS NULL;
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;

-- Soft delete pattern
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMPTZ;
-- Query: WHERE deleted_at IS NULL

Build It Right, Or Don't Build It At All. 🏛️

Other rigor resources

🏛️ Build It Right, Or Don't Build It At All.