Data Modeling Cheat Sheet
Model the data right first — everything else is downstream of this.
A well-designed schema is invisible. A poorly designed one poisons every query, every feature, and every migration forever. This sheet is your quick-reference for the decisions that matter most.
1. Modeling Steps (in order)
- Find entities — read the requirements, circle every noun:
User,Order,Product,Invoice. - Add attributes — what facts does each entity own?
User.email,Order.placed_at. - Map relationships — how do entities connect? One-to-one, one-to-many, many-to-many?
- Assign keys — every table needs a PK; FKs enforce the relationships.
- Normalize — remove redundancy (see §4).
- Add constraints — NOT NULL, UNIQUE, CHECK, FK rules.
- Choose types — pick the right column type from the start (see §7).
2. Keys
| Key Type | What It Is | When to Use |
|---|---|---|
| Primary Key (PK) | Uniquely identifies every row | Every table — no exceptions |
| Natural key | A real-world unique value (email, isbn) | When it is truly stable and unique |
| Surrogate key | System-generated (id SERIAL, UUID) | Preferred default — safe from real-world changes |
| Foreign Key (FK) | Points to a PK in another table | Whenever two tables are related |
-- Surrogate PK (preferred)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE
);
-- Natural key (only when it truly never changes)
CREATE TABLE countries (
iso_code CHAR(2) PRIMARY KEY, -- 'US', 'GB' — stable enough
name TEXT NOT NULL
);
3. Relationships Quick Reference
One-to-One (1:1)
Rare. Use when you need to split a wide table or restrict access to sensitive columns.
CREATE TABLE user_profiles (
user_id INT PRIMARY KEY REFERENCES users(id),
bio TEXT
);
One-to-Many (1:N)
Most common. FK goes on the "many" side.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id), -- FK on orders (the many)
placed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Many-to-Many (M:N)
Requires a join table. The join table holds two FKs and usually its own PK plus any relationship attributes.
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id),
product_id INT NOT NULL REFERENCES products(id),
quantity INT NOT NULL DEFAULT 1
);
4. Normalization in One Screen
| Form | Plain-English Rule | Violation Example |
|---|---|---|
| 1NF | Every cell holds one atomic value; no repeating groups | tags = 'sql,python,data' in one column |
| 2NF | Every non-key column depends on the whole PK (matters for composite PKs) | order_items(order_id, product_id, product_name) — product_name depends only on product_id |
| 3NF | No column depends on another non-key column (no transitive deps) | orders(id, customer_id, customer_email) — email depends on customer, not order |
Fix for each violation:
- 1NF → separate
user_tagstable or use an array/JSON column deliberately - 2NF → move
product_nametoproducts - 3NF → move
customer_emailtousers
When to Denormalize Deliberately
- Read-heavy reporting tables — pre-join for speed, accept write overhead
- Event logs / audit trails — snapshot the value at the time, not a FK to mutable data
- Search indexes / materialized views — denormalized copies built for queries
- Always document why and protect with a comment or migration note
5. Indexing Rules of Thumb
- Always index foreign keys — every FK column gets an index, or JOINs will scan.
- Index columns in WHERE, JOIN ON, and ORDER BY — if a query filters or sorts by it, consider an index.
- Composite indexes — column order matters; put the most selective column first.
- Indexes cost writes — every INSERT/UPDATE/DELETE must update all indexes on the table. Don't over-index.
- Partial indexes — index only the rows you query frequently.
-- Index the FK
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Partial index for active-only queries
CREATE INDEX idx_orders_open ON orders(placed_at)
WHERE status = 'open';
N+1 reminder: An index doesn't fix N+1 query patterns. If your ORM fires one query per row, you have a code problem, not an index problem — use JOIN or eager loading.
6. Constraints
| Constraint | Purpose | Example |
|---|---|---|
NOT NULL | Column must always have a value | email TEXT NOT NULL |
UNIQUE | No two rows share this value (allows one NULL per SQL standard) | UNIQUE(email) |
CHECK | Arbitrary boolean condition | CHECK (quantity > 0) |
REFERENCES (FK) | Value must exist in the parent table | REFERENCES users(id) |
FK ON DELETE Behaviors
| Behavior | What Happens | Use When |
|---|---|---|
CASCADE | Deleting parent deletes children | Order → order_items (items are meaningless without the order) |
RESTRICT | Block delete if children exist | User → orders (don't silently destroy order history) |
SET NULL | Children's FK becomes NULL | Optional parent (post author deleted, post stays) |
SET DEFAULT | Children's FK gets the column default | Rare — be explicit about what "default" means |
Gotcha: SET NULL requires the FK column to allow NULL. Combining NOT NULL with ON DELETE SET NULL is a constraint error.
-- Cascade: clean up children automatically
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE
);
-- Restrict: protect parent record
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE RESTRICT
);
7. Column Type Sanity
| Use Case | Right Type | Never Use |
|---|---|---|
| Money / currency | NUMERIC(12, 2) or integer cents | FLOAT / REAL — floating-point rounding errors will bite you |
| Timestamps | TIMESTAMPTZ (with time zone) | TIMESTAMP without tz unless you're 100% UTC-only |
| IDs (sequential) | SERIAL / BIGSERIAL / IDENTITY | VARCHAR for numeric IDs |
| IDs (distributed) | UUID | Sequential IDs when ordering across systems matters |
| Boolean flags | BOOLEAN | TINYINT(1) or CHAR(1) — use the real type |
| Enumerated values | TEXT + CHECK or a proper ENUM type | Magic integers (1=active, 2=inactive) undocumented |
| Large text | TEXT | VARCHAR(255) as a default — only add length limits you actually enforce |
| JSON blobs | JSONB (Postgres) | TEXT storing JSON — you lose operators and indexing |
-- Money: integer cents, never float
CREATE TABLE invoices (
id SERIAL PRIMARY KEY,
amount_cents INT NOT NULL CHECK (amount_cents >= 0),
currency CHAR(3) NOT NULL DEFAULT 'USD'
);
-- Always store timestamps with timezone
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
8. Migrations & Data Safety
- Version every migration — numbered files, one direction per file (
001_create_users.sql). - Never edit an applied migration — write a new one that corrects it.
- Backup before destructive changes —
DROP COLUMN,DROP TABLE, type changes. - Prefer
ADD COLUMNwith a default over multi-step backfills when possible. - Soft deletes — add
deleted_at TIMESTAMPTZinstead of hard-deleting rows when you need audit history. - Lock-safe migrations on live tables — avoid
ADD COLUMN NOT NULLwithout a default on large tables (full table rewrite in older Postgres). - Test migrations on a copy of prod data before running on production.
-- Safe: add nullable column first, backfill, then add NOT NULL
ALTER TABLE users ADD COLUMN display_name TEXT;
UPDATE users SET display_name = email WHERE display_name IS NULL;
ALTER TABLE users ALTER COLUMN display_name SET NOT NULL;
-- Soft delete pattern
ALTER TABLE posts ADD COLUMN deleted_at TIMESTAMPTZ;
-- Query: WHERE deleted_at IS NULL
Build It Right, Or Don't Build It At All. 🏛️