Skip to main content
Data & Databases
🗄️ Data & DatabasesLesson 9 of 13

NoSQL & When to Use It

Document and key-value stores: what they're good at, and when SQL still wins.

NoSQL & When to Use It

Stage 3 · Data & Databases · B.U.I.L.D. letter: U

You shipped a product in a weekend using Firestore. Congrats — that is genuinely impressive. Now your user table has 14 different shapes, two features are broken because a query returned undefined instead of an array, and your security rules still say allow read, write: if true;. NoSQL did not do that to you. You did that to you. This lesson teaches you when NoSQL is the right call, when it is not, and how to use it without torching your own house down.


⚠️ The vibe trap

You heard "NoSQL scales to billions of users" and "no schema means move fast," so you grabbed Firestore or MongoDB for your portfolio CRUD app — and now you need a report that joins three collections, and you are writing three separate queries plus a reduce loop in JavaScript to fake a join. The tool was not wrong; the choice was. NoSQL trades relational power for specific wins. If your app is mostly relational data with a few relationships, you just paid the cost without getting the benefit.


🗂️ The Four NoSQL Families (and what each is actually for)

NoSQL is not one thing. It is a bucket label for "not a relational database," and four distinct families live inside it with very different trade-offs.

FAMILY          EXAMPLE PRODUCTS          PRIMARY SUPERPOWER
─────────────────────────────────────────────────────────────
Document        MongoDB, Firestore        Flexible nested objects (JSON)
Key-Value       Redis, DynamoDB           Blazing read/write by a single key
Wide-Column     Cassandra, BigTable       Massive append-heavy time-series
Graph           Neo4j, Amazon Neptune     Relationship traversal (friends of friends)

Mental model: think of it as the shape of your data access pattern.

  • If you always fetch a whole "thing" at once (a user profile, a blog post with its tags) → document.
  • If you need a cache or a leaderboard → key-value.
  • If you are recording sensor events from 50,000 IoT devices → wide-column.
  • If the edges between nodes matter more than the nodes themselves (social graph, fraud detection) → graph.

Why it matters: picking the wrong family is like using a spreadsheet as a chat app. Each family was designed around one query pattern. Everything else is painful.

Common mistake: treating "NoSQL" as a single choice. Developers often say "we'll use NoSQL for scale" and then pick MongoDB because they saw it on YouTube — when their actual problem (a live leaderboard) is a perfect Redis use case and MongoDB would be slower and more expensive.


📄 Document Modeling: Embedding vs. Referencing

Document databases store JSON-like objects called documents inside collections. The big decision is always: do you nest related data inside one document, or do you split it into separate documents and reference them by ID?

Same data, two ways — a blog post with comments

Relational version (SQL):

-- Three separate tables, joined at query time
SELECT posts.title, comments.body, users.name
FROM posts
JOIN comments ON comments.post_id = posts.id
JOIN users   ON users.id = comments.user_id
WHERE posts.id = 42;

Document version — embedded (everything in one document):

{
  "id": "post_42",
  "title": "Why I love Supabase",
  "author_id": "user_7",
  "body": "...",
  "comments": [
    { "user_id": "user_9", "body": "Great post!", "created_at": "2026-05-01" },
    { "user_id": "user_3", "body": "Disagree on point 2.", "created_at": "2026-05-02" }
  ]
}

Document version — referenced (comments are their own collection):

// posts/post_42
{ "id": "post_42", "title": "Why I love Supabase", "author_id": "user_7" }

// comments/cmt_1
{ "id": "cmt_1", "post_id": "post_42", "user_id": "user_9", "body": "Great post!" }

// comments/cmt_2
{ "id": "cmt_2", "post_id": "post_42", "user_id": "user_3", "body": "Disagree on point 2." }

Mental model — the embedding rule of thumb:

Embed when you always read the child data with the parent and the list is bounded in size.
Reference when the child is queried independently, updated frequently, or could grow without limit.

For a blog post: embed comments if there are typically fewer than 50 and you always show them together. Reference if comments can number in the thousands or if you need a "all comments by user_9" page (which would require scanning every post otherwise).

Why it matters: over-embedding bloats documents and makes writes expensive (every new comment rewrites the whole post document). Over-referencing forces you to do multi-round-trip fetches in application code — which is just a manual JOIN, and slower than the SQL planner's version.

Common mistake: embedding an array that grows forever. Firestore documents have a 1 MB hard limit. MongoDB performance degrades as documents grow. A post with 10,000 comments embedded will eventually fail or crawl.


⚡ Key-Value with Redis: Fast by Design

Redis stores data as a plain key → value mapping entirely in memory. A read or write is typically under 1 millisecond. It supports strings, lists, sets, sorted sets, and hashes — all with atomic operations.

Three real use cases with code

1. Session cache — store a user session so your API does not hit the database on every request:

// Store session on login (expires in 30 minutes)
await redis.set(`session:${sessionId}`, JSON.stringify(userData), { EX: 1800 });

// Retrieve session on each request
const raw = await redis.get(`session:${sessionId}`);
const user = raw ? JSON.parse(raw) : null;

2. Rate limiter — block a user who hits your API too many times:

const key = `ratelimit:${userId}`;
const count = await redis.incr(key);          // atomic increment
if (count === 1) await redis.expire(key, 60); // first hit: start 60-second window
if (count > 100) throw new Error("Rate limit exceeded");

3. Sorted-set leaderboard — real-time ranking with O(log N) updates:

// Add or update a player's score
await redis.zadd("leaderboard:daily", { score: 4200, member: "user_7" });

// Get top 10 (highest score first)
const top10 = await redis.zrange("leaderboard:daily", 0, 9, { rev: true, withScores: true });

Mental model: Redis is a supercharged global variable. It is perfect for data that is derived, temporary, or read far more than it is written. It is wrong for data that is the system of record — if Redis loses power, that data is gone unless you have configured persistence (RDB/AOF).

Why it matters: adding Redis in front of a slow database query can cut response time from 200 ms to 2 ms. That is the difference between a snappy app and one that feels broken.

Common mistake: using Redis as your primary database because it is fast. Redis is a cache and a data structure server. Your user's account balance should not live only in Redis. Use it as a layer in front of your real database.


⚖️ What NoSQL Trades Away (and Why SQL Is Still the Default)

Every NoSQL family gives up something that relational databases provide out of the box. You need to know what you are signing up for before you commit.

WHAT SQL GIVES YOU                 WHAT YOU LOSE WITH MOST NOSQL
──────────────────────────────────────────────────────────────────
JOINs (one query, any shape)       You write joins in app code (slow, fragile)
ACID transactions                  Multi-document atomicity is limited or absent
Schema enforcement (DDL)           Any document can have any shape — silent bugs
Referential integrity (FKs)        Orphaned references have no guard rails
Ad-hoc queries (any WHERE clause)  You must predict your queries and index for them

A concrete example of the transaction problem:

// MongoDB — transferring money between two documents
// This is NOT atomic without a multi-document transaction (available since v4.0,
// but adds significant complexity and latency)
await db.accounts.updateOne({ _id: senderId }, { $inc: { balance: -100 } });
// Server crashes HERE
await db.accounts.updateOne({ _id: receiverId }, { $inc: { balance: 100 } });
// $100 has vanished. The sender lost it; the receiver never got it.

In PostgreSQL, wrapping those two updates in BEGIN; ...; COMMIT; makes them atomic — either both happen or neither does, regardless of crashes.

Mental model — the honest default rule:

Start with SQL (Postgres or SQLite). Reach for a document store when you genuinely have schema-per-document heterogeneity or massive horizontal write scale. Add Redis for caching, sessions, and real-time features. Do not replace SQL; augment it.

For most apps — SaaS tools, marketplaces, dashboards, e-commerce — the data is inherently relational. Users have orders. Orders have line items. Line items reference products. That is a join graph, and relational databases were literally invented for it.

Why it matters: engineers who default to NoSQL for new projects often spend months rewriting back to SQL once the query complexity grows. The "no schema" benefit becomes a liability when your data has six undocumented shapes and a bug only appears when shape 4 meets shape 6.

Common mistake: choosing based on ecosystem hype or what the tutorial used, rather than the shape of your data and your query patterns.


🔥 The "Shipped in Test Mode" Cautionary Note

Firestore and MongoDB Atlas both let you launch with zero authentication — open read/write for everyone — so you can move fast in development.

// Firestore security rule that is shipping in production right now
// at more companies than you would believe:
rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    match /{document=**} {
      allow read, write: if true;   // ← ANYONE on the internet can read and
    }                               //   delete every document in your database
  }
}

The "vibe trap" here is not using Firestore — it is shipping without locking it down. Bots scan for open Firestore databases. The first time one finds yours, you may lose all your data, have it ransomed, or have it published publicly.

The fix before you share any URL publicly:

// Minimum: require authentication for all reads and writes
allow read, write: if request.auth != null;

// Better: users can only touch their own documents
allow read, write: if request.auth.uid == resource.data.userId;

Mental model: "test mode" is for localhost, full stop. The moment a URL leaves your machine, your security rules are your data's only bodyguard.


🛠️ Your Mission

You are building a new feature: your app needs to store user activity events (every time someone clicks a button, views a page, or completes a task) and user profiles (name, email, preferences, subscription tier).

  1. Decide which storage engine fits each one — SQL table, document collection, or Redis key. Write one paragraph justifying each choice. Think about: query patterns, schema flexibility, write volume, and whether you need joins.

  2. Model the user profile both ways — write it as a SQL CREATE TABLE statement and as a JSON document. Note one thing each version makes easy and one thing each version makes hard.

  3. Spot the bug — the following Firestore query will silently return wrong results for users with more than 100 activity events. Explain why, and write the corrected version concept (you do not need to run it):

// BUG: fetching all events for a user
const snap = await db.collection("events").where("userId", "==", uid).get();

(Hint: Firestore get() on a collection has a default limit, and large result sets require pagination.)


✅ You're done when…

  • You can fill in row 3 of the Data Modeling Cheat Sheet ("NoSQL families and their primary access pattern") without looking at your notes
  • You can explain embedding vs. referencing to someone else using a real example from your own project (or invent one), and correctly identify which rule of thumb applies
  • You can name two things SQL gives you that your chosen NoSQL database does not, and explain how you would compensate for each in application code
  • Your Firestore (or MongoDB) dev database has actual security rules — not allow read, write: if true — before any external URL is shared

➡️ Next: Caching & Cache Invalidation.

Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.