Skip to main content
Backend & APIs
⚙️ Backend & APIsLesson 8 of 13

Pagination, Filtering & Sorting

Real APIs return a LOT of data. Page, filter, and sort it without melting your server.

Pagination, Filtering & Sorting

Stage 3 · Backend & APIs · B.U.I.L.D. letter: L

The moment your list endpoint returns every row in the table, you've handed your database a time bomb — pagination, filtering, and sorting are what separates a real API from a memory-hungry prototype.


⚠️ The vibe trap

You write SELECT * FROM orders and it works great in development with 50 rows. Six months later, production has 2 million orders, a user hits that endpoint, Node tries to serialize 2 million objects into one JSON response, the process runs out of memory, and your whole server crashes. The second trap: you let users sort by passing ?sort=column directly into your query string — and now an attacker types ?sort=1; DROP TABLE orders-- and you have a very bad day. "Return everything and let the client filter" sounds harmless until it isn't.


📏 Offset/Limit Pagination — The Workhorse (With a Flaw)

Offset/limit is the most familiar approach. You tell the database "skip N rows, give me the next M." Every SQL database supports it natively.

-- Page 3, 20 rows per page
SELECT id, title, created_at
FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;

In your Express route this looks like:

// GET /api/orders?page=2&pageSize=20
router.get('/orders', async (req, res) => {
  const MAX_PAGE_SIZE = 100;
  const page     = Math.max(1, parseInt(req.query.page, 10)     || 1);
  const pageSize = Math.min(MAX_PAGE_SIZE,
                    Math.max(1, parseInt(req.query.pageSize, 10) || 20));
  const offset   = (page - 1) * pageSize;

  const [rows, countResult] = await Promise.all([
    db.query(
      'SELECT id, title, created_at FROM orders ORDER BY created_at DESC LIMIT $1 OFFSET $2',
      [pageSize, offset]
    ),
    db.query('SELECT COUNT(*) AS total FROM orders'),
  ]);

  const total = parseInt(countResult.rows[0].total, 10);

  res.json({
    data:     rows.rows,
    page,
    pageSize,
    total,
    totalPages: Math.ceil(total / pageSize),
    nextCursor: null, // filled in the cursor section below
  });
});

Mental model: Offset is a bookmark by position ("row 40 in line"). The problem is that at row 1,000,000 the database still has to scan and discard 1,000,000 rows to find your starting point — it just hides the work from you. For low-volume data or admin dashboards where users need "jump to page 47," offset is fine. For high-volume feeds, you need cursors.

Why it matters: Always enforce a MAX_PAGE_SIZE. A client that requests pageSize=999999 should get back pageSize=100 (or a 400 error), not a server explosion.

Common mistake: Forgetting to clamp page and pageSize to positive integers. parseInt(undefined, 10) returns NaN, and LIMIT NaN will either throw or return every row, depending on your driver.


🔑 Cursor/Keyset Pagination — The Scale Winner

Instead of "skip N rows," cursor pagination says "give me all rows where created_at < X (and break ties on id < Y)." The database uses an index, never scans skipped rows, and stays fast whether you're on page 1 or page 100,000.

-- First page (no cursor)
SELECT id, title, created_at
FROM orders
WHERE created_at IS NOT NULL
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page (cursor = last row of previous page)
SELECT id, title, created_at
FROM orders
WHERE (created_at, id) < ('2025-11-01T14:22:00Z', 'ord_8812')
ORDER BY created_at DESC, id DESC
LIMIT 20;

In Express, encode the cursor as a base-64 JSON blob so clients treat it as opaque:

function encodeCursor(row) {
  return Buffer.from(JSON.stringify({
    created_at: row.created_at,
    id: row.id,
  })).toString('base64url');
}

function decodeCursor(token) {
  try {
    return JSON.parse(Buffer.from(token, 'base64url').toString('utf8'));
  } catch {
    return null;
  }
}

// GET /api/orders?pageSize=20&cursor=<token>
router.get('/orders', async (req, res) => {
  const MAX_PAGE_SIZE = 100;
  const pageSize = Math.min(MAX_PAGE_SIZE,
                    Math.max(1, parseInt(req.query.pageSize, 10) || 20));
  const cursor   = req.query.cursor ? decodeCursor(req.query.cursor) : null;

  let rows;
  if (cursor) {
    rows = await db.query(
      `SELECT id, title, created_at FROM orders
       WHERE (created_at, id) < ($1, $2)
       ORDER BY created_at DESC, id DESC
       LIMIT $3`,
      [cursor.created_at, cursor.id, pageSize]
    );
  } else {
    rows = await db.query(
      `SELECT id, title, created_at FROM orders
       ORDER BY created_at DESC, id DESC
       LIMIT $1`,
      [pageSize]
    );
  }

  const lastRow    = rows.rows[rows.rows.length - 1];
  const nextCursor = rows.rows.length === pageSize
    ? encodeCursor(lastRow)
    : null; // no more pages

  res.json({
    data:       rows.rows,
    pageSize,
    nextCursor, // null means "you're at the end"
  });
});

Mental model: A cursor is a bookmark by value, not position. "Give me everything older than this timestamp + id" hits the index directly. The tradeoff: you can't jump to "page 47" — you can only go forward (or backward with a prevCursor). For infinite-scroll feeds and APIs consumed by machines, that's perfectly fine.

Why it matters: Your (created_at, id) pair must be unique and stable. Using only created_at causes rows inserted at the same millisecond to vanish or duplicate across pages. Always add id as a tiebreaker.

Common mistake: Cursors that embed business data clients can decode and manipulate to bypass row-level security. Keep cursor fields opaque (base-64 encode) and re-validate permissions inside the query, not just at the cursor level.


🔒 Safe Filtering — The Allowlist Pattern

Filtering is where SQL injection and logic bugs love to hide. The rule is simple: your code decides which columns are filterable; user input only provides the values.

// Allowed filter fields → their SQL column names
const ALLOWED_FILTERS = {
  status:      'orders.status',
  customer_id: 'orders.customer_id',
  product_id:  'order_items.product_id',
};

// GET /api/orders?status=pending&customer_id=cust_42
router.get('/orders', async (req, res) => {
  const conditions = ['1=1']; // always-true anchor
  const values     = [];

  for (const [key, column] of Object.entries(ALLOWED_FILTERS)) {
    if (req.query[key] !== undefined) {
      values.push(req.query[key]);
      conditions.push(`${column} = $${values.length}`);
    }
  }

  const sql = `
    SELECT orders.id, orders.status, orders.created_at
    FROM orders
    LEFT JOIN order_items ON order_items.order_id = orders.id
    WHERE ${conditions.join(' AND ')}
    ORDER BY orders.created_at DESC
    LIMIT $${values.length + 1}
    OFFSET $${values.length + 2}
  `;

  const pageSize = 20;
  const page     = Math.max(1, parseInt(req.query.page, 10) || 1);
  values.push(pageSize, (page - 1) * pageSize);

  const rows = await db.query(sql, values);
  res.json({ data: rows.rows, page, pageSize });
});

Notice that column comes from your ALLOWED_FILTERS map — it is never the raw user string. The user's value is always passed as a parameterized bind variable ($1, $2, …), which the database driver escapes automatically.

Mental model: Your allowlist is the bouncer. User input never names the column — it only fills the value slot in a parameterized query. SQL injection requires controlling structure (column names, operators, keywords), not just values.

Why it matters: A query like WHERE ${req.query.field} = '${req.query.value}' is one curl request away from full table exfiltration. Parameterized queries protect values; allowlists protect structure.

Common mistake: Building the allowlist from the database schema at runtime (e.g., SELECT column_name FROM information_schema.columns). That's clever but it exposes internal column names and still doesn't enforce business rules like "customers can only filter by their own customer_id." Hard-code the allowlist; it documents intent.


🗂️ Safe Sorting — Allowlisting Columns and Directions

Sorting has the same injection risk as filtering, plus a logic risk: letting users sort by a non-indexed column on a large table will lock up your database. Both problems share the same fix — an allowlist.

const ALLOWED_SORT_COLUMNS = new Set([
  'created_at',
  'total_amount',
  'status',
]);

const ALLOWED_DIRECTIONS = new Set(['ASC', 'DESC']);

function buildOrderClause(sortParam, directionParam) {
  const column    = ALLOWED_SORT_COLUMNS.has(sortParam)    ? sortParam    : 'created_at';
  const direction = ALLOWED_DIRECTIONS.has((directionParam || '').toUpperCase())
    ? directionParam.toUpperCase()
    : 'DESC';

  // Column name is validated from an allowlist — safe to interpolate.
  // Never interpolate a raw user string here, only the validated variable.
  return `ORDER BY ${column} ${direction}, id DESC`;
}

// GET /api/orders?sort=total_amount&dir=ASC
router.get('/orders', async (req, res) => {
  const orderClause = buildOrderClause(req.query.sort, req.query.dir);
  const pageSize    = 20;
  const page        = Math.max(1, parseInt(req.query.page, 10) || 1);

  const rows = await db.query(
    `SELECT id, status, total_amount, created_at
     FROM orders
     ${orderClause}
     LIMIT $1 OFFSET $2`,
    [pageSize, (page - 1) * pageSize]
  );

  res.json({ data: rows.rows, page, pageSize });
});

The only string interpolated into the SQL is column and direction — both validated against closed sets before they ever touch the query. The user's raw req.query.sort string never enters the SQL template.

Mental model: Think of your sort allowlist as a menu. The kitchen only makes what's on the menu. If a customer orders something not listed, you give them the house special (created_at DESC) rather than making something up.

Why it matters: Even without injection intent, letting users sort by an unindexed column (SELECT ... ORDER BY notes DESC on a 10M-row table with no index on notes) triggers a full-table sort in memory. Your allowlist should only include indexed columns.

Common mistake: Allowing direction as a raw string without uppercasing and checking. ORDER BY created_at desc; DROP TABLE orders-- is a real attack vector if you only checked that direction is truthy.


📦 The Standard List-Response Envelope

Every list endpoint across your API should return the same shape. Clients — and future-you — will thank you.

// The contract every list endpoint fulfills
{
  "data":        [...],          // the actual records, always an array
  "page":        2,              // current page (offset-based only)
  "pageSize":    20,             // records per page
  "total":       843,            // total matching records (offset-based only)
  "totalPages":  43,             // Math.ceil(total / pageSize) (offset-based only)
  "nextCursor":  "eyJpZCI6Ijg...", // opaque token; null when no more pages
  "filters":     { "status": "pending" }, // echo back applied filters
  "sort":        { "column": "created_at", "direction": "DESC" }
}

A few principles behind this shape:

  • data is always an array, even if empty. Clients should never branch on if (response.data) — they should always iterate.
  • nextCursor: null is an unambiguous end-of-list signal. An empty data array can mean "this filter matches nothing" or "you're at the last page" — the cursor resolves that ambiguity.
  • total is expensive on cursor-based pagination (it requires a separate COUNT(*) with the same filters, bypassing the index benefit). Omit it for cursor endpoints and document that explicitly. Clients should use nextCursor === null to detect the end.
  • Echoing filters and sort back in the response saves clients from having to track what they asked for — especially useful in logs and debugging.

Mental model: A list endpoint is a contract, not a firehose. The contract says: "I will return at most pageSize records, I will tell you how to get more, and I will only filter and sort by columns I've deemed safe." Every other behavior is a bug.


🛠️ Your Mission

Take one of your app's existing list endpoints — orders, products, users, posts, whatever your app has most of — and upgrade it to production-grade:

  1. Add page and pageSize query params with a hard MAX_PAGE_SIZE = 100 ceiling and a DEFAULT_PAGE_SIZE = 20. Clamp and parse defensively; never trust raw query strings as numbers.
  2. Create an ALLOWED_FILTERS map for at least two fields. Wire them into a parameterized WHERE clause using the pattern above. Confirm that requesting a field not in the map is silently ignored (or returns a 400 with a helpful message listing valid fields).
  3. Create an ALLOWED_SORT_COLUMNS set and a direction allowlist. Default to your primary timestamp column descending.
  4. Return the standard envelope (data, page, pageSize, total, nextCursor, filters, sort).
  5. Bonus: Re-implement the endpoint using cursor-based pagination and compare query plans with EXPLAIN ANALYZE on a table with at least 10,000 rows.

✅ You're done when…

  • Requesting ?pageSize=99999 returns at most 100 records and does not crash the server (API Design Checklist: bounded resource responses).
  • Requesting ?sort=password&dir=; DROP TABLE users-- is silently rejected and defaults to the safe sort — no SQL error, no 500, no data leak (Production-Readiness Checklist: input validation on all query parameters).
  • Requesting ?filter[arbitrary_column]=x has no effect — the response is identical to the unfiltered request, confirming the allowlist is working (API Design Checklist: no information leakage via error messages or unexpected data).
  • All three concerns — pagination, filtering, sorting — are handled in a single shared helper or middleware so that adding a new list endpoint takes under 5 minutes (Production-Readiness Checklist: code reuse and consistency across endpoints).
  • The response envelope shape is identical across at least two different list endpoints in your app (API Design Checklist: consistent response format).

➡️ Next: Background Jobs & Async Work. Build It Right, Or Don't Build It At All. 🏛️

Always-on rigor toolkit

🏛️ Build It Right, Or Don't Build It At All.