Pagination, Filtering & Sorting
Stage 3 · Backend & APIs · B.U.I.L.D. letter: L
The moment your list endpoint returns every row in the table, you've handed your database a time bomb — pagination, filtering, and sorting are what separates a real API from a memory-hungry prototype.
⚠️ The vibe trap
You write SELECT * FROM orders and it works great in development with 50 rows. Six months later, production has 2 million orders, a user hits that endpoint, Node tries to serialize 2 million objects into one JSON response, the process runs out of memory, and your whole server crashes. The second trap: you let users sort by passing ?sort=column directly into your query string — and now an attacker types ?sort=1; DROP TABLE orders-- and you have a very bad day. "Return everything and let the client filter" sounds harmless until it isn't.
📏 Offset/Limit Pagination — The Workhorse (With a Flaw)
Offset/limit is the most familiar approach. You tell the database "skip N rows, give me the next M." Every SQL database supports it natively.
-- Page 3, 20 rows per page
SELECT id, title, created_at
FROM orders
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
In your Express route this looks like:
// GET /api/orders?page=2&pageSize=20
router.get('/orders', async (req, res) => {
const MAX_PAGE_SIZE = 100;
const page = Math.max(1, parseInt(req.query.page, 10) || 1);
const pageSize = Math.min(MAX_PAGE_SIZE,
Math.max(1, parseInt(req.query.pageSize, 10) || 20));
const offset = (page - 1) * pageSize;
const [rows, countResult] = await Promise.all([
db.query(
'SELECT id, title, created_at FROM orders ORDER BY created_at DESC LIMIT $1 OFFSET $2',
[pageSize, offset]
),
db.query('SELECT COUNT(*) AS total FROM orders'),
]);
const total = parseInt(countResult.rows[0].total, 10);
res.json({
data: rows.rows,
page,
pageSize,
total,
totalPages: Math.ceil(total / pageSize),
nextCursor: null, // filled in the cursor section below
});
});
Mental model: Offset is a bookmark by position ("row 40 in line"). The problem is that at row 1,000,000 the database still has to scan and discard 1,000,000 rows to find your starting point — it just hides the work from you. For low-volume data or admin dashboards where users need "jump to page 47," offset is fine. For high-volume feeds, you need cursors.
Why it matters: Always enforce a MAX_PAGE_SIZE. A client that requests pageSize=999999 should get back pageSize=100 (or a 400 error), not a server explosion.
Common mistake: Forgetting to clamp page and pageSize to positive integers. parseInt(undefined, 10) returns NaN, and LIMIT NaN will either throw or return every row, depending on your driver.
🔑 Cursor/Keyset Pagination — The Scale Winner
Instead of "skip N rows," cursor pagination says "give me all rows where created_at < X (and break ties on id < Y)." The database uses an index, never scans skipped rows, and stays fast whether you're on page 1 or page 100,000.
-- First page (no cursor)
SELECT id, title, created_at
FROM orders
WHERE created_at IS NOT NULL
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Next page (cursor = last row of previous page)
SELECT id, title, created_at
FROM orders
WHERE (created_at, id) < ('2025-11-01T14:22:00Z', 'ord_8812')
ORDER BY created_at DESC, id DESC
LIMIT 20;
In Express, encode the cursor as a base-64 JSON blob so clients treat it as opaque:
function encodeCursor(row) {
return Buffer.from(JSON.stringify({
created_at: row.created_at,
id: row.id,
})).toString('base64url');
}
function decodeCursor(token) {
try {
return JSON.parse(Buffer.from(token, 'base64url').toString('utf8'));
} catch {
return null;
}
}
// GET /api/orders?pageSize=20&cursor=<token>
router.get('/orders', async (req, res) => {
const MAX_PAGE_SIZE = 100;
const pageSize = Math.min(MAX_PAGE_SIZE,
Math.max(1, parseInt(req.query.pageSize, 10) || 20));
const cursor = req.query.cursor ? decodeCursor(req.query.cursor) : null;
let rows;
if (cursor) {
rows = await db.query(
`SELECT id, title, created_at FROM orders
WHERE (created_at, id) < ($1, $2)
ORDER BY created_at DESC, id DESC
LIMIT $3`,
[cursor.created_at, cursor.id, pageSize]
);
} else {
rows = await db.query(
`SELECT id, title, created_at FROM orders
ORDER BY created_at DESC, id DESC
LIMIT $1`,
[pageSize]
);
}
const lastRow = rows.rows[rows.rows.length - 1];
const nextCursor = rows.rows.length === pageSize
? encodeCursor(lastRow)
: null; // no more pages
res.json({
data: rows.rows,
pageSize,
nextCursor, // null means "you're at the end"
});
});
Mental model: A cursor is a bookmark by value, not position. "Give me everything older than this timestamp + id" hits the index directly. The tradeoff: you can't jump to "page 47" — you can only go forward (or backward with a prevCursor). For infinite-scroll feeds and APIs consumed by machines, that's perfectly fine.
Why it matters: Your (created_at, id) pair must be unique and stable. Using only created_at causes rows inserted at the same millisecond to vanish or duplicate across pages. Always add id as a tiebreaker.
Common mistake: Cursors that embed business data clients can decode and manipulate to bypass row-level security. Keep cursor fields opaque (base-64 encode) and re-validate permissions inside the query, not just at the cursor level.
🔒 Safe Filtering — The Allowlist Pattern
Filtering is where SQL injection and logic bugs love to hide. The rule is simple: your code decides which columns are filterable; user input only provides the values.
// Allowed filter fields → their SQL column names
const ALLOWED_FILTERS = {
status: 'orders.status',
customer_id: 'orders.customer_id',
product_id: 'order_items.product_id',
};
// GET /api/orders?status=pending&customer_id=cust_42
router.get('/orders', async (req, res) => {
const conditions = ['1=1']; // always-true anchor
const values = [];
for (const [key, column] of Object.entries(ALLOWED_FILTERS)) {
if (req.query[key] !== undefined) {
values.push(req.query[key]);
conditions.push(`${column} = $${values.length}`);
}
}
const sql = `
SELECT orders.id, orders.status, orders.created_at
FROM orders
LEFT JOIN order_items ON order_items.order_id = orders.id
WHERE ${conditions.join(' AND ')}
ORDER BY orders.created_at DESC
LIMIT $${values.length + 1}
OFFSET $${values.length + 2}
`;
const pageSize = 20;
const page = Math.max(1, parseInt(req.query.page, 10) || 1);
values.push(pageSize, (page - 1) * pageSize);
const rows = await db.query(sql, values);
res.json({ data: rows.rows, page, pageSize });
});
Notice that column comes from your ALLOWED_FILTERS map — it is never the raw user string. The user's value is always passed as a parameterized bind variable ($1, $2, …), which the database driver escapes automatically.
Mental model: Your allowlist is the bouncer. User input never names the column — it only fills the value slot in a parameterized query. SQL injection requires controlling structure (column names, operators, keywords), not just values.
Why it matters: A query like WHERE ${req.query.field} = '${req.query.value}' is one curl request away from full table exfiltration. Parameterized queries protect values; allowlists protect structure.
Common mistake: Building the allowlist from the database schema at runtime (e.g., SELECT column_name FROM information_schema.columns). That's clever but it exposes internal column names and still doesn't enforce business rules like "customers can only filter by their own customer_id." Hard-code the allowlist; it documents intent.
🗂️ Safe Sorting — Allowlisting Columns and Directions
Sorting has the same injection risk as filtering, plus a logic risk: letting users sort by a non-indexed column on a large table will lock up your database. Both problems share the same fix — an allowlist.
const ALLOWED_SORT_COLUMNS = new Set([
'created_at',
'total_amount',
'status',
]);
const ALLOWED_DIRECTIONS = new Set(['ASC', 'DESC']);
function buildOrderClause(sortParam, directionParam) {
const column = ALLOWED_SORT_COLUMNS.has(sortParam) ? sortParam : 'created_at';
const direction = ALLOWED_DIRECTIONS.has((directionParam || '').toUpperCase())
? directionParam.toUpperCase()
: 'DESC';
// Column name is validated from an allowlist — safe to interpolate.
// Never interpolate a raw user string here, only the validated variable.
return `ORDER BY ${column} ${direction}, id DESC`;
}
// GET /api/orders?sort=total_amount&dir=ASC
router.get('/orders', async (req, res) => {
const orderClause = buildOrderClause(req.query.sort, req.query.dir);
const pageSize = 20;
const page = Math.max(1, parseInt(req.query.page, 10) || 1);
const rows = await db.query(
`SELECT id, status, total_amount, created_at
FROM orders
${orderClause}
LIMIT $1 OFFSET $2`,
[pageSize, (page - 1) * pageSize]
);
res.json({ data: rows.rows, page, pageSize });
});
The only string interpolated into the SQL is column and direction — both validated against closed sets before they ever touch the query. The user's raw req.query.sort string never enters the SQL template.
Mental model: Think of your sort allowlist as a menu. The kitchen only makes what's on the menu. If a customer orders something not listed, you give them the house special (created_at DESC) rather than making something up.
Why it matters: Even without injection intent, letting users sort by an unindexed column (SELECT ... ORDER BY notes DESC on a 10M-row table with no index on notes) triggers a full-table sort in memory. Your allowlist should only include indexed columns.
Common mistake: Allowing direction as a raw string without uppercasing and checking. ORDER BY created_at desc; DROP TABLE orders-- is a real attack vector if you only checked that direction is truthy.
📦 The Standard List-Response Envelope
Every list endpoint across your API should return the same shape. Clients — and future-you — will thank you.
// The contract every list endpoint fulfills
{
"data": [...], // the actual records, always an array
"page": 2, // current page (offset-based only)
"pageSize": 20, // records per page
"total": 843, // total matching records (offset-based only)
"totalPages": 43, // Math.ceil(total / pageSize) (offset-based only)
"nextCursor": "eyJpZCI6Ijg...", // opaque token; null when no more pages
"filters": { "status": "pending" }, // echo back applied filters
"sort": { "column": "created_at", "direction": "DESC" }
}
A few principles behind this shape:
datais always an array, even if empty. Clients should never branch onif (response.data)— they should always iterate.nextCursor: nullis an unambiguous end-of-list signal. An emptydataarray can mean "this filter matches nothing" or "you're at the last page" — the cursor resolves that ambiguity.totalis expensive on cursor-based pagination (it requires a separateCOUNT(*)with the same filters, bypassing the index benefit). Omit it for cursor endpoints and document that explicitly. Clients should usenextCursor === nullto detect the end.- Echoing
filtersandsortback in the response saves clients from having to track what they asked for — especially useful in logs and debugging.
Mental model: A list endpoint is a contract, not a firehose. The contract says: "I will return at most pageSize records, I will tell you how to get more, and I will only filter and sort by columns I've deemed safe." Every other behavior is a bug.
🛠️ Your Mission
Take one of your app's existing list endpoints — orders, products, users, posts, whatever your app has most of — and upgrade it to production-grade:
- Add
pageandpageSizequery params with a hardMAX_PAGE_SIZE = 100ceiling and aDEFAULT_PAGE_SIZE = 20. Clamp and parse defensively; never trust raw query strings as numbers. - Create an
ALLOWED_FILTERSmap for at least two fields. Wire them into a parameterizedWHEREclause using the pattern above. Confirm that requesting a field not in the map is silently ignored (or returns a 400 with a helpful message listing valid fields). - Create an
ALLOWED_SORT_COLUMNSset and a direction allowlist. Default to your primary timestamp column descending. - Return the standard envelope (
data,page,pageSize,total,nextCursor,filters,sort). - Bonus: Re-implement the endpoint using cursor-based pagination and compare query plans with
EXPLAIN ANALYZEon a table with at least 10,000 rows.
✅ You're done when…
- Requesting
?pageSize=99999returns at most 100 records and does not crash the server (API Design Checklist: bounded resource responses). - Requesting
?sort=password&dir=; DROP TABLE users--is silently rejected and defaults to the safe sort — no SQL error, no 500, no data leak (Production-Readiness Checklist: input validation on all query parameters). - Requesting
?filter[arbitrary_column]=xhas no effect — the response is identical to the unfiltered request, confirming the allowlist is working (API Design Checklist: no information leakage via error messages or unexpected data). - All three concerns — pagination, filtering, sorting — are handled in a single shared helper or middleware so that adding a new list endpoint takes under 5 minutes (Production-Readiness Checklist: code reuse and consistency across endpoints).
- The response envelope shape is identical across at least two different list endpoints in your app (API Design Checklist: consistent response format).
➡️ Next: Background Jobs & Async Work. Build It Right, Or Don't Build It At All. 🏛️