HomeBlogDatabase
Database

Database Indexing Strategies for 10M+ Records

Practical indexing patterns for PostgreSQL and MongoDB at scale — composite indexes, partial indexes, covering indexes, and when NOT to index anything at all.

Jan 12, 2025 10 min read 22.1k views
PostgreSQL MongoDB Performance SQL

At small scale, indexing mistakes are invisible. At 10 million records, a missing index turns a 2ms query into a 4-second table scan. This guide covers the indexing patterns I've applied across multiple high-traffic production systems.

The Cost of Over-Indexing

Every index speeds up reads but slows down writes. On a write-heavy table, maintaining five unnecessary indexes can cut your INSERT throughput by 40%. Always profile write workloads alongside read workloads — don't index by instinct.

⚠️ Warning

Never add an index to fix a slow query in production without first testing the write impact in staging. I've seen 'performance fixes' that halved overall throughput.

Composite Indexes: Column Order Matters

In a composite index (a, b, c), PostgreSQL can use the index for queries filtering on a, on a+b, or on a+b+c — but not on b alone. Put the highest-cardinality column first, and the column most commonly used in equality checks before range conditions.

sql
-- This index supports:
-- WHERE user_id = $1
-- WHERE user_id = $1 AND status = $2
-- WHERE user_id = $1 AND status = $2 AND created_at > $3
CREATE INDEX idx_orders_user_status_date
  ON orders(user_id, status, created_at DESC);

Partial Indexes: Index Only What You Query

If 95% of your queries filter on status = 'active' and only 3% of rows are active, a partial index on active rows is dramatically smaller and faster than a full index. It fits in memory and keeps your buffer pool clean.

sql
-- Only indexes the 3% of rows you actually query
CREATE INDEX idx_active_users
  ON users(email)
  WHERE status = 'active';

Covering Indexes to Eliminate Table Fetches

A covering index includes all the columns a query needs, so PostgreSQL can answer the query entirely from the index without touching the main table (heap). For high-frequency read queries, this can cut I/O by 60–80%.

sql
-- Query needs: user_id, email, created_at
-- This index covers all three — no heap fetch needed
CREATE INDEX idx_covering_users
  ON users(user_id) INCLUDE (email, created_at);

Found this useful?

Share it with your network