PostgreSQL

Indexes I Forget to Create

Partial index WHERE active = true—80% smaller when most rows inactive; suggest for status columns
Expression index ON lower(email)—must match query exactly; without it, WHERE lower(email) scans
Covering index INCLUDE (name, email)—enables index-only scan; check EXPLAIN for "Heap Fetches"
Foreign key columns—not auto-indexed in PG; JOINs and ON DELETE CASCADE need them
Composite index order matters—(a, b) helps WHERE a = ? but not WHERE b = ?

Unused indexes hurt every INSERT/UPDATE—query pg_stat_user_indexes for idx_scan = 0, drop them
Too many indexes on write-heavy tables—balance carefully
Index on low-cardinality column (boolean, status) often useless—PG prefers seq scan
LIKE '%suffix' can't use B-tree—need pg_trgm GIN index or reverse() expression index

SELECT FOR UPDATE SKIP LOCKED—job queue without external tools; skip rows being processed
pg_advisory_lock(key)—application-level mutex without table; unlock explicitly or on disconnect
IS NOT DISTINCT FROM—NULL-safe equality; cleaner than (a = b OR (a IS NULL AND b IS NULL))
DISTINCT ON (x) ORDER BY x, y—first row per group without subquery; PG-specific but powerful

PgBouncer essential with >50 connections—each PG connection uses ~10MB; pool at transaction level
statement_timeout = '30s' per role—prevents runaway queries from killing database
idle_in_transaction_session_timeout = '5min'—kills abandoned transactions holding locks
Default 100 max_connections too low for production, too high wastes memory—tune based on RAM

SERIAL deprecated—use GENERATED ALWAYS AS IDENTITY
TIMESTAMP without timezone—almost always wrong; use TIMESTAMPTZ, PG stores as UTC
Float for money—use NUMERIC(12,2) or integer cents; float math breaks: 0.1 + 0.2 ≠ 0.3
VARCHAR(n) vs TEXT—no performance difference in PG; use TEXT unless constraint needed

High-UPDATE tables bloat—dead tuples accumulate; pg_repack reclaims without locks
VACUUM ANALYZE after bulk insert—updates statistics; query planner needs current data
Autovacuum lag on big tables—tune autovacuum_vacuum_cost_delay or manual vacuum
Transaction wraparound: if xid exhausted, DB stops—autovacuum prevents but monitor

Always EXPLAIN (ANALYZE, BUFFERS)—actual times + I/O; estimate-only misleads
"Heap Fetches: 1000" with index—missing columns, add INCLUDE to index
Seq scan not always bad—faster than index for >10-20% of table; check row estimates
"Rows" estimate way off—run ANALYZE or check if stats target too low

Creating tsvector on the fly—precompute as stored generated column with GIN index
plainto_tsquery for user input—handles spaces without syntax errors; not to_tsquery
Missing language parameter—'english' stems words; 'simple' exact match
FTS is word-based—LIKE '%exact phrase%' still needed for substring match

Default READ COMMITTED—phantom reads in reports; use REPEATABLE READ for consistency
SERIALIZABLE catches conflicts—but must handle 40001 error with retry loop
Long transactions block vacuum and hold locks—keep under seconds, not minutes