PostgreSQL
Data Type Defaults
| Need | Use | Avoid |
|---|---|---|
| Primary key | BIGINT GENERATED ALWAYS AS IDENTITY | SERIAL, BIGSERIAL |
| Timestamps | TIMESTAMPTZ | TIMESTAMP (loses timezone) |
| Text | TEXT | VARCHAR(n) unless constraint needed |
| Money | NUMERIC(precision, scale) | MONEY, FLOAT |
| Boolean | BOOLEAN with NOT NULL DEFAULT | nullable booleans |
| JSON | JSONB | JSON (no indexing), text JSON |
| UUID | gen_random_uuid() (PG13+) | uuid-ossp extension |
| IP addresses | INET / CIDR | text |
| Ranges | TSTZRANGE, INT4RANGE, etc. | pair of columns |
Schema Rules
- Every FK column gets an index (PG does NOT auto-create these)
NOT NULLon every column unless NULL has business meaningCHECKconstraints for domain rules at DB levelEXCLUDEconstraints for range overlaps:EXCLUDE USING gist (room WITH =, during WITH &&)- Default
created_at TIMESTAMPTZ NOT NULL DEFAULT now() - Separate
updated_atwith trigger, never trust app layer alone - Use
BIGINTPKs -- cheaper JOINs than UUID, better index locality - Safe migrations:
CREATE INDEX CONCURRENTLY, add columns withDEFAULT(instant PG11+). NeverALTER TYPEon large tables in-place. NULLS NOT DISTINCTon unique indexes (PG15+) — treats NULLs as equal for uniqueness- Revoke default public schema access:
REVOKE ALL ON SCHEMA public FROM public
Index Strategy
| Type | Use When |
|---|---|
| B-tree (default) | Equality, range, sorting, LIKE 'prefix%' |
| GIN | JSONB (@>, ?, ?&), arrays, full-text (tsvector) |
| GiST | Geometry, ranges, full-text (smaller but slower than GIN) |
| BRIN | Large tables with natural ordering (timestamps, serial IDs) |
Index rules:
- Composite: most selective column first, max 3-4 columns
- Partial:
WHERE status = 'active'-- smaller, faster - Covering:
INCLUDE (col)-- avoids heap lookup - Expression:
ON (lower(email))-- for function-based WHERE fillfactor = 70-90on write-heavy tables — reserves space for HOT updates, reducing index bloat- Drop unused indexes:
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0
Detect unindexed foreign keys:
SELECT conrelid::regclass, a.attname
FROM pg_constraint c
JOIN pg_attribute a ON a.attrelid = c.conrelid AND a.attnum = ANY(c.conkey)
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid AND a.attnum = ANY(i.indkey)
);
JSONB Patterns
-- GIN index for containment queries
CREATE INDEX ON items USING gin (metadata);
SELECT * FROM items WHERE metadata @> '{"status": "active"}';
-- Expression index for specific key access
CREATE INDEX ON items ((metadata->>'category'));
SELECT * FROM items WHERE metadata->>'category' = 'electronics';
Prefer typed columns over JSONB for frequently queried, well-structured data. Use JSONB for truly dynamic/variable attributes.
Use jsonb_path_ops operator class for containment-only (@>) queries — 2-3x smaller index. Use default jsonb_ops when key-existence (?, ?|) is needed.
Row-Level Security (RLS)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
ALTER TABLE orders FORCE ROW LEVEL SECURITY; -- applies to table owner too
-- Set session context (generic, no extensions needed)
SET app.current_user_id = '123';
CREATE POLICY orders_user_policy ON orders
FOR ALL
USING (user_id = current_setting('app.current_user_id')::bigint);
Performance: Policy expressions evaluate per row. Wrap function calls in a scalar subquery so PG evaluates once and caches:
-- BAD: called per row
USING (get_current_user() = user_id)
-- GOOD: evaluated once, cached
USING ((SELECT get_current_user()) = user_id)
Always index columns referenced in RLS policies. For complex multi-table checks, use SECURITY DEFINER helper functions.
Query Optimization
- Always
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)before optimizing - Sequential scan on large table -> add index or check
WHEREfor function wrapping - High
rows removed by filter-> index doesn't match predicate CTEis an optimization fence before PG12; useMATERIALIZED/NOT MATERIALIZEDhints (PG12+)- Prefer
EXISTSoverINfor correlated subqueries - Use
LATERAL JOINwhen subquery needs outer row reference - Cursor pagination (
WHERE id > $last ORDER BY id LIMIT $n) overOFFSET - Approximate row counts:
SELECT reltuples FROM pg_class WHERE relname = 'table'— avoids fullcount(*)on large tables - Materialized views for expensive aggregations:
REFRESH MATERIALIZED VIEW CONCURRENTLY(needs unique index). Schedule refresh, not per-query.
Concurrency Patterns
See concurrency-patterns.md for UPSERT, deadlock prevention, N+1 elimination, batch inserts, and queue processing with SKIP LOCKED.
Partitioning
Use when table exceeds ~100M rows or needs TTL purge:
RANGE-- time-series (by month/year), most commonLIST-- categorical (by region, tenant)HASH-- even distribution when no natural key
Partition key must be in every unique/PK constraint. Create indexes on partitions, not parent.
Transactions & Locking
- Keep transactions short -- long txns block vacuum and bloat tables
- Advisory locks for application-level mutual exclusion:
pg_advisory_xact_lock(key) - Non-blocking alternative:
pg_try_advisory_lock(key)— returns false instead of waiting - Check blocked queries:
SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock' - Monitor deadlocks:
SELECT deadlocks FROM pg_stat_database WHERE datname = current_database()
Full-Text Search
See full-text-search.md for weighted tsvector setup, query syntax, highlighting, and when to use PG full-text vs external search.
Connection Pooling
Always pool in production. Direct connections cost ~10MB each.
- PgBouncer in
transactionmode for most workloads statementmode if no session-level features (prepared statements, temp tables, advisory locks)
Prepared statement caveat: Named prepared statements are bound to a specific connection. In transaction-mode pooling, the next request may hit a different connection. Use unnamed/extended-query-protocol statements (most ORMs default to this), or deallocate immediately after use.
Operations
See operations.md for performance tuning, maintenance/monitoring, WAL, replication, and backup/recovery.
Vector Search (pgvector)
CREATE EXTENSION vector;
ALTER TABLE items ADD COLUMN embedding vector(1536); -- match your model's output dimensions
-- HNSW: better recall, higher memory. Default choice.
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);
-- IVFFlat: lower memory for large datasets. Set lists = sqrt(row_count).
CREATE INDEX ON items USING ivfflat (embedding vector_cosine_ops) WITH (lists = 1000);
Always filter BEFORE vector search (use partial indexes or CTEs with pre-filtered rows). Distance operators: <=> cosine, <-> L2, <#> inner product.
Anti-Patterns
| Anti-Pattern | Fix |
|---|---|
SERIAL / BIGSERIAL for PKs | BIGINT GENERATED ALWAYS AS IDENTITY |
| No FK indexes | Add index on every FK column |
OFFSET pagination | Cursor-based: WHERE id > $last |
SELECT * | List needed columns |
TIMESTAMP without timezone | TIMESTAMPTZ |
Functions in WHERE (lower(col)) | Expression index or citext extension |
| Storing structured data as text | JSONB with GIN index |
| Long-running transactions | Keep txns short, use idle_in_transaction_session_timeout |
| N+1 query loops | Batch with = ANY($1::bigint[]) or JOIN |
| SELECT-then-INSERT for upsert | ON CONFLICT DO UPDATE |
| Multi-tenant without RLS | Enable RLS with per-tenant policies |
Verify
Run EXPLAIN (ANALYZE, BUFFERS) on changed queries. Confirm no sequential scans on large tables and no unindexed FK columns before declaring done.