SQL Guide
Applies to: PostgreSQL 15+, MySQL 8+, Database Migrations, Query Optimization
Core Principles
-
Parameterized Queries Always: Never concatenate user input into SQL strings -- use bind parameters ($1 , ? , :name ) without exception
-
Explicit Over Implicit: Name all constraints, specify column lists in INSERT, avoid SELECT * in production code
-
Migrations Are Immutable: Once applied to a shared environment, never modify a migration -- create a new one
-
Indexes Are Not Free: Every index speeds reads but slows writes; justify each index with a query plan
-
Transactions Are Boundaries: Keep transactions short, choose the correct isolation level, and always handle rollback
Guardrails
Naming Conventions
-
Tables: snake_case , plural (users , order_items , audit_logs )
-
Columns: snake_case , singular (email , created_at , is_active )
-
Primary keys: id (integer or UUID depending on project convention)
-
Foreign keys: <singular_table>_id (user_id , order_id )
-
Indexes: idx_<table>_<columns> (idx_users_email , idx_orders_user_id_created_at )
-
Unique constraints: uq_<table>_<columns> (uq_users_email )
-
Check constraints: ck_<table>_<description> (ck_orders_positive_total )
-
Boolean columns: is_ or has_ prefix (is_active , has_verified_email )
-
Timestamps: created_at , updated_at , deleted_at (always TIMESTAMPTZ )
Query Safety
-
ALWAYS use parameterized queries -- no string interpolation of user input
-
ALWAYS specify column lists in INSERT statements
-
NEVER use SELECT * in application code (acceptable in ad-hoc queries only)
-
ALWAYS add LIMIT to queries that return lists (prevent unbounded result sets)
-
NEVER use TRUNCATE or DROP in application code without explicit safeguards
-
ALWAYS use EXISTS instead of COUNT(*) > 0 for existence checks
-
ALWAYS qualify column names with table aliases in JOINs
-- GOOD: parameterized, explicit columns, bounded SELECT u.id, u.email, u.created_at FROM users u WHERE u.email = $1 AND u.is_active = true LIMIT 1;
-- BAD: string concatenation, SELECT *, no LIMIT SELECT * FROM users WHERE email = '" + userInput + "';
Indexing
-
Create indexes to support WHERE , JOIN , and ORDER BY clauses
-
Composite index column order: equality columns first, then range, then sort
-
Use UNIQUE indexes to enforce business rules at the database level
-
Prefer partial indexes when filtering on a known subset (WHERE is_active = true )
-
Use CONCURRENTLY for index creation on live tables (PostgreSQL)
-
Review index usage periodically -- drop unused indexes
-- Composite index: equality (status) first, then range (created_at) CREATE INDEX CONCURRENTLY idx_orders_status_created ON orders (status, created_at DESC);
-- Partial index: only index active users CREATE INDEX idx_users_email_active ON users (email) WHERE is_active = true;
Transactions
-
Keep transactions as short as possible (no I/O or external calls inside)
-
Use READ COMMITTED for most operations (PostgreSQL default)
-
Use REPEATABLE READ when a transaction reads the same data multiple times
-
Use SERIALIZABLE for financial or inventory operations requiring strict consistency
-
Always explicitly COMMIT or ROLLBACK -- never leave transactions hanging
-
Use SAVEPOINT for partial rollback within complex transactions
BEGIN; SAVEPOINT before_update;
UPDATE accounts SET balance = balance - 100.00 WHERE id = $1; UPDATE accounts SET balance = balance + 100.00 WHERE id = $2;
-- If second update fails, roll back to savepoint -- ROLLBACK TO SAVEPOINT before_update; COMMIT;
Migrations
-
Every migration MUST have both up and down functions
-
Migrations MUST be idempotent (IF NOT EXISTS , IF EXISTS guards)
-
Never rename or drop columns in a single step -- use a multi-step process
-
Add columns as NULL first, backfill, then add NOT NULL constraint
-
Never modify a migration that has been applied to a shared environment
-
Migration filenames: YYYYMMDDHHMMSS_description.sql or sequential numbering
-
Test both up and down in development before committing
-- UP: 20250115120000_add_users_phone.sql ALTER TABLE users ADD COLUMN IF NOT EXISTS phone VARCHAR(20); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone ON users (phone) WHERE phone IS NOT NULL;
-- DOWN: 20250115120000_add_users_phone.sql DROP INDEX CONCURRENTLY IF EXISTS idx_users_phone; ALTER TABLE users DROP COLUMN IF EXISTS phone;
Schema Design
Standard Table Template
CREATE TABLE IF NOT EXISTS orders ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE, status VARCHAR(20) NOT NULL DEFAULT 'pending' CONSTRAINT ck_orders_valid_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')), total_cents INTEGER NOT NULL CONSTRAINT ck_orders_positive_total CHECK (total_cents >= 0), notes TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() );
-- Auto-update updated_at (PostgreSQL) CREATE OR REPLACE FUNCTION update_updated_at() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = now(); RETURN NEW; END; $$ LANGUAGE plpgsql;
CREATE TRIGGER trg_orders_updated_at BEFORE UPDATE ON orders FOR EACH ROW EXECUTE FUNCTION update_updated_at();
Design Rules
-
Use BIGINT GENERATED ALWAYS AS IDENTITY for auto-increment PKs (PostgreSQL 10+)
-
Use UUID PKs when IDs are exposed externally or distributed systems require it
-
Use TIMESTAMPTZ (not TIMESTAMP ) for all time columns -- always store UTC
-
Add NOT NULL constraints by default; allow NULL only when semantically meaningful
-
Use CHECK constraints to enforce domain rules at the database level
-
Use ENUM types sparingly -- prefer VARCHAR with CHECK for easier migration
-
Add ON DELETE CASCADE or ON DELETE SET NULL explicitly to every foreign key
-
Use soft deletes (deleted_at TIMESTAMPTZ ) when audit trail is required
Key Patterns
Common Table Expressions (CTEs)
Prefer CTEs over subqueries for readability and maintainability.
-- Readable: each step has a name WITH active_orders AS ( SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent FROM orders WHERE status != 'cancelled' AND created_at >= now() - INTERVAL '30 days' GROUP BY user_id ), high_value_users AS ( SELECT user_id FROM active_orders WHERE total_spent > 50000 -- over $500 ) SELECT u.id, u.email, ao.order_count, ao.total_spent FROM users u JOIN active_orders ao ON ao.user_id = u.id WHERE u.id IN (SELECT user_id FROM high_value_users) ORDER BY ao.total_spent DESC;
Window Functions
Use window functions for ranking, running totals, and row comparisons without self-joins.
-- Rank users by spending within each region SELECT u.id, u.region, SUM(o.total_cents) AS total_spent, RANK() OVER (PARTITION BY u.region ORDER BY SUM(o.total_cents) DESC) AS rank FROM users u JOIN orders o ON o.user_id = u.id GROUP BY u.id, u.region;
-- Running total of daily revenue SELECT date_trunc('day', created_at) AS day, SUM(total_cents) AS daily_revenue, SUM(SUM(total_cents)) OVER (ORDER BY date_trunc('day', created_at)) AS running_total FROM orders WHERE status = 'delivered' GROUP BY date_trunc('day', created_at) ORDER BY day;
UPSERT (INSERT ... ON CONFLICT)
-- PostgreSQL: insert or update on conflict INSERT INTO user_preferences (user_id, theme, language) VALUES ($1, $2, $3) ON CONFLICT (user_id) DO UPDATE SET theme = EXCLUDED.theme, language = EXCLUDED.language, updated_at = now();
Batch Operations
-- Batch insert with unnest (PostgreSQL) INSERT INTO tags (name, category) SELECT unnest($1::text[]), unnest($2::text[]) ON CONFLICT (name) DO NOTHING;
-- Batch update with VALUES list UPDATE products AS p SET price_cents = v.new_price FROM (VALUES (1, 2999), (2, 4999), (3, 1499) ) AS v(id, new_price) WHERE p.id = v.id;
Parameterized Queries (Application Code)
Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries
cursor.execute( "SELECT id, email FROM users WHERE email = %s AND is_active = %s", (email, True), )
Node.js (pg)
const result = await pool.query( 'SELECT id, email FROM users WHERE email = $1 AND is_active = $2', [email, true] );
Go (database/sql)
row := db.QueryRowContext(ctx, "SELECT id, email FROM users WHERE email = $1 AND is_active = $2", email, true, )
Performance
EXPLAIN ANALYZE
Always use EXPLAIN ANALYZE to validate query plans before deploying.
-- Check execution plan and actual timing EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.id, u.email, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at >= '2025-01-01' GROUP BY u.id, u.email ORDER BY order_count DESC LIMIT 20;
What to look for:
-
Seq Scan on large tables -- usually needs an index
-
Nested Loop with large outer sets -- consider Hash Join via index or rewrite
-
Sort operations with high cost -- add index matching ORDER BY
-
Rows Removed by Filter much larger than Actual Rows -- index the filter column
-
Buffers: shared hit vs shared read -- high reads indicate cold cache
Index Strategy
Access Pattern Index Type Example
Equality lookup B-tree (default) WHERE email = $1
Range scan B-tree WHERE created_at > $1
Full-text search GIN + tsvector WHERE search_vector @@ to_tsquery($1)
JSONB field access GIN WHERE metadata @> '{"key": "val"}'
Array containment GIN WHERE tags @> ARRAY['sql']
Geometric/spatial GiST WHERE location <-> point($1,$2) < $3
Pattern matching (LIKE 'abc%' ) B-tree with text_pattern_ops
WHERE name LIKE 'abc%'
N+1 Prevention
-- BAD: N+1 (one query per user to fetch orders) -- Application loops: for each user, SELECT orders WHERE user_id = ?
-- GOOD: single query with JOIN SELECT u.id, u.email, o.id AS order_id, o.total_cents FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.is_active = true ORDER BY u.id, o.created_at DESC;
-- GOOD: batch with ANY (when you have a list of IDs) SELECT id, email, created_at FROM users WHERE id = ANY($1::bigint[]);
Pagination
-- Cursor-based pagination (preferred for large datasets) SELECT id, email, created_at FROM users WHERE created_at < $1 -- cursor: last seen created_at AND is_active = true ORDER BY created_at DESC LIMIT 25;
-- Offset-based pagination (simpler but slower for deep pages) SELECT id, email, created_at FROM users WHERE is_active = true ORDER BY created_at DESC LIMIT 25 OFFSET $1;
Tooling
Essential Commands
PostgreSQL CLI
psql -h localhost -U myuser -d mydb # Connect psql -f migration.sql mydb # Run migration file pg_dump -Fc mydb > backup.dump # Backup (custom format) pg_restore -d mydb backup.dump # Restore
pgcli (enhanced CLI with autocomplete)
pgcli -h localhost -U myuser -d mydb
Schema inspection
\dt # List tables \d+ table_name # Describe table with details \di # List indexes \df # List functions
Migration Tools
golang-migrate
migrate create -ext sql -dir db/migrations -seq add_users_phone migrate -path db/migrations -database "$DATABASE_URL" up migrate -path db/migrations -database "$DATABASE_URL" down 1
Alembic (Python/SQLAlchemy)
alembic revision -m "add users phone" alembic upgrade head alembic downgrade -1
Prisma (Node.js)
npx prisma migrate dev --name add_users_phone npx prisma migrate deploy
dbmate
dbmate new add_users_phone dbmate up dbmate rollback
Useful Diagnostic Queries (PostgreSQL)
-- Find slow queries (requires pg_stat_statements extension) SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
-- Find unused indexes SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexname NOT LIKE '%pkey%' ORDER BY pg_relation_size(indexrelid) DESC;
-- Table sizes SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;
-- Active connections and queries SELECT pid, state, query, query_start, now() - query_start AS duration FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;
References
For detailed patterns and examples, see:
- references/patterns.md -- Window functions, CTE patterns, indexing strategies, advanced query techniques
External References
-
PostgreSQL Documentation
-
Use The Index, Luke -- SQL indexing and tuning
-
Modern SQL -- Modern SQL features across databases
-
pgMustard EXPLAIN Guide -- Reading query plans
-
SQL Style Guide (Holywell) -- Formatting conventions
-
Postgres Wiki: Don't Do This -- Common PostgreSQL anti-patterns