SQL Guide

Applies to: PostgreSQL 15+, MySQL 8+, Database Migrations, Query Optimization

Core Principles

Parameterized Queries Always: Never concatenate user input into SQL strings -- use bind parameters ($1 , ? , :name ) without exception
Explicit Over Implicit: Name all constraints, specify column lists in INSERT, avoid SELECT * in production code
Migrations Are Immutable: Once applied to a shared environment, never modify a migration -- create a new one
Indexes Are Not Free: Every index speeds reads but slows writes; justify each index with a query plan
Transactions Are Boundaries: Keep transactions short, choose the correct isolation level, and always handle rollback

Guardrails

Naming Conventions

Tables: snake_case , plural (users , order_items , audit_logs )
Columns: snake_case , singular (email , created_at , is_active )
Primary keys: id (integer or UUID depending on project convention)
Foreign keys: <singular_table>_id (user_id , order_id )
Indexes: idx_<table>_<columns> (idx_users_email , idx_orders_user_id_created_at )
Unique constraints: uq_<table>_<columns> (uq_users_email )
Check constraints: ck_<table>_<description> (ck_orders_positive_total )
Boolean columns: is_ or has_ prefix (is_active , has_verified_email )
Timestamps: created_at , updated_at , deleted_at (always TIMESTAMPTZ )

Query Safety

ALWAYS use parameterized queries -- no string interpolation of user input
ALWAYS specify column lists in INSERT statements
NEVER use SELECT * in application code (acceptable in ad-hoc queries only)
ALWAYS add LIMIT to queries that return lists (prevent unbounded result sets)
NEVER use TRUNCATE or DROP in application code without explicit safeguards
ALWAYS use EXISTS instead of COUNT(*) > 0 for existence checks
ALWAYS qualify column names with table aliases in JOINs

-- GOOD: parameterized, explicit columns, bounded SELECT u.id, u.email, u.created_at FROM users u WHERE u.email = $1 AND u.is_active = true LIMIT 1;

-- BAD: string concatenation, SELECT *, no LIMIT SELECT * FROM users WHERE email = '" + userInput + "';

Indexing

Create indexes to support WHERE , JOIN , and ORDER BY clauses
Composite index column order: equality columns first, then range, then sort
Use UNIQUE indexes to enforce business rules at the database level
Prefer partial indexes when filtering on a known subset (WHERE is_active = true )
Use CONCURRENTLY for index creation on live tables (PostgreSQL)
Review index usage periodically -- drop unused indexes

-- Composite index: equality (status) first, then range (created_at) CREATE INDEX CONCURRENTLY idx_orders_status_created ON orders (status, created_at DESC);

-- Partial index: only index active users CREATE INDEX idx_users_email_active ON users (email) WHERE is_active = true;

Transactions

Keep transactions as short as possible (no I/O or external calls inside)
Use READ COMMITTED for most operations (PostgreSQL default)
Use REPEATABLE READ when a transaction reads the same data multiple times
Use SERIALIZABLE for financial or inventory operations requiring strict consistency
Always explicitly COMMIT or ROLLBACK -- never leave transactions hanging
Use SAVEPOINT for partial rollback within complex transactions

BEGIN; SAVEPOINT before_update;

UPDATE accounts SET balance = balance - 100.00 WHERE id = $1; UPDATE accounts SET balance = balance + 100.00 WHERE id = $2;

-- If second update fails, roll back to savepoint -- ROLLBACK TO SAVEPOINT before_update; COMMIT;

Migrations

Every migration MUST have both up and down functions
Migrations MUST be idempotent (IF NOT EXISTS , IF EXISTS guards)
Never rename or drop columns in a single step -- use a multi-step process
Add columns as NULL first, backfill, then add NOT NULL constraint
Never modify a migration that has been applied to a shared environment
Migration filenames: YYYYMMDDHHMMSS_description.sql or sequential numbering
Test both up and down in development before committing

-- UP: 20250115120000_add_users_phone.sql ALTER TABLE users ADD COLUMN IF NOT EXISTS phone VARCHAR(20); CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_phone ON users (phone) WHERE phone IS NOT NULL;

-- DOWN: 20250115120000_add_users_phone.sql DROP INDEX CONCURRENTLY IF EXISTS idx_users_phone; ALTER TABLE users DROP COLUMN IF EXISTS phone;

Schema Design

Standard Table Template

CREATE TABLE IF NOT EXISTS orders ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE, status VARCHAR(20) NOT NULL DEFAULT 'pending' CONSTRAINT ck_orders_valid_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')), total_cents INTEGER NOT NULL CONSTRAINT ck_orders_positive_total CHECK (total_cents >= 0), notes TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() );

-- Auto-update updated_at (PostgreSQL) CREATE OR REPLACE FUNCTION update_updated_at() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = now(); RETURN NEW; END; $$ LANGUAGE plpgsql;

CREATE TRIGGER trg_orders_updated_at BEFORE UPDATE ON orders FOR EACH ROW EXECUTE FUNCTION update_updated_at();

Design Rules

Use BIGINT GENERATED ALWAYS AS IDENTITY for auto-increment PKs (PostgreSQL 10+)
Use UUID PKs when IDs are exposed externally or distributed systems require it
Use TIMESTAMPTZ (not TIMESTAMP ) for all time columns -- always store UTC
Add NOT NULL constraints by default; allow NULL only when semantically meaningful
Use CHECK constraints to enforce domain rules at the database level
Use ENUM types sparingly -- prefer VARCHAR with CHECK for easier migration
Add ON DELETE CASCADE or ON DELETE SET NULL explicitly to every foreign key
Use soft deletes (deleted_at TIMESTAMPTZ ) when audit trail is required

Key Patterns

Common Table Expressions (CTEs)

Prefer CTEs over subqueries for readability and maintainability.

-- Readable: each step has a name WITH active_orders AS ( SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent FROM orders WHERE status != 'cancelled' AND created_at >= now() - INTERVAL '30 days' GROUP BY user_id ), high_value_users AS ( SELECT user_id FROM active_orders WHERE total_spent > 50000 -- over $500 ) SELECT u.id, u.email, ao.order_count, ao.total_spent FROM users u JOIN active_orders ao ON ao.user_id = u.id WHERE u.id IN (SELECT user_id FROM high_value_users) ORDER BY ao.total_spent DESC;

Window Functions

Use window functions for ranking, running totals, and row comparisons without self-joins.

-- Rank users by spending within each region SELECT u.id, u.region, SUM(o.total_cents) AS total_spent, RANK() OVER (PARTITION BY u.region ORDER BY SUM(o.total_cents) DESC) AS rank FROM users u JOIN orders o ON o.user_id = u.id GROUP BY u.id, u.region;

-- Running total of daily revenue SELECT date_trunc('day', created_at) AS day, SUM(total_cents) AS daily_revenue, SUM(SUM(total_cents)) OVER (ORDER BY date_trunc('day', created_at)) AS running_total FROM orders WHERE status = 'delivered' GROUP BY date_trunc('day', created_at) ORDER BY day;

UPSERT (INSERT ... ON CONFLICT)

-- PostgreSQL: insert or update on conflict INSERT INTO user_preferences (user_id, theme, language) VALUES ($1, $2, $3) ON CONFLICT (user_id) DO UPDATE SET theme = EXCLUDED.theme, language = EXCLUDED.language, updated_at = now();

Batch Operations

-- Batch insert with unnest (PostgreSQL) INSERT INTO tags (name, category) SELECT unnest($1::text[]), unnest($2::text[]) ON CONFLICT (name) DO NOTHING;

-- Batch update with VALUES list UPDATE products AS p SET price_cents = v.new_price FROM (VALUES (1, 2999), (2, 4999), (3, 1499) ) AS v(id, new_price) WHERE p.id = v.id;

Parameterized Queries (Application Code)

Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries

cursor.execute( "SELECT id, email FROM users WHERE email = %s AND is_active = %s", (email, True), )

Node.js (pg)

const result = await pool.query( 'SELECT id, email FROM users WHERE email = $1 AND is_active = $2', [email, true] );

Go (database/sql)

row := db.QueryRowContext(ctx, "SELECT id, email FROM users WHERE email = $1 AND is_active = $2", email, true, )

Performance

EXPLAIN ANALYZE

Always use EXPLAIN ANALYZE to validate query plans before deploying.

-- Check execution plan and actual timing EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.id, u.email, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at >= '2025-01-01' GROUP BY u.id, u.email ORDER BY order_count DESC LIMIT 20;

What to look for:

Seq Scan on large tables -- usually needs an index
Nested Loop with large outer sets -- consider Hash Join via index or rewrite
Sort operations with high cost -- add index matching ORDER BY
Rows Removed by Filter much larger than Actual Rows -- index the filter column
Buffers: shared hit vs shared read -- high reads indicate cold cache

Index Strategy

Access Pattern Index Type Example

Equality lookup B-tree (default) WHERE email = $1

Range scan B-tree WHERE created_at > $1

Full-text search GIN + tsvector WHERE search_vector @@ to_tsquery($1)

JSONB field access GIN WHERE metadata @> '{"key": "val"}'

Array containment GIN WHERE tags @> ARRAY['sql']

Geometric/spatial GiST WHERE location <-> point($1,$2) < $3

Pattern matching (LIKE 'abc%' ) B-tree with text_pattern_ops

WHERE name LIKE 'abc%'

N+1 Prevention

-- BAD: N+1 (one query per user to fetch orders) -- Application loops: for each user, SELECT orders WHERE user_id = ?

-- GOOD: single query with JOIN SELECT u.id, u.email, o.id AS order_id, o.total_cents FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.is_active = true ORDER BY u.id, o.created_at DESC;

-- GOOD: batch with ANY (when you have a list of IDs) SELECT id, email, created_at FROM users WHERE id = ANY($1::bigint[]);

Pagination

-- Cursor-based pagination (preferred for large datasets) SELECT id, email, created_at FROM users WHERE created_at < $1 -- cursor: last seen created_at AND is_active = true ORDER BY created_at DESC LIMIT 25;

-- Offset-based pagination (simpler but slower for deep pages) SELECT id, email, created_at FROM users WHERE is_active = true ORDER BY created_at DESC LIMIT 25 OFFSET $1;

Tooling

Essential Commands

PostgreSQL CLI

psql -h localhost -U myuser -d mydb # Connect psql -f migration.sql mydb # Run migration file pg_dump -Fc mydb > backup.dump # Backup (custom format) pg_restore -d mydb backup.dump # Restore

pgcli (enhanced CLI with autocomplete)

pgcli -h localhost -U myuser -d mydb

Schema inspection

\dt # List tables \d+ table_name # Describe table with details \di # List indexes \df # List functions

Migration Tools

golang-migrate

migrate create -ext sql -dir db/migrations -seq add_users_phone migrate -path db/migrations -database "$DATABASE_URL" up migrate -path db/migrations -database "$DATABASE_URL" down 1

Alembic (Python/SQLAlchemy)

alembic revision -m "add users phone" alembic upgrade head alembic downgrade -1

Prisma (Node.js)

npx prisma migrate dev --name add_users_phone npx prisma migrate deploy

dbmate

dbmate new add_users_phone dbmate up dbmate rollback

Useful Diagnostic Queries (PostgreSQL)

-- Find slow queries (requires pg_stat_statements extension) SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

-- Find unused indexes SELECT schemaname, tablename, indexname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexname NOT LIKE '%pkey%' ORDER BY pg_relation_size(indexrelid) DESC;

-- Table sizes SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;

-- Active connections and queries SELECT pid, state, query, query_start, now() - query_start AS duration FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC;

References

For detailed patterns and examples, see:

references/patterns.md -- Window functions, CTE patterns, indexing strategies, advanced query techniques

External References

PostgreSQL Documentation
Use The Index, Luke -- SQL indexing and tuning
Modern SQL -- Modern SQL features across databases
pgMustard EXPLAIN Guide -- Reading query plans
SQL Style Guide (Holywell) -- Formatting conventions
Postgres Wiki: Don't Do This -- Common PostgreSQL anti-patterns

sql-guide

Safety Notice

Copy this and send it to your AI assistant to learn

Python (psycopg2/asyncpg) -- ALWAYS use parameterized queries

Node.js (pg)

Go (database/sql)

PostgreSQL CLI

pgcli (enhanced CLI with autocomplete)

Schema inspection

golang-migrate

Alembic (Python/SQLAlchemy)

Prisma (Node.js)

dbmate

Source Transparency

Related Skills

actix-web

frontend-design

blazor