postgresql-optimization

PostgreSQL Optimization

Overview

Expert in PostgreSQL performance tuning, query optimization, and database administration. Specializes in EXPLAIN analysis, indexing strategies, connection pooling, partitioning, and production-grade PostgreSQL operations.

When to Use

Diagnosing slow queries with EXPLAIN ANALYZE
Creating optimal indexes for query patterns
Designing database schemas for performance
Configuring PostgreSQL for production workloads
Implementing connection pooling (PgBouncer, Supavisor)
Setting up partitioning for large tables
Analyzing and reducing lock contention
Migrating or upgrading PostgreSQL versions

Capabilities

Query Optimization

EXPLAIN / EXPLAIN ANALYZE interpretation
Query plan analysis and optimization
Identifying sequential scans vs index scans
Join optimization and query rewriting
CTE vs subquery performance trade-offs
Window function optimization

Indexing Strategies

B-tree, GIN, GiST, BRIN index selection
Partial indexes for filtered queries
Expression indexes for computed values
Covering indexes (INCLUDE clause)
Index-only scans optimization
Concurrent index creation

Schema Design

Normalization vs denormalization trade-offs
JSONB column design and indexing
Array columns and operations
Enum types vs lookup tables
Foreign key cascade strategies
Table inheritance and partitioning

Configuration Tuning

Memory settings (shared_buffers, work_mem, effective_cache_size)
Connection limits and pooling
WAL and checkpoint tuning
Autovacuum configuration
Statistics collection settings

Advanced Features

Partitioning (range, list, hash)
Materialized views with refresh strategies
Full-text search with tsvector/tsquery
PostGIS geospatial queries
Logical replication setup
pg_stat_statements analysis

Dependencies

Works well with:

database-modeler
Schema design and ERD creation
data-pipeline-engineer
ETL and data processing
site-reliability-engineer
Database monitoring and alerting
nextjs-app-router-expert
Full-stack data fetching

Examples

Reading EXPLAIN ANALYZE Output

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT u.*, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON o.user_id = u.id WHERE u.created_at > '2024-01-01' GROUP BY u.id;

-- Key metrics to look for: -- - "Seq Scan" on large tables → needs index -- - "Rows Removed by Filter" high → filter before join -- - "Sort Method: external merge" → increase work_mem -- - "Buffers: shared hit" vs "shared read" → cache efficiency

Creating Effective Indexes

-- Basic B-tree for equality and range queries CREATE INDEX CONCURRENTLY idx_orders_user_created ON orders (user_id, created_at DESC);

-- Partial index for common filter CREATE INDEX CONCURRENTLY idx_orders_pending ON orders (created_at) WHERE status = 'pending';

-- GIN index for JSONB containment queries CREATE INDEX CONCURRENTLY idx_products_metadata ON products USING GIN (metadata jsonb_path_ops);

-- Covering index to enable index-only scans CREATE INDEX CONCURRENTLY idx_users_email_covering ON users (email) INCLUDE (name, created_at);

-- Expression index for case-insensitive search CREATE INDEX CONCURRENTLY idx_users_email_lower ON users (LOWER(email));

Optimizing N+1 Queries

-- BAD: N+1 pattern (1 + N queries) SELECT * FROM posts WHERE user_id = $1; -- Then for each post: SELECT * FROM comments WHERE post_id = $1;

-- GOOD: Single query with lateral join SELECT p., c.comments FROM posts p LEFT JOIN LATERAL ( SELECT json_agg(c.) as comments FROM comments c WHERE c.post_id = p.id ) c ON true WHERE p.user_id = $1;

-- GOOD: Window function for aggregates SELECT p., COUNT() OVER (PARTITION BY p.user_id) as user_post_count FROM posts p WHERE p.user_id = $1;

Table Partitioning

-- Create partitioned table by date range CREATE TABLE events ( id BIGSERIAL, event_type TEXT NOT NULL, payload JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (id, created_at) ) PARTITION BY RANGE (created_at);

-- Create monthly partitions CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automate partition creation with pg_partman CREATE EXTENSION pg_partman; SELECT partman.create_parent('public.events', 'created_at', 'native', 'monthly');

Connection Pooling Config (PgBouncer)

; pgbouncer.ini

[databases] myapp = host=localhost dbname=myapp

[pgbouncer] listen_addr = 0.0.0.0 listen_port = 6432 auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt

; Pool settings pool_mode = transaction ; Recommended for most apps max_client_conn = 1000 default_pool_size = 20 reserve_pool_size = 5

; Timeouts server_idle_timeout = 600 client_idle_timeout = 0

Performance Configuration

-- Check current settings SHOW shared_buffers; -- ~25% of RAM SHOW effective_cache_size; -- ~75% of RAM SHOW work_mem; -- Per-operation, start small (64MB) SHOW maintenance_work_mem; -- For VACUUM, CREATE INDEX (512MB-1GB)

-- Recommended production settings (for 32GB RAM server) ALTER SYSTEM SET shared_buffers = '8GB'; ALTER SYSTEM SET effective_cache_size = '24GB'; ALTER SYSTEM SET work_mem = '64MB'; ALTER SYSTEM SET maintenance_work_mem = '1GB'; ALTER SYSTEM SET random_page_cost = 1.1; -- For SSD storage ALTER SYSTEM SET effective_io_concurrency = 200; -- For SSD

-- Reload configuration SELECT pg_reload_conf();

Finding Slow Queries

-- Enable pg_stat_statements CREATE EXTENSION pg_stat_statements;

-- Top 10 slowest queries by total time SELECT round(total_exec_time::numeric, 2) as total_ms, calls, round(mean_exec_time::numeric, 2) as avg_ms, round((100 * total_exec_time / sum(total_exec_time) OVER())::numeric, 2) as pct, query FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

-- Queries with most I/O SELECT round(shared_blks_read::numeric, 2) as disk_reads, round(shared_blks_hit::numeric, 2) as cache_hits, round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 2) as cache_hit_ratio, query FROM pg_stat_statements ORDER BY shared_blks_read DESC LIMIT 10;

Analyzing Table Bloat

-- Check table bloat SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as total_size, pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) as table_size, n_dead_tup, n_live_tup, round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct FROM pg_stat_user_tables ORDER BY n_dead_tup DESC LIMIT 10;

-- Manual VACUUM for critical tables VACUUM (VERBOSE, ANALYZE) orders;

-- Reclaim space (requires exclusive lock) VACUUM FULL orders; -- Use during maintenance window

Best Practices

Always use EXPLAIN ANALYZE - Don't guess, measure actual query performance
Create indexes CONCURRENTLY - Avoid blocking writes during index creation
Partial indexes for hot paths - Index only the rows you query frequently
Use connection pooling - PgBouncer or Supavisor for production
Monitor pg_stat_statements - Track query performance over time
Regular ANALYZE - Keep statistics current for query planner
**Avoid SELECT *** - Only fetch columns you need
Batch large updates - Process in chunks to avoid lock contention
Use prepared statements - Reduce parsing overhead for repeated queries

Common Pitfalls

Missing indexes - Check for sequential scans on large tables
Over-indexing - Too many indexes slow down writes
work_mem too low - Causes disk-based sorts and hash joins
Connection exhaustion - Not using connection pooling
Stale statistics - Autovacuum not running frequently enough
Bloated tables - Not vacuuming after large deletes/updates
N+1 queries - Fetching related data in loops instead of joins
SELECT * everywhere - Fetching unnecessary columns

postgresql-optimization

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

video-processing-editing

cv-creator

mobile-ux-optimizer