Performance Profiling
When to Use
-
Establishing performance baselines before optimization
-
Diagnosing slow response times, high CPU, or memory issues
-
Identifying bottlenecks in application, database, or infrastructure
-
Planning capacity for expected load increases
-
Validating performance improvements after optimization
-
Creating performance budgets for new features
Core Methodology
The Golden Rule: Measure First
Never optimize based on assumptions. Follow this order:
-
Measure - Establish baseline metrics
-
Identify - Find the actual bottleneck
-
Hypothesize - Form a theory about the cause
-
Fix - Implement targeted optimization
-
Validate - Measure again to confirm improvement
-
Document - Record findings and decisions
Profiling Hierarchy
Profile at the right level to find the actual bottleneck:
Application Level |-- Request/Response timing |-- Function/Method profiling |-- Memory allocation tracking | System Level |-- CPU utilization per process |-- Memory usage patterns |-- I/O wait times |-- Network latency | Infrastructure Level |-- Database query performance |-- Cache hit rates |-- External service latency |-- Resource saturation
Profiling Patterns
CPU Profiling
Identify what code consumes CPU time:
-
Sampling profilers - Low overhead, statistical accuracy
-
Instrumentation profilers - Exact counts, higher overhead
-
Flame graphs - Visual representation of call stacks
Key metrics:
-
Self time (time in function itself)
-
Total time (self time + time in called functions)
-
Call count and frequency
Memory Profiling
Track allocation patterns and detect leaks:
-
Heap snapshots - Point-in-time memory state
-
Allocation tracking - What allocates memory and when
-
Garbage collection analysis - GC frequency and duration
Key metrics:
-
Heap size over time
-
Object retention
-
Allocation rate
-
GC pause times
I/O Profiling
Measure disk and network operations:
-
Disk I/O - Read/write latency, throughput, IOPS
-
Network I/O - Latency, bandwidth, connection count
-
Database I/O - Query time, connection pool usage
Key metrics:
-
Latency percentiles (p50, p95, p99)
-
Throughput (ops/sec, MB/sec)
-
Queue depth and wait times
Bottleneck Identification
The USE Method
For each resource, check:
-
Utilization - Percentage of time resource is busy
-
Saturation - Degree of queued work
-
Errors - Error count for the resource
The RED Method
For services, measure:
-
Rate - Requests per second
-
Errors - Failed requests per second
-
Duration - Distribution of request latencies
Common Bottleneck Patterns
Pattern Symptoms Typical Causes
CPU-bound High CPU, low I/O wait Inefficient algorithms, tight loops
Memory-bound High memory, GC pressure Memory leaks, large allocations
I/O-bound Low CPU, high I/O wait Slow queries, network latency
Lock contention Low CPU, high wait time Synchronization, connection pools
N+1 queries Many small DB queries Missing joins, lazy loading
Amdahl's Law
Optimization impact is limited by the fraction of time affected:
If 90% of time is in function A and 10% in function B:
- Optimizing A by 50% = 45% total improvement
- Optimizing B by 50% = 5% total improvement
Focus on the biggest contributors first.
Capacity Planning
Baseline Establishment
Measure current capacity under production load:
-
Peak load metrics - Maximum concurrent users, requests/sec
-
Resource headroom - How close to limits at peak
-
Scaling patterns - Linear, sub-linear, or super-linear
Load Testing Approach
-
Establish baseline - Current performance at normal load
-
Ramp testing - Gradually increase load to find limits
-
Stress testing - Push beyond limits to understand failure modes
-
Soak testing - Sustained load to find memory leaks, degradation
Capacity Metrics
Metric What It Tells You
Throughput at saturation Maximum system capacity
Latency at 80% load Performance before degradation
Error rate under stress Failure patterns
Recovery time How quickly system returns to normal
Growth Planning
Required Capacity = (Current Load x Growth Factor) + Safety Margin
Example:
- Current: 1000 req/sec
- Expected growth: 50% per year
- Safety margin: 30%
Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec
Optimization Patterns
Quick Wins
-
Enable caching - Application, CDN, database query cache
-
Add indexes - For slow queries identified in profiling
-
Compression - Gzip/Brotli for responses
-
Connection pooling - Reduce connection overhead
-
Batch operations - Reduce round-trips
Algorithmic Improvements
-
Reduce complexity - O(n^2) to O(n log n)
-
Lazy evaluation - Defer work until needed
-
Memoization - Cache computed results
-
Pagination - Limit data processed at once
Architectural Changes
-
Horizontal scaling - Add more instances
-
Async processing - Queue background work
-
Read replicas - Distribute read load
-
Caching layers - Redis, Memcached
-
CDN - Edge caching for static content
Best Practices
-
Profile in production-like environments; development can have different characteristics
-
Use percentiles (p95, p99) not averages for latency
-
Monitor continuously, not just during incidents
-
Set performance budgets and enforce them in CI
-
Document baseline metrics before making changes
-
Keep profiling overhead low in production
-
Correlate metrics across layers (application, database, infrastructure)
-
Understand the difference between latency and throughput
Anti-Patterns
-
Optimizing without measurement
-
Using averages for latency metrics
-
Profiling only in development
-
Ignoring tail latencies (p99, p999)
-
Premature optimization of non-bottleneck code
-
Over-engineering for hypothetical scale
-
Caching without invalidation strategy
References
- Profiling Tools Reference - Tools by language and platform