performance-analysis

Performance Profiling

When to Use

Establishing performance baselines before optimization
Diagnosing slow response times, high CPU, or memory issues
Identifying bottlenecks in application, database, or infrastructure
Planning capacity for expected load increases
Validating performance improvements after optimization
Creating performance budgets for new features

Core Methodology

The Golden Rule: Measure First

Never optimize based on assumptions. Follow this order:

Measure - Establish baseline metrics
Identify - Find the actual bottleneck
Hypothesize - Form a theory about the cause
Fix - Implement targeted optimization
Validate - Measure again to confirm improvement
Document - Record findings and decisions

Profiling Hierarchy

Profile at the right level to find the actual bottleneck:

Profiling Patterns

CPU Profiling

Identify what code consumes CPU time:

Sampling profilers - Low overhead, statistical accuracy
Instrumentation profilers - Exact counts, higher overhead
Flame graphs - Visual representation of call stacks

Key metrics:

Self time (time in function itself)
Total time (self time + time in called functions)
Call count and frequency

Memory Profiling

Track allocation patterns and detect leaks:

Heap snapshots - Point-in-time memory state
Allocation tracking - What allocates memory and when
Garbage collection analysis - GC frequency and duration

Key metrics:

Heap size over time
Object retention
Allocation rate
GC pause times

I/O Profiling

Measure disk and network operations:

Disk I/O - Read/write latency, throughput, IOPS
Network I/O - Latency, bandwidth, connection count
Database I/O - Query time, connection pool usage

Key metrics:

Latency percentiles (p50, p95, p99)
Throughput (ops/sec, MB/sec)
Queue depth and wait times

Bottleneck Identification

The USE Method

For each resource, check:

Utilization - Percentage of time resource is busy
Saturation - Degree of queued work
Errors - Error count for the resource

The RED Method

For services, measure:

Rate - Requests per second
Errors - Failed requests per second
Duration - Distribution of request latencies

Common Bottleneck Patterns

Pattern Symptoms Typical Causes

CPU-bound High CPU, low I/O wait Inefficient algorithms, tight loops

Memory-bound High memory, GC pressure Memory leaks, large allocations

I/O-bound Low CPU, high I/O wait Slow queries, network latency

Lock contention Low CPU, high wait time Synchronization, connection pools

N+1 queries Many small DB queries Missing joins, lazy loading

Amdahl's Law

Optimization impact is limited by the fraction of time affected:

If 90% of time is in function A and 10% in function B:

Optimizing A by 50% = 45% total improvement
Optimizing B by 50% = 5% total improvement

Focus on the biggest contributors first.

Capacity Planning

Baseline Establishment

Measure current capacity under production load:

Peak load metrics - Maximum concurrent users, requests/sec
Resource headroom - How close to limits at peak
Scaling patterns - Linear, sub-linear, or super-linear

Load Testing Approach

Establish baseline - Current performance at normal load
Ramp testing - Gradually increase load to find limits
Stress testing - Push beyond limits to understand failure modes
Soak testing - Sustained load to find memory leaks, degradation

Capacity Metrics

Metric What It Tells You

Throughput at saturation Maximum system capacity

Latency at 80% load Performance before degradation

Error rate under stress Failure patterns

Recovery time How quickly system returns to normal

Growth Planning

Required Capacity = (Current Load x Growth Factor) + Safety Margin

Example:

Current: 1000 req/sec
Expected growth: 50% per year
Safety margin: 30%

Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec

Optimization Patterns

Quick Wins

Enable caching - Application, CDN, database query cache
Add indexes - For slow queries identified in profiling
Compression - Gzip/Brotli for responses
Connection pooling - Reduce connection overhead
Batch operations - Reduce round-trips

Algorithmic Improvements

Reduce complexity - O(n^2) to O(n log n)
Lazy evaluation - Defer work until needed
Memoization - Cache computed results
Pagination - Limit data processed at once

Architectural Changes

Horizontal scaling - Add more instances
Async processing - Queue background work
Read replicas - Distribute read load
Caching layers - Redis, Memcached
CDN - Edge caching for static content

Best Practices

Profile in production-like environments; development can have different characteristics
Use percentiles (p95, p99) not averages for latency
Monitor continuously, not just during incidents
Set performance budgets and enforce them in CI
Document baseline metrics before making changes
Keep profiling overhead low in production
Correlate metrics across layers (application, database, infrastructure)
Understand the difference between latency and throughput

Anti-Patterns

Optimizing without measurement
Using averages for latency metrics
Profiling only in development
Ignoring tail latencies (p99, p999)
Premature optimization of non-bottleneck code
Over-engineering for hypothetical scale
Caching without invalidation strategy

References

Profiling Tools Reference - Tools by language and platform

performance-analysis

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

requirements-analysis

knowledge-capture

user-research