Load Balancing Patterns

Distribute traffic across infrastructure using the appropriate load balancing approach, from simple round-robin to global multi-region failover.

When to Use This Skill

Use load-balancing-patterns when:

Distributing traffic across multiple application servers
Implementing high availability and failover
Routing traffic based on URLs, headers, or geographic location
Managing session persistence across stateless backends
Deploying applications to Kubernetes clusters
Configuring global traffic management across regions
Implementing zero-downtime deployments (blue-green, canary)
Selecting between cloud-managed and self-managed load balancers

Core Load Balancing Concepts

Layer 4 vs Layer 7

Layer 4 (L4) - Transport Layer:

Routes based on IP address and port (TCP/UDP packets)
No application data inspection, lower latency, higher throughput
Protocol agnostic, preserves client IP addresses
Use for: Database connections, video streaming, gaming, financial transactions, non-HTTP protocols

Layer 7 (L7) - Application Layer:

Routes based on HTTP URLs, headers, cookies, request body
Full application data visibility, SSL/TLS termination, caching, WAF integration
Content-based routing capabilities
Use for: Web applications, REST APIs, microservices, GraphQL endpoints, complex routing logic

For detailed comparison including performance benchmarks and hybrid approaches, see references/l4-vs-l7-comparison.md.

Load Balancing Algorithms

Algorithm	Distribution Method	Use Case
Round Robin	Sequential	Stateless, similar servers
Weighted Round Robin	Capacity-based	Different server specs
Least Connections	Fewest active connections	Long-lived connections
Least Response Time	Fastest server	Performance-sensitive
IP Hash	Client IP-based	Session persistence
Resource-Based	CPU/memory metrics	Varying workloads

Health Check Types

Shallow (Liveness): Is the process alive?

Endpoint: /health/live or /live
Returns: 200 if process running
Use for: Process monitoring, container health

Deep (Readiness): Can the service handle requests?

Endpoint: /health/ready or /ready
Validates: Database, cache, external API connectivity
Use for: Load balancer routing decisions

Health Check Hysteresis: Different thresholds for marking up vs down to prevent flapping

Example: 3 failures to mark down, 2 successes to mark up

For complete health check implementation patterns, see references/health-check-strategies.md.

Cloud Load Balancers

AWS Load Balancing

Application Load Balancer (ALB) - Layer 7:

Use for: HTTP/HTTPS applications, microservices, WebSocket
Features: Path/host/header routing, AWS WAF integration, Lambda targets
Choose when: Content-based routing needed

Network Load Balancer (NLB) - Layer 4:

Use for: Ultra-low latency (<1ms), TCP/UDP, static IPs, millions RPS
Features: Preserves source IP, TLS termination
Choose when: Non-HTTP protocols, performance critical

Global Accelerator - Layer 4 Global:

Use for: Multi-region applications, global users, DDoS protection
Features: Anycast IPs, automatic regional failover

GCP Load Balancing

Application LB (L7): Global HTTPS LB, Cloud CDN integration, Cloud Armor (WAF/DDoS) Network LB (L4): Regional TCP/UDP, pass-through balancing, session affinity Cloud Load Balancing: Single anycast IP, global distribution, backend buckets

Azure Load Balancing

Application Gateway (L7): WAF integration, URL-based routing, SSL termination, autoscaling Load Balancer (L4): Basic and Standard SKUs, health probes, HA ports Traffic Manager (Global): DNS-based routing (priority, weighted, performance, geographic)

For complete cloud provider configurations and Terraform examples, see references/cloud-load-balancers.md.

Self-Managed Load Balancers

NGINX

Best for: General-purpose HTTP/HTTPS load balancing, web application stacks

Capabilities:

HTTP reverse proxy with multiple algorithms
TCP/UDP stream load balancing
SSL/TLS termination
Passive health checks (open source), active health checks (NGINX Plus)
Cookie-based sticky sessions (NGINX Plus)

Basic configuration:

upstream backend {
    least_conn;
    server backend1.example.com:8080 weight=3;
    server backend2.example.com:8080 weight=2;
    keepalive 32;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

For complete NGINX patterns and advanced configurations, see references/nginx-patterns.md.

HAProxy

Best for: Maximum performance, database load balancing, resource efficiency

Capabilities:

Highest raw throughput, lowest memory footprint
10+ load balancing algorithms
Sophisticated health checks (HTTP, TCP, Redis, MySQL, etc.)
Cookie or IP-based persistence

Basic configuration:

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health
    server web1 192.168.1.101:8080 check
    server web2 192.168.1.102:8080 check

For complete HAProxy patterns, see references/haproxy-patterns.md.

Envoy

Best for: Microservices, Kubernetes, service mesh integration

Capabilities:

Cloud-native design with dynamic configuration (xDS APIs)
Circuit breakers, retries, timeouts
Advanced health checks (TCP, HTTP, gRPC)
Excellent observability

For complete Envoy patterns, see references/envoy-patterns.md.

Traefik

Best for: Docker/Kubernetes environments, dynamic configuration, ease of use

Capabilities:

Automatic service discovery
Native Kubernetes integration
Built-in Let's Encrypt support
Middleware system (auth, rate limiting)

For complete Traefik patterns, see references/traefik-patterns.md.

Kubernetes Ingress Controllers

Selection Guide

Controller	Best For	Strengths
NGINX Ingress (F5)	General purpose	Stability, wide adoption, mature features
Traefik	Dynamic environments	Easy configuration, service discovery
HAProxy Ingress	High performance	Advanced L7 routing, reliability
Envoy (Contour/Gateway)	Service mesh	Rich L7 features, extensibility
Kong	API-heavy apps	JWT auth, rate limiting, plugins
Cloud Provider	Single-cloud	Native cloud integration

Basic Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/affinity: "cookie"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80

For complete Kubernetes ingress examples and Gateway API patterns, see references/kubernetes-ingress.md.

Session Persistence

Sticky Sessions (Use Sparingly)

Cookie-Based: Load balancer sets cookie to track server affinity

Accurate routing, works with NAT/proxies
HTTP only, adds cookie overhead

IP Hash: Hash client IP to select backend server

No cookie required, works for non-HTTP
Poor distribution with NAT/proxies

Drawbacks: Uneven load distribution, session lost on server failure, complicates scaling

Shared Session Store (Recommended)

Architecture: Stateless application servers + centralized session storage (Redis, Memcached)

Benefits:

No sticky sessions needed
True load balancing
Server failures don't lose sessions
Horizontal scaling trivial

Client-Side Tokens (Best for APIs)

JWT (JSON Web Tokens): Server generates signed token, client stores and sends with requests

Benefits:

Fully stateless servers
Perfect load balancing
No session storage needed

For complete session management patterns and code examples, see references/session-persistence.md.

Global Load Balancing

GeoDNS Routing

Route users to nearest server based on geographic location:

DNS returns different IPs based on client location
Reduces latency, supports compliance and regional content
Implementation: AWS Route 53, GCP Cloud DNS, Azure Traffic Manager

Multi-Region Failover

Primary/secondary region configuration:

Health checks determine primary region health
Automatic DNS failover to secondary
Transparent to clients

CDN Integration

Combine load balancing with CDN:

GeoDNS routes to closest CDN PoP
CDN caches content globally
Origin load balancing for cache misses

For complete global load balancing examples with Terraform, see references/global-load-balancing.md.

Decision Frameworks

L4 vs L7 Selection

Choose L4 when:

Protocol is TCP/UDP (not HTTP)
Ultra-low latency critical (<1ms)
High throughput required (millions RPS)
Client source IP preservation needed

Choose L7 when:

Protocol is HTTP/HTTPS
Content-based routing needed (URL, headers)
SSL termination required
WAF integration needed
Microservices architecture

Cloud vs Self-Managed

Choose Cloud-Managed when:

Single cloud deployment
Auto-scaling required
Team lacks load balancer expertise
Managed service preferred

Choose Self-Managed when:

Multi-cloud or hybrid deployment
Advanced routing requirements
Cost optimization important
Full control needed
Vendor lock-in avoidance

Self-Managed Selection

NGINX: General-purpose, web stacks, HTTP/3 support
HAProxy: Maximum performance, database LB, lowest resource usage
Envoy: Microservices, service mesh, dynamic configuration
Traefik: Docker/Kubernetes, automatic discovery, easy configuration

Configuration Examples

Complete working examples available in examples/ directory:

Cloud Providers:

examples/aws/alb-terraform.tf - AWS ALB with path-based routing
examples/aws/nlb-terraform.tf - AWS NLB for TCP load balancing

Self-Managed:

examples/nginx/http-load-balancing.conf - NGINX HTTP reverse proxy
examples/haproxy/http-lb.cfg - HAProxy configuration
examples/envoy/basic-lb.yaml - Envoy cluster configuration
examples/traefik/kubernetes-ingress.yaml - Traefik IngressRoute

Kubernetes:

examples/kubernetes/nginx-ingress.yaml - NGINX Ingress with TLS
examples/kubernetes/traefik-ingress.yaml - Traefik IngressRoute
examples/kubernetes/gateway-api.yaml - Gateway API configuration

Monitoring and Observability

Key Metrics

Throughput: Requests per second, bytes transferred, connection rate Latency: Request duration (p50, p95, p99), backend response time, SSL handshake time Errors: HTTP error rates (4xx, 5xx), backend connection failures, health check failures Resource Utilization: CPU, memory, active connections, connection queue depth Health: Healthy/unhealthy backend count, health check success rate

Load Balancer Logs

Enable access logs for request/response details, client IPs, response times, error tracking

AWS ALB: Store in S3, analyze with Athena
NGINX: Custom log format, ship to centralized logging
HAProxy: Syslog integration, structured logging

Troubleshooting

Uneven Load Distribution

Symptoms: One server receives disproportionate traffic Causes: Sticky sessions with few clients, IP hash with NAT concentration, long-lived connections Solutions: Switch to least connections, disable sticky sessions, implement connection draining

Health Check Flapping

Symptoms: Servers rapidly transition between healthy/unhealthy Causes: Health check timeout too short, threshold too low, network instability Solutions: Increase interval and timeout, implement hysteresis, use deep health checks

Session Loss After Failover

Symptoms: Users logged out when server fails Causes: Sticky sessions without replication, in-memory sessions Solutions: Implement shared session store (Redis), use client-side tokens (JWT)

Integration Points

Related Skills:

infrastructure-as-code - Deploy load balancers via Terraform/Pulumi
kubernetes-operations - Ingress controllers for K8s traffic management
network-architecture - Network design and topology for load balancing
deploying-applications - Blue-green and canary deployments via load balancers
observability - Load balancer metrics, access logs, distributed tracing
security-hardening - WAF integration, rate limiting, DDoS protection
service-mesh - Envoy as both ingress and service mesh proxy
implementing-tls - TLS termination and certificate management

Quick Reference

Selection Matrix

Use Case	Recommended Solution
HTTP web app (AWS)	ALB
Non-HTTP protocol (AWS)	NLB
Kubernetes HTTP ingress	NGINX Ingress or Traefik
Maximum performance	HAProxy
Service mesh	Envoy
Docker Swarm	Traefik
Multi-cloud portable	NGINX or HAProxy
Global distribution	CloudFlare, AWS Global Accelerator

Algorithm Selection

Traffic Pattern	Algorithm
Stateless, similar servers	Round Robin
Stateless, different capacity	Weighted Round Robin
Long-lived connections	Least Connections
Performance-sensitive	Least Response Time
Session persistence needed	IP Hash or Cookie
Varying server load	Resource-Based

Health Check Configuration

Service Type	Check Type	Interval	Timeout
Web app	HTTP /health	10s	3s
API	HTTP /health/ready	10s	5s
Database	TCP connect	5s	2s
Critical service	HTTP deep check	5s	3s
Background worker	HTTP /live	30s	5s

Summary

Load balancing is essential for distributing traffic, ensuring high availability, and enabling horizontal scaling. Choose L4 for raw performance and non-HTTP protocols, L7 for intelligent content-based routing. Prefer cloud-managed load balancers for simplicity and auto-scaling, self-managed for multi-cloud portability and advanced features. Implement proper health checks with hysteresis, avoid sticky sessions when possible, and monitor key metrics continuously.

For deployment patterns, see examples in examples/aws/, examples/nginx/, examples/kubernetes/, and other provider directories.