System Design
Overview
Principles for designing systems that handle scale, remain available, and perform well under load.
Scalability Fundamentals
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up): ┌─────────────────────┐ │ Bigger Server │ │ - More CPU │ │ - More RAM │ │ - Faster disk │ └─────────────────────┘ Limit: Hardware ceiling
Horizontal Scaling (Scale Out): ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │Server│ │Server│ │Server│ │Server│ └──────┘ └──────┘ └──────┘ └──────┘ ↑ Load Balancer Limit: Coordination complexity
Stateless Services
// ❌ Stateful - stores session in memory class BadService { private sessions = new Map();
login(userId: string) { this.sessions.set(userId, { loggedIn: true }); } }
// ✅ Stateless - external session store class GoodService { constructor(private sessionStore: Redis) {}
async login(userId: string) {
await this.sessionStore.set(session:${userId}, { loggedIn: true });
}
}
Load Balancing
Strategies
Strategy Description Use Case
Round Robin Cycle through servers Equal capacity servers
Weighted RR Based on server capacity Mixed capacity
Least Connections Route to least busy Long-lived connections
IP Hash Same IP → same server Session stickiness
URL Hash Same URL → same server Cache optimization
Health Checks
Kubernetes-style health checks
livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 3 periodSeconds: 10
readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5
// Health check endpoints app.get('/health/live', (req, res) => { // Am I running? res.status(200).json({ status: 'alive' }); });
app.get('/health/ready', async (req, res) => { // Can I serve traffic? const dbOk = await checkDatabase(); const cacheOk = await checkCache();
if (dbOk && cacheOk) { res.status(200).json({ status: 'ready' }); } else { res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk }); } });
Caching Strategies
Cache Patterns
Cache-Aside (Lazy Loading): ┌────────┐ miss ┌────────┐ │ App │ ──────────→ │ Cache │ │ │ ←────────── │ │ └────────┘ null └────────┘ │ │ read ↓ ┌────────┐ │ DB │ ──── write ──→ Cache └────────┘
Write-Through: App → Cache → DB (synchronous)
Write-Behind (Write-Back): App → Cache → (async) → DB
// Cache-aside implementation class CachedUserService { constructor( private cache: Redis, private db: Database ) {}
async getUser(id: string): Promise<User> {
// Try cache first
const cached = await this.cache.get(user:${id});
if (cached) return JSON.parse(cached);
// Cache miss - read from DB
const user = await this.db.users.findById(id);
if (user) {
// Store in cache with TTL
await this.cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
}
return user;
}
async updateUser(id: string, data: Partial<User>): Promise<User> {
const user = await this.db.users.update(id, data);
// Invalidate cache
await this.cache.del(user:${id});
return user;
}
}
Cache Invalidation
Strategy Description Complexity
TTL Expire after time Simple
Event-based Invalidate on write Medium
Version-based Key includes version Medium
Tag-based Group related keys Complex
Database Scaling
Read Replicas
┌─────────────────┐
Writes ──────→ │ Primary DB │
└────────┬────────┘
│ replication
┌─────────────────┼─────────────────┐
↓ ↓ ↓
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Replica 1│ │ Replica 2│ │ Replica 3│
└──────────┘ └──────────┘ └──────────┘
↑ ↑ ↑
└─────── Reads ───┴────────────────┘
Sharding (Partitioning)
// Hash-based sharding function getShard(userId: string, numShards: number): number { const hash = crypto.createHash('md5').update(userId).digest('hex'); return parseInt(hash.slice(0, 8), 16) % numShards; }
// Range-based sharding
function getShardByDate(date: Date): string {
const year = date.getFullYear();
const month = date.getMonth() + 1;
return orders_${year}_${month.toString().padStart(2, '0')};
}
// Consistent hashing for dynamic shards class ConsistentHash { private ring: Map<number, string> = new Map();
addNode(node: string) {
for (let i = 0; i < 150; i++) { // Virtual nodes
const hash = this.hash(${node}:${i});
this.ring.set(hash, node);
}
}
getNode(key: string): string { const hash = this.hash(key); // Find next node on ring for (const [nodeHash, node] of [...this.ring.entries()].sort()) { if (nodeHash >= hash) return node; } return this.ring.values().next().value; } }
Message Queues
Patterns
Point-to-Point: Producer → Queue → Consumer
Pub/Sub: ┌─→ Subscriber 1 Publisher → Topic ─→ Subscriber 2 └─→ Subscriber 3
Work Queue: ┌─→ Worker 1 Producer → Queue ─→ Worker 2 (competing consumers) └─→ Worker 3
Delivery Guarantees
Guarantee Description Implementation
At-most-once May lose messages Fire and forget
At-least-once May duplicate Ack after process
Exactly-once No loss, no dupe Idempotency + dedup
// Idempotent processing
async function processOrder(event: OrderEvent) {
// Check if already processed
const processed = await redis.get(processed:${event.id});
if (processed) {
console.log(Already processed ${event.id});
return;
}
// Process the order await db.orders.create(event.order);
// Mark as processed (with TTL for cleanup)
await redis.set(processed:${event.id}, '1', 'EX', 86400 * 7);
}
High Availability
Redundancy Patterns
Active-Active: ┌────────┐ ┌────────┐ │Server A│ ←─→ │Server B│ Both handle traffic └────────┘ └────────┘
Active-Passive: ┌────────┐ ┌────────┐ │ Active │ ──→ │Standby │ Failover on failure └────────┘ └────────┘
Circuit Breaker
class CircuitBreaker { private failures = 0; private lastFailure: Date | null = null; private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor( private threshold: number = 5, private timeout: number = 30000 ) {}
async execute<T>(fn: () => Promise<T>): Promise<T> { if (this.state === 'open') { if (Date.now() - this.lastFailure!.getTime() > this.timeout) { this.state = 'half-open'; } else { throw new Error('Circuit is open'); } }
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() { this.failures = 0; this.state = 'closed'; }
private onFailure() { this.failures++; this.lastFailure = new Date(); if (this.failures >= this.threshold) { this.state = 'open'; } } }
CAP Theorem
Consistency
/\
/ \
/ \
/ \
/ CA \
/──────────\
/ \
/ CP AP \
/________________\
Partition Availability Tolerance
CA: Single node (RDBMS) CP: MongoDB, HBase (may reject writes during partition) AP: Cassandra, DynamoDB (eventual consistency)
Consistency Models
Model Description Example
Strong Read sees latest write RDBMS
Eventual Eventually consistent DNS, Cassandra
Causal Respects causality Chat apps
Read-your-writes See your own writes Social feeds
Rate Limiting
Algorithms
// Token Bucket class TokenBucket { private tokens: number; private lastRefill: number;
constructor( private capacity: number, private refillRate: number // tokens per second ) { this.tokens = capacity; this.lastRefill = Date.now(); }
consume(tokens: number = 1): boolean { this.refill(); if (this.tokens >= tokens) { this.tokens -= tokens; return true; } return false; }
private refill() { const now = Date.now(); const elapsed = (now - this.lastRefill) / 1000; this.tokens = Math.min( this.capacity, this.tokens + elapsed * this.refillRate ); this.lastRefill = now; } }
// Sliding Window class SlidingWindowRateLimiter { constructor( private redis: Redis, private limit: number, private window: number // seconds ) {}
async isAllowed(key: string): Promise<boolean> { const now = Date.now(); const windowStart = now - this.window * 1000;
const pipe = this.redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now, `${now}`);
pipe.zcard(key);
pipe.expire(key, this.window);
const results = await pipe.exec();
const count = results[2][1] as number;
return count <= this.limit;
} }
Common System Designs
System Key Components
URL Shortener Hash function, Redis cache, DB
Twitter Feed Fan-out, Redis timeline, Kafka
Chat App WebSocket, Presence, Message queue
E-commerce Cart service, Inventory, Payment
Video Streaming CDN, Chunking, Adaptive bitrate
Related Skills
-
[[architecture-patterns]] - Microservices, event-driven
-
[[database]] - Database optimization
-
[[caching-implementation]] - Cache strategies
-
[[reliability-engineering]] - SRE practices