System Design

Overview

Principles for designing systems that handle scale, remain available, and perform well under load.

Scalability Fundamentals

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up): ┌─────────────────────┐ │ Bigger Server │ │ - More CPU │ │ - More RAM │ │ - Faster disk │ └─────────────────────┘ Limit: Hardware ceiling

Horizontal Scaling (Scale Out): ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │Server│ │Server│ │Server│ │Server│ └──────┘ └──────┘ └──────┘ └──────┘ ↑ Load Balancer Limit: Coordination complexity

Stateless Services

// ❌ Stateful - stores session in memory class BadService { private sessions = new Map();

// ✅ Stateless - external session store class GoodService { constructor(private sessionStore: Redis) {}

async login(userId: string) { await this.sessionStore.set(session:${userId}, { loggedIn: true }); } }

Load Balancing

Strategies

Strategy Description Use Case

Round Robin Cycle through servers Equal capacity servers

Weighted RR Based on server capacity Mixed capacity

Least Connections Route to least busy Long-lived connections

IP Hash Same IP → same server Session stickiness

URL Hash Same URL → same server Cache optimization

Health Checks

Kubernetes-style health checks

livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 3 periodSeconds: 10

readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5

// Health check endpoints app.get('/health/live', (req, res) => { // Am I running? res.status(200).json({ status: 'alive' }); });

app.get('/health/ready', async (req, res) => { // Can I serve traffic? const dbOk = await checkDatabase(); const cacheOk = await checkCache();

if (dbOk && cacheOk) { res.status(200).json({ status: 'ready' }); } else { res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk }); } });

Caching Strategies

Cache Patterns

Cache-Aside (Lazy Loading): ┌────────┐ miss ┌────────┐ │ App │ ──────────→ │ Cache │ │ │ ←────────── │ │ └────────┘ null └────────┘ │ │ read ↓ ┌────────┐ │ DB │ ──── write ──→ Cache └────────┘

Write-Through: App → Cache → DB (synchronous)

Write-Behind (Write-Back): App → Cache → (async) → DB

// Cache-aside implementation class CachedUserService { constructor( private cache: Redis, private db: Database ) {}

async getUser(id: string): Promise<User> { // Try cache first const cached = await this.cache.get(user:${id}); if (cached) return JSON.parse(cached);

// Cache miss - read from DB
const user = await this.db.users.findById(id);
if (user) {
  // Store in cache with TTL
  await this.cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
}
return user;

}

async updateUser(id: string, data: Partial<User>): Promise<User> { const user = await this.db.users.update(id, data); // Invalidate cache await this.cache.del(user:${id}); return user; } }

Cache Invalidation

Strategy Description Complexity

TTL Expire after time Simple

Event-based Invalidate on write Medium

Version-based Key includes version Medium

Tag-based Group related keys Complex

Database Scaling

Read Replicas

                ┌─────────────────┐
 Writes ──────→ │   Primary DB    │
                └────────┬────────┘
                         │ replication
       ┌─────────────────┼─────────────────┐
       ↓                 ↓                 ↓
┌──────────┐      ┌──────────┐      ┌──────────┐
│ Replica 1│      │ Replica 2│      │ Replica 3│
└──────────┘      └──────────┘      └──────────┘
       ↑                 ↑                 ↑
       └─────── Reads ───┴────────────────┘

Sharding (Partitioning)

// Hash-based sharding function getShard(userId: string, numShards: number): number { const hash = crypto.createHash('md5').update(userId).digest('hex'); return parseInt(hash.slice(0, 8), 16) % numShards; }

// Range-based sharding function getShardByDate(date: Date): string { const year = date.getFullYear(); const month = date.getMonth() + 1; return orders_${year}_${month.toString().padStart(2, '0')}; }

// Consistent hashing for dynamic shards class ConsistentHash { private ring: Map<number, string> = new Map();

addNode(node: string) { for (let i = 0; i < 150; i++) { // Virtual nodes const hash = this.hash(${node}:${i}); this.ring.set(hash, node); } }

getNode(key: string): string { const hash = this.hash(key); // Find next node on ring for (const [nodeHash, node] of [...this.ring.entries()].sort()) { if (nodeHash >= hash) return node; } return this.ring.values().next().value; } }

Message Queues

Patterns

Point-to-Point: Producer → Queue → Consumer

Pub/Sub: ┌─→ Subscriber 1 Publisher → Topic ─→ Subscriber 2 └─→ Subscriber 3

Work Queue: ┌─→ Worker 1 Producer → Queue ─→ Worker 2 (competing consumers) └─→ Worker 3

Delivery Guarantees

Guarantee Description Implementation

At-most-once May lose messages Fire and forget

At-least-once May duplicate Ack after process

Exactly-once No loss, no dupe Idempotency + dedup

// Idempotent processing async function processOrder(event: OrderEvent) { // Check if already processed const processed = await redis.get(processed:${event.id}); if (processed) { console.log(Already processed ${event.id}); return; }

// Process the order await db.orders.create(event.order);

// Mark as processed (with TTL for cleanup) await redis.set(processed:${event.id}, '1', 'EX', 86400 * 7); }

High Availability

Redundancy Patterns

Active-Active: ┌────────┐ ┌────────┐ │Server A│ ←─→ │Server B│ Both handle traffic └────────┘ └────────┘

Active-Passive: ┌────────┐ ┌────────┐ │ Active │ ──→ │Standby │ Failover on failure └────────┘ └────────┘

Circuit Breaker

class CircuitBreaker { private failures = 0; private lastFailure: Date | null = null; private state: 'closed' | 'open' | 'half-open' = 'closed';

constructor( private threshold: number = 5, private timeout: number = 30000 ) {}

async execute<T>(fn: () => Promise<T>): Promise<T> { if (this.state === 'open') { if (Date.now() - this.lastFailure!.getTime() > this.timeout) { this.state = 'half-open'; } else { throw new Error('Circuit is open'); } }

try {
  const result = await fn();
  this.onSuccess();
  return result;
} catch (error) {
  this.onFailure();
  throw error;
}

}

private onSuccess() { this.failures = 0; this.state = 'closed'; }

private onFailure() { this.failures++; this.lastFailure = new Date(); if (this.failures >= this.threshold) { this.state = 'open'; } } }

CAP Theorem

     Consistency
        /\
       /  \
      /    \
     /      \
    /   CA   \
   /──────────\
  /            \
 / CP        AP \
/________________\

Partition Availability Tolerance

CA: Single node (RDBMS) CP: MongoDB, HBase (may reject writes during partition) AP: Cassandra, DynamoDB (eventual consistency)

Consistency Models

Model Description Example

Strong Read sees latest write RDBMS

Eventual Eventually consistent DNS, Cassandra

Causal Respects causality Chat apps

Read-your-writes See your own writes Social feeds

Rate Limiting

Algorithms

// Token Bucket class TokenBucket { private tokens: number; private lastRefill: number;

constructor( private capacity: number, private refillRate: number // tokens per second ) { this.tokens = capacity; this.lastRefill = Date.now(); }

consume(tokens: number = 1): boolean { this.refill(); if (this.tokens >= tokens) { this.tokens -= tokens; return true; } return false; }

private refill() { const now = Date.now(); const elapsed = (now - this.lastRefill) / 1000; this.tokens = Math.min( this.capacity, this.tokens + elapsed * this.refillRate ); this.lastRefill = now; } }

// Sliding Window class SlidingWindowRateLimiter { constructor( private redis: Redis, private limit: number, private window: number // seconds ) {}

async isAllowed(key: string): Promise<boolean> { const now = Date.now(); const windowStart = now - this.window * 1000;

const pipe = this.redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now, `${now}`);
pipe.zcard(key);
pipe.expire(key, this.window);

const results = await pipe.exec();
const count = results[2][1] as number;

return count &#x3C;= this.limit;

} }

Common System Designs

System Key Components

URL Shortener Hash function, Redis cache, DB

Twitter Feed Fan-out, Redis timeline, Kafka

Chat App WebSocket, Presence, Message queue

E-commerce Cart service, Inventory, Payment

Video Streaming CDN, Chunking, Adaptive bitrate

Related Skills

[[architecture-patterns]] - Microservices, event-driven
[[database]] - Database optimization
[[caching-implementation]] - Cache strategies
[[reliability-engineering]] - SRE practices

system-design

Safety Notice

Copy this and send it to your AI assistant to learn

Kubernetes-style health checks

Source Transparency

Related Skills

saas-platforms

architecture-patterns

frontend

project-management