system-design

Principles for designing systems that handle scale, remain available, and perform well under load.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "system-design" with this command: npx skills add miles990/claude-software-skills/miles990-claude-software-skills-system-design

System Design

Overview

Principles for designing systems that handle scale, remain available, and perform well under load.

Scalability Fundamentals

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up): ┌─────────────────────┐ │ Bigger Server │ │ - More CPU │ │ - More RAM │ │ - Faster disk │ └─────────────────────┘ Limit: Hardware ceiling

Horizontal Scaling (Scale Out): ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │Server│ │Server│ │Server│ │Server│ └──────┘ └──────┘ └──────┘ └──────┘ ↑ Load Balancer Limit: Coordination complexity

Stateless Services

// ❌ Stateful - stores session in memory class BadService { private sessions = new Map();

login(userId: string) { this.sessions.set(userId, { loggedIn: true }); } }

// ✅ Stateless - external session store class GoodService { constructor(private sessionStore: Redis) {}

async login(userId: string) { await this.sessionStore.set(session:${userId}, { loggedIn: true }); } }

Load Balancing

Strategies

Strategy Description Use Case

Round Robin Cycle through servers Equal capacity servers

Weighted RR Based on server capacity Mixed capacity

Least Connections Route to least busy Long-lived connections

IP Hash Same IP → same server Session stickiness

URL Hash Same URL → same server Cache optimization

Health Checks

Kubernetes-style health checks

livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 3 periodSeconds: 10

readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5

// Health check endpoints app.get('/health/live', (req, res) => { // Am I running? res.status(200).json({ status: 'alive' }); });

app.get('/health/ready', async (req, res) => { // Can I serve traffic? const dbOk = await checkDatabase(); const cacheOk = await checkCache();

if (dbOk && cacheOk) { res.status(200).json({ status: 'ready' }); } else { res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk }); } });

Caching Strategies

Cache Patterns

Cache-Aside (Lazy Loading): ┌────────┐ miss ┌────────┐ │ App │ ──────────→ │ Cache │ │ │ ←────────── │ │ └────────┘ null └────────┘ │ │ read ↓ ┌────────┐ │ DB │ ──── write ──→ Cache └────────┘

Write-Through: App → Cache → DB (synchronous)

Write-Behind (Write-Back): App → Cache → (async) → DB

// Cache-aside implementation class CachedUserService { constructor( private cache: Redis, private db: Database ) {}

async getUser(id: string): Promise<User> { // Try cache first const cached = await this.cache.get(user:${id}); if (cached) return JSON.parse(cached);

// Cache miss - read from DB
const user = await this.db.users.findById(id);
if (user) {
  // Store in cache with TTL
  await this.cache.set(`user:${id}`, JSON.stringify(user), 'EX', 3600);
}
return user;

}

async updateUser(id: string, data: Partial<User>): Promise<User> { const user = await this.db.users.update(id, data); // Invalidate cache await this.cache.del(user:${id}); return user; } }

Cache Invalidation

Strategy Description Complexity

TTL Expire after time Simple

Event-based Invalidate on write Medium

Version-based Key includes version Medium

Tag-based Group related keys Complex

Database Scaling

Read Replicas

                ┌─────────────────┐
 Writes ──────→ │   Primary DB    │
                └────────┬────────┘
                         │ replication
       ┌─────────────────┼─────────────────┐
       ↓                 ↓                 ↓
┌──────────┐      ┌──────────┐      ┌──────────┐
│ Replica 1│      │ Replica 2│      │ Replica 3│
└──────────┘      └──────────┘      └──────────┘
       ↑                 ↑                 ↑
       └─────── Reads ───┴────────────────┘

Sharding (Partitioning)

// Hash-based sharding function getShard(userId: string, numShards: number): number { const hash = crypto.createHash('md5').update(userId).digest('hex'); return parseInt(hash.slice(0, 8), 16) % numShards; }

// Range-based sharding function getShardByDate(date: Date): string { const year = date.getFullYear(); const month = date.getMonth() + 1; return orders_${year}_${month.toString().padStart(2, '0')}; }

// Consistent hashing for dynamic shards class ConsistentHash { private ring: Map<number, string> = new Map();

addNode(node: string) { for (let i = 0; i < 150; i++) { // Virtual nodes const hash = this.hash(${node}:${i}); this.ring.set(hash, node); } }

getNode(key: string): string { const hash = this.hash(key); // Find next node on ring for (const [nodeHash, node] of [...this.ring.entries()].sort()) { if (nodeHash >= hash) return node; } return this.ring.values().next().value; } }

Message Queues

Patterns

Point-to-Point: Producer → Queue → Consumer

Pub/Sub: ┌─→ Subscriber 1 Publisher → Topic ─→ Subscriber 2 └─→ Subscriber 3

Work Queue: ┌─→ Worker 1 Producer → Queue ─→ Worker 2 (competing consumers) └─→ Worker 3

Delivery Guarantees

Guarantee Description Implementation

At-most-once May lose messages Fire and forget

At-least-once May duplicate Ack after process

Exactly-once No loss, no dupe Idempotency + dedup

// Idempotent processing async function processOrder(event: OrderEvent) { // Check if already processed const processed = await redis.get(processed:${event.id}); if (processed) { console.log(Already processed ${event.id}); return; }

// Process the order await db.orders.create(event.order);

// Mark as processed (with TTL for cleanup) await redis.set(processed:${event.id}, '1', 'EX', 86400 * 7); }

High Availability

Redundancy Patterns

Active-Active: ┌────────┐ ┌────────┐ │Server A│ ←─→ │Server B│ Both handle traffic └────────┘ └────────┘

Active-Passive: ┌────────┐ ┌────────┐ │ Active │ ──→ │Standby │ Failover on failure └────────┘ └────────┘

Circuit Breaker

class CircuitBreaker { private failures = 0; private lastFailure: Date | null = null; private state: 'closed' | 'open' | 'half-open' = 'closed';

constructor( private threshold: number = 5, private timeout: number = 30000 ) {}

async execute<T>(fn: () => Promise<T>): Promise<T> { if (this.state === 'open') { if (Date.now() - this.lastFailure!.getTime() > this.timeout) { this.state = 'half-open'; } else { throw new Error('Circuit is open'); } }

try {
  const result = await fn();
  this.onSuccess();
  return result;
} catch (error) {
  this.onFailure();
  throw error;
}

}

private onSuccess() { this.failures = 0; this.state = 'closed'; }

private onFailure() { this.failures++; this.lastFailure = new Date(); if (this.failures >= this.threshold) { this.state = 'open'; } } }

CAP Theorem

     Consistency
        /\
       /  \
      /    \
     /      \
    /   CA   \
   /──────────\
  /            \
 / CP        AP \
/________________\

Partition Availability Tolerance

CA: Single node (RDBMS) CP: MongoDB, HBase (may reject writes during partition) AP: Cassandra, DynamoDB (eventual consistency)

Consistency Models

Model Description Example

Strong Read sees latest write RDBMS

Eventual Eventually consistent DNS, Cassandra

Causal Respects causality Chat apps

Read-your-writes See your own writes Social feeds

Rate Limiting

Algorithms

// Token Bucket class TokenBucket { private tokens: number; private lastRefill: number;

constructor( private capacity: number, private refillRate: number // tokens per second ) { this.tokens = capacity; this.lastRefill = Date.now(); }

consume(tokens: number = 1): boolean { this.refill(); if (this.tokens >= tokens) { this.tokens -= tokens; return true; } return false; }

private refill() { const now = Date.now(); const elapsed = (now - this.lastRefill) / 1000; this.tokens = Math.min( this.capacity, this.tokens + elapsed * this.refillRate ); this.lastRefill = now; } }

// Sliding Window class SlidingWindowRateLimiter { constructor( private redis: Redis, private limit: number, private window: number // seconds ) {}

async isAllowed(key: string): Promise<boolean> { const now = Date.now(); const windowStart = now - this.window * 1000;

const pipe = this.redis.pipeline();
pipe.zremrangebyscore(key, 0, windowStart);
pipe.zadd(key, now, `${now}`);
pipe.zcard(key);
pipe.expire(key, this.window);

const results = await pipe.exec();
const count = results[2][1] as number;

return count &#x3C;= this.limit;

} }

Common System Designs

System Key Components

URL Shortener Hash function, Redis cache, DB

Twitter Feed Fan-out, Redis timeline, Kafka

Chat App WebSocket, Presence, Message queue

E-commerce Cart service, Inventory, Payment

Video Streaming CDN, Chunking, Adaptive bitrate

Related Skills

  • [[architecture-patterns]] - Microservices, event-driven

  • [[database]] - Database optimization

  • [[caching-implementation]] - Cache strategies

  • [[reliability-engineering]] - SRE practices

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

saas-platforms

No summary provided by upstream source.

Repository SourceNeeds Review
General

architecture-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
General

frontend

No summary provided by upstream source.

Repository SourceNeeds Review
General

project-management

No summary provided by upstream source.

Repository SourceNeeds Review