system-design-thinking

Frameworks for architectural decision-making and system design. Use when designing new systems, evaluating architecture options, breaking down complex problems, making build-vs-buy decisions, or when asked about "system design", "architecture", "how should I structure this", or "what's the best approach".

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "system-design-thinking" with this command: npx skills add tawanorg/skills/tawanorg-skills-system-design-thinking

System Design Thinking

Frameworks for thinking about architecture before writing code. Design decisions are expensive to change—think first, code second.

Core Philosophy

"The goal of software architecture is to minimize the human resources required to build and maintain the required system." — Robert C. Martin

Architecture is about:

  1. Managing complexity - Breaking big problems into smaller ones
  2. Enabling change - Making the system adaptable
  3. Deferring decisions - Keeping options open as long as possible
  4. Communicating intent - Making the system understandable

The Design Thinking Process

1. Understand Before Designing

See rules/start-with-requirements.md

Never design without understanding:

UnderstandQuestions to Ask
UsersWho uses this? What do they need? How many?
DataWhat data? How much? How fast does it grow?
OperationsWho runs it? How is it deployed? Monitored?
ConstraintsBudget? Timeline? Team skills? Existing systems?
Quality attributesLatency? Availability? Consistency? Security?

2. Identify the Core Problem

Before jumping to solutions, articulate:

We need to [capability]
For [users/systems]
So that [business value]
Constrained by [limitations]

Example:

We need to process payment transactions
For 10,000 concurrent users
So that customers can complete purchases
Constrained by:
- 99.99% availability requirement
- PCI compliance
- <200ms response time
- Existing PostgreSQL infrastructure

3. Explore the Solution Space

Don't commit to the first idea. Generate options:

OptionProsConsRisk
Option A.........
Option B.........
Option C.........

4. Make and Document Decisions

See references/adr-template.md

Every significant decision needs:

  • Context - Why are we making this decision?
  • Options - What did we consider?
  • Decision - What did we choose?
  • Consequences - What are the trade-offs?

Trade-off Analysis

See references/trade-off-analysis.md for detailed frameworks.

The Iron Triangle

You can optimize for two, not all three:

        Speed
         /\
        /  \
       /    \
      /______\
   Scope    Quality

CAP Theorem

Distributed systems can guarantee only two of three:

PropertyMeaning
ConsistencyAll nodes see the same data
AvailabilityEvery request gets a response
Partition toleranceSystem works despite network splits

In practice: Network partitions happen, so choose CP or AP:

  • CP (Consistency + Partition tolerance): Banking, inventory
  • AP (Availability + Partition tolerance): Social feeds, caching

Common Trade-offs

Trade-offWhen to favor AWhen to favor B
Consistency vs AvailabilityFinancial data, inventorySocial features, analytics
Latency vs ThroughputUser-facing APIsBatch processing
Simplicity vs FlexibilityMVP, small teamPlatform, multiple use cases
Build vs BuyCore differentiatorCommodity functionality
Monolith vs MicroservicesSmall team, new productLarge team, proven domain

Architecture Patterns Quick Reference

See references/architecture-styles.md for detailed comparisons.

When to Use What

PatternBest ForAvoid When
MonolithNew products, small teams, unclear domainsTeam > 10, independent scaling needed
Modular MonolithGrowing product, preparing for splitAlready distributed
MicroservicesLarge teams, independent deploymentsSmall team, unclear boundaries
ServerlessSporadic traffic, event-drivenConsistent high load, complex state
Event-DrivenLoose coupling, async workflowsSimple CRUD, strong consistency

Start Simple

See rules/prefer-simple-architecture.md

Start here ──► Monolith ──► Modular Monolith ──► Microservices
                  │
                  └── Most projects never need to leave

The Monolith First rule: Unless you have proven scale requirements, start with a well-structured monolith.


System Decomposition

Finding Boundaries

See rules/define-boundaries-first.md

Good boundaries have:

  • High cohesion - Related things together
  • Low coupling - Minimal dependencies across boundaries
  • Clear contracts - Well-defined interfaces
  • Independent deployability - Can change without coordinating

Decomposition Strategies

StrategyHow It WorksBest For
By domainBusiness capabilities (Orders, Users, Payments)Business-aligned teams
By layerPresentation, Business, DataSmall teams, simple domains
By featureEnd-to-end slices (Checkout, Search)Feature teams
By volatilitySeparate stable from changing partsMixed stability requirements

Domain-Driven Design (Quick Reference)

ConceptMeaningExample
Bounded ContextClear boundary with its own model"Orders" vs "Inventory"
Ubiquitous LanguageShared vocabulary within context"Order" means same thing to all
AggregateCluster of entities, one rootOrder with OrderItems
Domain EventSomething that happenedOrderPlaced, PaymentReceived

Data Architecture

See references/data-architecture.md

Data Flow First

Before designing components, understand:

  1. What data exists? - Entities, relationships, volumes
  2. How does it flow? - Sources, transformations, destinations
  3. Who owns it? - Single source of truth for each entity
  4. How fresh must it be? - Real-time, near-real-time, batch

Storage Decisions

NeedConsider
Structured data, ACIDPostgreSQL, MySQL
Document/flexible schemaMongoDB, DynamoDB
Key-value, cachingRedis, Memcached
Time seriesInfluxDB, TimescaleDB
SearchElasticsearch, Algolia
Analytics/OLAPBigQuery, Snowflake, ClickHouse
File/blob storageS3, GCS, Azure Blob

Data Consistency Patterns

PatternConsistencyUse When
Single databaseStrongSimple apps, single service
Database per serviceEventualMicroservices, independent scaling
Saga patternEventualDistributed transactions
CQRSEventualRead/write scaling differs
Event sourcingEventualAudit trail, temporal queries

Integration Patterns

See references/integration-patterns.md

Sync vs Async

SynchronousAsynchronous
Request/ResponseEvents/Messages
Simple, immediateResilient, decoupled
Tight couplingLoose coupling
Cascading failuresIsolated failures
REST, gRPCQueues, Pub/Sub

When to Use What

Need immediate response? ──► Sync (REST/gRPC)
                │
                No
                │
                ▼
Can caller continue without result? ──► Async (Queue/Event)
                │
                No
                │
                ▼
Use async with callback/webhook

Scalability Thinking

See references/scalability-patterns.md

Scaling Strategies

StrategyHowWhen
VerticalBigger machineQuick fix, stateful services
HorizontalMore machinesStateless services, proven bottleneck
CachingStore computed resultsRead-heavy, expensive computations
PartitioningSplit data by keyLarge datasets, geographic distribution
Async processingQueue work for laterSpiky traffic, heavy operations

Performance Budget

Before optimizing, set targets:

Page load: < 2s
API response: < 200ms (p99)
Background job: < 30s
Availability: 99.9% (8.76 hours downtime/year)

Design for Operations

See rules/consider-operations.md

Operability Checklist

Every system needs:

  • Health checks - Is it running? Is it healthy?
  • Logging - What happened? (Structured, searchable)
  • Metrics - How is it performing? (Latency, errors, throughput)
  • Alerting - When does it need attention?
  • Deployment - How do we release changes safely?
  • Rollback - How do we undo bad releases?
  • Disaster recovery - What if everything fails?

The "3 AM Test"

Ask: "If this breaks at 3 AM, can someone fix it?"

  • Are errors clear and actionable?
  • Are runbooks documented?
  • Can issues be diagnosed from logs/metrics?
  • Can it be rolled back quickly?

Quick Reference Checklist

Before Designing

  • Requirements understood (functional + non-functional)
  • Constraints identified (budget, timeline, skills, compliance)
  • Success criteria defined (measurable)
  • Stakeholders aligned

During Design

  • Multiple options explored
  • Trade-offs explicitly stated
  • Simplest solution that works chosen
  • Boundaries clearly defined
  • Data flow mapped
  • Failure modes considered

After Design

  • Decision documented (ADR)
  • Operability addressed
  • Migration path identified (if replacing existing)
  • Team understands and agrees

Rules

Atomic decision heuristics. Each rule follows: Context → Heuristic → Why.

RuleFile
Start with Requirementsrules/start-with-requirements.md
Prefer Simple Architecturerules/prefer-simple-architecture.md
Define Boundaries Firstrules/define-boundaries-first.md
Design for Failurerules/design-for-failure.md
Document Decisionsrules/document-decisions.md
Consider Operationsrules/consider-operations.md
Data Flow Firstrules/data-flow-first.md
Async Over Syncrules/async-over-sync.md

Reference Files

Deep dives on specific topics.

TopicFile
Trade-off Analysisreferences/trade-off-analysis.md
Architecture Stylesreferences/architecture-styles.md
Scalability Patternsreferences/scalability-patterns.md
Integration Patternsreferences/integration-patterns.md
Data Architecturereferences/data-architecture.md
ADR Templatereferences/adr-template.md

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ai-coding-principles

No summary provided by upstream source.

Repository SourceNeeds Review
General

nano-banana-2

Nano Banana 2 - Gemini 3.1 Flash Image Preview

Repository Source
41.9K153inferen-sh
General

qwen-image-2

Qwen-Image - Alibaba Image Generation

Repository Source
41.7K153inferen-sh
General

p-video

Pruna P-Video Generation

Repository Source
41.7K153inferen-sh