Software Architecture
Complete framework for designing software systems that are scalable, maintainable, and aligned with business requirements.
When to Use
-
Starting a new project or greenfield development
-
Refactoring a monolith
-
System is growing beyond current architecture
-
Making technology stack decisions
-
Designing for scale (10x users expected)
-
Multiple teams working on same codebase
-
Performance or reliability issues
-
Planning microservices migration
Core Principles
Architecture Serves Business:
-
Technology choices follow business needs
-
Trade-offs are intentional
-
Over-engineering is waste
-
Simplest solution that works
SOLID Principles:
S - Single Responsibility Principle O - Open/Closed Principle L - Liskov Substitution Principle I - Interface Segregation Principle D - Dependency Inversion Principle
Other Key Principles:
-
DRY (Don't Repeat Yourself)
-
KISS (Keep It Simple, Stupid)
-
YAGNI (You Aren't Gonna Need It)
-
Separation of Concerns
-
Principle of Least Surprise
Workflow
Step 1: Understand Requirements
Functional Requirements:
What the System Must Do
User Stories:
- As a [user], I want to [action] so that [benefit]
Features:
- User authentication
- Product catalog
- Shopping cart
- Payment processing
- Order tracking
Business Rules:
- Discount codes can only be used once per user
- Orders over $50 get free shipping
- Inventory decrements on successful payment
Non-Functional Requirements (The "ilities"):
How the System Must Perform
Scalability:
- Support 10K concurrent users
- Handle 100K products in catalog
- Process 1K orders per hour
Performance:
- Page load <2 seconds
- API response <100ms (p95)
- Search results <500ms
Reliability:
- 99.9% uptime (8.7 hours downtime/year)
- Zero data loss
- Graceful degradation under load
Security:
- PCI DSS compliant for payments
- GDPR compliant for EU users
- Data encrypted at rest and in transit
Maintainability:
- New developers productive in 1 week
- Deploy multiple times per day
- Rollback within 5 minutes
Observability:
- Full request tracing
- Error rate monitoring
- Performance metrics
Step 2: Choose Architectural Pattern
Monolith:
Best for:
- Small teams (<10 people)
- Simple domains
- Early-stage startups
- Rapid iteration
Architecture: ┌─────────────────────────┐ │ Web Application │ │ ┌──────┬──────┬──────┐ │ │ │ UI │Logic │ Data │ │ │ └──────┴──────┴──────┘ │ └─────────────────────────┘ ↓ Single Database
Pros: ✅ Simple to develop ✅ Simple to deploy ✅ Simple to test ✅ Low latency between components
Cons: ❌ Scaling requires scaling everything ❌ Tight coupling ❌ One failure affects all ❌ Hard to work on independently
Microservices:
Best for:
- Large teams (multiple squads)
- Complex domains
- Independent scaling needs
- Polyglot requirements
Architecture: ┌──────────┐ ┌──────────┐ ┌──────────┐ │ User │ │ Order │ │ Payment │ │ Service │ │ Service │ │ Service │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ ↓ ↓ ↓ User DB Order DB Payment DB
Pros: ✅ Independent deployment ✅ Technology flexibility ✅ Team autonomy ✅ Fault isolation
Cons: ❌ Network complexity ❌ Distributed transactions hard ❌ More operational overhead ❌ Debugging across services
Event-Driven:
Best for:
- Async workflows
- Real-time data processing
- Audit trails
- Decoupled systems
Architecture: ┌─────────┐ ┌────────────┐ │Producer │──────>│Event Queue │ └─────────┘ └─────┬──────┘ │ ┌──────────────┼──────────────┐ ↓ ↓ ↓ Consumer 1 Consumer 2 Consumer 3
Pros: ✅ Loose coupling ✅ Easy to add consumers ✅ Natural audit log ✅ Handles spikes well
Cons: ❌ Eventual consistency ❌ Harder to debug ❌ Message ordering challenges ❌ More moving parts
Layered Architecture (N-Tier):
Best for:
- Traditional enterprise apps
- Clear separation of concerns
- Team specialization (frontend/backend/data)
Architecture: ┌─────────────────────────┐ │ Presentation Layer │ (UI, API) ├─────────────────────────┤ │ Business Logic Layer │ (Domain, Services) ├─────────────────────────┤ │ Data Access Layer │ (Repositories, ORM) ├─────────────────────────┤ │ Database Layer │ (PostgreSQL, etc.) └─────────────────────────┘
Rules:
- Upper layers can call lower layers
- Lower layers cannot call upper layers
- Each layer has clear responsibility
Pros: ✅ Clear separation ✅ Testable layers ✅ Familiar pattern
Cons: ❌ Can become rigid ❌ Changes ripple across layers ❌ Performance overhead
Hexagonal Architecture (Ports & Adapters):
Best for:
- Domain-driven design
- Testing-heavy environments
- Swappable infrastructure
Architecture: ┌─────────────┐ │ Domain │ │ (Core) │ └──────┬──────┘ │ ┌─────────┼─────────┐ ↓ ↓ ↓ HTTP API Database Queue (Adapter) (Adapter) (Adapter)
Core never depends on adapters Adapters depend on core
Pros: ✅ Highly testable ✅ Infrastructure-agnostic ✅ DDD-friendly
Cons: ❌ More abstraction ❌ Steeper learning curve ❌ Can be over-engineered
Step 3: Design System Components
Component Design Template:
[Component Name]
Purpose: What does this component do?
Responsibilities:
- Responsibility 1
- Responsibility 2
Dependencies:
- Component A (for X)
- Component B (for Y)
Interfaces:
interface ComponentAPI {
operation1(input: Type): Promise<Result>;
operation2(input: Type): Result;
}
Data:
What data does it own/manage?
Events:
What events does it emit/consume?
Error Handling:
How does it handle failures?
**Example - Order Service:**
```markdown
## Order Service
**Purpose:**
Manage order lifecycle from creation to fulfillment
**Responsibilities:**
- Create orders
- Update order status
- Calculate totals with discounts
- Validate inventory availability
**Dependencies:**
- User Service (get user details)
- Inventory Service (check/reserve stock)
- Payment Service (process payment)
**Interfaces:**
```typescript
interface OrderService {
createOrder(cart: Cart, userId: string): Promise<Order>;
getOrder(orderId: string): Promise<Order>;
updateStatus(orderId: string, status: OrderStatus): Promise<void>;
}
Events Emitted:
- OrderCreated
- OrderPaid
- OrderShipped
- OrderCancelled
Events Consumed:
- PaymentSucceeded
- PaymentFailed
Error Handling:
- Invalid cart → 400 Bad Request
- Out of stock → 409 Conflict
- Payment fails → Reverse inventory reservation
### Step 4: Make Technology Choices
**Decision Framework:**
```markdown
## Technology Decision: [Name]
**Problem:**
What are we trying to solve?
**Options:**
1. Option A
2. Option B
3. Option C
**Criteria:**
- Performance requirements
- Team expertise
- Community support
- Cost
- Scalability
- Security
**Evaluation:**
| Criteria | Option A | Option B | Option C |
|----------|----------|----------|----------|
| Performance | 8/10 | 9/10 | 7/10 |
| Expertise | 9/10 | 5/10 | 8/10 |
| Community | 10/10 | 7/10 | 9/10 |
| Cost | Free | $X/mo | Free |
| Scalability | 7/10 | 10/10 | 8/10 |
**Decision:** Option A
**Rationale:**
Why we chose this option.
**Trade-offs:**
What we're giving up.
**Review Date:**
When we'll reconsider this decision.
Example - Database Choice:
## Database for Order Service
**Problem:**
Need persistent storage for orders with ACID guarantees
**Options:**
1. PostgreSQL (Relational)
2. MongoDB (Document)
3. DynamoDB (NoSQL)
**Criteria:**
- ACID compliance (critical)
- Complex queries (important)
- Scalability (important)
- Team expertise (important)
**Evaluation:**
| Criteria | PostgreSQL | MongoDB | DynamoDB |
|----------|------------|---------|----------|
| ACID | ✅ Full | ⚠️ Limited | ⚠️ Eventual |
| Queries | ✅ Excellent | ⚠️ Good | ❌ Limited |
| Scale | ✅ Vertical+ | ✅ Horizontal | ✅ Managed |
| Expertise | ✅ High | ⚠️ Medium | ❌ Low |
**Decision:** PostgreSQL
**Rationale:**
- ACID compliance is non-negotiable for financial transactions
- Team has 5 years PostgreSQL experience
- Can scale vertically to meet current needs
- Complex reporting queries needed
**Trade-offs:**
- Harder to horizontally scale than MongoDB
- More expensive at large scale than DynamoDB
- Self-managed vs fully managed
**Review Date:** When we hit 100K orders/day
Step 5: Plan for Scale
Scaling Strategies:
## Vertical Scaling (Scale Up)
Add more resources to single machine
**When:**
- Quick fix needed
- Simple deployment
- Under 10K users
**How:**
- Bigger CPU
- More RAM
- Faster disk
**Limits:**
- Hardware ceiling
- Single point of failure
- Expensive at scale
---
## Horizontal Scaling (Scale Out)
Add more machines
**When:**
- Growth expected
- High availability needed
- Cost-effective at scale
**How:**
- Load balancer
- Stateless services
- Shared database or sharding
**Challenges:**
- Session management
- Distributed state
- Data consistency
---
## Caching Strategy
Reduce load on database/services
**Layers:**
Browser Cache → CDN → App Cache → Database Cache
Patterns:
- Cache-Aside (lazy loading)
- Write-Through (sync write)
- Write-Behind (async write)
- Refresh-Ahead (proactive)
Example:
async function getUser(id: string): Promise<User> {
// 1. Check cache
const cached = await cache.get(`user:${id}`);
if (cached) return cached;
// 2. Cache miss: fetch from DB
const user = await db.users.findById(id);
// 3. Store in cache (TTL: 1 hour)
await cache.set(`user:${id}`, user, 3600);
return user;
}
Database Scaling
Read Replicas:
┌────────┐
│Primary │ (writes)
└───┬────┘
│
├──────────┬──────────┐
↓ ↓ ↓
Replica Replica Replica
(reads) (reads) (reads)
Sharding:
User IDs 0-999 → Shard 1
User IDs 1000-1999 → Shard 2
User IDs 2000-2999 → Shard 3
Challenges:
- Rebalancing
- Cross-shard queries
- Transactions across shards
Partitioning:
Orders by date:
├── 2024-Q1 → Partition 1
├── 2024-Q2 → Partition 2
├── 2024-Q3 → Partition 3
└── 2024-Q4 → Partition 4
Benefits:
- Query performance
- Easier archival
- Smaller indexes
Step 6: Document Decisions (ADRs)
Architecture Decision Record Template:
# ADR [Number]: [Title]
**Status:** [Proposed | Accepted | Deprecated | Superseded]
**Date:** YYYY-MM-DD
**Deciders:** [Names]
---
## Context
What is the issue we're trying to solve?
**Current Situation:**
[Describe current state]
**Problem:**
[What needs to change and why]
**Constraints:**
- Technical constraints
- Business constraints
- Time constraints
---
## Decision
We will [decision].
**Details:**
[Explain the decision in detail]
---
## Options Considered
### Option 1: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
### Option 2: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
---
## Consequences
**Positive:**
- What improves
- What becomes easier
**Negative:**
- What becomes harder
- What we give up
**Risks:**
- What could go wrong
- Mitigation strategies
**Technical Debt:**
- What shortcuts are we taking
- When will we revisit
---
## Follow-up Actions
- [ ] Action 1 (Owner, Due Date)
- [ ] Action 2 (Owner, Due Date)
---
## References
- Link to design doc
- Link to RFC
- Related ADRs
Example ADR:
# ADR 001: Migrate from Monolith to Microservices
**Status:** Accepted
**Date:** 2026-01-15
**Deciders:** Architecture Team, Engineering Leads
---
## Context
**Current Situation:**
Single Rails monolith serving all traffic. 50K daily active users.
**Problem:**
- Deployment takes 30 minutes, blocks all teams
- Database at 80% capacity
- Cannot scale teams independently
- Different services have different scaling needs (API vs background jobs)
**Constraints:**
- Must maintain 99.9% uptime during migration
- Complete within 6 months
- Team of 15 engineers
---
## Decision
We will migrate to microservices using the Strangler Fig pattern.
**Approach:**
1. Start with highest-value, lowest-risk services (User Service, Notifications)
2. Extract one service per month
3. API Gateway routes to new services
4. Monolith remains for remaining functionality
5. Gradual data migration
**Tech Stack:**
- Services: Node.js/TypeScript
- Communication: REST + Message Queue (RabbitMQ)
- Deployment: Kubernetes
- Data: PostgreSQL per service
---
## Options Considered
### Option 1: Continue Scaling Monolith
**Pros:**
- Simplest
- Team already knows it
- No migration risk
**Cons:**
- Doesn't solve team scaling
- Database still bottleneck
- Deployment still blocking
### Option 2: Big Bang Rewrite
**Pros:**
- Fresh start
- Modern architecture
**Cons:**
- High risk
- 6+ months no features
- Likely to fail
### Option 3: Strangler Fig Migration (CHOSEN)
**Pros:**
- Low risk (gradual)
- Continuous value delivery
- Reversible
- Learn as we go
**Cons:**
- Longer timeline
- Temporary complexity
- Some duplication
---
## Consequences
**Positive:**
- Teams can deploy independently
- Services scale independently
- Technology flexibility
- Fault isolation
**Negative:**
- Operational complexity (15+ services)
- Distributed debugging harder
- Network latency between services
- More infrastructure cost
**Risks:**
- Data consistency across services
- Authentication/authorization complexity
- Monitoring/observability gaps
**Mitigation:**
- Event sourcing for data sync
- Shared auth service
- OpenTelemetry from day 1
**Technical Debt:**
- Monolith will coexist for 12-18 months
- Some duplication during migration
- Revisit architecture Q3 2026
---
## Follow-up Actions
- [x] Create migration roadmap (Sarah, 2026-01-20)
- [x] Set up Kubernetes cluster (DevOps, 2026-01-25)
- [ ] Extract User Service (Team A, 2026-02-15)
- [ ] Implement API Gateway (Team B, 2026-02-01)
- [ ] Set up observability (DevOps, 2026-01-30)
---
## References
- [Migration Roadmap](link)
- [Microservices RFC](link)
- Related: ADR 002 (Service Communication Pattern)
Common Patterns & Practices
API Gateway Pattern:
Client
↓
API Gateway (routes, auth, rate limiting)
├──→ User Service
├──→ Order Service
└──→ Payment Service
Benefits:
- Single entry point
- Handles cross-cutting concerns
- Backend for frontend
Circuit Breaker Pattern:
class CircuitBreaker {
state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
failures = 0;
threshold = 5;
async call(fn: Function) {
if (this.state === 'OPEN') {
throw new Error('Circuit breaker OPEN');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'OPEN';
setTimeout(() => this.state = 'HALF_OPEN', 60000);
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
}
Saga Pattern (Distributed Transactions):
Order Saga:
1. Create Order → Success
2. Reserve Inventory → Success
3. Charge Payment → FAILS
Compensation (rollback):
3. Refund Payment ← (skipped, never charged)
2. Release Inventory ← Execute
1. Cancel Order ← Execute
Result: Consistent state, no partial orders
CQRS (Command Query Responsibility Segregation):
Commands (Writes): Queries (Reads):
Create Order Get Order
Update User List Orders
Delete Product Search Products
↓ ↑
Write DB ──────→ Read DB
(normalized) (denormalized)
Benefits:
- Optimize read/write separately
- Scale independently
- Complex queries without impacting writes
Architecture Checklist
## Pre-Development
- [ ] Functional requirements documented
- [ ] Non-functional requirements defined
- [ ] Architecture pattern chosen
- [ ] Technology stack decided
- [ ] Data model designed
- [ ] API contracts defined
- [ ] Security reviewed
- [ ] Scalability plan created
## During Development
- [ ] Code organized by domain/feature
- [ ] Dependencies point inward (clean architecture)
- [ ] Interfaces define contracts
- [ ] Error handling consistent
- [ ] Logging and monitoring instrumented
- [ ] Tests cover critical paths
- [ ] Documentation up to date
## Pre-Production
- [ ] Load testing completed
- [ ] Security audit passed
- [ ] Monitoring dashboards ready
- [ ] Alerts configured
- [ ] Runbooks written
- [ ] Rollback plan tested
- [ ] DR plan documented
- [ ] Team trained
Common Mistakes
Don't
Do
Microservices for everything
Start monolith, extract when needed
Premature optimization
Optimize when you have data
Architecture astronaut
Solve today's problems, not future maybes
Copy Big Tech architecture
Your scale != their scale
Ignore non-functional requirements
Performance/security/reliability matter
Big Bang rewrites
Incremental refactoring
One size fits all
Different components, different patterns
Skip documentation
ADRs, diagrams, runbooks
Tools & Resources
Diagramming:
- draw.io (free, versatile)
- Lucidchart (collaborative)
- Mermaid (code-based)
- C4 Model (structured approach)
Books:
- "Clean Architecture" by Robert Martin
- "Designing Data-Intensive Applications" by Martin Kleppmann
- "Building Microservices" by Sam Newman
- "Domain-Driven Design" by Eric Evans
Patterns:
- microservices.io (pattern catalog)
- martinfowler.com (architecture articles)
Related Skills
- /systems-decompose
- Break down features
- /database-schema
- Design data models
- /api-design
- Design API contracts
- /code-review
- Review architectural decisions
Last Updated: 2026-01-22