Architecture Design
Design system architecture and make strategic technical decisions.
Core Principle
Good architecture enables change while maintaining simplicity.
Name
han-core:architect - Design system architecture and high-level technical strategy
Synopsis
/architect [arguments]
Architecture vs Planning
Architecture Design (this skill):
-
Strategic: "How should the system be structured?"
-
Component interactions and boundaries
-
Technology and pattern choices
-
Long-term implications
-
System-level decisions
Technical Planning:
-
Tactical: "How do I implement feature X?"
-
Specific implementation tasks
-
Execution details
-
Short-term focus
Use /architect when:
-
Designing new systems or subsystems
-
Significant system change (new subsystem, major refactor)
-
Affects multiple components or teams
-
Major refactors affecting multiple components
-
Technology selection decisions
-
Defining system boundaries and interfaces
-
Long-term technical strategy needed
-
Need to evaluate multiple approaches
-
Decisions have broad impact
Use /plan when:
-
Implementing within existing architecture
-
Implementing specific feature within existing architecture
-
Tactical execution planning
-
Breaking down known work
-
Architecture is already decided
-
Task sequencing and execution
Architecture Process
- Understand Context
Business context:
-
What problem are we solving?
-
Who are the users?
-
What are the business goals?
-
What are the success metrics?
Technical context:
-
What exists today?
-
What constraints exist?
-
What must we integrate with?
-
What scale must we support?
Team context:
-
What's our expertise?
-
What can we maintain?
-
What's our velocity?
- Gather Requirements
Functional requirements:
-
What must the system do?
-
What are the features?
-
What are the user scenarios?
Non-functional requirements:
-
Performance: Response time, throughput
-
Scalability: Expected load, growth
-
Availability: Uptime requirements
-
Security: Compliance, data protection
-
Maintainability: Team size, skills
-
Cost: Budget constraints
Example:
Requirements
Functional
- Users can search products by name/category
- Users can add items to cart
- Users can checkout and pay
Non-Functional
- Search response time < 200ms (p95)
- Support 10,000 concurrent users
- 99.9% uptime
- PCI DSS compliant for payments
- Team of 5 developers can maintain
- Identify Constraints
Technical constraints:
-
Must use existing authentication system
-
Must integrate with legacy inventory system
-
Database must be PostgreSQL (existing infrastructure)
Business constraints:
-
Budget for infrastructure
-
Must support EU data residency
Team constraints:
-
Team experienced in Python, less in Go
-
No DevOps specialist on team
-
Remote team across timezones
- Consider Alternatives
Never design in a vacuum - consider options:
Example: Data storage choice
Option 1: PostgreSQL
-
Pros: Team knows it, ACID guarantees, rich query support
-
Cons: Vertical scaling limits, setup complexity
Option 2: MongoDB
-
Pros: Flexible schema, horizontal scaling
-
Cons: Team unfamiliar, eventual consistency
Option 3: DynamoDB
-
Pros: Fully managed, auto-scaling
-
Cons: Vendor lock-in, query limitations, cost at scale
Decision: PostgreSQL
-
Team expertise outweighs scaling concerns
-
Can re-evaluate if scale becomes issue
-
Faster initial development
- Design System Structure
Define components and their responsibilities:
┌─────────────────────────────────────────────┐ │ Client Apps │ │ (Web, iOS, Android) │ └────────────────┬────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────┐ │ API Gateway / Load Balancer │ └────────────────┬────────────────────────────┘ │ ┌────────┴────────┐ ▼ ▼ ┌───────────────┐ ┌───────────────┐ │ Auth │ │ Core API │ │ Service │ │ Service │ └───────┬───────┘ └───────┬───────┘ │ │ │ ┌────────┴────────┐ │ ▼ ▼ │ ┌──────────────┐ ┌──────────────┐ │ │ PostgreSQL │ │ Redis │ │ │ (Primary) │ │ (Cache) │ │ └──────────────┘ └──────────────┘ │ ▼ ┌───────────────┐ │ User DB │ └───────────────┘
Component descriptions:
Components
API Gateway
Responsibility: Route requests, rate limiting, authentication Technology: Nginx Dependencies: Auth Service, Core API Service Scale: 2-3 instances behind load balancer
Auth Service
Responsibility: User authentication, session management, JWT issuing Technology: Python (Flask), PostgreSQL API: REST Scale: Stateless, 2-N instances
Core API Service
Responsibility: Business logic, data access, external integrations Technology: Python (FastAPI), PostgreSQL, Redis API: REST Scale: Stateless, 2-N instances
PostgreSQL
Responsibility: Primary data store Scale: Primary with read replica
Redis
Responsibility: Session storage, caching, rate limiting Scale: Cluster mode (3 nodes)
- Define Interfaces
API contracts:
API Design
POST /api/auth/login
Purpose: Authenticate user, issue JWT
Request:
{
"email": "user@example.com",
"password": "secure_password"
}
Response (200):
{
"token": "eyJ...",
"user": {
"id": "123",
"email": "user@example.com",
"name": "John Doe"
}
}
Errors:
- 400: Invalid request
- 401: Invalid credentials
- 429: Rate limit exceeded
### 7. Plan for Failure
**What can go wrong?**
- Database unavailable
- External API down
- Network partition
- High load
- Data corruption
**Mitigation strategies:**
- Retry with exponential backoff
- Circuit breakers for external services
- Graceful degradation
- Health checks and monitoring
- Database backups
**Example:**
```markdown
## Failure Scenarios
### Database Unavailable
**Impact:** Cannot read/write data
**Mitigation:**
- Read replica failover (automated)
- Circuit breaker after 3 failures
- Cache serves stale data for 5 minutes
- User sees degraded experience message
**Recovery:** Manual failover to replica, fix primary
### External Payment API Down
**Impact:** Cannot process payments
**Mitigation:**
- Retry 3 times with exponential backoff
- Queue payments for later processing
- User notified of delay
- Alert on-call engineer
**Recovery:** Process queued payments once API recovers
8. Document Decisions
Architecture Decision Record (ADR):
# ADR-001: Use PostgreSQL for Primary Database
**Status:** Accepted
**Date:** 2024-01-15
**Deciders:** Tech Lead, Backend Team
## Context
We need to choose a primary database for user data, products, and orders.
Requirements:
- Strong consistency (ACID)
- Complex queries (joins, aggregations)
- < 200ms query time for 90% of queries
- Support 100k users initially
## Decision
Use PostgreSQL as primary database.
## Alternatives Considered
### MongoDB
- **Pros:** Flexible schema, horizontal scaling
- **Cons:** Team unfamiliar, eventual consistency issues
- **Why not:** Team expertise more valuable than flexibility
### DynamoDB
- **Pros:** Managed service, auto-scaling
- **Cons:** Vendor lock-in, limited query capability, cost
- **Why not:** Query limitations would hurt development velocity
### MySQL
- **Pros:** Similar to PostgreSQL, team knows it
- **Cons:** Less feature-rich than PostgreSQL
- **Why not:** PostgreSQL offers JSON support, better full-text search
## Consequences
**Positive:**
- Team can be productive immediately
- Strong consistency guarantees
- Rich query capabilities
- JSON support for flexible data
**Negative:**
- Vertical scaling limits (mitigated: can add read replicas)
- More complex than managed services (mitigated: use RDS)
- Higher operational overhead
**Trade-offs:**
- Chose familiarity over horizontal scaling
- Chose rich queries over eventual consistency
- Can re-evaluate if scale requirements change
## Validation
- Team confirmed expertise in PostgreSQL
- Load testing shows meets performance requirements
- Cost analysis shows acceptable for first year
Architecture Document Structure
# Architecture Design: [System/Feature Name]
## Context
### Problem Statement
[What business problem are we solving?]
### Goals
[What are we trying to achieve?]
### Non-Goals
[What are we explicitly NOT trying to achieve?]
### Requirements
**Functional:**
- [Requirement 1]
- [Requirement 2]
**Non-Functional:**
- Performance: [e.g., < 200ms response time]
- Scalability: [e.g., handle 10k concurrent users]
- Security: [e.g., PCI compliance]
- Maintainability: [e.g., easy to modify]
### Constraints
[Technical, time, resource, or business constraints]
## Current Architecture
[What exists today? What needs to change?]
## Proposed Architecture
### High-Level Design
[Diagram or description of system components]
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ Client │────▶│ API │────▶│ Database │
└─────────────┘ └─────────────┘ └──────────────┘
│
▼
┌─────────────┐
│ Cache │
└─────────────┘
### Components
#### Component A
**Responsibility:** [What it does]
**Interface:** [How others interact with it]
**Dependencies:** [What it depends on]
**Technology:** [Implementation stack]
#### Component B
[Similar structure...]
### Data Flow
[How data moves through the system]
### API Design
[Key endpoints, schemas, contracts]
### Data Model
[Database schema, key entities]
### Security Model
[Authentication, authorization, data protection]
## Alternative Approaches Considered
### Alternative 1: [Approach name]
**Pros:**
- [Advantage 1]
- [Advantage 2]
**Cons:**
- [Disadvantage 1]
- [Disadvantage 2]
**Why not chosen:** [Reasoning]
### Alternative 2: [Another approach]
[Similar structure...]
## Decision Rationale
### Why This Architecture?
[Explain the key decisions and trade-offs]
### Trade-offs Accepted
[What we gave up for what benefits]
### Assumptions
[What we're assuming to be true]
## Implementation Strategy
### Phase 1: [Foundation]
[What to build first]
### Phase 2: [Core Features]
[Next phase]
### Phase 3: [Enhancement]
[Final phase]
### Migration Strategy
[If replacing existing system, how to transition?]
## Risks & Mitigation
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| [Risk 1] | High | Medium | [How to mitigate] |
| [Risk 2] | Medium | Low | [How to mitigate] |
## Testing Strategy
[How will we validate this architecture?]
## Monitoring & Observability
[How will we know if it's working?]
## Success Metrics
[How will we measure success?]
## Open Questions
[What still needs to be resolved?]
## References
[Links to research, related docs, RFCs]
Architecture Principles
1. Simplicity
Start simple, add complexity only when needed.
BAD: Microservices from day 1 with 20 services
GOOD: Start with monolith, split when needed
Apply YAGNI: You Aren't Gonna Need It
- Don't build for hypothetical future
- Add when actually needed
- Simpler is easier to maintain
2. Separation of Concerns
Each component has one clear responsibility.
GOOD:
- Auth Service: Authentication only
- User Service: User profile management
- Order Service: Order processing
BAD:
- God Service: Does everything
Apply SOLID principles:
- Single Responsibility
- Open/Closed
- Liskov Substitution
- Interface Segregation
- Dependency Inversion
3. Loose Coupling
Components depend on interfaces, not implementations.
// BAD: Tight coupling
class OrderService {
constructor(private db: PostgresDatabase) {}
}
// GOOD: Loose coupling
class OrderService {
constructor(private db: Database) {} // Interface
}
Benefits:
- Easier to test (mock interface)
- Easier to swap implementations
- Components can evolve independently
4. High Cohesion
Related functionality stays together.
GOOD:
user/
- create_user.ts
- update_user.ts
- delete_user.ts
- user_repository.ts
BAD:
create/
- create_user.ts
- create_order.ts
update/
- update_user.ts
- update_order.ts
5. Explicit Over Implicit
Make dependencies and contracts clear.
// BAD: Implicit dependency
function processOrder(orderId: string) {
const db = global.database // Where does this come from?
// ...
}
// GOOD: Explicit dependency
function processOrder(
orderId: string,
db: Database,
logger: Logger
) {
// Dependencies are clear
}
6. Fail Fast
Detect and report errors early.
// BAD: Silent failure
function divide(a: number, b: number) {
if (b === 0) return 0 // Wrong!
return a / b
}
// GOOD: Fail fast
function divide(a: number, b: number) {
if (b === 0) {
throw new Error('Division by zero')
}
return a / b
}
7. Design for Testability
Make it easy to test.
// BAD: Hard to test
class OrderService {
processOrder(orderId: string) {
const db = new PostgresDatabase() // Can't mock
const api = new PaymentAPI() // Can't mock
// ...
}
}
// GOOD: Easy to test
class OrderService {
constructor(
private db: Database, // Can inject mock
private api: PaymentAPI // Can inject mock
) {}
processOrder(orderId: string) {
// ...
}
}
Common Architecture Patterns
Layered Architecture
┌─────────────────────┐
│ Presentation │ (UI, API controllers)
├─────────────────────┤
│ Business Logic │ (Domain, services)
├─────────────────────┤
│ Data Access │ (Repositories, ORMs)
├─────────────────────┤
│ Database │ (Storage)
└─────────────────────┘
When to use: Simple to moderate complexity
Hexagonal Architecture (Ports & Adapters)
┌───────────────────────┐
│ External Systems │
│ (UI, DB, APIs) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Adapters │ (Implementation)
│ (REST, PostgreSQL) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Ports │ (Interfaces)
│ (IUserRepo, IAuth) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Core Domain │ (Business logic)
│ (Pure logic) │
└───────────────────────┘
When to use: Want to isolate business logic, multiple frontends
Microservices
┌─────────┐ ┌─────────┐ ┌─────────┐
│ User │ │ Order │ │ Payment │
│ Service │ │ Service │ │ Service │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└────────────┴────────────┘
│
┌───────▼────────┐
│ Message Bus │
│ (Event-driven)│
└────────────────┘
When to use: Large team, need independent deploy, clear boundaries
Avoid when: Small team, unclear boundaries, early stage
Event-Driven Architecture
┌─────────┐ ┌─────────────┐ ┌─────────┐
│Producer │──────▶│ Event Bus │──────▶│Consumer │
└─────────┘ └─────────────┘ └─────────┘
When to use: Async processing, decoupled systems, audit trails
Architectural Patterns to Consider
System Patterns:
- Layered architecture
- Microservices vs monolith
- Event-driven architecture
- CQRS (Command Query Responsibility Segregation)
- Hexagonal architecture
Data Patterns:
- Database per service
- Shared database
- Event sourcing
- CQRS
- Cache-aside
Integration Patterns:
- API Gateway
- Service mesh
- Message queue
- Pub/sub
- GraphQL federation
Anti-Patterns
Premature Optimization
Don't optimize for scale you don't have.
BAD: Build microservices for 100 users
GOOD: Start with monolith, split when needed
Resume-Driven Architecture
Don't choose technology to pad resume.
BAD: "I want to learn Kubernetes, let's use it"
GOOD: "Kubernetes fits our scale needs"
Distributed Monolith
Microservices that are tightly coupled.
BAD: Service A can't deploy without Service B
GOOD: Services are independently deployable
Big Ball of Mud
No structure, everything depends on everything.
BAD: Any code can call any other code
GOOD: Clear layers and boundaries
Analysis Paralysis
Over-analyzing, never shipping.
BAD: Spend months on perfect architecture
GOOD: Design enough to start, iterate
Architecture Review Checklist
- Business goals clearly understood
- Functional requirements documented
- Non-functional requirements defined
- Constraints identified
- Multiple alternatives considered
- Trade-offs explicitly stated
- Component responsibilities clear
- Interfaces well-defined
- Data flow documented
- Failure scenarios planned for
- Security model defined
- Scalability addressed
- Testability designed in
- Decisions documented (ADRs)
- Implementation phases outlined
- Team can implement and maintain
- Success metrics defined
Examples
When the user says:
- "Design the architecture for our multi-tenant system"
- "How should we structure our microservices?"
- "Plan the technical approach for real-time notifications"
- "Design the data model for our marketplace"
- "Create architecture for migrating from monolith to services"
Integration with Other Skills
- Apply solid-principles - Guide component design
- Apply simplicity-principles - KISS, YAGNI
- Apply orthogonality-principle - Independent components
- Apply structural-design-principles - Composition patterns
- Use technical-planning - For implementation after design
Notes
- Use TaskCreate to track architecture design steps
- Apply all relevant design principle skills
- Create diagrams (ASCII art or reference drawing tools)
- Architecture evolves - document decisions and changes
- Consider using /plan for detailed implementation after architecture is approved
- Archive decisions as ADRs for future reference
Remember
- Simplicity first - Start simple, add complexity when needed
- Document decisions - Future you will thank you
- Consider alternatives - Never the first idea only
- State trade-offs - Every decision has consequences
- Design for change - Systems evolve
The best architecture is the one that's simple enough to ship and flexible enough to evolve.