DDIA Principles

Apply Kleppmann's principles to make informed decisions about data systems.

Core Framework: The Three Concerns

Every data system decision maps to these concerns:

Reliability - System works correctly despite faults
Scalability - System handles growth in data, traffic, or complexity
Maintainability - System remains operable and evolvable over time

When reviewing designs or making technology choices, evaluate against all three.

Quick Decision Patterns

Database Selection

Need	Start With	Graduate To
General CRUD, transactions	PostgreSQL	PostgreSQL (it scales further than you think)
Document-shaped data, flexible schema	PostgreSQL JSONB	MongoDB if document model is primary
Key-value, caching	Redis	Redis Cluster
Full-text search	PostgreSQL FTS	Elasticsearch when FTS is primary workload
Time-series, metrics	TimescaleDB	ClickHouse at scale
Graph relationships	PostgreSQL + recursive CTEs	Neo4j when traversals dominate

Default recommendation: PostgreSQL. It handles more use cases than people realize. Only move to specialized databases when PostgreSQL becomes the bottleneck for a specific workload.

Consistency vs Availability

Use this when designing distributed features:

Strong consistency needed?
├── Yes (money, inventory, unique constraints)
│   └── Use transactions, accept higher latency
│   └── Single leader or consensus protocol
└── No (feeds, caches, analytics)
    └── Eventual consistency acceptable
    └── Can use multi-leader or leaderless

When to Add a Message Queue

Add a queue when:

Tasks take >100ms and user doesn't need immediate result
Need to decouple producers from consumers
Need retry logic with backoff
Processing can be delayed during load spikes

Don't add a queue just because "microservices." A direct function call or HTTP request is simpler when synchronous processing works.

Architecture Review Checklist

When reviewing a data system design, ask:

Data Model

Does the data model match how data is queried? (Not just how it's structured logically)
Are relationships handled appropriately? (Normalize for integrity, denormalize for read performance)
Is there a clear schema evolution strategy?

Reliability

What happens when the database is unavailable?
What happens when a downstream service times out?
Are writes idempotent where possible?
Is there a backup/restore strategy?

Consistency

What consistency guarantees does the system actually need?
Where are the transaction boundaries?
What happens during partial failures?

Scalability

What's the expected data growth rate?
Which operations will become slow first?
Are there natural partition keys if sharding becomes needed?

Code Review Lens

When reviewing code that handles data:

Red Flags

Read-modify-write without transactions or optimistic locking
Assuming network calls will succeed
Silent data loss on errors
Unbounded queries without pagination
N+1 query patterns
Mixing business logic with data access in ways that prevent batching

Patterns to Encourage

Explicit transaction boundaries
Retry logic with exponential backoff
Idempotency keys for mutations
Cursor-based pagination for large datasets
Bulk operations where applicable

Technology Trade-offs Reference

For detailed analysis of specific technology choices, see:

references/storage-engines.md - LSM vs B-tree, when each shines
references/replication.md - Leader-based vs leaderless, consistency models
references/encoding.md - JSON vs binary formats, schema evolution

Common Anti-Patterns

"We need microservices"

Before splitting into services, ask: Is the complexity of distributed transactions worth it? Monoliths with good module boundaries often serve startups better.

"Let's use Kafka"

Kafka is powerful but operationally complex. For most startups: PostgreSQL LISTEN/NOTIFY, Redis Streams, or a managed queue (SQS, Cloud Pub/Sub) are simpler starting points.

"We'll just cache everything"

Caching adds complexity: invalidation, consistency, cold starts. First optimize queries, add indexes, denormalize read models. Cache as a last resort.

"NoSQL for scale"

Modern PostgreSQL with proper indexing handles more than most startups will ever need. Choose NoSQL for data model fit, not scale anxiety.

Practical Guidance

For New Systems

Start with PostgreSQL
Use transactions for data integrity
Add caching/queues only when measured need arises
Design for the data access patterns you have, not ones you might have

For Growing Systems

Profile before optimizing
Vertical scaling is simpler than horizontal—use it first
Partition by natural boundaries when needed
Consider read replicas before complex architectures

For System Rewrites

Strangler fig pattern over big bang
Keep data in sync during migration
Verify with shadow reads/writes
Roll back capability is essential

ddia-principles

Safety Notice

Copy this and send it to your AI assistant to learn