Data Modeling Mistakes
- Design tables around queries, not entities—denormalization is mandatory, not optional
- One table per query pattern—Cassandra has no JOINs; duplicate data across tables
- Partition key determines data distribution—all rows with same partition key on same node
- Wide partitions kill performance—keep under 100MB; add time bucket to partition key if growing
Primary Key Traps
PRIMARY KEY (a, b, c):ais partition key,bandcare clustering columnsPRIMARY KEY ((a, b), c):(a, b)together is partition key—compound partition key- Clustering columns define sort order within partition—query must respect this order
- Can't query by clustering column without partition key—unlike SQL indexes
Query Restrictions
WHEREmust include full partition key—partial partition key fails unlessALLOW FILTERINGALLOW FILTERINGscans all nodes—never use in production; redesign table instead- Range queries only on last clustering column used—
WHERE a = ? AND b > ?works,WHERE a = ? AND c > ?doesn't INon partition key hits multiple nodes—expensive; prefer single partition queries
Consistency Levels
QUORUMfor most operations—majority of replicas; balances consistency and availabilityLOCAL_QUORUMfor multi-datacenter—avoids cross-DC latencyONEfor pure availability—may read stale data; fine for caches, bad for critical reads- Write + read consistency must overlap for strong consistency—
QUORUM+QUORUMsafe
Tombstones (Silent Performance Killer)
- DELETE creates a tombstone, not actual deletion—tombstones persist until compaction
- Mass deletes destroy read performance—thousands of tombstones scanned per query
- TTL also creates tombstones—don't use short TTLs with high write volume
- Check with
nodetool cfstats -H table—Tombstonecolumns show problem
Batch Misuse
- UNLOGGED BATCH is not faster—use only for atomic writes to same partition
- LOGGED BATCH for multi-partition atomicity—adds coordination overhead
- Don't batch unrelated writes—hurts coordinator; send individual async writes
- Batch size limit ~50KB—larger batches fail or timeout
Anti-Patterns
- Secondary indexes on high-cardinality columns—scatter-gather query, slow
- Secondary indexes on frequently updated columns—creates tombstones
SELECT *—always list columns; schema changes break queries- UUID as partition key without time component—random distribution, hot spots during bulk loads
Lightweight Transactions
IF NOT EXISTS/IF column = ?—uses Paxos, 4x slower than normal write- Serial consistency for LWTs—
SERIALorLOCAL_SERIAL - Don't use for counters or high-frequency updates—contention kills throughput
- Returns
[applied]boolean—must check if operation succeeded
Collections and Counters
- Sets/Lists/Maps stored with row—can't exceed 64KB, no pagination
- List prepend is anti-pattern—creates tombstones; use append or Set
- Counters require dedicated table—can't mix with regular columns
- Counter increment is not idempotent—retry may double-count
Compaction Strategies
SizeTieredCompactionStrategy(default)—good for write-heavy, uses more disk spaceLeveledCompactionStrategy—better read latency, higher write amplificationTimeWindowCompactionStrategy—for time-series with TTL; reduces tombstone overhead- Wrong strategy for workload = degraded performance over time
Operations
nodetool repairregularly—inconsistencies accumulate without repairnodetool statusshows cluster health—UN (Up Normal) is good, DN is down- Schema changes propagate eventually—wait for
nodetool describeclusterto show agreement - Rolling restarts: one node at a time, wait for UN status before next