MooseStack ClickHouse Best Practices
Comprehensive guidance for ClickHouse in MooseStack applications covering data model design, query optimization, and data ingestion. Contains 28 rules across 3 main categories (schema, query, insert), prioritized by impact. Each rule includes MooseStack TypeScript and Python examples.
MooseStack docs: MooseStack Documentation ClickHouse docs: ClickHouse Best Practices
IMPORTANT: How to Apply This Skill
Before answering MooseStack/ClickHouse questions, follow this priority order:
-
Check for applicable rules in the rules/ directory
-
If rules exist: Apply them and cite them in your response using "Per rule-name ..."
-
Apply MooseStack patterns: Each rule includes TypeScript and Python examples for data models, OlapTables, and APIs
-
If no rule exists: Use MooseStack or ClickHouse documentation
-
If uncertain: Use web search for current best practices
-
Always cite your source: rule name, "general MooseStack guidance", or URL
Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated guidance with MooseStack-specific patterns for TypeScript and Python.
For Formal Reviews
When performing a formal review of MooseStack data models, schemas, queries, or data ingestion:
Review Procedures
For MooseStack Data Model Reviews (TypeScript/Python)
Read these rule files in order:
-
rules/schema-types-native-types.md
-
Type annotation mapping
-
rules/schema-types-minimize-bitwidth.md
-
Use UInt8 , UInt16 etc.
-
rules/schema-types-lowcardinality.md
-
string & LowCardinality annotation
-
rules/schema-types-avoid-nullable.md
-
ClickHouseDefault instead of ?
-
rules/schema-pk-cardinality-order.md
-
orderByFields ordering
-
rules/schema-partition-lifecycle.md
-
partitionByField and TTL
Check for:
-
Proper type annotations (Key<T> , UInt64 , LowCardinality , etc.)
-
OlapTable has orderByFields in low-to-high cardinality order
-
Optional fields use ClickHouseDefault instead of ? where possible
-
engine specified for update/delete patterns (ReplacingMergeTree, etc.)
-
indexes for non-ORDER BY filter columns
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
-
rules/schema-pk-plan-before-creation.md
-
ORDER BY is immutable
-
rules/schema-pk-cardinality-order.md
-
Column ordering in keys
-
rules/schema-pk-prioritize-filters.md
-
Filter column inclusion
-
rules/schema-types-native-types.md
-
Proper type selection
-
rules/schema-types-minimize-bitwidth.md
-
Numeric type sizing
-
rules/schema-types-lowcardinality.md
-
LowCardinality usage
-
rules/schema-types-avoid-nullable.md
-
Nullable vs DEFAULT
-
rules/schema-partition-low-cardinality.md
-
Partition count limits
-
rules/schema-partition-lifecycle.md
-
Partitioning purpose
Check for:
-
PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
-
Data types match actual data ranges
-
LowCardinality applied to appropriate string columns
-
Partition key cardinality bounded (100-1,000 values)
-
ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
-
rules/query-join-choose-algorithm.md
-
Algorithm selection
-
rules/query-join-filter-before.md
-
Pre-join filtering
-
rules/query-join-use-any.md
-
ANY vs regular JOIN
-
rules/query-index-skipping-indices.md
-
Secondary index usage
-
rules/schema-pk-filter-on-orderby.md
-
Filter alignment with ORDER BY
Check for:
-
Filters use ORDER BY prefix columns
-
JOINs filter tables before joining (not after)
-
Correct JOIN algorithm for table sizes
-
Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
-
rules/insert-batch-size.md
-
Batch sizing requirements
-
rules/insert-mutation-avoid-update.md
-
UPDATE alternatives
-
rules/insert-mutation-avoid-delete.md
-
DELETE alternatives
-
rules/insert-async-small-batches.md
-
Async insert usage
-
rules/insert-optimize-avoid-final.md
-
OPTIMIZE TABLE risks
Check for:
-
Batch size 10K-100K rows per INSERT
-
No ALTER TABLE UPDATE for frequent changes
-
ReplacingMergeTree or CollapsingMergeTree for update patterns
-
Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
Rules Checked
rule-name-1- Compliant / Violation foundrule-name-2- Compliant / Violation found ...
Findings
Violations
rule-name: Description of the issue- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
Compliant
rule-name: Brief note on why it's correct
Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
Priority Category Impact Prefix Rule Count
1 Primary Key Selection CRITICAL schema-pk-
4
2 Data Type Selection CRITICAL schema-types-
5
3 JOIN Optimization CRITICAL query-join-
5
4 Insert Batching CRITICAL insert-batch-
1
5 Mutation Avoidance CRITICAL insert-mutation-
2
6 Partitioning Strategy HIGH schema-partition-
4
7 Skipping Indices HIGH query-index-
1
8 Materialized Views HIGH query-mv-
2
9 Async Inserts HIGH insert-async-
2
10 OPTIMIZE Avoidance HIGH insert-optimize-
1
11 JSON Usage MEDIUM schema-json-
1
Quick Reference
Schema Design - Primary Key (CRITICAL)
-
schema-pk-plan-before-creation
-
Plan ORDER BY before table creation (immutable)
-
schema-pk-cardinality-order
-
Order columns low-to-high cardinality
-
schema-pk-prioritize-filters
-
Include frequently filtered columns
-
schema-pk-filter-on-orderby
-
Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
-
schema-types-native-types
-
Use native types, not String for everything
-
schema-types-minimize-bitwidth
-
Use smallest numeric type that fits
-
schema-types-lowcardinality
-
LowCardinality for <10K unique strings
-
schema-types-enum
-
Enum for finite value sets with validation
-
schema-types-avoid-nullable
-
Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
-
schema-partition-low-cardinality
-
Keep partition count 100-1,000
-
schema-partition-lifecycle
-
Use partitioning for data lifecycle, not queries
-
schema-partition-query-tradeoffs
-
Understand partition pruning trade-offs
-
schema-partition-start-without
-
Consider starting without partitioning
Schema Design - JSON (MEDIUM)
- schema-json-when-to-use
- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
-
query-join-choose-algorithm
-
Select algorithm based on table sizes
-
query-join-use-any
-
ANY JOIN when only one match needed
-
query-join-filter-before
-
Filter tables before joining
-
query-join-consider-alternatives
-
Dictionaries/denormalization vs JOIN
-
query-join-null-handling
-
join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
- query-index-skipping-indices
- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
-
query-mv-incremental
-
Incremental MVs for real-time aggregations
-
query-mv-refreshable
-
Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
- insert-batch-size
- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
-
insert-async-small-batches
-
Async inserts for high-frequency small batches
-
insert-format-native
-
Native format for best performance
Insert Strategy - Mutations (CRITICAL)
-
insert-mutation-avoid-update
-
ReplacingMergeTree instead of ALTER UPDATE
-
insert-mutation-avoid-delete
-
Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
- insert-optimize-avoid-final
- Let background merges work
When to Apply
This skill activates when you encounter:
MooseStack Triggers:
-
MooseStack data model definitions (TypeScript interfaces or Python Pydantic models)
-
OlapTable configuration
-
IngestPipeline setup
-
MaterializedView definitions
-
Api query handlers with ClickHouse queries
-
Type annotation questions (Key , LowCardinality , UInt64 , etc.)
ClickHouse Triggers:
-
CREATE TABLE statements
-
ALTER TABLE modifications
-
ORDER BY or PRIMARY KEY discussions
-
Data type selection questions
-
Slow query troubleshooting
-
JOIN optimization requests
-
Data ingestion pipeline design
-
Update/delete strategy questions
-
ReplacingMergeTree or other specialized engine usage
-
Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
-
YAML frontmatter: title, impact level, tags
-
Brief explanation: Why this rule matters
-
Incorrect example: Anti-pattern with SQL explanation
-
Correct example: Best practice with SQL explanation
-
MooseStack examples: TypeScript and Python code showing how to implement the pattern
-
Data model type annotations
-
OlapTable configuration
-
API query patterns where relevant
-
Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.