clickhouse-best-practices-ts-py

ClickHouse Best Practices (TypeScript, Python)

ClickHouse data modeling best practices extended with MooseStack TypeScript and Python examples. 28 rules across 3 categories (schema, query, insert), prioritized by impact.

MooseStack docs: MooseStack Documentation ClickHouse docs: ClickHouse Best Practices

IMPORTANT: How to Apply This Skill

Before answering MooseStack/ClickHouse questions, follow this priority order:

Check for applicable rules in the rules/ directory
If rules exist: Apply them and cite them in your response using "Per rule-name ..."
Apply MooseStack patterns: Each rule includes TypeScript and Python examples for data models, OlapTables, and APIs
If no rule exists: Use MooseStack or ClickHouse documentation
If uncertain: Use web search for current best practices
Always cite your source: rule name, "general MooseStack guidance", or URL

Why rules take priority: ClickHouse has specific behaviors (columnar storage, sparse indexes, merge tree mechanics) where general database intuition can be misleading. The rules encode validated guidance with MooseStack-specific patterns for TypeScript and Python.

For Formal Reviews

When performing a formal review of MooseStack data models, schemas, queries, or data ingestion:

Review Procedures

For MooseStack Data Model Reviews (TypeScript/Python)

Read these rule files in order:

rules/schema-types-native-types.md
Type annotation mapping
rules/schema-types-minimize-bitwidth.md
Use UInt8 , UInt16 etc.
rules/schema-types-lowcardinality.md
string & LowCardinality annotation
rules/schema-types-avoid-nullable.md
ClickHouseDefault instead of ?
rules/schema-pk-cardinality-order.md
orderByFields ordering
rules/schema-partition-lifecycle.md
partitionByField and TTL

Check for:

Proper type annotations (Key<T> , UInt64 , LowCardinality , etc.)
OlapTable has orderByFields in low-to-high cardinality order
Optional fields use ClickHouseDefault instead of ? where possible
engine specified for update/delete patterns (ReplacingMergeTree, etc.)
indexes for non-ORDER BY filter columns

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

rules/schema-pk-plan-before-creation.md
ORDER BY is immutable
rules/schema-pk-cardinality-order.md
Column ordering in keys
rules/schema-pk-prioritize-filters.md
Filter column inclusion
rules/schema-types-native-types.md
Proper type selection
rules/schema-types-minimize-bitwidth.md
Numeric type sizing
rules/schema-types-lowcardinality.md
LowCardinality usage
rules/schema-types-avoid-nullable.md
Nullable vs DEFAULT
rules/schema-partition-low-cardinality.md
Partition count limits
rules/schema-partition-lifecycle.md
Partitioning purpose

Check for:

PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
Data types match actual data ranges
LowCardinality applied to appropriate string columns
Partition key cardinality bounded (100-1,000 values)
ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

rules/query-join-choose-algorithm.md
Algorithm selection
rules/query-join-filter-before.md
Pre-join filtering
rules/query-join-use-any.md
ANY vs regular JOIN
rules/query-index-skipping-indices.md
Secondary index usage
rules/schema-pk-filter-on-orderby.md
Filter alignment with ORDER BY

Check for:

Filters use ORDER BY prefix columns
JOINs filter tables before joining (not after)
Correct JOIN algorithm for table sizes
Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

rules/insert-batch-size.md
Batch sizing requirements
rules/insert-mutation-avoid-update.md
UPDATE alternatives
rules/insert-mutation-avoid-delete.md
DELETE alternatives
rules/insert-async-small-batches.md
Async insert usage
rules/insert-optimize-avoid-final.md
OPTIMIZE TABLE risks

Check for:

Batch size 10K-100K rows per INSERT
No ALTER TABLE UPDATE for frequent changes
ReplacingMergeTree or CollapsingMergeTree for update patterns
Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

Rules Checked

rule-name-1 - Compliant / Violation found
rule-name-2 - Compliant / Violation found ...

Findings

Violations

rule-name: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]

Compliant

rule-name: Brief note on why it's correct

Recommendations

[Prioritized list of changes, citing rules]

Rule Categories by Priority

Priority Category Impact Prefix Rule Count

1 Primary Key Selection CRITICAL schema-pk-

2 Data Type Selection CRITICAL schema-types-

3 JOIN Optimization CRITICAL query-join-

4 Insert Batching CRITICAL insert-batch-

5 Mutation Avoidance CRITICAL insert-mutation-

6 Partitioning Strategy HIGH schema-partition-

7 Skipping Indices HIGH query-index-

8 Materialized Views HIGH query-mv-

9 Async Inserts HIGH insert-async-

10 OPTIMIZE Avoidance HIGH insert-optimize-

11 JSON Usage MEDIUM schema-json-

Quick Reference

Schema Design - Primary Key (CRITICAL)

schema-pk-plan-before-creation
Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order
Order columns low-to-high cardinality
schema-pk-prioritize-filters
Include frequently filtered columns
schema-pk-filter-on-orderby
Query filters must use ORDER BY prefix

Schema Design - Data Types (CRITICAL)

schema-types-native-types
Use native types, not String for everything
schema-types-minimize-bitwidth
Use smallest numeric type that fits
schema-types-lowcardinality
LowCardinality for <10K unique strings
schema-types-enum
Enum for finite value sets with validation
schema-types-avoid-nullable
Avoid Nullable; use DEFAULT instead

Schema Design - Partitioning (HIGH)

schema-partition-low-cardinality
Keep partition count 100-1,000
schema-partition-lifecycle
Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs
Understand partition pruning trade-offs
schema-partition-start-without
Consider starting without partitioning

Schema Design - JSON (MEDIUM)

schema-json-when-to-use
JSON for dynamic schemas; typed columns for known

Query Optimization - JOINs (CRITICAL)

query-join-choose-algorithm
Select algorithm based on table sizes
query-join-use-any
ANY JOIN when only one match needed
query-join-filter-before
Filter tables before joining
query-join-consider-alternatives
Dictionaries/denormalization vs JOIN
query-join-null-handling
join_use_nulls=0 for default values

Query Optimization - Indices (HIGH)

query-index-skipping-indices
Skipping indices for non-ORDER BY filters

Query Optimization - Materialized Views (HIGH)

query-mv-incremental
Incremental MVs for real-time aggregations
query-mv-refreshable
Refreshable MVs for complex joins

Insert Strategy - Batching (CRITICAL)

insert-batch-size
Batch 10K-100K rows per INSERT

Insert Strategy - Async (HIGH)

insert-async-small-batches
Async inserts for high-frequency small batches
insert-format-native
Native format for best performance

Insert Strategy - Mutations (CRITICAL)

insert-mutation-avoid-update
ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete
Lightweight DELETE or DROP PARTITION

Insert Strategy - Optimization (HIGH)

insert-optimize-avoid-final
Let background merges work

When to Apply

This skill activates when you encounter:

MooseStack Triggers:

MooseStack data model definitions (TypeScript interfaces or Python Pydantic models)
OlapTable configuration
IngestPipeline setup
MaterializedView definitions
Api query handlers with ClickHouse queries
Type annotation questions (Key , LowCardinality , UInt64 , etc.)

ClickHouse Triggers:

CREATE TABLE statements
ALTER TABLE modifications
ORDER BY or PRIMARY KEY discussions
Data type selection questions
Slow query troubleshooting
JOIN optimization requests
Data ingestion pipeline design
Update/delete strategy questions
ReplacingMergeTree or other specialized engine usage
Partitioning strategy decisions

Rule File Structure

Each rule file in rules/ contains:

YAML frontmatter: title, impact level, tags
Brief explanation: Why this rule matters
Incorrect example: Anti-pattern with SQL explanation
Correct example: Best practice with SQL explanation
MooseStack examples: TypeScript and Python code showing how to implement the pattern
Data model type annotations
OlapTable configuration
API query patterns where relevant
Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.