altinity-expert-clickhouse-merges

Run all queries from checks.sql (cluster-wide) one by one and produce a decision-ready report.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "altinity-expert-clickhouse-merges" with this command: npx skills add altinity/skills/altinity-skills-altinity-expert-clickhouse-merges

Diagnostics

Run all queries from checks.sql (cluster-wide) one by one and produce a decision-ready report.

Triage Order (Mandatory)

  • Check current active merges from system.merges .

  • Check merge success/failure trend from system.part_log .

  • Check table-level status (merge_ok , merge_failed , last success/failure).

  • Check merge reason + algorithm (merge_reason , merge_algorithm ).

  • Check merge RAM now + historical peak RAM (system.merges.memory_usage , system.part_log.peak_memory_usage ).

  • Check part-count offenders with split:

  • database = 'system'

  • database != 'system'

  • Check relevant settings, including TTL merge concurrency.

Decision Rules

Use one of these final verdicts explicitly:

  • PROVED : cluster-wide merge stop (no successful merges in the selected window).

  • DECLINED : cluster-wide stop is false.

  • PARTIAL : merges are blocked for specific table(s) while others still merge.

Additional rules:

  • If some tables still have successful merges in same timeframe, do not report global merge stop.

  • If a target table has merge_ok = 0 with repeated MEMORY_LIMIT_EXCEEDED , report table-level block.

  • If merges are 100% Horizontal , state that planner selected horizontal merges (do not say vertical is disabled unless settings prove it).

  • If max part count is driven by system.* tables, call out alert-source mismatch to avoid misattribution to business tables.

Problem-Specific Investigation

"Too Many Parts" Investigation

For a specific table, run ad-hoc checks (time-bound and limited):

select toStartOfMinute(event_time) as minute, countIf(event_type = 'NewPart') as new_parts, countIf(event_type = 'MergeParts') as merges, countIf(event_type = 'MergeParts') - countIf(event_type = 'NewPart') as net_reduction from system.part_log where database = '{database}' and table = '{table}' and event_time > now() - interval 1 hour group by minute order by minute desc limit 60

If net_reduction is negative consistently, inserts outpace merges.

TTL merge pressure and merge-size settings snapshot

Check system.merge_tree_settings if modified Suggest changing (reducing or increasing) in case of a problem as remediation.

  • max_parts_to_merge_at_once

  • max_bytes_to_merge_at_max_space_in_pool

  • max_bytes_to_merge_at_min_space_in_pool

  • enable_vertical_merge_algorithm

  • vertical_merge_algorithm_min_rows_to_activate

  • vertical_merge_algorithm_min_columns_to_activate

  • max_number_of_merges_with_ttl_in_pool

  • max_replicated_merges_with_ttl_in_queue

  • parts_to_delay_insert

  • parts_to_throw_insert

Structural Fix Guidance (When Settings Are Not Enough)

Call out anti-patterns explicitly:

  • Single hot partition (partition_id='all' )

  • Heavy TTL ... GROUP BY ... SET ... on a hot ingestion table

  • Persistent large horizontal merge attempts with OOM failures

Recommended long-term direction:

  • Add time-based partitioning

  • Move heavy rollup logic from TTL path to MV/batch table

  • Keep base-table TTL simple (delete-oriented)

Ad-Hoc Query Guidelines

Required Safeguards

-- Always include LIMIT limit 100

-- Always time-bound historical queries where event_time >= now() - interval 24 hour

-- For part_log, always filter event_type where event_type in ('NewPart', 'MergeParts', 'MutatePart')

Avoid

  • select * from system.part_log

  • Unbounded scans on *_log tables

  • Large joins in-context (aggregate in SQL)

Cross-Module Triggers

Finding Load Module Reason

High memory during merges / OOM altinity-expert-clickhouse-memory

Memory limits and pressure

Slow merges + normal disk altinity-expert-clickhouse-schema

ORDER BY/partitioning anti-patterns

Slow merges + high disk IO altinity-expert-clickhouse-storage

Storage bottleneck

Merges blocked by mutations altinity-expert-clickhouse-mutations

Mutation backlog

Replication lag + merge issues altinity-expert-clickhouse-replication

Queue/replica bottlenecks

Final Report Sections (Mandatory)

  • Environment header

  • Global vs table-specific merge status

  • Current and peak merge RAM

  • Merge reason and merge algorithm findings

  • Max-part offenders (system vs non-system )

  • Current settings and recommended deltas

  • Immediate mitigation and structural remediation

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

altinity-expert-clickhouse-schema

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

altinity-expert-clickhouse-logs

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

altinity-expert-clickhouse-ingestion

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

altinity-expert-clickhouse-dictionaries

No summary provided by upstream source.

Repository SourceNeeds Review