cluster

Perform data clustering analysis using k-means and hierarchical algorithms. Use when you need to group, classify, or segment datasets.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "cluster" with this command: npx skills add ckchzh/cluster

Cluster — Data Clustering Analysis Tool

Cluster is a command-line data clustering analysis tool that supports k-means and hierarchical clustering algorithms. It reads numerical data from CSV/JSONL sources, performs clustering, evaluates cluster quality, and exports results.

Data is stored in ~/.cluster/data.jsonl as JSONL records. Each record represents a clustering run with its parameters, assignments, centroids, and evaluation metrics.

Prerequisites

  • Python 3.8+ with standard library (no external packages required for basic operations)
  • bash shell

Commands

run

Run a clustering algorithm on input data.

Environment Variables:

  • INPUT (required) — Path to input CSV/JSONL file with numerical data
  • K — Number of clusters (default: 3)
  • ALGORITHM — Algorithm to use: kmeans or hierarchical (default: kmeans)
  • MAX_ITER — Maximum iterations for k-means (default: 100)
  • SEED — Random seed for reproducibility

Example:

INPUT=/path/to/data.csv K=5 ALGORITHM=kmeans bash scripts/script.sh run

assign

Assign new data points to existing clusters from a previous run.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run to use
  • INPUT (required) — Path to new data points (CSV/JSONL)

Example:

RUN_ID=abc123 INPUT=/path/to/new_data.csv bash scripts/script.sh assign

centroids

Display or export centroid coordinates for a clustering run.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run
  • FORMAT — Output format: table, json, csv (default: table)

evaluate

Evaluate clustering quality with silhouette score, inertia, and Davies-Bouldin index.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run to evaluate

visualize

Generate a text-based or ASCII visualization of cluster assignments.

Environment Variables:

  • RUN_ID (required) — ID of the clustering run
  • DIMS — Dimensions to plot, comma-separated (default: first two)

export

Export clustering results to a file.

Environment Variables:

  • RUN_ID (required) — ID of the run to export
  • OUTPUT — Output file path (default: stdout)
  • FORMAT — Export format: json, csv, jsonl (default: json)

import

Import a previously exported clustering run.

Environment Variables:

  • INPUT (required) — Path to the file to import

config

View or update configuration settings.

Environment Variables:

  • KEY — Configuration key to set
  • VALUE — Configuration value

list

List all stored clustering runs with summary info.

Environment Variables:

  • LIMIT — Maximum runs to display (default: 20)
  • SORT — Sort field: date, k, score (default: date)

stats

Show aggregate statistics across all clustering runs.

help

Display usage information and available commands.

version

Display the current version of the cluster tool.

Data Storage

All clustering runs are stored in ~/.cluster/data.jsonl. Each line is a JSON object with fields:

  • id — Unique run identifier
  • timestamp — ISO 8601 creation time
  • algorithm — Algorithm used
  • k — Number of clusters
  • centroids — List of centroid coordinates
  • assignments — Mapping of data point indices to cluster IDs
  • metrics — Evaluation metrics (silhouette, inertia, etc.)
  • input_file — Source data file path
  • num_points — Number of data points clustered

Configuration

Config is stored in ~/.cluster/config.json. Available keys:

  • default_k — Default number of clusters (default: 3)
  • default_algorithm — Default algorithm (default: kmeans)
  • max_iterations — Default max iterations (default: 100)
  • random_seed — Default random seed (default: 42)

Powered by BytesAgain | bytesagain.com | hello@bytesagain.com

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

Scan

The Universal Perceptual Interface for Autonomous Agents. Multi-modal deep-scan technology for telemetry, biometric data, and high-density information extrac...

Registry SourceRecently Updated
3560Profile unavailable
Research

Data Intelligence

综合数据智能平台 - 整合 Apify 云端爬虫、PinchTab 浏览器自动化、内容分析与数据工作流。支持 55+ 平台的网络爬虫、线索生成、电商情报、竞品分析、趋势研究,以及浏览器自动化测试和数据提取。

Registry SourceRecently Updated
3371Profile unavailable
Research

Quant Tools 1.0.0

学术导向量化研究工具集。包含7大核心库(因子分析、组合优化、AI增强、因果验证、衍生品定价、回测引擎、情感分析)和5大投研工具(VeighNa交易框架、Qlib AI投研、WTP高性能框架、AkShare数据接口、JupyterHub研究环境)。适用于策略研发、因子挖掘、论文复现、资产配置、API服务化等投研任务...

Registry SourceRecently Updated
1470Profile unavailable
Research

HUDC Bidding Information Capture

Intelligently analyzes SGCC bidding documents in Word/PDF/Excel formats, extracting 23 key fields with automated qualification backfilling and deadline highl...

Registry SourceRecently Updated
1371Profile unavailable