Dataset Splitter

# Dataset Splitter

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Dataset Splitter" with this command: npx skills add Mingo-318/dataset-splitter

Dataset Splitter

Split image datasets into train/val/test sets. Supports random split, stratified split, and custom ratios. Use when user needs to split dataset for machine learning training.

Features

  • Random Split: Randomly shuffle and split
  • Stratified Split: Maintain class distribution
  • Custom Ratios: Configurable train/val/test ratios
  • Annotation Support: Split images and corresponding annotations together
  • YOLO Format: Generate YOLO format dataset structure
  • Reproducible: Set random seed for reproducibility

Usage

# Simple split (80/10/10)
python scripts/splitter.py split /path/to/images/ --ratios 80 10 10

# With annotations
python scripts/splitter.py split /path/to/images/ --annotations /path/to/labels/

# YOLO format output
python scripts/splitter.py split /path/to/images/ --output /path/to/dataset/ --yolo

# Stratified by class
python scripts/splitter.py split /path/to/images/ --annotations labels/ --stratify

Examples

$ python scripts/splitter.py split ./images --ratios 80 10 10

Splitting dataset...
Total images: 1000
Train: 800 (80%)
Val: 100 (10%)
Test: 100 (10%)

✓ Created train/ (800 images)
✓ Created val/ (100 images)
✓ Created test/ (100 images)

Installation

pip install pillow

Options

  • --ratios: Split ratios (train val test), default: 80 10 10
  • --seed: Random seed for reproducibility
  • --annotations: Path to annotations (will be split together)
  • --output: Output directory
  • --yolo: Output in YOLO dataset format
  • --stratify: Maintain class distribution
  • --copy: Copy files instead of moving

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Sendflare

通过 Sendflare SDK 发送带附件的电子邮件,管理联系人列表,支持 CC/BCC 和安全 API 认证。

Registry SourceRecently Updated
General

Playtomic - Book courts using padel-tui

This skill should be used when the user asks to "book a padel court", "find available padel courts", "search padel courts near me", "reserve a Playtomic cour...

Registry SourceRecently Updated
General

Fund Keeper

国内场外基金智能顾问 + 股票行情查询。实时估值、买卖建议、收益统计、定投计划、OCR 识图、股票 - 基金联动。支持离线模式、多数据源缓存。

Registry SourceRecently Updated