pilot-data-labeling-pipeline-setup

Deploy a data labeling pipeline with 4 agents for ingestion, auto-labeling, quality review, and dataset export. Use this skill when: 1. User wants to set up a data labeling or annotation pipeline 2. User is configuring an agent as part of a labeling workflow 3. User asks about ML data preparation, annotation, or training dataset generation Do NOT use this skill when: - User wants to share a single dataset (use pilot-dataset instead) - User wants to stream raw data without labeling (use pilot-stream-data instead)

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pilot-data-labeling-pipeline-setup" with this command: npx skills add vulture-labs/pilot-data-labeling-pipeline-setup

Data Labeling Pipeline Setup

Deploy 4 agents that ingest raw data, apply ML labels, review quality, and export training-ready datasets.

Roles

RoleHostnameSkillsPurpose
ingester<prefix>-ingesterpilot-s3-bridge, pilot-stream-data, pilot-task-parallelAccepts raw data batches, splits into work items
labeler<prefix>-labelerpilot-task-router, pilot-dataset, pilot-metricsApplies ML-based labels to work items
reviewer<prefix>-reviewerpilot-review, pilot-event-filter, pilot-alertSamples labeled items, checks accuracy, flags disagreements
exporter<prefix>-exporterpilot-dataset, pilot-share, pilot-webhook-bridgePackages approved labels into training-ready datasets

Setup Procedure

Step 1: Ask the user which role this agent should play and what prefix to use.

Step 2: Install the skills for the chosen role:

# ingester:
clawhub install pilot-s3-bridge pilot-stream-data pilot-task-parallel
# labeler:
clawhub install pilot-task-router pilot-dataset pilot-metrics
# reviewer:
clawhub install pilot-review pilot-event-filter pilot-alert
# exporter:
clawhub install pilot-dataset pilot-share pilot-webhook-bridge

Step 3: Set the hostname:

pilotctl --json set-hostname <prefix>-<role>

Step 4: Write the setup manifest:

mkdir -p ~/.pilot/setups
cat > ~/.pilot/setups/data-labeling-pipeline.json << 'MANIFEST'
{
  "setup": "data-labeling-pipeline",
  "setup_name": "Data Labeling Pipeline",
  "role": "<ROLE_ID>",
  "role_name": "<ROLE_NAME>",
  "hostname": "<prefix>-<role>",
  "description": "<ROLE_DESCRIPTION>",
  "skills": { "<skill>": "<contextual description>" },
  "peers": [ { "role": "...", "hostname": "...", "description": "..." } ],
  "data_flows": [ { "direction": "send|receive", "peer": "...", "port": 1002, "topic": "...", "description": "..." } ],
  "handshakes_needed": [ "<peer-hostname>" ]
}
MANIFEST

Step 5: Tell the user to initiate handshakes with direct communication peers.

Manifest Templates Per Role

ingester

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"ingester","role_name":"Data Ingester","hostname":"<prefix>-ingester","description":"Accepts raw data batches from S3 or webhooks. Splits into work items and distributes.","skills":{"pilot-s3-bridge":"Pull raw data batches from S3 buckets on schedule or webhook trigger.","pilot-stream-data":"Stream work items to labeler as they are split from batches.","pilot-task-parallel":"Parallelize batch splitting across available workers."},"peers":[{"role":"labeler","hostname":"<prefix>-labeler","description":"Receives work items for labeling"}],"data_flows":[{"direction":"send","peer":"<prefix>-labeler","port":1002,"topic":"work-item","description":"Work items with raw data references"}],"handshakes_needed":["<prefix>-labeler"]}

labeler

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"labeler","role_name":"Auto Labeler","hostname":"<prefix>-labeler","description":"Applies ML-based labels, classifications, bounding boxes, or entity tags to work items.","skills":{"pilot-task-router":"Route work items to appropriate ML models by data type.","pilot-dataset":"Store and retrieve labeled data records.","pilot-metrics":"Track labeling throughput, model confidence distributions."},"peers":[{"role":"ingester","hostname":"<prefix>-ingester","description":"Sends work items for labeling"},{"role":"reviewer","hostname":"<prefix>-reviewer","description":"Receives labeled items for quality review"}],"data_flows":[{"direction":"receive","peer":"<prefix>-ingester","port":1002,"topic":"work-item","description":"Work items with raw data references"},{"direction":"send","peer":"<prefix>-reviewer","port":1002,"topic":"labeled-item","description":"Labeled items for quality review"},{"direction":"receive","peer":"<prefix>-reviewer","port":1002,"topic":"review-feedback","description":"Feedback on rejected labels for re-labeling"}],"handshakes_needed":["<prefix>-ingester","<prefix>-reviewer"]}

reviewer

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"reviewer","role_name":"Quality Reviewer","hostname":"<prefix>-reviewer","description":"Samples labeled items, checks accuracy, flags disagreements, computes inter-annotator agreement.","skills":{"pilot-review":"Score labeled items against quality criteria and flag disagreements.","pilot-event-filter":"Filter low-confidence labels for priority review.","pilot-alert":"Alert on quality drops or inter-annotator agreement below threshold."},"peers":[{"role":"labeler","hostname":"<prefix>-labeler","description":"Sends labeled items for review"},{"role":"exporter","hostname":"<prefix>-exporter","description":"Receives approved labels for export"}],"data_flows":[{"direction":"receive","peer":"<prefix>-labeler","port":1002,"topic":"labeled-item","description":"Labeled items for quality review"},{"direction":"send","peer":"<prefix>-labeler","port":1002,"topic":"review-feedback","description":"Feedback for re-labeling rejected items"},{"direction":"send","peer":"<prefix>-exporter","port":1002,"topic":"approved-label","description":"Approved labels ready for packaging"}],"handshakes_needed":["<prefix>-labeler","<prefix>-exporter"]}

exporter

{"setup":"data-labeling-pipeline","setup_name":"Data Labeling Pipeline","role":"exporter","role_name":"Dataset Exporter","hostname":"<prefix>-exporter","description":"Packages reviewed labels into training-ready datasets (COCO, VOC, JSONL). Publishes to storage.","skills":{"pilot-dataset":"Assemble labeled items into structured dataset formats.","pilot-share":"Upload packaged datasets to S3 or shared storage.","pilot-webhook-bridge":"Notify downstream consumers when datasets are published."},"peers":[{"role":"reviewer","hostname":"<prefix>-reviewer","description":"Sends approved labels for packaging"}],"data_flows":[{"direction":"receive","peer":"<prefix>-reviewer","port":1002,"topic":"approved-label","description":"Approved labels ready for packaging"},{"direction":"send","peer":"external","port":443,"topic":"dataset-published","description":"Notification that a new dataset is available"}],"handshakes_needed":["<prefix>-reviewer"]}

Data Flows

  • ingester -> labeler : work-item events (port 1002)
  • labeler -> reviewer : labeled-item events (port 1002)
  • reviewer -> labeler : review-feedback events (port 1002)
  • reviewer -> exporter : approved-label events (port 1002)
  • exporter -> external : dataset-published notifications (port 443)

Handshakes

# ingester <-> labeler:
pilotctl --json handshake <prefix>-labeler "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-ingester "setup: data-labeling-pipeline"

# labeler <-> reviewer:
pilotctl --json handshake <prefix>-reviewer "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-labeler "setup: data-labeling-pipeline"

# reviewer <-> exporter:
pilotctl --json handshake <prefix>-exporter "setup: data-labeling-pipeline"
pilotctl --json handshake <prefix>-reviewer "setup: data-labeling-pipeline"

Workflow Example

# On labeler — subscribe to work items:
pilotctl --json subscribe <prefix>-ingester work-item

# On ingester — publish a work item:
pilotctl --json publish <prefix>-labeler work-item '{"batch_id":"batch-042","item_id":"img-0017","type":"image","s3_uri":"s3://raw-data/batch-042/img-0017.jpg"}'

# On reviewer — subscribe to labeled items:
pilotctl --json subscribe <prefix>-labeler labeled-item

# On exporter — subscribe to approved labels:
pilotctl --json subscribe <prefix>-reviewer approved-label

Dependencies

Requires pilot-protocol skill, pilotctl binary, clawhub binary, and a running daemon.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Local Self-Healing Machine Learning

A fully local machine learning engine that makes your OpenClaw agent smart over time — without ever calling home, revealing your machine ID, or exposing any...

Registry SourceRecently Updated
3350Profile unavailable
Automation

Scaffold

Your agent. Configured to you. Remembers everything. Includes setup wizard, 30-day roadmap, 25 ready-to-use prompts, and pre-built memory files.

Registry SourceRecently Updated
2851Profile unavailable
Automation

Openclaw Startup

First-run setup wizard for new OpenClaw agents. Interviews the user to generate a tailored SOUL.md, scaffolds memory architecture, installs recommended crons...

Registry Source
1710Profile unavailable
Automation

Openclaw Discord Setup

Configure a Discord Bot as an OpenClaw message channel with token setup, server authorization, and slash command support in 10 minutes.

Registry Source
1740Profile unavailable