data-pipelines

Deep data pipeline workflow—ingestion, orchestration, idempotency, data quality, SLAs, observability, and lineage. Use when building batch/stream pipelines, debugging job failures, or hardening ETL/ELT.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "data-pipelines" with this command: npx skills add mike47512/data-pipelines

Data Pipelines

Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.

When to Offer This Workflow

Trigger conditions:

  • Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
  • Late data, backfills, or schema changes breaking jobs
  • SLA misses on freshness or row counts

Initial offer:

Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.


Stage 1: Requirements & SLAs

Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).

Exit condition: SLA table: pipeline → metric → threshold.


Stage 2: Source Contracts

Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.

Practices

  • Raw landing zone immutable; curated layers downstream

Stage 3: Transforms & Idempotency

Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.

Practices

  • Watermark progress for incremental loads

Stage 4: Orchestration & Dependencies

Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.


Stage 5: Quality & Monitoring

Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.


Stage 6: Lineage & Operations

Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.


Final Review Checklist

  • SLAs and failure policy explicit
  • Source contracts and schema evolution path
  • Idempotent writes and checkpointing
  • Orchestration with retries and safe backfill
  • Data quality checks and alerts
  • Lineage and ownership documented

Tips for Effective Guidance

  • Separate compute from storage cost awareness for large shuffles.
  • Pair with etl-design for batch patterns and message-queues for streaming handoffs.

Handling Deviations

  • Single-script pipelines: still document inputs, outputs, and schedule.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Canonry Setup

Agent-first AEO operating platform.

Registry SourceRecently Updated
4151arberx
Automation

Pilot Service Agents Entertainment

Games, manga/anime, trivia, and fandom APIs — PokeAPI, Jikan, CheapShark, misc. Use this skill when: 1. Pokémon / PokeAPI lookups 2. Anime or manga metadata...

Registry SourceRecently Updated
Automation

Pilot Service Agents Economics

Macroeconomic indicators — IMF DataMapper, World Bank, Eurostat SDMX, Coinbase reference prices. Use this skill when: 1. Country-level GDP, inflation, or une...

Registry SourceRecently Updated
Automation

Pilot Service Agents Flights

Aircraft tracking and aviation weather — ADS-B feeds (ICAO + bbox), airport directory, METAR/TAF/SIGMET. Use this skill when: 1. Live aircraft positions by I...

Registry SourceRecently Updated