Data Pipelines

Pipelines fail on silent schema drift, partial writes, and unclear ownership. Design for at-least-once delivery, idempotent sinks, and observable stages.

When to Offer This Workflow

Trigger conditions:

Batch or streaming ingestion (Kafka, Fivetran, Airflow, Dagster, Spark, etc.)
Late data, backfills, or schema changes breaking jobs
SLA misses on freshness or row counts

Initial offer:

Use six stages: (1) requirements & SLAs, (2) source contracts, (3) transforms & idempotency, (4) orchestration & dependencies, (5) quality & monitoring, (6) lineage & operations). Confirm batch vs stream and cloud stack.

Stage 1: Requirements & SLAs

Goal: Freshness (latency), completeness expectations, cost ceiling, failure tolerance (quarantine vs stop-the-line).

Exit condition: SLA table: pipeline → metric → threshold.

Stage 2: Source Contracts

Goal: Schema versioning; CDC vs snapshot pulls; API rate limits.

Practices

Raw landing zone immutable; curated layers downstream

Stage 3: Transforms & Idempotency

Goal: Deterministic transforms; upsert keys; partition strategy for rewinds.

Practices

Watermark progress for incremental loads

Stage 4: Orchestration & Dependencies

Goal: Clear DAG; retry policy; backfill without double counting; SLA miss alerts.

Stage 5: Quality & Monitoring

Goal: Data quality checks (null spikes, row bounds, referential checks); metrics on lag, duration, error rate.

Stage 6: Lineage & Operations

Goal: Column-level lineage where valuable; on-call runbook; ownership per pipeline.

Final Review Checklist

SLAs and failure policy explicit
Source contracts and schema evolution path
Idempotent writes and checkpointing
Orchestration with retries and safe backfill
Data quality checks and alerts
Lineage and ownership documented

Tips for Effective Guidance

Separate compute from storage cost awareness for large shuffles.
Pair with etl-design for batch patterns and message-queues for streaming handoffs.

Handling Deviations

Single-script pipelines: still document inputs, outputs, and schedule.

data-pipelines

Safety Notice

Copy this and send it to your AI assistant to learn

Data Pipelines

When to Offer This Workflow

Stage 1: Requirements & SLAs

Stage 2: Source Contracts

Practices

Stage 3: Transforms & Idempotency

Practices

Stage 4: Orchestration & Dependencies

Stage 5: Quality & Monitoring

Stage 6: Lineage & Operations

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

Source Transparency

Related Skills

Canonry Setup

Pilot Service Agents Entertainment

Pilot Service Agents Economics

Pilot Service Agents Flights