etl-design

Deep ETL/ELT design workflow—extract patterns, transforms, loading strategies, idempotency, validation, and reconciliation. Use when designing batch data flows between systems or hardening pipelines for correctness.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "etl-design" with this command: npx skills add mike47512/etl-design

ETL Design

ETL is correctness under change: schema drift, partial loads, retries, and reconciliation with upstream systems.

When to Offer This Workflow

Trigger conditions:

  • Batch loads into warehouse or data lake
  • Choosing between CDC, snapshots, and incremental watermarks
  • Missing rows, duplicates, or inconsistent aggregates downstream

Initial offer:

Use six stages: (1) source contract, (2) extract strategy, (3) transform rules, (4) load & dedupe, (5) validation, (6) operations & backfill). Confirm batch window and SLA.


Stage 1: Source Contract

Goal: Document schema, primary keys, change indicators (updated_at, CDC log position), and access constraints (rate limits, read replicas).


Stage 2: Extract Strategy

Goal: Full dump vs incremental watermark vs CDC—trade freshness, source load, and complexity.

Practices

  • CDC for large sources; snapshots for small or infrequent tables

Stage 3: Transform Rules

Goal: Deterministic transforms; surrogate keys; business rules versioned; handling of deletes (tombstones vs hard deletes).


Stage 4: Load & Dedupe

Goal: Upsert keys; partitions; rerunnable jobs with same batch id producing the same outcome (idempotent load).


Stage 5: Validation

Goal: Row counts, checksums, key uniqueness, referential checks; alert on threshold breaches.


Stage 6: Operations & Backfill

Goal: Replay by date range; monitor lag; dead-letter or quarantine bad rows with reason codes.


Final Review Checklist

  • Source contract and keys documented
  • Extract mode matches SLA and source constraints
  • Transforms deterministic and versioned
  • Idempotent load strategy
  • Validation and reconciliation defined

Tips for Effective Guidance

  • Plan for late-arriving facts and slowly changing dimensions in analytics paths.
  • Pair with data-pipelines for orchestration and monitoring.

Handling Deviations

  • Near-real-time: document micro-batch or streaming semantics separately.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

Canonry Setup

Agent-first AEO operating platform.

Registry SourceRecently Updated
4151arberx
Automation

Pilot Service Agents Entertainment

Games, manga/anime, trivia, and fandom APIs — PokeAPI, Jikan, CheapShark, misc. Use this skill when: 1. Pokémon / PokeAPI lookups 2. Anime or manga metadata...

Registry SourceRecently Updated
Automation

Pilot Service Agents Economics

Macroeconomic indicators — IMF DataMapper, World Bank, Eurostat SDMX, Coinbase reference prices. Use this skill when: 1. Country-level GDP, inflation, or une...

Registry SourceRecently Updated
Automation

Pilot Service Agents Flights

Aircraft tracking and aviation weather — ADS-B feeds (ICAO + bbox), airport directory, METAR/TAF/SIGMET. Use this skill when: 1. Live aircraft positions by I...

Registry SourceRecently Updated