ETL Patterns

Orchestrator for production-grade Extract-Transform-Load patterns.

Skill Routing

Need Skill Content

Reliability patterns etl-core-patterns

Idempotency, checkpointing, error handling, chunking, retry, logging

Load strategies etl-incremental-patterns

Backfill, timestamp-based, CDC, pipeline orchestration

Pattern Selection Guide

By Reliability Need

Need Pattern Skill

Repeatable runs Idempotency etl-core-patterns

Resume after failure Checkpointing etl-core-patterns

Handle bad records Error handling + DLQ etl-core-patterns

Memory management Chunked processing etl-core-patterns

Network resilience Retry with backoff etl-core-patterns

Observability Structured logging etl-core-patterns

By Load Strategy

Scenario Pattern Skill

Small tables (<100K) Full refresh etl-incremental-patterns

Large tables Timestamp incremental etl-incremental-patterns

Real-time sync CDC events etl-incremental-patterns

Historical migration Parallel backfill etl-incremental-patterns

Zero-downtime refresh Swap pattern etl-incremental-patterns

Multi-step pipelines Pipeline orchestration etl-incremental-patterns

Quick Reference

Idempotency Options

Small datasets: Delete-then-insert

Large datasets: UPSERT on conflict

Change detection: Row hash comparison

Load Strategy Decision

Is table < 100K rows? → Full refresh

Has reliable timestamp column? → Timestamp incremental

Source supports CDC? → CDC event processing

Need zero downtime? → Swap pattern (temp table → rename)

One-time historical load? → Parallel backfill with date ranges

Common Pipeline Structure

1. Setup

checkpoint = Checkpoint('.etl_checkpoint.json') processor = ETLProcessor()

2. Extract (with incremental)

df = incremental_by_timestamp(source_table, 'updated_at')

3. Transform (with error handling)

transformed = processor.process_batch(df.to_dict('records'))

4. Load (with idempotency)

upsert_records(pd.DataFrame(transformed))

5. Checkpoint

checkpoint.set_last_processed('sync', df['updated_at'].max())

6. Handle failures

processor.save_failures('failures/')

Related Skills

data-validation
Validate data quality during ETL
data-quality
Monitor data quality metrics
pandas-coder
DataFrame transformations

etl-patterns

Safety Notice

Copy this and send it to your AI assistant to learn

Small datasets: Delete-then-insert

Large datasets: UPSERT on conflict

Change detection: Row hash comparison

1. Setup

2. Extract (with incremental)

3. Transform (with error handling)

4. Load (with idempotency)

5. Checkpoint

6. Handle failures

Source Transparency

Related Skills

google-ads-strategy

viral-content

market-research

free-tool-arsenal