dbt Refactoring

Find ALL downstream dependencies before changing. Refactor in small steps. Verify output after each change.

Workflow

Analyze Current Model

cat models/<path>/<model_name>.sql

Identify refactoring opportunities:

CTEs longer than 50 lines → extract to intermediate model
Logic repeated across models → extract to macro
Multiple joins in sequence → split into steps
Complex WHERE clauses → extract to staging filter

Find All Downstream Dependencies

CRITICAL: Never refactor without knowing impact.

Get full dependency tree (model and all its children)

dbt ls --select model_name+ --output list

Find all models referencing this one

grep -r "ref('model_name')" models/ --include="*.sql"

Report to user: "Found X downstream models: [list]. These will be affected by changes."

Check What Columns Downstream Models Use

BEFORE changing any columns, check what downstream models reference:

For each downstream model, check what columns it uses

cat models/<path>/<downstream_model>.sql | grep -E "model_name.\w+|alias.\w+"

If downstream models reference specific columns, you MUST ensure those columns remain available after refactoring.

Plan Refactoring Strategy

Opportunity Strategy

Long CTE Extract to intermediate model

Repeated logic Create macro in macros/

Complex join Split into intermediate models

Multiple concerns Separate into focused models

Execute Refactoring

Pattern: Extract CTE to Model

Before:

-- orders.sql (200 lines) with customer_metrics as ( -- 50 lines of complex logic ), order_enriched as ( select ... from orders join customer_metrics on ... ) select * from order_enriched

After:

-- customer_metrics.sql (new file) select customer_id, -- complex logic here from {{ ref('customers') }}

-- orders.sql (simplified) with order_enriched as ( select ... from {{ ref('raw_orders') }} orders join {{ ref('customer_metrics') }} cm on ... ) select * from order_enriched

Pattern: Extract to Macro

Before (repeated in multiple models):

case when amount < 0 then 'refund' when amount = 0 then 'zero' else 'positive' end as amount_category

After:

-- macros/categorize_amount.sql {% macro categorize_amount(column_name) %} case when {{ column_name }} < 0 then 'refund' when {{ column_name }} = 0 then 'zero' else 'positive' end {% endmacro %}

-- In models: {{ categorize_amount('amount') }} as amount_category

Validate Changes

Compile to check syntax

dbt compile --select +model_name+

Build entire lineage

dbt build --select +model_name+

Check row counts (manual)

Before: Record expected counts

After: Verify counts match

Verify Output Matches Original

CRITICAL: Refactoring should not change output.

Compare row counts before and after

dbt show --inline "select count(*) from {{ ref('model_name') }}"

Spot check key values

dbt show --select <model_name> --limit 10

Update Downstream Models

If changing output columns:

Update all downstream refs
Update schema.yml documentation
Re-run downstream tests

Refactoring Checklist

All downstream dependencies identified
User informed of impact scope
One change at a time
Compile passes after each change
Build passes after each change
Output validated (row counts match)
Documentation updated
Tests still pass

Common Refactoring Triggers

Symptom Refactoring

Model > 200 lines Extract CTEs to models

Same logic in 3+ models Extract to macro

5+ joins in one model Create intermediate models

Hard to understand Add CTEs with clear names

Slow performance Split to allow parallelization

Anti-Patterns

Refactoring without checking downstream impact
Making multiple changes at once
Not validating output matches after refactoring
Extracting prematurely (wait for 3+ uses)
Breaking existing tests without updating them

refactoring-dbt-models

Safety Notice

Copy this and send it to your AI assistant to learn

Get full dependency tree (model and all its children)

Find all models referencing this one

For each downstream model, check what columns it uses

Compile to check syntax

Build entire lineage

Check row counts (manual)

Before: Record expected counts

After: Verify counts match

Compare row counts before and after

Spot check key values

Source Transparency

Related Skills

documenting-dbt-models

creating-dbt-models

optimizing-query-text