dbt Incremental Model Development
Choose the right strategy. Design the unique_key carefully. Handle edge cases.
When to Use Incremental
Scenario Recommendation
Source data < 10M rows Use table (simpler, full refresh is fast)
Source data > 10M rows Consider incremental
Source data updated in place Use incremental with merge strategy
Append-only source (logs, events) Use incremental with append strategy
Partitioned warehouse data Use insert_overwrite if supported
Default to table unless you have a clear performance reason for incremental.
Critical Rules
-
ALWAYS test with --full-refresh first before relying on incremental logic
-
ALWAYS verify unique_key is truly unique in both source and target
-
If merge fails 3+ times, check unique_key for duplicates
-
Run full refresh periodically to prevent data drift
Workflow
- Confirm Incremental is Needed
Check source table size
dbt show --inline "select count(*) from {{ source('schema', 'table') }}"
If count < 10 million, consider using table instead. Incremental adds complexity.
- Understand the Source Data Pattern
Before choosing a strategy, answer:
-
Is data append-only? (new rows added, never updated)
-
Are existing rows updated? (need merge/upsert)
-
Is there a reliable timestamp? (for filtering new data)
-
What's the unique identifier? (for merge matching)
Check for timestamp column
dbt show --inline " select min(updated_at) as earliest, max(updated_at) as latest, count(distinct date(updated_at)) as days_of_data from {{ source('schema', 'table') }} "
- Choose the Right Strategy
Strategy Use When How It Works
append
Data is append-only, no updates INSERT only, no deduplication
merge
Data can be updated MERGE/UPSERT by unique_key
delete+insert
Data updated in batches DELETE matching rows, then INSERT
insert_overwrite
Partitioned tables (BigQuery, Spark) Replace entire partitions
Default: merge is safest for most use cases.
Note: Strategy availability varies by adapter. Check the dbt incremental strategy docs for your specific warehouse.
- Design the Unique Key
CRITICAL: unique_key must be truly unique in your data.
Verify uniqueness BEFORE creating model
dbt show --inline " select {{ unique_key_column }}, count() from {{ source('schema', 'table') }} group by 1 having count() > 1 limit 10 "
If duplicates exist:
-
Add more columns to make composite key
-
Add deduplication logic in model
-
Use delete+insert instead of merge
- Write the Incremental Model
{{ config( materialized='incremental', incremental_strategy='merge', -- or append, delete+insert unique_key='id', -- MUST be unique on_schema_change='append_new_columns' -- handle new columns ) }}
select id, column_a, column_b, updated_at from {{ source('schema', 'table') }}
{% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}
- Build with Full Refresh First
ALWAYS verify with full refresh before trusting incremental logic.
First run: full refresh to establish baseline
dbt build --select <model_name> --full-refresh
Verify output
dbt show --select <model_name> --limit 10 dbt show --inline "select count(*) from {{ ref('model_name') }}"
- Test Incremental Logic
Run incrementally (no --full-refresh)
dbt build --select <model_name>
Verify row count changed appropriately
dbt show --inline "select count(*) from {{ ref('model_name') }}"
- Handle Schema Changes
Set on_schema_change based on your needs:
Setting Behavior
ignore (default) New columns in source are ignored
append_new_columns
New columns added to target
sync_all_columns
Target schema matches source exactly
fail
Error if schema changes
Common Incremental Problems
Problem: Merge Fails with Duplicate Key
Symptom: "Cannot MERGE with duplicate values"
Cause: Multiple rows with same unique_key in source or target.
Fix:
-- Add deduplication using a CTE (cross-database compatible) with deduplicated as ( select *, row_number() over (partition by id order by updated_at desc) as rn from {{ source('schema', 'table') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %} ) select * from deduplicated where rn = 1
Problem: No Partition Pruning (Full Table Scan)
Symptom: Incremental runs take as long as full refresh.
Cause: Dynamic date filter prevents partition pruning.
Fix:
{% if is_incremental() %} -- Use static date instead of subquery for partition pruning where updated_at >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} and updated_at > (select max(updated_at) from {{ this }}) {% endif %}
Problem: Late-Arriving Data is Missed
Symptom: Some records never appear in incremental model.
Cause: Filtering by max(updated_at) misses late arrivals.
Fix: Use a lookback window with a fixed offset from current date:
{% if is_incremental() %} -- Lookback 3 days to catch late-arriving data where updated_at >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} {% endif %}
Alternatively, use a variable for the lookback period:
{% set lookback_days = 3 %}
{% if is_incremental() %} where updated_at >= {{ dbt.dateadd('day', -lookback_days, dbt.current_timestamp()) }} {% endif %}
Problem: Schema Drift Causes Errors
Symptom: "Column X not found" after source adds column.
Fix: Set on_schema_change='append_new_columns' in config.
Problem: Data Drift Over Time
Symptom: Counts diverge between incremental and full refresh.
Fix: Schedule periodic full refresh:
Weekly full refresh
dbt build --select <model_name> --full-refresh
Incremental Strategy Reference
Append (Simplest)
{{ config(materialized='incremental', incremental_strategy='append') }}
select * from {{ source('events', 'raw') }} {% if is_incremental() %} where event_timestamp > (select max(event_timestamp) from {{ this }}) {% endif %}
-
No unique_key needed
-
Fastest performance
-
Only use for append-only data (logs, events, immutable records)
Merge (Default)
{{ config( materialized='incremental', incremental_strategy='merge', unique_key='id' ) }}
select * from {{ source('crm', 'contacts') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}
-
Requires unique_key
-
Handles updates and inserts
-
Most common strategy
Delete+Insert (Batch Updates)
{{ config( materialized='incremental', incremental_strategy='delete+insert', unique_key='id' ) }}
select * from {{ source('orders', 'raw') }} {% if is_incremental() %} where order_date >= {{ dbt.dateadd('day', -7, dbt.current_timestamp()) }} {% endif %}
-
Deletes all matching rows first
-
Good for reprocessing batches
-
Use when merge has duplicate key issues
Insert Overwrite (Partitioned)
{{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'event_date', 'data_type': 'date'} ) }}
select * from {{ source('events', 'raw') }} {% if is_incremental() %} where event_date >= {{ dbt.dateadd('day', -3, dbt.current_timestamp()) }} {% endif %}
-
Replaces entire partitions
-
Best for partitioned tables in BigQuery/Spark
-
No unique_key needed (operates on partitions)
Anti-Patterns
-
Using incremental for small tables (< 10M rows)
-
Not testing with full-refresh first
-
Using append strategy when data can be updated
-
Not verifying unique_key uniqueness
-
Relying on exact timestamp match without lookback
-
Never running full refresh (causes data drift)
-
Using merge with non-unique keys
Testing Checklist
-
Model runs with --full-refresh
-
Model runs incrementally (without flag)
-
unique_key verified as truly unique
-
Row counts reasonable after incremental run
-
Late-arriving data handled (lookback window)
-
Schema changes handled (on_schema_change set)
-
Periodic full refresh scheduled