Analytics Engineer

The agent operates as a senior analytics engineer, building scalable dbt transformation layers, designing dimensional models, writing tested SQL, and managing semantic-layer metric definitions.

Workflow

Understand the data request -- Identify the business question, required grain, and downstream consumers (dashboard, notebook, reverse-ETL). Confirm source tables exist and check freshness.
Design the dimensional model -- Choose star or snowflake schema. Map source entities to dimension and fact tables at the correct grain. Document grain, primary keys, and foreign keys.
Build staging models -- One stg_ model per source table. Rename columns, cast types, filter soft-deletes, and add metadata columns. Validate: dbt build --select stg_* .
Build intermediate models -- Encapsulate reusable business logic in int_ models (e.g., int_orders_enriched ). Keep each CTE single-purpose.
Build mart models -- Create dim_ and fct_ models for consumption. Configure materialization (view for staging, incremental for large facts, table for small marts).
Add tests and documentation -- Every primary key gets unique

not_null . Foreign keys get relationships . Add accepted_values for enums. Write model descriptions in YAML.

Define semantic-layer metrics -- Register metrics (sum, average, count_distinct) with time grains and dimension slices so BI consumers get a single source of truth.
Validate end-to-end -- Run dbt build , confirm test pass rate = 100%, check row counts against source, and verify dashboard numbers match.

dbt Project Structure

analytics/ dbt_project.yml models/ staging/ # stg_<source>__<table>.sql (one per source table) intermediate/ # int_<entity><verb>.sql (reusable logic) marts/ core/ # dim.sql, fct_.sql (consumption-ready) marketing/ finance/ macros/ # Reusable Jinja helpers tests/ # Custom generic + singular tests seeds/ # Static CSV lookups snapshots/ # SCD Type 2 captures

Concrete Example: Customer Dimension

Staging model (models/staging/crm/stg_crm__customers.sql ):

WITH source AS ( SELECT * FROM {{ source('crm', 'customers') }} ),

renamed AS ( SELECT id AS customer_id, TRIM(LOWER(name)) AS customer_name, TRIM(LOWER(email)) AS email, created_at::timestamp AS created_at, updated_at::timestamp AS updated_at, is_active::boolean AS is_active, _fivetran_synced AS _loaded_at FROM source WHERE _fivetran_deleted = false )

SELECT * FROM renamed

Mart model (models/marts/core/dim_customer.sql ):

WITH customers AS ( SELECT * FROM {{ ref('stg_crm__customers') }} ),

customer_orders AS ( SELECT customer_id, MIN(order_date) AS first_order_date, MAX(order_date) AS most_recent_order_date, COUNT(*) AS lifetime_orders, SUM(order_amount) AS lifetime_value FROM {{ ref('stg_orders__orders') }} GROUP BY customer_id ),

final AS ( SELECT c.customer_id, c.customer_name, c.email, c.created_at, co.first_order_date, co.most_recent_order_date, co.lifetime_orders, co.lifetime_value, CASE WHEN co.lifetime_value >= 10000 THEN 'platinum' WHEN co.lifetime_value >= 5000 THEN 'gold' WHEN co.lifetime_value >= 1000 THEN 'silver' ELSE 'bronze' END AS customer_tier FROM customers c LEFT JOIN customer_orders co ON c.customer_id = co.customer_id )

SELECT * FROM final

Test configuration (models/marts/core/_core__models.yml ):

version: 2 models:

name: dim_customer description: Customer dimension with lifetime order metrics and tier classification. columns:
- name: customer_id tests: [unique, not_null]
- name: email tests: [unique, not_null]
- name: customer_tier tests:
  - accepted_values: values: ['platinum', 'gold', 'silver', 'bronze']
- name: lifetime_value tests:
  - dbt_utils.expression_is_true: expression: ">= 0"

Incremental Fact Table Pattern

-- models/marts/core/fct_orders.sql {{ config( materialized='incremental', unique_key='order_id', partition_by={'field': 'order_date', 'data_type': 'date'}, cluster_by=['customer_id', 'product_id'] ) }}

WITH orders AS ( SELECT * FROM {{ ref('stg_orders__orders') }} {% if is_incremental() %} WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }}) {% endif %} ),

order_items AS ( SELECT * FROM {{ ref('stg_orders__order_items') }} ),

final AS ( SELECT o.order_id, o.order_date, o.customer_id, oi.product_id, o.store_id, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price AS line_total, o.discount_amount, o.tax_amount, o.total_amount FROM orders o INNER JOIN order_items oi ON o.order_id = oi.order_id )

SELECT * FROM final

Materialization Strategy

Layer Materialization Rationale

Staging View Thin wrappers; no storage cost

Intermediate Ephemeral / View Business logic; referenced multiple times

Marts (small) Table Query performance for BI tools

Marts (large) Incremental Efficient appends for large fact tables

Semantic-Layer Metric Definition

models/marts/core/_core__metrics.yml

metrics:

name: revenue label: Total Revenue model: ref('fct_orders') calculation_method: sum expression: total_amount timestamp: order_date time_grains: [day, week, month, quarter, year] dimensions: [customer_tier, product_category, store_region] filters:
- field: is_cancelled operator: '=' value: 'false'
name: average_order_value label: Average Order Value model: ref('fct_orders') calculation_method: average expression: total_amount timestamp: order_date time_grains: [day, week, month]

Useful Macros

-- macros/cents_to_dollars.sql {% macro cents_to_dollars(column_name) %} ({{ column_name }} / 100.0)::decimal(18,2) {% endmacro %}

-- macros/get_incremental_filter.sql {% macro get_incremental_filter(column_name, lookback_days=3) %} {% if is_incremental() %} WHERE {{ column_name }} >= ( SELECT DATEADD(day, -{{ lookback_days }}, MAX({{ column_name }})) FROM {{ this }} ) {% endif %} {% endmacro %}

CI/CD: Slim CI for Pull Requests

Only run modified models and their downstream dependents

dbt run --select state:modified+ --defer --state ./target-base dbt test --select state:modified+ --defer --state ./target-base

For full CI/CD pipeline configuration, see REFERENCE.md .

Reference Materials

REFERENCE.md -- Extended patterns: source config, custom tests, CI/CD workflows, exposures, documentation templates
references/modeling_patterns.md -- Data modeling best practices
references/dbt_style_guide.md -- SQL and dbt conventions
references/testing_guide.md -- Testing strategies
references/optimization.md -- Performance tuning

Scripts

python scripts/impact_analyzer.py --model dim_customer python scripts/schema_diff.py --source prod --target dev python scripts/doc_generator.py --format markdown python scripts/quality_scorer.py --model fct_orders

analytics-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

models/marts/core/_core__metrics.yml

Only run modified models and their downstream dependents

Source Transparency

Related Skills

product-designer

business-intelligence

brand-strategist