analytics-engineer

The agent operates as a senior analytics engineer, building scalable dbt transformation layers, designing dimensional models, writing tested SQL, and managing semantic-layer metric definitions.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "analytics-engineer" with this command: npx skills add borghei/claude-skills/borghei-claude-skills-analytics-engineer

Analytics Engineer

The agent operates as a senior analytics engineer, building scalable dbt transformation layers, designing dimensional models, writing tested SQL, and managing semantic-layer metric definitions.

Workflow

  • Understand the data request -- Identify the business question, required grain, and downstream consumers (dashboard, notebook, reverse-ETL). Confirm source tables exist and check freshness.

  • Design the dimensional model -- Choose star or snowflake schema. Map source entities to dimension and fact tables at the correct grain. Document grain, primary keys, and foreign keys.

  • Build staging models -- One stg_ model per source table. Rename columns, cast types, filter soft-deletes, and add metadata columns. Validate: dbt build --select stg_* .

  • Build intermediate models -- Encapsulate reusable business logic in int_ models (e.g., int_orders_enriched ). Keep each CTE single-purpose.

  • Build mart models -- Create dim_ and fct_ models for consumption. Configure materialization (view for staging, incremental for large facts, table for small marts).

  • Add tests and documentation -- Every primary key gets unique

  • not_null . Foreign keys get relationships . Add accepted_values for enums. Write model descriptions in YAML.
  • Define semantic-layer metrics -- Register metrics (sum, average, count_distinct) with time grains and dimension slices so BI consumers get a single source of truth.

  • Validate end-to-end -- Run dbt build , confirm test pass rate = 100%, check row counts against source, and verify dashboard numbers match.

dbt Project Structure

analytics/ dbt_project.yml models/ staging/ # stg_<source>__<table>.sql (one per source table) intermediate/ # int_<entity><verb>.sql (reusable logic) marts/ core/ # dim.sql, fct_.sql (consumption-ready) marketing/ finance/ macros/ # Reusable Jinja helpers tests/ # Custom generic + singular tests seeds/ # Static CSV lookups snapshots/ # SCD Type 2 captures

Concrete Example: Customer Dimension

Staging model (models/staging/crm/stg_crm__customers.sql ):

WITH source AS ( SELECT * FROM {{ source('crm', 'customers') }} ),

renamed AS ( SELECT id AS customer_id, TRIM(LOWER(name)) AS customer_name, TRIM(LOWER(email)) AS email, created_at::timestamp AS created_at, updated_at::timestamp AS updated_at, is_active::boolean AS is_active, _fivetran_synced AS _loaded_at FROM source WHERE _fivetran_deleted = false )

SELECT * FROM renamed

Mart model (models/marts/core/dim_customer.sql ):

WITH customers AS ( SELECT * FROM {{ ref('stg_crm__customers') }} ),

customer_orders AS ( SELECT customer_id, MIN(order_date) AS first_order_date, MAX(order_date) AS most_recent_order_date, COUNT(*) AS lifetime_orders, SUM(order_amount) AS lifetime_value FROM {{ ref('stg_orders__orders') }} GROUP BY customer_id ),

final AS ( SELECT c.customer_id, c.customer_name, c.email, c.created_at, co.first_order_date, co.most_recent_order_date, co.lifetime_orders, co.lifetime_value, CASE WHEN co.lifetime_value >= 10000 THEN 'platinum' WHEN co.lifetime_value >= 5000 THEN 'gold' WHEN co.lifetime_value >= 1000 THEN 'silver' ELSE 'bronze' END AS customer_tier FROM customers c LEFT JOIN customer_orders co ON c.customer_id = co.customer_id )

SELECT * FROM final

Test configuration (models/marts/core/_core__models.yml ):

version: 2 models:

  • name: dim_customer description: Customer dimension with lifetime order metrics and tier classification. columns:
    • name: customer_id tests: [unique, not_null]
    • name: email tests: [unique, not_null]
    • name: customer_tier tests:
      • accepted_values: values: ['platinum', 'gold', 'silver', 'bronze']
    • name: lifetime_value tests:
      • dbt_utils.expression_is_true: expression: ">= 0"

Incremental Fact Table Pattern

-- models/marts/core/fct_orders.sql {{ config( materialized='incremental', unique_key='order_id', partition_by={'field': 'order_date', 'data_type': 'date'}, cluster_by=['customer_id', 'product_id'] ) }}

WITH orders AS ( SELECT * FROM {{ ref('stg_orders__orders') }} {% if is_incremental() %} WHERE order_date >= (SELECT MAX(order_date) FROM {{ this }}) {% endif %} ),

order_items AS ( SELECT * FROM {{ ref('stg_orders__order_items') }} ),

final AS ( SELECT o.order_id, o.order_date, o.customer_id, oi.product_id, o.store_id, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price AS line_total, o.discount_amount, o.tax_amount, o.total_amount FROM orders o INNER JOIN order_items oi ON o.order_id = oi.order_id )

SELECT * FROM final

Materialization Strategy

Layer Materialization Rationale

Staging View Thin wrappers; no storage cost

Intermediate Ephemeral / View Business logic; referenced multiple times

Marts (small) Table Query performance for BI tools

Marts (large) Incremental Efficient appends for large fact tables

Semantic-Layer Metric Definition

models/marts/core/_core__metrics.yml

metrics:

  • name: revenue label: Total Revenue model: ref('fct_orders') calculation_method: sum expression: total_amount timestamp: order_date time_grains: [day, week, month, quarter, year] dimensions: [customer_tier, product_category, store_region] filters:

    • field: is_cancelled operator: '=' value: 'false'
  • name: average_order_value label: Average Order Value model: ref('fct_orders') calculation_method: average expression: total_amount timestamp: order_date time_grains: [day, week, month]

Useful Macros

-- macros/cents_to_dollars.sql {% macro cents_to_dollars(column_name) %} ({{ column_name }} / 100.0)::decimal(18,2) {% endmacro %}

-- macros/get_incremental_filter.sql {% macro get_incremental_filter(column_name, lookback_days=3) %} {% if is_incremental() %} WHERE {{ column_name }} >= ( SELECT DATEADD(day, -{{ lookback_days }}, MAX({{ column_name }})) FROM {{ this }} ) {% endif %} {% endmacro %}

CI/CD: Slim CI for Pull Requests

Only run modified models and their downstream dependents

dbt run --select state:modified+ --defer --state ./target-base dbt test --select state:modified+ --defer --state ./target-base

For full CI/CD pipeline configuration, see REFERENCE.md .

Reference Materials

  • REFERENCE.md -- Extended patterns: source config, custom tests, CI/CD workflows, exposures, documentation templates

  • references/modeling_patterns.md -- Data modeling best practices

  • references/dbt_style_guide.md -- SQL and dbt conventions

  • references/testing_guide.md -- Testing strategies

  • references/optimization.md -- Performance tuning

Scripts

python scripts/impact_analyzer.py --model dim_customer python scripts/schema_diff.py --source prod --target dev python scripts/doc_generator.py --format markdown python scripts/quality_scorer.py --model fct_orders

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

product-designer

No summary provided by upstream source.

Repository SourceNeeds Review
-2.2K
borghei
General

business-intelligence

No summary provided by upstream source.

Repository SourceNeeds Review
General

brand-strategist

No summary provided by upstream source.

Repository SourceNeeds Review