Dagster Integrations Index
Navigate 82+ Dagster integrations organized by Dagster's official taxonomy. Find AI/ML tools, ETL platforms, data storage, compute services, BI tools, and monitoring integrations.
When to Use This Skill vs. Others
If User Says... Use This Skill/Command Why
"which integration for X" /dagster-integrations
Need to discover appropriate integration
"does dagster support X" /dagster-integrations
Check integration availability
"snowflake vs bigquery" /dagster-integrations
Compare integrations in same category
"best practices for X" /dagster-conventions
Implementation patterns needed
"implement X integration" /dg:prototype
Ready to build with specific integration
"how do I use dbt" /dagster-conventions (dbt section) dbt-specific implementation patterns
"make this code better" /dignified-python
Python code review needed
"create new project" /dg:create-project
Project initialization needed
Quick Reference by Category
Category Count Common Tools Reference
AI & ML 6 OpenAI, Anthropic, MLflow, W&B references/ai.md
ETL/ELT 9 dbt, Fivetran, Airbyte, PySpark references/etl.md
Storage 35+ Snowflake, BigQuery, Postgres, DuckDB references/storage.md
Compute 15+ AWS, Databricks, Spark, Docker, K8s references/compute.md
BI & Visualization 7 Looker, Tableau, PowerBI, Sigma references/bi.md
Monitoring 3 Datadog, Prometheus, Papertrail references/monitoring.md
Alerting 6 Slack, PagerDuty, MS Teams, Twilio references/alerting.md
Testing 2 Great Expectations, Pandera references/testing.md
Other 2+ Pandas, Polars references/other.md
Category Taxonomy
This index aligns with Dagster's official documentation taxonomy from tags.yml:
-
ai: Artificial intelligence and machine learning integrations (LLM APIs, experiment tracking)
-
etl: Extract, transform, and load tools including data replication and transformation frameworks
-
storage: Databases, data warehouses, object storage, and table formats
-
compute: Cloud platforms, container orchestration, and distributed processing frameworks
-
bi: Business intelligence and visualization platforms
-
monitoring: Observability platforms and metrics systems for tracking performance
-
alerting: Notification and incident management systems for pipeline alerts
-
testing: Data quality validation and testing frameworks
-
other: Miscellaneous integrations including DataFrame libraries
Note: Support levels (dagster-supported, community-supported) are shown inline in each integration entry.
Last verified: 2026-01-27
Finding the Right Integration
I need to...
Load data from external sources
-
SaaS applications → ETL (Fivetran, Airbyte)
-
Files/databases → ETL (dlt, Sling, Meltano)
-
Cloud storage → Storage (S3, GCS, Azure Blob)
Transform data
-
SQL transformations → ETL (dbt)
-
Distributed transformations → ETL (PySpark)
-
DataFrame operations → Other (Pandas, Polars)
-
Large-scale processing → Compute (Spark, Dask, Ray)
Store data
-
Cloud data warehouse → Storage (Snowflake, BigQuery, Redshift)
-
Relational database → Storage (Postgres, MySQL)
-
File/object storage → Storage (S3, GCS, Azure, LakeFS)
-
Analytics database → Storage (DuckDB)
-
Vector embeddings → Storage (Weaviate, Chroma, Qdrant)
Validate data quality
-
Schema validation → Testing (Pandera)
-
Quality checks → Testing (Great Expectations)
Run ML workloads
-
LLM integration → AI (OpenAI, Anthropic, Gemini)
-
Experiment tracking → AI (MLflow, W&B)
-
Distributed training → Compute (Ray, Spark)
Execute computation
-
Cloud compute → Compute (AWS, Azure, GCP, Databricks)
-
Containers → Compute (Docker, Kubernetes)
-
Distributed processing → Compute (Spark, Dask, Ray)
Monitor pipelines
-
Team notifications → Alerting (Slack, MS Teams, PagerDuty)
-
Metrics tracking → Monitoring (Datadog, Prometheus)
-
Log aggregation → Monitoring (Papertrail)
Visualize data
-
BI dashboards → BI (Looker, Tableau, PowerBI)
-
Analytics platform → BI (Sigma, Hex, Evidence)
Integration Categories
AI & ML
Artificial intelligence and machine learning platforms, including LLM APIs and experiment tracking.
Key integrations:
-
OpenAI - GPT models and embeddings API
-
Anthropic - Claude AI models
-
Gemini - Google's multimodal AI
-
MLflow - Experiment tracking and model registry
-
Weights & Biases - ML experiment tracking
-
NotDiamond - LLM routing and optimization
See references/ai.md for all AI/ML integrations.
ETL/ELT
Extract, transform, and load tools for data ingestion, transformation, and replication.
Key integrations:
-
dbt - SQL-based transformation with automatic dependencies
-
Fivetran - Automated SaaS data ingestion (component-based)
-
Airbyte - Open-source ELT platform
-
dlt - Python-based data loading (component-based)
-
Sling - High-performance data replication (component-based)
-
PySpark - Distributed data transformation
-
Meltano - ELT for the modern data stack
See references/etl.md for all ETL/ELT integrations.
Storage
Data warehouses, databases, object storage, vector databases, and table formats.
Key integrations:
-
Snowflake - Cloud data warehouse with IO managers
-
BigQuery - Google's serverless data warehouse
-
DuckDB - In-process SQL analytics
-
Postgres - Open-source relational database
-
Weaviate - Vector database for AI search
-
Delta Lake - ACID transactions for data lakes
-
DataHub - Metadata catalog and lineage
See references/storage.md for all storage integrations.
Compute
Cloud platforms, container orchestration, and distributed processing frameworks.
Key integrations:
-
AWS - Cloud compute services (Glue, EMR, Lambda)
-
Databricks - Unified analytics platform
-
GCP - Google Cloud compute (Dataproc, Cloud Run)
-
Spark - Distributed data processing engine
-
Dask - Parallel computing framework
-
Docker - Container execution with Pipes
-
Kubernetes - Cloud-native orchestration
-
Ray - Distributed computing for ML
See references/compute.md for all compute integrations.
BI & Visualization
Business intelligence and visualization platforms for analytics and reporting.
Key integrations:
-
Looker - Google's BI platform
-
Tableau - Interactive dashboards
-
PowerBI - Microsoft's BI tool
-
Sigma - Cloud analytics platform
-
Hex - Collaborative notebooks
-
Evidence - Markdown-based BI
-
Cube - Semantic layer platform
See references/bi.md for all BI integrations.
Monitoring
Observability platforms and metrics systems for tracking pipeline performance.
Key integrations:
-
Datadog - Comprehensive observability platform
-
Prometheus - Time-series metrics collection
-
Papertrail - Centralized log management
See references/monitoring.md for all monitoring integrations.
Alerting
Notification and incident management systems for pipeline alerts.
Key integrations:
-
Slack - Team messaging and alerts
-
PagerDuty - Incident management for on-call
-
MS Teams - Microsoft Teams notifications
-
Twilio - SMS and voice notifications
-
Apprise - Universal notification platform
-
DingTalk - Team communication for Asian markets
See references/alerting.md for all alerting integrations.
Testing
Data quality validation and testing frameworks for ensuring data reliability.
Key integrations:
-
Great Expectations - Data validation with expectations
-
Pandera - Statistical data validation for DataFrames
See references/testing.md for all testing integrations.
Other
Miscellaneous integrations including DataFrame libraries and utility tools.
Key integrations:
-
Pandas - In-memory DataFrame library
-
Polars - Fast DataFrame library with columnar storage
See references/other.md for other integrations.
References
Integration details are organized in the following files:
-
AI & ML: references/ai.md
-
AI and ML platforms, LLM APIs, experiment tracking
-
ETL/ELT: references/etl.md
-
Data ingestion, transformation, and replication tools
-
Storage: references/storage.md
-
Warehouses, databases, object storage, vector DBs
-
Compute: references/compute.md
-
Cloud platforms, containers, distributed processing
-
BI & Visualization: references/bi.md
-
Business intelligence and analytics platforms
-
Monitoring: references/monitoring.md
-
Observability and metrics systems
-
Alerting: references/alerting.md
-
Notifications and incident management
-
Testing: references/testing.md
-
Data quality and validation frameworks
-
Other: references/other.md
-
DataFrame libraries and miscellaneous tools
Using Integrations
Most Dagster integrations follow a common pattern:
Install the package:
pip install dagster-<integration>
Import and configure a resource:
from dagster_<integration> import <Integration>Resource
resource = <Integration>Resource( config_param=dg.EnvVar("ENV_VAR") )
Use in your assets:
@dg.asset def my_asset(integration: <Integration>Resource): # Use the integration pass
For component-based integrations (dbt, Fivetran, dlt, Sling), see the specific reference files for scaffolding and configuration patterns.