Upstream Lineage: Sources
Trace the origins of data and answer "Where does this data come from?"
Lineage Investigation
Step 1: Identify the Target Type
Determine what we are tracing:
- Table
- Column
- DAG
Step 2: Find the Producing DAG
- List DAGs: use
list_active_dagsandlist_paused_dags - Read DAG source: use
get_dag_source_code - If a run exists, use
analyse_dag_latest_runto see tasks and logs
Step 3: Trace Data Sources
From the DAG code, identify source tables and systems:
- SQL sources in FROM or JOIN clauses
- External sources via operator hooks or connection IDs
- Files in object storage
Use go_to_connections_view to inspect connection metadata.
Step 4: Build the Lineage Chain
Example:
TARGET: analytics.orders_daily
^
+-- DAG: etl_daily_orders
^
+-- SOURCE: raw.orders
|
+-- SOURCE: dim.customers
Step 5: Check Source Health
- Use
get_dag_runsorget_dag_historyon upstream DAGs - For logs, use
go_to_dag_log_view
Lineage for Columns
- Find the column in the target table schema
- Search DAG source for references
- Trace transformations and mappings
Output: Lineage Report
Include:
- Summary of sources
- Lineage diagram
- Source details (connections, freshness)
- Transformation chain
- Data quality implications
Related Skills
- checking-freshness
- debugging-dags
- tracing-downstream-lineage
- annotating-task-lineage
- creating-openlineage-extractors