Area: Product Issues
Sub-Area: Lineage Display
Issue
Users may notice that expected upstream or downstream lineage relationships are not appearing in the DataHub UI, even when they know data relationships exist between tables. This can occur due to several factors including time-based filtering, missing lineage ingestion, or limitations with intermediate table handling.
You Might Be Asking
- Why don't I see upstream tables for my dataset even though I know they exist?
- What does the time range filter in the lineage view do?
- How do I troubleshoot missing cross-platform lineage relationships?
Solution
-
Check the Time Range Filter
DataHub's lineage view includes a time range filter (typically showing 7 days by default) that hides lineage relationships that haven't been updated within the specified timeframe. This helps identify stale relationships.
- Expand the time range filter to 14 days, 28 days, or "All time" to see if lineage appears
- The filter shows lineage based on when it was last ingested, not when the data relationship was created
-
Verify Both Entities Exist in DataHub
Search for both the upstream and downstream tables in DataHub to confirm they have been ingested:
- Use the search functionality to find each table by name
- Check that the upstream table exists as a recognized entity
- Note the exact URNs and platform identifications
-
Review Ingestion Configuration
For lineage to appear, it must be explicitly extracted during ingestion:
- Verify your ingestion recipe includes lineage extraction (e.g.,
include_table_lineage: true) - Check ingestion logs for any errors or warnings related to lineage parsing
- For dbt sources, ensure
emit_upstream_lineage: trueis configured
- Verify your ingestion recipe includes lineage extraction (e.g.,
-
Check for Cross-Platform Lineage Issues
Cross-platform lineage (e.g., BigQuery to Snowflake) requires special attention:
- Ensure URN formatting is consistent across platforms
- Verify case sensitivity matching (Snowflake often uses uppercase while other platforms may use lowercase)
- Confirm that lineage ingestion includes references to both platforms
-
Investigate API-Level Lineage Data
Use the DataHub GMS API to check if lineage exists at the metadata level:
curl -X GET "https:///api/gms/entities/urn:li:dataset: /aspects/upstreamLineage" This will show if the lineage relationship has been ingested but may not be displaying due to other factors.
-
Address Known Limitations
Be aware of scenarios where lineage may not be captured:
- When intermediate tables are created and dropped within the same pipeline in separate sessions
- When data transformations occur outside of tracked systems
- When custom ETL processes don't emit lineage metadata
Additional Notes
The time range filter is designed to help users identify stale metadata and focus on recent data relationships. If lineage appears when expanding the time range, consider whether your ingestion schedule needs to be adjusted to refresh lineage more frequently. For complex data pipelines with intermediate tables that are created and dropped, you may need to implement custom lineage ingestion or modify your pipeline to better support lineage tracking.
Related Documentation
Tags: lineage, missing-upstream, time-filter, cross-platform, ingestion, metadata, troubleshooting, api, stale-data