Area: Product Issues
Sub-Area: Browse Path Organization
Issue
In the Discover page, duplicate folders appear with different case variations (e.g., both "RAW" and "raw" folders for the same Snowflake database). This creates confusion in asset organization where assets that should be grouped together appear in separate folders, with some assets potentially missing lineage relationships.
You Might Be Asking
- Why do I see both uppercase and lowercase versions of the same database/schema folders?
- Why do assets in the lowercase folders appear to have no lineage?
- Why does this issue appear in production but not staging environments?
Solution
This issue occurs due to inconsistent browse path formats stored in the search index, often resulting from historical ingestion runs or configuration changes. Follow these steps to resolve:
-
Verify Current Configuration
Check your Snowflake ingestion recipe for the
convert_urns_to_lowercasesetting:source: type: snowflake config: # Ensure this is consistently set convert_urns_to_lowercase: true # Recommended default -
Run Search Index Restoration
The primary fix involves updating the search index to use consistent browse paths. Contact DataHub support to run a restore indices job scoped to the affected URNs, or run it yourself if you have admin access:
# This will rewrite search index entries from the database # Scope to affected dataset URNs to avoid full reindex -
Perform Fresh Ingestion
After the index restoration, run a fresh ingestion of your Snowflake source to ensure any remaining inconsistencies are resolved:
datahub ingest -c your_snowflake_recipe.yml -
Enable Stateful Ingestion (Optional)
To prevent future inconsistencies and clean up stale metadata, enable stateful ingestion:
source: type: snowflake config: # Your existing config stateful_ingestion: enabled: true remove_stale_metadata: true
Additional Notes
The missing lineage on assets in lowercase folders is typically coincidental rather than causal - lineage is keyed off dataset URNs, not browse paths. Assets appearing without lineage often include backup tables, quickstart entities, or other assets that genuinely lack upstream/downstream relationships. This issue commonly occurs when there's configuration drift between environments or historical changes to case normalization settings. The convert_urns_to_lowercase: false setting is explicitly discouraged as it breaks lineage to other sources.
Related Documentation
Tags: snowflake, case-sensitivity, browse-path, discover-page, search-index, duplicate-folders, lineage, ingestion-configuration, stateful-ingestion, asset-organization