Area: Ingestion Issues
Sub-Area: Schema Field URN Management
Issue
The Microsoft SQL Server connector's convert_column_urns_to_lowercase configuration setting improves cross-platform lineage extraction by normalizing column URN casing, but creates potential URN conflicts when tables contain columns whose names differ only by case (e.g., customer_name and Customer_Name in the same table).
You Might Be Asking
- What happens when two columns in the same table differ only by case after enabling lowercase conversion?
- Will schema ingestion fail or succeed with column name collisions?
- How does column-level lineage behave when URN conflicts occur?
- Should I enable this setting despite the potential for conflicts?
Solution
When convert_column_urns_to_lowercase is enabled and column URN collisions occur, DataHub's validation system prevents data corruption through the following behavior:
-
Schema Ingestion Failure: DataHub's
FieldPathValidatordetects duplicate field paths and raises an error, rejecting the entire schema metadata aspect. The ingestion fails with:Cannot perform UPSERT action on proposal. SchemaMetadata aspect has duplicated field paths -
Dataset Page Display:
- For first-time ingestion: No schema fields are shown (empty schema tab)
- For re-ingestion: The schema tab shows the last successfully ingested schema before the collision was introduced
- No visible error appears on the dataset page itself
-
Lineage Emission: If a schema existed from prior successful ingestion, both physical columns map to the same lowercased URN:
Lineage is attributed to this single URN, making it semantically ambiguous as to which physical column is the true source or target.urn:li:schemaField:(urn:li:dataset:(...mssql...),customer_name) - Lineage Graph Exploration: The collision manifests as a single node representing the lowercased column name. All upstream and downstream lineage edges for either physical column merge into this one node, preventing distinction between the two case-variant columns.
Recommended Implementation Approach
- Enable
convert_column_urns_to_lowercase: truein your SQL Server connector configuration - the cross-platform lineage benefits typically outweigh the risks - Monitor ingestion run reports for schema validation errors to detect when collisions occur
- If collisions are detected, resolve them at the source database level by renaming columns to eliminate case-only differences
- Set up alerting on ingestion failures to catch schema update freezes caused by URN collisions
Additional Notes
This behavior is by design, not a bug. The FieldPathValidator intentionally prevents schema corruption by failing loudly rather than silently overwriting metadata. Since SQL Server is case-insensitive for column matching at the database level, having columns that differ only by case is extremely rare in well-governed environments. The primary risk is silent schema freezing when collisions occur, which can be mitigated through proper monitoring of ingestion reports.
Related Documentation
- Microsoft SQL Server Connector Configuration
- SQL Parsing and Lineage
- Monitoring Ingestion and Lineage
Tags: mssql, sql-server, column-urns, case-sensitivity, schema-validation, lineage, ingestion-failure, urn-collision, field-path-validator