Area: Ingestion Issues
Sub-Area: Stateful Ingestion
Issue
Stateful ingestion fails with a "fail_safe_threshold" error when the current ingestion run produces significantly fewer entities compared to the previous run, triggering DataHub's safety mechanism to prevent accidental mass deletion of metadata entities.
Error Messages
Skipping stateful ingestion / stale entity removal. The previous run produced X entities, whereas this run produced Y entities. Comparing the entities produced this run vs the previous run, we would be deleting Z% of the entities produced by the previous run. This percentage is above the threshold (currently 75.0), so we will skip soft-deleting stale entities.
You Might Be Asking
- Why is my ingestion failing even though I didn't change anything in my source system?
- How do I increase the fail_safe_threshold to allow the ingestion to proceed?
- What causes a dramatic drop in entity count between ingestion runs?
Solution
The fail_safe_threshold is a protective mechanism that prevents accidental mass deletion of entities. Before adjusting the threshold, you should investigate why the entity count dropped significantly.
Step 1: Investigate the Root Cause
- Check the full ingestion run logs for errors, timeouts, or connection issues that occurred before the fail_safe_threshold message
- Verify your source system configuration hasn't changed (database access, schema permissions, API credentials)
- For dbt sources, confirm you're using the correct manifest and that all expected models are present
- Review your ingestion recipe for any changes to filter patterns, node selection, or discovery settings
Step 2: Compare Configuration Changes
Check if your ingestion recipe switched between auto-discovery and explicit mode:
# Auto-discovery mode (discovers multiple jobs/sources)
source:
type: dbt-cloud
config:
account_id:
project_id:
auto_discovery:
enabled: true
# vs. Explicit mode (single job only)
source:
type: dbt-cloud
config:
account_id:
project_id:
job_id:
auto_discovery:
enabled: false
Step 3: Temporary Resolution (If Entity Drop is Intentional)
If you've confirmed the entity count drop is intentional and you want to proceed with soft-deleting the stale entities:
- Temporarily increase the fail_safe_threshold above the deletion percentage:
stateful_ingestion:
enabled: true
remove_stale_metadata: true
fail_safe_threshold: 90.0 # Set above the reported percentage
- Run the ingestion to allow the stale entity removal
- After the successful run, optionally reduce the threshold back to 75.0 for future runs
Step 4: Permanent Resolution
Address the underlying configuration issue:
- If you want to continue ingesting from multiple sources, revert to auto-discovery mode
- If you want to ingest from a single source only, the temporary threshold increase in Step 3 is the correct approach
- Fix any source system connectivity or permission issues identified in Step 1
Additional Notes
The default fail_safe_threshold was changed from 95% to 75% in recent DataHub versions to better protect against accidental metadata loss. This safety mechanism is working as designed - simply increasing the threshold without investigating the root cause may result in unintended deletion of valid metadata entities. The threshold should only be increased after confirming that the entity count drop is intentional and expected.
Related Documentation
Tags: stateful-ingestion, fail-safe-threshold, entity-count, dbt-cloud, ingestion-failure, metadata-deletion, auto-discovery, stale-entities, safety-mechanism, troubleshooting