Area: Product Issues
Sub-Area: Search and Index Synchronization
Issue
Soft-deleted datasets and users continue to appear in search results and pagination counts despite being properly deleted. This occurs when the search index (Elasticsearch/OpenSearch) becomes out of sync with the primary database after bulk deletion operations, causing discrepancies between displayed counts and actual available entities.
Error Messages
-
No users founddisplayed on pagination pages despite showing inflated total counts -
This entity is not discoverable via search or lineage graphbanner on soft-deleted entity pages
You Might Be Asking
- Why do soft-deleted datasets still appear when I search for them?
- Why does the user management page show incorrect pagination counts?
- How can I force DataHub to update the search index after bulk deletions?
Solution
The root cause is index inconsistency between the primary database (MySQL) and search index (Elasticsearch). In DataHub Cloud, this requires support intervention to resolve:
- Contact DataHub Support to request a restore indices operation for affected entity types (datasets, users, etc.)
- Specify the affected instances (development, production) and entity types experiencing the issue
-
Allow for processing time during the reindexing operation:
- Some Kafka lag may occur during reindexing
- Writes will be buffered but not lost
- UI edits may not persist immediately during the operation
- Verify resolution after reindexing completes by checking search results and pagination counts
For self-hosted deployments, you can trigger reindexing manually:
# Reindex specific entity type
curl -X POST "http://:9002/operations?action=restoreIndices" \
-H "Content-Type: application/json" \
-d '{"urn": "urn:li:dataset:"}'
# Or use the DataHub CLI
datahub put --urn "" --aspect
Additional Notes
This issue is more likely to occur after bulk operations involving thousands of entities. The search index update process can fail due to connection timeouts or bulk request failures. DataHub versions 0.3.15+ have improved handling of these synchronization issues, but reindexing may still be required to resolve existing inconsistencies. In DataHub Cloud, all search infrastructure management is handled by the DataHub team.
Related Documentation
Tags: soft-delete, search-index, elasticsearch, pagination, bulk-operations, index-synchronization, datahub-cloud, reindexing, cache-issues