Area: Ingestion Issues
Sub-Area: Profiling Configuration
Issue
The Stats tab in DataHub displays "No Data" for row count, storage size, and top users even when profiling is enabled in the ingestion configuration. This typically occurs due to configuration gaps, filtering patterns, or stateful ingestion limiting which tables are processed.
You Might Be Asking
- Why are my dataset statistics empty when profiling is enabled?
- What's the difference between profiling stats and usage stats?
- How do I get row counts and top users to display in the Stats tab?
- Why do some tables show stats while others don't after ingestion?
Solution
-
Understand the distinction between profiling and usage statistics:
- Profiling populates structural statistics (row count, column count, storage size)
- Usage ingestion populates query activity and top users
- Both are required for complete Stats tab coverage
-
Check your profiling configuration patterns:
profiling: enabled: true profile_pattern: deny: - "schema_to_exclude.*" # Remove or adjust exclusions profile_table_size_limit: null # Remove 5GB default limit profile_table_row_limit: null # Remove 5M row default limit -
Verify usage statistics configuration:
include_usage_stats: true user_email_pattern: allow: - ".*" # Allow all users (adjust regex as needed) # Remove or adjust deny patterns that filter out all users -
Handle stateful ingestion limitations:
- Stateful ingestion only processes changed tables
- To refresh all statistics, temporarily disable stateful ingestion:
stateful_ingestion: enabled: false # Temporarily disable for full refresh- Run a complete ingestion, then re-enable stateful ingestion for incremental updates
-
Review ingestion logs for profiling skips:
- Look for messages like
profiling_skipped_table_profile_pattern - Check for
profiling_skipped_row_limitorprofiling_skipped_size_limit - Verify no permission errors during profiling execution
- Look for messages like
-
Configure appropriate profiling limits:
profiling: enabled: true profile_table_level_only: true # For performance turn_off_expensive_profiling_metrics: true report_dropped_profiles: true
Additional Notes
Snowflake views are not profiled by default and will not show row count or storage size statistics through standard ingestion. For views requiring row count metrics, you'll need to implement a custom solution using the DataHub Python SDK to emit DatasetProfile aspects with computed statistics. The "Last 30 Days" filter in the Stats tab only appears when multiple historical profiles exist for a dataset. Removing profiling size and row limits will significantly increase Snowflake credit consumption, so adjust limits carefully rather than removing them entirely.
Related Documentation
Tags: profiling, stats-tab, snowflake, usage-statistics, stateful-ingestion, configuration, no-data, row-count, top-users