Area: Deployment Issues
Sub-Area: Remote Executor Configuration
Issue
Remote executor pods experience out-of-memory (OOM) errors during large-scale ingestion jobs, particularly when using BigQuery sources with multiple projects or after upgrading to newer DataHub versions. The default 8GB memory allocation becomes insufficient when ingesting from expanded data sources or when memory overhead increases due to version changes.
Error Messages
Pod failed with OOM exceptionMemory limit exceeded during ingestion
You Might Be Asking
- Why are my previously working ingestion jobs now failing with OOM errors?
- How can I reduce memory consumption in the remote executor?
- What memory allocation is recommended for large multi-project BigQuery ingestion?
Solution
To resolve memory issues in remote executor deployments, implement the following configuration optimizations:
-
Reduce Ingestion Parallelism
Lower the number of concurrent workers in your BigQuery ingestion recipe:
source: type: bigquery config: max_workers: 2 profiling: enabled: true max_workers: 2 classification: enabled: true max_workers: 2 -
Scope Your Ingestion
Explicitly limit the scope of your ingestion to prevent loading excessive metadata:
source: type: bigquery config: project_ids: - "project-1" - "project-2" dataset_pattern: allow: - "prod_*" - "analytics_*" table_pattern: deny: - "temp_*" - "staging_*" -
Disable Resource-Heavy Features
Turn off optional features if not actively used:
source: type: bigquery config: profiling: enabled: false classification: enabled: false lineage: use_v2_lineage_api: true -
Configure Executor Task Limits
Set memory limits and task weights using environment variables:
{ "EXECUTOR_TASK_MEMORY_LIMIT": "6000000", "EXECUTOR_TASK_WEIGHT": "1.0" } -
Increase Pod Memory Allocation
For large-scale ingestion with multiple projects, increase the pod memory limit:
resources: limits: memory: "12Gi" # or "16Gi" for very large deployments requests: memory: "8Gi" -
Update to Latest CLI Version
If using bundled images, ensure you're using the latest DataHub CLI version. For custom builds, replace the bundled venv creation with direct installation:
for x in bigquery looker lookml databricks; do uv venv "/opt/datahub/venvs/${x}-bundled" uv pip install --python "/opt/datahub/venvs/${x}-bundled/" "acryl-datahub[${x}]==" done
Additional Notes
Memory usage in BigQuery ingestion grows linearly with the number of projects and tables due to stateful profiling and checkpoint data stored in memory. Newer DataHub versions may have higher baseline memory requirements due to architectural changes in credential management. The 8GB recommendation may be insufficient for multi-project BigQuery deployments and should be adjusted based on your specific ingestion scope. Always test memory configuration changes in a non-production environment first.
Related Documentation
Tags: remote-executor, memory, oom, bigquery, ingestion, deployment, kubernetes, performance, scaling, troubleshooting