Area: Ingestion Issues
Sub-Area: Virtual Environment and Dependency Management
Issue
DataHub ingestion jobs fail during virtual environment setup when using remote executors, typically manifesting as dependency conflicts in auto-generated requirements.txt files or uv pip install commands returning non-zero exit status. These failures commonly occur after executor version upgrades or when specific package versions have conflicting dependency requirements that cannot be resolved during venv creation.
Error Messages
acryl.executor.execution.task.TaskError: Failed to set up virtual environment: Command '['/usr/bin/uv', 'pip', 'install', '-r', '/tmp/datahub/ingest/<execution-id>/venv-<source>-<hash>/requirements.txt']' returned non-zero exit status 2.Both acryl-datahub[<source>]==<version> and <package><<version> are pinned, but acryl-datahub[<source>] <version> transitively requires <package>>=<version>,<<version>subprocess.CalledProcessError: Command '['/usr/bin/uv', 'pip', 'install', '-r', '<requirements-file>']' returned non-zero exit status 2.
You Might Be Asking
- Why did my ingestion jobs start failing after upgrading the remote executor?
- How do I resolve conflicting package requirements in DataHub ingestion?
- Why does downgrading the executor version fix the issue?
- Can I modify the auto-generated requirements.txt to fix dependency conflicts?
Solution
The resolution depends on the specific type of dependency conflict:
For Package Dependency Conflicts
- Check your ingestion source configuration for pinned dependencies in the "Extra Pip Libraries" or "Advanced Configuration" section
- Remove or update conflicting version pins that are no longer compatible with the current DataHub package versions
- For example, if you see conflicts with
python-liquid, remove any pins likepython-liquid<2from your source configuration - Save the configuration and retry the ingestion job
For File System Permission Issues
- Check if your deployment uses
readOnlyRootFilesystem: truein the security context - If so, either:
- Set
readOnlyRootFilesystem: falsein your deployment configuration, or - Mount an empty directory at
/home/datahub/.cache/uvwith write permissions, or - Set
UV_CACHE_DIR: /tmp/uv_cachein each recipe's advanced configuration
- Set
- Redeploy the remote executor with the updated configuration
For Executor Version Issues
- Ensure you're using the correct executor version tag (e.g.,
v0.3.15.4-acrylnotv0.3.15-acryl) - Consider upgrading to the latest stable executor version which may have dependency fixes
- Check release notes for any breaking changes in dependency management between versions
Example Configuration Fix
Remove conflicting pins from your source's Extra Pip Libraries:
# Before (causes conflict)
python-liquid<2
acryl-datahub[looker]==1.5.0.8
# After (resolves conflict)
acryl-datahub[looker]==1.5.0.8
Or set UV_CACHE_DIR in recipe advanced configuration:
pipeline_name: your-pipeline-name
source:
type: <source-type>
config:
# your source config
sink:
# your sink config
datahub_api:
# your api config
advanced:
UV_CACHE_DIR: /tmp/uv_cache
Additional Notes
Auto-generated requirements.txt files cannot be manually modified as they are recreated for each execution. The dependency resolution logic in newer executor versions is more strict than previous versions, which may expose previously hidden conflicts. When using readOnlyRootFilesystem: true, ensure write access to cache directories or configure alternative writable locations. Version downgrades may temporarily resolve issues but should not be considered a permanent solution.
Related Documentation
Tags: ingestion, dependency-conflict, virtual-environment, remote-executor, pip-install, uv, requirements, package-conflict, venv-setup