Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official Support for NVIDIA GPU Acceleration in Bacalhau's Docker-in-Docker (dind) Image #4851

Open
chris-gputrader opened this issue Feb 13, 2025 · 1 comment
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/enhancement Type: New features or enhancements to existing features

Comments

@chris-gputrader
Copy link

The problem:
Bacalhau doesn't support GPU workloads when deployed using the bacalhau-dind image.

Description:
Currently, Bacalhau's official dind image is based on docker:dind, which uses Alpine Linux. This limits its compatibility with NVIDIA Container Toolkit (nvidia-container-toolkit), as the toolkit does not officially support Alpine. Users who require GPU acceleration must manually modify the Bacalhau image or use a different base image, leading to issues with compatibility, missing dependencies (dockerd-entrypoint.sh), and increased complexity.

This feature request proposes adding official support for NVIDIA GPUs in Bacalhau's dind image by:

Providing a variant of the Bacalhau dind image based on an Ubuntu/Debian base (possibly this one that already includes the necessary NVIDIA toolkit - https://github.com/prasad89/dind-ubuntu-nvidia)

Ensuring the image includes NVIDIA Container Toolkit for GPU passthrough.
Retaining Bacalhau’s existing functionalities while making it GPU-capable.

Why This is Needed:
Lack of GPU Support in Alpine-based dind
The current docker:dind base does not support nvidia-container-toolkit, making it difficult to run Bacalhau workloads on GPUs.

Workarounds Are Complex & Unstable
Users must manually modify Bacalhau’s Dockerfile to base it on an Ubuntu/Debian dind image, copy missing dependencies (dockerd-entrypoint.sh), and install NVIDIA-related packages. This is prone to failure and requires ongoing maintenance.

Expanding Bacalhau’s Use Cases
Official GPU support in dind would allow Bacalhau to be used in machine learning, AI training, and other GPU-intensive workloads more effectively.

@chris-gputrader chris-gputrader added request/new Request: Indicates a new request that has been submitted and awaits initial triage type/enhancement Type: New features or enhancements to existing features labels Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/enhancement Type: New features or enhancements to existing features
Projects
None yet
Development

No branches or pull requests

1 participant