Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to provide Dockerfile source code for Nvidia driver installation on COS #204

Open
Loquats opened this issue Sep 2, 2021 · 2 comments
Assignees

Comments

@Loquats
Copy link

Loquats commented Sep 2, 2021

Would it be possible for repo maintainers to provide the Dockerfile and any scripts used to generate the image by this daemonset? https://github.com/GoogleCloudPlatform/container-engine-accelerators/blob/master/nvidia-driver-installer/cos/daemonset-nvidia-v450.yaml

The reason for this request is, I'd like to install a specific version (470.57.02) of Nvidia drivers on a GKE cluster running container-optimized OS with containerd. The official GKE documentation provides this daemonset, which installs an older driver version. I assume daemonset-nvidia-v450.yaml in this repo can be modified to install a specific driver, by changing this line to an appropriate image:

      - image: gcr.io/cos-cloud/cos-gpu-installer@sha256:93f1abf0d6a27e14bebf43ffb00b8d819b20f6027012ad73306ba670bcac6c83

However, I cannot find the source code for this image, so it is not clear how I can install a different Nvidia driver version.

For example, for GKE ubuntu images, this repo provides the Dockerfile and entrypoint.sh source code. Would it be possible to share the COS equivalent?

@jtrouth
Copy link

jtrouth commented Dec 3, 2021

If you set the image to gcr.io/cos-cloud/cos-gpu-installer:latest and set an NVIDIA_DRIVER_VERSION environment variable to the driver version you want it should work. Works for me with 470.82.01.

Also, I don't think the entrypoint and Dockerfile for Ubuntu are valid anymore. I've attempted the install steps in the script manually on an Ubuntu node and it doesn't work.

@DavraYoung
Copy link

Any updates on this?

Our cluster is working on ubuntu, we need to know how to install specific version of cuda on ubuntu nodes. Tried setting the env, but it still fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants