Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-docker v2? #1

Open
thomas-riccardi opened this issue Nov 13, 2017 · 8 comments
Open

nvidia-docker v2? #1

thomas-riccardi opened this issue Nov 13, 2017 · 8 comments

Comments

@thomas-riccardi
Copy link

thomas-riccardi commented Nov 13, 2017

Hi,
Are there plans to use nvidia-docker v2 (now merged into master: new official version) ?

It is simpler to use: https://github.com/NVIDIA/nvidia-docker/wiki/About-version-2.0

@rporres
Copy link

rporres commented Nov 24, 2017

Above links are broken. I guess it's because 2.0 branch was merged into master recently by means of NVIDIA/nvidia-docker@fe18749

@thomas-riccardi
Copy link
Author

@rporres indeed, I updated my comment.

@mcuadros
Copy link
Contributor

Its requires any changes? The current version was done for bare docker, not even nvidia-docker 1.0

@thomas-riccardi
Copy link
Author

thomas-riccardi commented Dec 22, 2017

using nvidia-docker v2 would simplify the docker run part: no need to add:

--volumes-from nvidia-driver \
    --env PATH=$PATH:/opt/nvidia/bin/ \
    --env LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/lib \
    $(for d in /dev/nvidia*; do echo -n "--device $d "; done) \

So what change is required is in fact installing nvidia-docker v2 in coreos, and removing the nvidia-driver container.

@trevex
Copy link
Contributor

trevex commented Jan 29, 2018

I used the following steps to install nvidia-docker v2 (very hacky though):

  1. install nvidia driver
  2. instead of the volume I simply copy the files to the host, e.g.
/usr/bin/docker run --rm --volume /opt/nvidia/current:/output srcd/coreos-nvidia:${VERSION} cp -a /opt/nvidia/. /output/
  1. install libnvidia-container
  2. (build and) install nvidia-container-runtime
  3. create small bash scripts in /run/torcx/bin for nvidia-container{-runtime,-runtime-hook,-cli} to make sure they are accessible by docker and libraries are in LD_LIBRARY_PATH
  4. create /etc/docker/daemon.json and set default runtime to nvidia
  5. restart docker
  6. add the nvidia-docker bash scripts

There is only one issue currently: The nvidia-container-runtime somehow (even though same commit as installed runc) has a regression. And fails to run containers with docker run --security-opt=no-new-privileges (coreos/bugs#1796).

@lsjostro
Copy link

lsjostro commented Feb 2, 2018

We have it working as well (nvidia-docker v2 + coreos + k8s device plugin). We will try to clean it up and hopefully be able to share it soonish.

@lsjostro
Copy link

lsjostro commented Feb 9, 2018

went for this instead
GoogleCloudPlatform/container-engine-accelerators#54

@thomas-riccardi
Copy link
Author

@lsjostro I would be interested in having your previous "nvidia-docker v2 + coreos" version, even if not cleaned up and production-ready: nvidia-docker v2 enables sharing GPUs between containers (at the cost of losing k8s scheduling) that device drivers solutions don't support (and won't for the foreseeable future).

In any case, GoogleCloudPlatform/container-engine-accelerators#54 is useful too, thanks for that !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants