Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support on event worker #740

Open
chrisbward opened this issue Nov 3, 2022 · 1 comment
Open

GPU support on event worker #740

chrisbward opened this issue Nov 3, 2022 · 1 comment

Comments

@chrisbward
Copy link
Contributor

chrisbward commented Nov 3, 2022

Hi,

As discussed with Fahad on discord - the kairon worker is using CPU and not GPU inside the docker container.

I ran some tests to make sure it wasn't a problem on my side;

version: "3"
services:
  test:
    image: tensorflow/tensorflow:latest-gpu
    command: python -c "import tensorflow as tf;tf.test.gpu_device_name()"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

and

services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Both worked fine and detected my GPU

I added the same config to kairon-worker;

    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

but this did not seem to make a difference.

I inquired to check the Dockerfile for the worker, and noticed that there are no packages installed or drivers for the image.

Found some steps here for someone to implement;
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html

@chrisbward
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant