Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-device-plugin not deployed #2151

Open
j0hnL opened this issue Sep 6, 2023 · 2 comments
Open

k8s-device-plugin not deployed #2151

j0hnL opened this issue Sep 6, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@j0hnL
Copy link
Collaborator

j0hnL commented Sep 6, 2023

Describe the bug
when a k8s-manager does not have a GPU Omnia will not deploy the k8s-device-plugin. We need to inspect the entire inventory for GPUs before deploying the plugin. I suggest we also taint or label any compute nodes that do not have GPUs because nvidia's plugin does not check. The AMD plugin seems to deploy just fine whether there are AMD accelerators or not.

@j0hnL j0hnL added the bug Something isn't working label Sep 6, 2023
@naresh3774
Copy link
Collaborator

this is what i think:

Identify Nodes without GPUs:
You need a mechanism to determine which compute nodes in your Kubernetes cluster do not have GPUs available. This can be done through manual inspection or automated scripts that query node specifications.

Node Labeling:
Once you identify nodes without GPUs, apply labels to them using kubectl label nodes =.
For example, you can label nodes without GPUs as gpu-enabled=false.

Node Tainting:
Apply taints to nodes without GPUs to repel workloads that require GPUs. Taints prevent non-GPU workloads from being scheduled on these nodes.
Use kubectl taint nodes =: to apply taints.
For instance, you can use a taint like gpu-accelerator=false:NoSchedule.

Configure Workloads:
Ensure that GPU-dependent workloads are configured to tolerate the taints or have node selectors that consider GPU availability.
For example, in the Pod specification, you might add tolerations for the taints applied to nodes without GPUs.

@abhishek-sa1
Copy link
Contributor

abhishek-sa1 commented May 8, 2024

This issue is fixed with PR #2238 .

@sujit-jadhav @j0hnL can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants