Skip to content

microsoft-dsvm:ubuntu-hpc:2204:latest doesn't work with Standard_NV72ads_A10_v5 #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alyssa1303 opened this issue Mar 13, 2025 · 6 comments

Comments

@alyssa1303
Copy link

I'm trying to create a VM for GPU workload using microsoft-dsvm:ubuntu-hpc:2204:latest and VM Size Standard_NV72ads_A10_v5. This is the command used to create the VM:

VM_SIZE="Standard_NV72ads_A10_v5"
az vm create --resource-group $resource_group \
        --name chatbot-server --size $VM_SIZE \
        --nics chatbot-server-nic --zone 2 \
        --os-disk-caching ReadWrite --storage-sku Premium_LRS --os-disk-size-gb 256 \
        --image microsoft-dsvm:ubuntu-hpc:2204:latest \
        --admin-username ubuntu --ssh-key-value @${ssh_public_key_path} --tags $tags

However, when I ssh into the vm to run the command nvidia-smi, got I this error:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Isn't the image supposed to install NVIDIA driver already? The same image works fine with VM Size Standard_NC96ads_A100_v4.

@darkwhite29
Copy link
Contributor

The latest HPC image doesn't support A10 series due to GPU driver mismatch. Could you please try this image instead?

microsoft-dsvm:ubuntu-hpc:2204:22.04.2024091701

@alyssa1303
Copy link
Author

alyssa1303 commented Mar 13, 2025

@darkwhite29 I tried with that image but still got the same error :(

Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-1071-azure x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Thu Mar 13 20:57:18 UTC 2025

  System load:  0.04                Processes:             728
  Usage of /:   12.6% of 247.92GB   Users logged in:       0
  Memory usage: 0%                  IPv4 address for eth0: 10.0.1.4
  Swap usage:   0%


Expanded Security Maintenance for Applications is not enabled.

163 updates can be applied immediately.
88 of these updates are standard security updates.
To see these additional updates run: apt list --upgradable

6 additional security updates can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm

New release '24.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@chatbot-server:~$ uname -a
Linux chatbot-server 5.15.0-1071-azure #80-Ubuntu SMP Tue Aug 6 19:27:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@chatbot-server:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
ubuntu@chatbot-server:~$ lspci | grep -i nvidia
0002:00:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)
0003:00:00.0 3D controller: NVIDIA Corporation GA102GL [A10] (rev a1)

@alyssa1303
Copy link
Author

This is the installed version

ubuntu@chatbot-server:~$ dkms status | grep nvidia
nvidia/550.90.07, 5.15.0-1071-azure, x86_64: installed

@darkwhite29
Copy link
Contributor

darkwhite29 commented Mar 13, 2025

In our HPC images, we install public GPU drivers (for CUDA) from NVIDIA website to support datacenter GPU SKUs (NC and ND series), which don't work for A10 SKUs. You need to manually install NVIDIA GRID driver, a unified driver for both graphics and CUDA.

https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup#nvidia-cuda-drivers

NVIDIA GRID drivers can be found here: https://docs.nvidia.com/vgpu/

@alyssa1303
Copy link
Author

So that means NV A10 SKUs is currently not supported? So I'd need to file a feature request for that or is it completely out of scope?

@darkwhite29
Copy link
Contributor

darkwhite29 commented Mar 13, 2025

It's out of scope -- we don't support NV series GPUs. We only own support for NC/ND.

NV series owners: PM (Vijay Kanchanahalli), dev (Ray Jui-Hao Chiang)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants