Skip to content

Running CUDA-Q on multi-GPUs accross nodes via MPI #3641

@ctminh

Description

@ctminh

Hi *,

I have an issue when running CUDA-Q on multi-GPUs across compute nodes via MPI. I installed CUDA-Q with Python wheels simply following https://nvidia.github.io/cuda-quantum/latest/using/quick_start.html. The CUDA version on my system is 12.9, and the GPU is A100 80 GB. When I run on 1 GPU, the GHZ example with cuda-q could work to 33 qubits.

When I run cuda-q GHZ with 34 qubits via MPI across 2 nodes (each with 1 GPU), I get RuntimeError: requested size is too big. For example,

~/conda/x86_64/envs/cudaq-custom-mpi-env/bin/mpirun -np 2 --map-by ppr:1:node --hostfile ./my_hosts.txt ~/conda/x86_64/envs/cudaq-custom-mpi-env/bin/python ./exp_ghz_cudaq.py

Total ranks 2: Current rank is 0
Running on target nvidia
Total ranks 2: Current rank is 1
Running on target nvidia
RuntimeError: requested size is too big
RuntimeError: requested size is too big
--------------------------------------------------------------------------

In the case that I run GHZ with 33 qubits on 2 nodes, it seems to work OK, e.g.,

~/conda/x86_64/envs/cudaq-custom-mpi-env/bin/mpirun -np 2 --map-by ppr:1:node --hostfile ./my_hosts.txt ~/conda/x86_64/envs/cudaq-custom-mpi-env/bin/python ./exp_ghz_cudaq.py 33
Total ranks 2: Current rank is 0
Running on target nvidia
Total ranks 2: Current rank is 1
Running on target nvidia
Results: { 000000000000000000000000000000000:493 111111111111111111111111111111111:507 }
Results: { 000000000000000000000000000000000:493 111111111111111111111111111111111:507 }

Do you have any suggestions, e.g., to check MPI distributing cuda-q stuff?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions