Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA-aware MPI build on fresh Ubuntu 24.04 LTS, MPIX_Query_cuda_support() returns zero #13130

Open
niklebedenko opened this issue Mar 6, 2025 · 12 comments

Comments

@niklebedenko
Copy link

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Obtained from https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.gz

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

N/A

Please describe the system on which you are running

  • Operating system/version: Ubuntu 24.04 LTS
  • Computer hardware: x86_64
  • Network type: single node, single GPU

Details of the problem

I'm really struggling to run CUDA-aware MPI on just one node. I want to do this so that I can test my code locally before deploying to a cluster. I've reproduced this on a fresh install of Ubuntu 24.04 on two different machines.

Here's my install steps:

tar xf openmpi-5.0.7.tar.gz
cd openmpi-5.0.7
mkdir build
cd build
../configure --with-cuda=/usr/local/cuda --prefix=/opt/openmpi | tee config.out
make -j$(nproc) all | tee make.out
sudo make install

Now, I build a very simple test program:

// mpi_check.c
#include "mpi.h"
#include <stdio.h>

#if !defined(OPEN_MPI) || !OPEN_MPI
#error This source code uses an Open MPI-specific extension
#endif

/* Needed for MPIX_Query_cuda_support(), below */
#include "mpi-ext.h"

int main(int argc, char* argv[]) {
        MPI_Init(&argc, &argv);

        printf("Compile time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT) && MPIX_CUDA_AWARE_SUPPORT
        printf("This MPI library has CUDA-aware support.\n");
#else
        printf("This MPI library does not have CUDA-aware support.\n");
#endif /* MPIX_CUDA_AWARE_SUPPORT */

        printf("Run time check:\n");
#if defined(MPIX_CUDA_AWARE_SUPPORT)
        if (1 == MPIX_Query_cuda_support()) {
                printf("This MPI library has CUDA-aware support.\n");
        }
        else {
                printf("This MPI library does not have CUDA-aware support.\n");
        }
#endif /* MPIX_CUDA_AWARE_SUPPORT */

        MPI_Finalize();

        return 0;
}

This was built with:

/opt/openmpi/bin/mpicc mpi_check.c -o mpi_check

/opt/openmpi/bin/mpirun -n 1 ./mpi_check

Then, we get this output:

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.

However, if I just run ./mpi_check, i.e. no mpirun, I get this output:

Authorization required, but no authorization protocol specified

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

There's no other MPI installations, this was reproduced on two independent machines.

Perhaps I'm missing a step, or missing some configuration, but I've tried lots of variations of each of the above commands to no avail, and (I think?) I've followed the install instructions in the documentation correctly. So I believe it is a bug.

If I'm missing something, please let me know. Also please let me know if you'd like the config.out and make.out log files.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

I am unable to replicate, as long as I run your example on a machine with CUDA devices attached it works just as expected. If I run on a machine without GPUs then it fails are runtime.

Can you run mpirun -np 1 --mca accelerator_base_verbose 100 ./mpi_check to see if there is anything interesting in the output of the accelerator module.

@niklebedenko
Copy link
Author

Thanks for responding so quickly --- here's the output you asked for:

$ /opt/openmpi/bin/mpirun -np 1 --mca accelerator_base_verbose 100 ./mpi_check
[<hostname>:670389] mca: base: components_register: registering framework accelerator components
[<hostname>:670389] mca: base: components_register: found loaded component null
[<hostname>:670389] mca: base: components_register: component null register function successful
[<hostname>:670389] mca: base: components_open: opening accelerator components
[<hostname>:670389] mca: base: components_open: found loaded component null
[<hostname>:670389] mca: base: components_open: component null open function successful
[<hostname>:670389] select: initializing accelerator component null
[<hostname>:670389] selected null
Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library does not have CUDA-aware support.

I've also tried to add --mca accelerator cuda, but that gives:

$ /opt/openmpi/bin/mpirun -np 1 --mca accelerator_base_verbose 100 --mca accelerator cuda ./mpi_check
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      <hostname>
Framework: accelerator
Component: cuda
--------------------------------------------------------------------------
[<hostname>:671182] *** Process received signal ***
[<hostname>:671182] Signal: Segmentation fault (11)
[<hostname>:671182] Signal code: Address not mapped (1)
[<hostname>:671182] Failing at address: (nil)
[<hostname>:671182] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x791767e45330]
[<hostname>:671182] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 671182 on node <hostname> exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I think these lines from config.out might also be helpful:

$ grep "cuda" config.out
configure: running /bin/bash ../../../3rd-party/openpmix/configure --disable-option-checking '--prefix=/opt/openmpi' --without-tests-examples --enable-pmix-binaries --disable-pmix-backward-compatibility --disable-visibility --disable-devel-check '--with-cuda=/usr/local/cuda' --cache-file=/dev/null --srcdir=../../../3rd-party/openpmix
checking for subdir args...  '--disable-option-checking' '--prefix=/opt/openmpi' '--without-tests-examples' '--enable-pmix-binaries' '--disable-pmix-backward-compatibility' '--disable-visibility' '--disable-devel-check' '--with-cuda=/usr/local/cuda'
configure: running /bin/bash ../../../3rd-party/prrte/configure --disable-option-checking '--prefix=/opt/openmpi' --enable-prte-ft --with-proxy-version-string=5.0.7 --with-proxy-package-name="Open MPI" --with-proxy-bugreport="https://www.open-mpi.org/community/help/" --disable-devel-check --enable-prte-prefix-by-default '--with-cuda=/usr/local/cuda' --cache-file=/dev/null --srcdir=../../../3rd-party/prrte
checking for subdir args...  '--disable-option-checking' '--prefix=/opt/openmpi' '--enable-prte-ft' '--with-proxy-version-string=5.0.7' '--with-proxy-package-name=Open MPI' '--with-proxy-bugreport=https://www.open-mpi.org/community/help/' '--disable-devel-check' '--enable-prte-prefix-by-default' '--with-cuda=/usr/local/cuda'
checking for subdir args...  '--with-cuda=/usr/local/cuda' '--prefix=/opt/openmpi'
checking which components should be run-time loadable... rcache-rgpusm rcache-gpusm btl-smcuda accelerator-ze accelerator-rocm accelerator-cuda (default)
checking for m4 configure components in framework accelerator... cuda, rocm
--- MCA component accelerator:cuda (m4 configuration macro)
checking for MCA component accelerator:cuda compile mode... dso
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... yes
checking if cuda requires libnl v1 or v3... none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if MCA component accelerator:cuda can compile... yes
checking for m4 configure components in framework btl... ofi, portals4, sm, smcuda, tcp, uct, ugni, usnic
--- MCA component btl:smcuda (m4 configuration macro)
checking for MCA component btl:smcuda compile mode... dso
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if MCA component btl:smcuda can compile... yes
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking if --with-cuda is set... found (/usr/local/cuda/include/cuda.h)
checking for cuda pkg-config name... (cached) /usr/local/cuda/targets/x86_64-linux/lib/stubs/pkgconfig/cuda.pc
checking if cuda pkg-config module exists... (cached) no
checking for cuda header at /usr/local/cuda/include... found
checking for cuda library (cuda) in /usr/local/cuda/targets/x86_64-linux/lib/stubs... found
checking for cuda cppflags... -I/usr/local/cuda/include
checking for cuda ldflags... -L/usr/local/cuda/targets/x86_64-linux/lib/stubs
checking for cuda libs... -lcuda
checking for cuda static libs... -lcuda
checking for cuda.h... (cached) yes
checking if cuda requires libnl v1 or v3... (cached) none
checking if have cuda support... yes (-I/usr/local/cuda/include)
checking for m4 configure components in framework coll... cuda, ftagree, hcoll, monitoring, portals4, sm, ucc
--- MCA component coll:cuda (m4 configuration macro)
checking for MCA component coll:cuda compile mode... static
checking if MCA component coll:cuda can compile... yes
configure: running /bin/bash '../../../3rd-party/romio341/configure'  FROM_OMPI=yes CC="gcc" CFLAGS="-O3 -DNDEBUG  -finline-functions -mcx16 -D__EXTENSIONS__" CPPFLAGS="" FFLAGS="" LDFLAGS="" --enable-shared --disable-static  --prefix=/opt/openmpi --disable-aio --disable-weak-symbols --enable-strict --disable-f77 --disable-f90 ac_cv_lib_cuda_cuMemGetAddressRange=no ac_cv_lib_cudart_cudaStreamSynchronize=no --cache-file=/dev/null --srcdir=../../../3rd-party/romio341 --disable-option-checking
configure: running /bin/bash ../../../../3rd-party/romio341/mpl/configure --disable-option-checking '--prefix=/opt/openmpi' --disable-versioning --enable-embedded 'FROM_OMPI=yes' 'CC=gcc' 'CFLAGS=-O3 -DNDEBUG  -finline-functions -mcx16 -D__EXTENSIONS__' 'CPPFLAGS=' 'FFLAGS=' 'LDFLAGS=' '--enable-shared' '--disable-static' '--disable-aio' '--disable-weak-symbols' '--enable-strict' '--disable-f77' '--disable-f90' 'ac_cv_lib_cuda_cuMemGetAddressRange=no' 'ac_cv_lib_cudart_cudaStreamSynchronize=no' --cache-file=/dev/null --srcdir=../../../../3rd-party/romio341/mpl
checking for cuda_runtime_api.h... no
checking for cudaStreamSynchronize in -lcudart... (cached) no
checking for cuda.h... no
checking for cuMemGetAddressRange in -lcuda... (cached) no
checking for available MPI Extensions... affinity, cuda, ftmpi, rocm, shortfloat
--- MPI Extension cuda
checking if MPI Extension cuda can compile... yes
checking if MPI Extension cuda has C bindings... yes (required)
checking if MPI Extension cuda has mpif.h bindings... no
checking if MPI Extension cuda has "use mpi" bindings... no
checking if MPI Extension cuda has "use mpi_f08" bindings... no
config.status: creating opal/mca/accelerator/cuda/Makefile
config.status: creating opal/mca/btl/smcuda/Makefile
config.status: creating ompi/mca/coll/cuda/Makefile
config.status: creating ompi/mpiext/cuda/Makefile
config.status: creating ompi/mpiext/cuda/c/Makefile
config.status: creating ompi/mpiext/cuda/c/mpiext_cuda_c.h

It seems that most of the components can find cuda.h, apart from one: the (recently-removed) romio341 thing.

Finally, both machines were running with CUDA GPUs inside. Here's some relevant output for that:

$ nvidia-smi
Fri Mar  7 08:25:24 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:02:00.0 Off |                  Off |
|  0%   32C    P8              12W / 450W |     67MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1914      G   /usr/lib/xorg/Xorg                           39MiB |
|    0   N/A  N/A      2442      G   /usr/bin/gnome-shell                         15MiB |
+---------------------------------------------------------------------------------------+

$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0

Admittedly, there's a mismatch in CUDA versions between the driver and compiler. Not sure if that's the issue.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

You are correct, most of the components building as DSO found CUDA. However, coll:cuda decided to be built statically and failed.

Something is weird with your build because at the end the CUDA accelerator module has not been build, that's why you don't see it on the output I asked you for, and it fails when you try to force loading it. If you go in build directory then opal/mca/accelerator/cuda do you see a Makefile and if yes what's the output of make clean && make V=1?

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

There is a Makefile. Here's the output now

$ make clean
test -z "*~ .#*" || rm -f *~ .#*
rm -rf .libs _libs
test -z "mca_accelerator_cuda.la" || rm -f mca_accelerator_cuda.la
rm -f ./so_locations
test -z "" || rm -f 
rm -f *.o
rm -f *.lo
$ make V=1
depbase=`echo accelerator_cuda_component.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/bash ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c  -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include  -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix  -O3 -DNDEBUG  -finline-functions -mcx16 -MT accelerator_cuda_component.lo -MD -MP -MF $depbase.Tpo -c -o accelerator_cuda_component.lo ../../../../../opal/mca/accelerator/cuda/accelerator_cuda_component.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix -O3 -DNDEBUG -finline-functions -mcx16 -MT accelerator_cuda_component.lo -MD -MP -MF .deps/accelerator_cuda_component.Tpo -c ../../../../../opal/mca/accelerator/cuda/accelerator_cuda_component.c  -fPIC -DPIC -o .libs/accelerator_cuda_component.o
depbase=`echo accelerator_cuda.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/bash ../../../../libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c  -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include  -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix  -O3 -DNDEBUG  -finline-functions -mcx16 -MT accelerator_cuda.lo -MD -MP -MF $depbase.Tpo -c -o accelerator_cuda.lo ../../../../../opal/mca/accelerator/cuda/accelerator_cuda.c &&\
mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/accelerator/cuda -I../../../../opal/include -I../../../../ompi/include -I../../../../oshmem/include -I../../../../ompi/mpiext/cuda/c -I../../../../ompi/mpiext/rocm/c -I/usr/local/cuda/include -iquote../../../../.. -iquote../../../.. -iquote../../../../../opal/include -iquote../../../../../ompi/include -iquote../../../../../oshmem/include -I/usr/lib/x86_64-linux-gnu/pmix2/include -I/usr/lib/x86_64-linux-gnu/pmix2/include/pmix -O3 -DNDEBUG -finline-functions -mcx16 -MT accelerator_cuda.lo -MD -MP -MF .deps/accelerator_cuda.Tpo -c ../../../../../opal/mca/accelerator/cuda/accelerator_cuda.c  -fPIC -DPIC -o .libs/accelerator_cuda.o
/bin/bash ../../../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG  -finline-functions -mcx16 -module -avoid-version -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags -o mca_accelerator_cuda.la -rpath /opt/openmpi/lib/openmpi accelerator_cuda_component.lo accelerator_cuda.lo ../../../../opal/libopen-pal.la -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix
libtool: link: gcc -shared  -fPIC -DPIC  .libs/accelerator_cuda_component.o .libs/accelerator_cuda.o   -Wl,-rpath -Wl,/home/<user>/Downloads/openmpi-5.0.7/build/opal/.libs -Wl,-rpath -Wl,/opt/openmpi/lib -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib ../../../../opal/.libs/libopen-pal.so -ldl -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix  -O3 -mcx16 -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags   -Wl,-soname -Wl,mca_accelerator_cuda.so -o .libs/mca_accelerator_cuda.so
libtool: link: ( cd ".libs" && rm -f "mca_accelerator_cuda.la" && ln -s "../mca_accelerator_cuda.la" "mca_accelerator_cuda.la" )

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

The CUDA accelerator component is build, but not loaded.

  1. Let's check that it is installed properly. what's the output of make install V=1 in the same directory as above ? Do you see the MCA module in ${INSTALLDIR}/lib/openmpi/ ?
  2. What is ompi_info reporting ? Do you see the CUDA accelerator component in the output ? If yes what is the output of ompi_info --param accelerator cuda -l 9 ?

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

$ sudo make install V=1
make[1]: Entering directory '/home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda'
make[1]: Nothing to be done for 'install-exec-am'.
 /usr/bin/mkdir -p '/opt/openmpi/share/openmpi'
 /usr/bin/install -c -m 644 ../../../../../opal/mca/accelerator/cuda/help-accelerator-cuda.txt '/opt/openmpi/share/openmpi'
 /usr/bin/mkdir -p '/opt/openmpi/lib/openmpi'
 /bin/bash ../../../../libtool   --mode=install /usr/bin/install -c   mca_accelerator_cuda.la '/opt/openmpi/lib/openmpi'
libtool: warning: relinking 'mca_accelerator_cuda.la'
libtool: install: (cd /home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda; /bin/bash "/home/<user>/Downloads/openmpi-5.0.7/build/libtool"  --tag CC --mode=relink gcc -O3 -DNDEBUG -finline-functions -mcx16 -module -avoid-version -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags -o mca_accelerator_cuda.la -rpath /opt/openmpi/lib/openmpi accelerator_cuda_component.lo accelerator_cuda.lo ../../../../opal/libopen-pal.la -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix )
libtool: relink: gcc -shared  -fPIC -DPIC  .libs/accelerator_cuda_component.o .libs/accelerator_cuda.o   -Wl,-rpath -Wl,/opt/openmpi/lib -L/usr/local/cuda/targets/x86_64-linux/lib/stubs -L/usr/lib/x86_64-linux-gnu/pmix2/lib -L/opt/openmpi/lib -lopen-pal -ldl -lcuda -lm -levent_core -levent_pthreads -lhwloc -lpmix  -O3 -mcx16 -Wl,-rpath -Wl,/usr/lib/x86_64-linux-gnu/pmix2/lib -Wl,--enable-new-dtags   -Wl,-soname -Wl,mca_accelerator_cuda.so -o .libs/mca_accelerator_cuda.so
libtool: install: /usr/bin/install -c .libs/mca_accelerator_cuda.soT /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
libtool: install: /usr/bin/install -c .libs/mca_accelerator_cuda.lai /opt/openmpi/lib/openmpi/mca_accelerator_cuda.la
libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/sbin" ldconfig -n /opt/openmpi/lib/openmpi
----------------------------------------------------------------------
Libraries have been installed in:
   /opt/openmpi/lib/openmpi

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[1]: Leaving directory '/home/<user>/Downloads/openmpi-5.0.7/build/opal/mca/accelerator/cuda'
$ cd /opt/openmpi/lib/openmpi/
$ ls
libompi_dbg_msgq.la  mca_accelerator_cuda.la  mca_btl_smcuda.la  mca_rcache_gpusm.la  mca_rcache_rgpusm.la
libompi_dbg_msgq.so  mca_accelerator_cuda.so  mca_btl_smcuda.so  mca_rcache_gpusm.so  mca_rcache_rgpusm.so
$ /opt/openmpi/bin/ompi_info | grep "cuda"
  Configure command line: '--with-cuda=/usr/local/cuda' '--prefix=/opt/openmpi'
          MPI extensions: affinity, cuda, ftmpi, rocm, shortfloat
         MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)
                 MCA btl: smcuda (MCA v2.1.0, API v3.3.0, Component v5.0.7)
                MCA coll: cuda (MCA v2.1.0, API v2.4.0, Component v5.0.7)
$ /opt/openmpi/bin/ompi_info --param accelerator cuda -l 9
         MCA accelerator: cuda (MCA v2.1.0, API v1.0.0, Component v5.0.7)

FYI I'm still getting the same behaviour from the compile / run commands for mpi_check.c.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

Everything seems to be in place, but the CUDA accelerator component is not loaded. ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so or readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so please.

@niklebedenko
Copy link
Author

$ ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
        linux-vdso.so.1 (0x0000720a8b478000)
        libopen-pal.so.80 => /opt/openmpi/lib/libopen-pal.so.80 (0x0000720a8b355000)
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x0000720a89600000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000720a89200000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x0000720a8b307000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x0000720a8b302000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x0000720a8b29f000)
        libpmix.so.2 => /usr/lib/x86_64-linux-gnu/pmix2/lib/libpmix.so.2 (0x0000720a88e00000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000720a89517000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000720a8b29a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000720a8b295000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000720a8b28e000)
        /lib64/ld-linux-x86-64.so.2 (0x0000720a8b47a000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x0000720a894e4000)
        libmunge.so.2 => /lib/x86_64-linux-gnu/libmunge.so.2 (0x0000720a8b286000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x0000720a8b279000)
$ readelf -d /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so

Dynamic section at offset 0x4d60 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libopen-pal.so.80]
 0x0000000000000001 (NEEDED)             Shared library: [libcuda.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [mca_accelerator_cuda.so]
 0x000000000000001d (RUNPATH)            Library runpath: [/opt/openmpi/lib:/usr/lib/x86_64-linux-gnu/pmix2/lib]
 0x000000000000000c (INIT)               0x2000
 0x000000000000000d (FINI)               0x3f7c
 0x0000000000000019 (INIT_ARRAY)         0x5d50
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x5d58
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x2f0
 0x0000000000000005 (STRTAB)             0x980
 0x0000000000000006 (SYMTAB)             0x338
 0x000000000000000a (STRSZ)              1570 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x5fe8
 0x0000000000000002 (PLTRELSZ)           1080 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x1500
 0x0000000000000007 (RELA)               0x1068
 0x0000000000000008 (RELASZ)             1176 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x1028
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0xfa2
 0x000000006ffffff9 (RELACOUNT)          30
 0x0000000000000000 (NULL)               0x0

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

Everything looks normal. Let's make sure launching an app does not screw up the environment mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

$ /opt/openmpi/bin/mpirun -np 1 ldd /opt/openmpi/lib/openmpi/mca_accelerator_cuda.so
        linux-vdso.so.1 (0x0000711305890000)
        libopen-pal.so.80 => /opt/openmpi/lib/libopen-pal.so.80 (0x000071130576d000)
        libcuda.so.1 => /lib/x86_64-linux-gnu/libcuda.so.1 (0x0000711303a00000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000711303600000)
        libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x000071130571f000)
        libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x000071130571a000)
        libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x00007113056b7000)
        libpmix.so.2 => /usr/lib/x86_64-linux-gnu/pmix2/lib/libpmix.so.2 (0x0000711303200000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000711303917000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007113056b2000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007113056ad000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007113056a6000)
        /lib64/ld-linux-x86-64.so.2 (0x0000711305892000)
        libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x0000711305673000)
        libmunge.so.2 => /lib/x86_64-linux-gnu/libmunge.so.2 (0x000071130390f000)
        libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x0000711303902000)

Btw I've not added /opt/openmpi/bin to path, in case that matters.

@bosilca
Copy link
Member

bosilca commented Mar 7, 2025

that might be a reason, not PATH but LD_LIBRARY_PATH. Most of the components are build statically in the libmpi.so with a few exceptions, and CUDA-based components are part of these exceptions. But I'm slightly skeptical as ompi_info managed to find the CUDA shared library and all the processes are local so they should inherit the mpirun environement.

But just in case you can try

export LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH
/opt/openmpi/bin/mpirun -np 1 -x LD_LIBRARY_PATH ./mpi_check

I'm running out of ideas unfortunately.

@niklebedenko
Copy link
Author

niklebedenko commented Mar 7, 2025

Unfortunately still the same behaviour :(

Thank you so much for taking the time.

You said you were unable to reproduce this error --- could you tell me what setup you used on your end to produce a working CUDA-aware OpenMPI build on Ubuntu 24.04 LTS? If there's a docker container that has a working installation that I could run my code in, that would work too.

I'm also really puzzled that the output of simply ./mpi_check shows CUDA-awareness:

$ ./mpi_check 
Authorization required, but no authorization protocol specified

Compile time check:
This MPI library has CUDA-aware support.
Run time check:
This MPI library has CUDA-aware support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants