You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
I am unable to verify that my MPI one-sided applications can use TCP sockets/Ethernet instead of Infiniband on a cluster with both Ethernet and Infiniband adapters / switches. I either get Infiniband-level performance whatever I try to disable Infiniband, or I get hangups when I compile Open MPI with Infiniband disabled. These issues do not manifest for codes using MPI 2-sided.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
The IB-enabled Open MPI: Pre-packaged openmpi-4.1.7a from MNLX_OFED package from NVidia website
The IB-disabled Open MPI: openmpi-4.1.4 from source, compiled via ../configure --without-ucx
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
Operating system/version:
Ubuntu 22.04.4 LTS
Computer hardware:
ARM nodes with Ethernet cards and Mellanox Infiniband cards (>4 years old)
Network type:
Both Ethernet and Mellanox Infiniband
Details of the problem
I am unable to run MPI 1-sided applications without Infiniband, but instead using sockets / the Ethernet card.
My efforts to disable Infiniband are as follows: mpirun --mca pml ^ucx --mca btl ^vader ./my-app
The application runs with actual throughput approaching 40-50 Gbps, and our network has a 25 Gigabit Ethernet, and a 100Gb Infiniband. I am now convinced my instructions are ignored and Infiniband is being used.
In fact, the actual throughput is identical (40-50 Gbps) to using following line: mpirun --mca pml ucx ./my-app
However, when I use e.g. NetPipe, which relies on P2P calls, I do get exactly as expected actual throughput -- ~90Gbps with --mca pml ucx, and ~16 Gbps with --mca pml ^ucx, which fits perfectly with the underlying hardware.
So it seems that using one-sided communication is simply not compatible with the Ethernet card / sockets.
I tried to get more output via the verbose flags, but I am struggling to see a definitive answer.
On the other hand, I tried compiling Open MPI disabling UCX. In this case, my application completely hangs, as it seems in MPI_Win_flush:
#0 0x000040000832be3c in __GI___poll (fds=0xaaaaf30549f0, nfds=4, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
#1 0x0000400008994b44 in ?? () from /lib/aarch64-linux-gnu/libevent_core-2.1.so.7
#2 0x0000400008990140 in event_base_loop () from /lib/aarch64-linux-gnu/libevent_core-2.1.so.7
#3 0x000040000873dcfc in opal_progress_events.isra () from /home/kdichev/openmpi-4.1.4/build-arm/lib/libopen-pal.so.40
#4 0x000040000873de54 in opal_progress () from /home/kdichev/openmpi-4.1.4/build-arm/lib/libopen-pal.so.40
#5 0x000040000a205c2c in ompi_osc_pt2pt_flush_lock () from /home/kdichev/openmpi-4.1.4/build-arm/lib/openmpi/mca_osc_pt2pt.so
#6 0x0000400007f252b0 in PMPI_Win_flush () from /home/kdichev/openmpi-4.1.4/build-arm/lib/libmpi.so.40
I know asking for Ethernet when using MPI 1-sided, and having an Infiniband cluster, is unusual, I just want some reference numbers. Still, isn't Open MPI supposed to work in this scenario too, converting one-sided calls to some sort of active messages issuing point-to-point calls on the remote side and ultimately doing 2-sided communication? In any case, I can't get this to work.
Any clarification would be much appreciated!
The text was updated successfully, but these errors were encountered:
Thanks for the report. We removed the osc/pt2pt (which seems to be used without UCX in your case) in 5.0 because it was unmaintained. Could you try running the non-UCX version with --mca osc ^pt2pt?
Thanks for the quick response. Indeed, using --mca osc ^pt2pt allowed me to progress further in the application. Sadly, very soon after, I got an error in the very first MPI_Win_allocate call:
[srv02:1441844] *** An error occurred in MPI_Win_allocate
[srv02:1441844] *** reported by process [1707540481,70368744177664]
[srv02:1441844] *** on communicator MPI_COMM_WORLD
[srv02:1441844] *** MPI_ERR_WIN: invalid window
[srv02:1441844] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[srv02:1441844] *** and potentially your MPI job)
[srv02:1441834] 1 more process has sent help message help-mpi-btl-openib.txt / ib port not selected
[srv02:1441834] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[srv02:1441834] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[srv02:1441834] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
I reviewed for quite some time the code and after finding no errors, I threw following public example using MPI_Win_allocate as a standalone test, again using mpirun --mca osc ^pt2pt
The exact same error was thrown! This means that it is likely the issue is in using UCX-disabled 1-sided in general. I am still unable to successfully run any code in this way.
(NB: I also noticed earlier that in the NVidia-packaged UCX-enabled version I was using --mca pml ^ucx or --mca pml ucx - it made sense that it only affected 2-sided NetPIPE and not any 1-sided code. Sorry about that. But let's focus on the issues with the UCX-disabled Open MPI 4.1.4 here).
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
I am unable to verify that my MPI one-sided applications can use TCP sockets/Ethernet instead of Infiniband on a cluster with both Ethernet and Infiniband adapters / switches. I either get Infiniband-level performance whatever I try to disable Infiniband, or I get hangups when I compile Open MPI with Infiniband disabled. These issues do not manifest for codes using MPI 2-sided.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
../configure --without-ucx
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Ubuntu 22.04.4 LTS
ARM nodes with Ethernet cards and Mellanox Infiniband cards (>4 years old)
Both Ethernet and Mellanox Infiniband
Details of the problem
I am unable to run MPI 1-sided applications without Infiniband, but instead using sockets / the Ethernet card.
My efforts to disable Infiniband are as follows:
mpirun --mca pml ^ucx --mca btl ^vader ./my-app
The application runs with actual throughput approaching 40-50 Gbps, and our network has a 25 Gigabit Ethernet, and a 100Gb Infiniband. I am now convinced my instructions are ignored and Infiniband is being used.
In fact, the actual throughput is identical (40-50 Gbps) to using following line:
mpirun --mca pml ucx ./my-app
However, when I use e.g. NetPipe, which relies on P2P calls, I do get exactly as expected actual throughput -- ~90Gbps with
--mca pml ucx
, and ~16 Gbps with--mca pml ^ucx
, which fits perfectly with the underlying hardware.So it seems that using one-sided communication is simply not compatible with the Ethernet card / sockets.
I tried to get more output via the verbose flags, but I am struggling to see a definitive answer.
On the other hand, I tried compiling Open MPI disabling UCX. In this case, my application completely hangs, as it seems in MPI_Win_flush:
I know asking for Ethernet when using MPI 1-sided, and having an Infiniband cluster, is unusual, I just want some reference numbers. Still, isn't Open MPI supposed to work in this scenario too, converting one-sided calls to some sort of active messages issuing point-to-point calls on the remote side and ultimately doing 2-sided communication? In any case, I can't get this to work.
Any clarification would be much appreciated!
The text was updated successfully, but these errors were encountered: