Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v5.0.7
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed by hand via:
ml clang-gfortran/14
mkdir build-clang-gfortran-14 && cd build-clang-gfortran-14
../configure --disable-wrapper-rpath --disable-wrapper-runpath \
CC=clang CXX=clang++ FC=gfortran-14 \
--with-hwloc=internal --with-libevent=internal --with-pmix=internal \
--prefix=$HOME/installed/Compiler/clang-gfortran-14/openmpi/5.0.7 |& tee configure.clang-gfortran-14.log
mv config.log config.clang-gfortran-14.log
make -j6 |& tee make.clang-gfortran-14.log
make install |& tee makeinstall.clang-gfortran-14.log
Please describe the system on which you are running
- Operating system/version: macOS 15.3.1
- Computer hardware: Mac Studio M1 Max
- Network type: Ethernet (though only running locally)
Details of the problem
Recently, I updated my Mac Studio to macOS 15.3.1 from macOS 14. The upgrade went fine until recently when I tried to run some MPI code and found it failing. I then tried HelloWorld that that fails. But that was with an Open MPI 5.0.5 built back in the Sonoma days. So I grabbed the 5.0.7 tarfile and built as shown above (exactly how I built with 5.0.5), but no joy. When I try and run helloworld I get this:
❯ mpirun --version
mpirun (Open MPI) 5.0.7
Report bugs to https://www.open-mpi.org/community/help/
❯ mpifort -o helloWorld.mpi2.exe helloWorld.mpi2.F90
❯ mpirun -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_mpi_instance_init failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[gs6101-alderaan-198120226026:00000] *** An error occurred in MPI_Init
[gs6101-alderaan-198120226026:00000] *** reported by process [3839623169,1]
[gs6101-alderaan-198120226026:00000] *** on a NULL communicator
[gs6101-alderaan-198120226026:00000] *** Unknown error
[gs6101-alderaan-198120226026:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gs6101-alderaan-198120226026:00000] *** and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:
Process name: [prterun-gs6101-alderaan-198120226026-45102@1,1]
Exit code: 14
--------------------------------------------------------------------------
Now, I looked around the internet (and these issues) and found things like #12273 which suggested --pmixmca ptl_tcp_if_include lo0
, but:
❯ mpirun --pmixmca ptl_tcp_if_include lo0 -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
The PMIx server's listener thread failed to start. We cannot
continue.
--------------------------------------------------------------------------
So not that. But threads also said to try --mca ptl_tcp_if_include lo0
so:
❯ mpirun --mca btl_tcp_if_include lo0 -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
PML add procs failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_mpi_instance_init failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[gs6101-alderaan-198120226026:00000] *** An error occurred in MPI_Init
[gs6101-alderaan-198120226026:00000] *** reported by process [4258004993,1]
[gs6101-alderaan-198120226026:00000] *** on a NULL communicator
[gs6101-alderaan-198120226026:00000] *** Unknown error
[gs6101-alderaan-198120226026:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gs6101-alderaan-198120226026:00000] *** and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:
Process name: [prterun-gs6101-alderaan-198120226026-45932@1,1]
Exit code: 14
--------------------------------------------------------------------------
I also tried a few other things in combination that I had commented in a modulefile:
-- setenv("OMPI_MCA_btl_tcp_if_include","lo0")
-- setenv("OMPI_MCA_io","ompio")
-- setenv("OMPI_MCA_btl","^tcp")
but nothing helped.
But, on a supercomputer cluster I work on, Open MPI 5 has just never worked for us, so I thought "Let me try Open MPI 4.1" so I grabbed 4.1.8 and built as:
../configure --disable-wrapper-rpath --disable-wrapper-runpath \
CC=clang CXX=clang++ FC=gfortran-14 \
--with-hwloc=internal --with-libevent=internal --with-pmix=internal \
--prefix=$HOME/installed/Compiler/clang-gfortran-14/openmpi/4.1.8 |& tee configure.clang-gfortran-14.log
(exactly the same as 5.0.7 save where I'm installing) and then:
❯ mpirun --version
mpirun (Open MPI) 4.1.8
Report bugs to http://www.open-mpi.org/community/help/
❯ mpifort -o helloWorld.mpi2.exe helloWorld.mpi2.F90
❯ mpirun -np 2 ./helloWorld.mpi2.exe
Compiler Version: GCC version 14.2.0
MPI Version: 3.1
MPI Library Version: Open MPI v4.1.8, package: Open MPI [email protected] Distribution, ident: 4.1.8, repo rev: v4.1.8, Feb 04, 2025
Process 0 of 2 is on gs6101-alderaan-198120226026.ndc.nasa.gov
Process 1 of 2 is on gs6101-alderaan-198120226026.ndc.nasa.gov
So, good news, I can keep working with Open MPI 4.1. Bad news, I'm unsure why Open MPI 5 stopped working all of the sudden. It did work before the OS update.