Skip to content

Open MPI 5 doesn't work after update to macOS 15.3.1, 4.1 does #13129

Closed
@mathomp4

Description

@mathomp4

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.7

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed by hand via:

ml clang-gfortran/14

mkdir build-clang-gfortran-14 && cd build-clang-gfortran-14

../configure --disable-wrapper-rpath --disable-wrapper-runpath \
  CC=clang CXX=clang++ FC=gfortran-14 \
  --with-hwloc=internal --with-libevent=internal --with-pmix=internal \
  --prefix=$HOME/installed/Compiler/clang-gfortran-14/openmpi/5.0.7 |& tee configure.clang-gfortran-14.log

mv config.log config.clang-gfortran-14.log
make -j6 |& tee make.clang-gfortran-14.log
make install |& tee makeinstall.clang-gfortran-14.log

Please describe the system on which you are running

  • Operating system/version: macOS 15.3.1
  • Computer hardware: Mac Studio M1 Max
  • Network type: Ethernet (though only running locally)

Details of the problem

Recently, I updated my Mac Studio to macOS 15.3.1 from macOS 14. The upgrade went fine until recently when I tried to run some MPI code and found it failing. I then tried HelloWorld that that fails. But that was with an Open MPI 5.0.5 built back in the Sonoma days. So I grabbed the 5.0.7 tarfile and built as shown above (exactly how I built with 5.0.5), but no joy. When I try and run helloworld I get this:

❯ mpirun --version
mpirun (Open MPI) 5.0.7

Report bugs to https://www.open-mpi.org/community/help/
❯ mpifort -o helloWorld.mpi2.exe helloWorld.mpi2.F90
❯ mpirun -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[gs6101-alderaan-198120226026:00000] *** An error occurred in MPI_Init
[gs6101-alderaan-198120226026:00000] *** reported by process [3839623169,1]
[gs6101-alderaan-198120226026:00000] *** on a NULL communicator
[gs6101-alderaan-198120226026:00000] *** Unknown error
[gs6101-alderaan-198120226026:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gs6101-alderaan-198120226026:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:

   Process name: [prterun-gs6101-alderaan-198120226026-45102@1,1]
   Exit code:    14
--------------------------------------------------------------------------

Now, I looked around the internet (and these issues) and found things like #12273 which suggested --pmixmca ptl_tcp_if_include lo0, but:

❯ mpirun --pmixmca ptl_tcp_if_include lo0 -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
The PMIx server's listener thread failed to start. We cannot
continue.
--------------------------------------------------------------------------

So not that. But threads also said to try --mca ptl_tcp_if_include lo0 so:

❯ mpirun --mca btl_tcp_if_include lo0 -np 2 ./helloWorld.mpi2.exe
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_mpi_instance_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[gs6101-alderaan-198120226026:00000] *** An error occurred in MPI_Init
[gs6101-alderaan-198120226026:00000] *** reported by process [4258004993,1]
[gs6101-alderaan-198120226026:00000] *** on a NULL communicator
[gs6101-alderaan-198120226026:00000] *** Unknown error
[gs6101-alderaan-198120226026:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[gs6101-alderaan-198120226026:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun detected that one or more processes exited with non-zero status,
thus causing the job to be terminated. The first process to do so was:

   Process name: [prterun-gs6101-alderaan-198120226026-45932@1,1]
   Exit code:    14
--------------------------------------------------------------------------

I also tried a few other things in combination that I had commented in a modulefile:

-- setenv("OMPI_MCA_btl_tcp_if_include","lo0")
-- setenv("OMPI_MCA_io","ompio")
-- setenv("OMPI_MCA_btl","^tcp")

but nothing helped.

But, on a supercomputer cluster I work on, Open MPI 5 has just never worked for us, so I thought "Let me try Open MPI 4.1" so I grabbed 4.1.8 and built as:

../configure --disable-wrapper-rpath --disable-wrapper-runpath \
  CC=clang CXX=clang++ FC=gfortran-14 \
  --with-hwloc=internal --with-libevent=internal --with-pmix=internal \
  --prefix=$HOME/installed/Compiler/clang-gfortran-14/openmpi/4.1.8 |& tee configure.clang-gfortran-14.log

(exactly the same as 5.0.7 save where I'm installing) and then:

❯ mpirun --version
mpirun (Open MPI) 4.1.8

Report bugs to http://www.open-mpi.org/community/help/
❯ mpifort -o helloWorld.mpi2.exe helloWorld.mpi2.F90
❯ mpirun -np 2 ./helloWorld.mpi2.exe
Compiler Version: GCC version 14.2.0
MPI Version: 3.1
MPI Library Version: Open MPI v4.1.8, package: Open MPI [email protected] Distribution, ident: 4.1.8, repo rev: v4.1.8, Feb 04, 2025
Process    0 of    2 is on gs6101-alderaan-198120226026.ndc.nasa.gov
Process    1 of    2 is on gs6101-alderaan-198120226026.ndc.nasa.gov

So, good news, I can keep working with Open MPI 4.1. Bad news, I'm unsure why Open MPI 5 stopped working all of the sudden. It did work before the OS update.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions