Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide helpful error message when double-initializing MPI. #824

Closed
joaander opened this issue Feb 22, 2024 · 2 comments
Closed

Provide helpful error message when double-initializing MPI. #824

joaander opened this issue Feb 22, 2024 · 2 comments

Comments

@joaander
Copy link
Member

Feature description

Check whether MPI is initialized when run is about to fork and launch a MPI process.

Proposed solution

import hoomd

import ctypes
import platform

system = platform.system()
extension = ''
if system == 'Darwin':
    extension = 'dylib'
elif system == 'Linux':
    extension = 'so'
elif system == 'Windows':
    extension = 'dll'

try:
    libmpi = ctypes.CDLL('libmpi.' + extension, ctypes.RTLD_GLOBAL)

    flag = ctypes.c_int()
    libmpi.MPI_Initialized(ctypes.byref(flag))

    if flag:
        print('MPI is initialized')  # Replace with an exception and a helpful message
except OSError:
    pass

Additional context

By using ctypes to call MPI_Initialized, we add no new dependencies.

Packages like mpi4py and hoomd automatically initialize MPI on import. signac-flow then forks to execute the operation srun python project.py exec .... which will import hoomd or mpi4py again. This causes an error similar to:

gl3081.arc-ts.umich.edu:2369448] OPAL ERROR: Unreachable in file ext3x_client.c at line 111
srun: error: gl3081: task 0: Exited with exit code 1
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------

Users would find a helpful error message useful to detect these cases.

There is no reliable way to prevent the double initialization except by asking users to not import these packages at the top level.

@joaander
Copy link
Member Author

To avoid conflicts, we should implement this in or after #819.

@joaander
Copy link
Member Author

I have no plans to implement this check.

@joaander joaander closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant