-
Notifications
You must be signed in to change notification settings - Fork 947
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
from ompi_info:
Open MPI: 4.1.9a1
Open MPI repo revision: v4.1.5-232-gad48c462ff
v4.1.5-232-gad48c462ff
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Open MPI was obtained from Nvidia's HPC-X download site
https://content.mellanox.com/hpc/hpc-x/v2.25.1_cuda12/hpcx-v2.25.1-gcc-inbox-redhat9-cuda12-x86_64.tbz
I also pulled these two kits to isolate which shared object introduced the problem:
https://content.mellanox.com/hpc/hpc-x/v2.23/hpcx-v2.23-gcc-inbox-redhat9-cuda12-x86_64.tbz
https://content.mellanox.com/hpc/hpc-x/v2.24_cuda12/hpcx-v2.24-gcc-inbox-redhat9-cuda12-x86_64.tbz
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
- Operating system/version: Rocky Linux 9.6
- Computer hardware: AMD EPYC 9xxx processors
- Network type: Nvidia Mellanox ConnectX-7
Details of the problem
Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.
Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block
We see occasional failures with SHMEM code with this error (minor nit -- note the spelling error of "addres"):
Error base/memheap_base_select.c:211 - memheap_base_segment_setup() Failed to setup base segment address (error -1)
Error base/memheap_base_select.c:262 - _memheap_create() Failed to negotiate base segment addres
--------------------------------------------------------------------------
It looks like SHMEM_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during SHMEM_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open SHMEM
developer):
mca_memheap_base_select() failed
--> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to initialize - aborting
I am able to reproduce it on a single server with 3 ranks, although that may require 100 or more attempts.
Running at higher scale (more servers and higher PPN) reduces the number of attempts to reproduce.
The reproducer binary links against these libraries from hpcx, which I obtain from Nvidia's site:
• liboshmem.so.40
• libmpi.so.40
• libopen-rte.so.40
• libopen-pal.so.40
I have isolated the problem to hpcx's ompi/lib/liboshmem.so.40 shared object, starting with hpcx version 2.24
When using liboshmem.so.40 from hpcx version 2.23 I have not experienced the problem.
I launch with this SLURM command:
salloc -N1 --tasks-per-node=3 /path/to/scriptThe script preloads the four libraries of interest. One of them is liboshmem; if I select it from hpcx 2.23 then I do not see a failure (after 2.000 attempts), and if I select it from hpcx 2.24 or later then I get the afore-mentioned memheap error. It does not seem to matter what package I get the other three shared objects from.
Here is the source to my reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <shmem.h>
#include <mpp/shmem.h>
extern char oshmem_version_string[];
int main(){
int my_pe, num_pe; // declare variables for both PE id of processor and the number of PEs
shmem_init();
num_pe = shmem_n_pes(); // obtain the number of PEs that can be used
my_pe = shmem_my_pe(); // obtain the PE id number
if ( *(volatile int *)(&my_pe) == 0 ) { // attempt to preclude having the compiler know which PE this is
long sleepTime;
char *sleepTimeE = getenv("sleepTime");
int major, minor;
printf("Hello from %d of %d\n", my_pe, num_pe);
shmem_info_get_version( &major, &minor );
printf( "shmem_info_get_version reports this is version %d.%d\n", major, minor );
printf( "The string oshmem_version_string is %s\n", oshmem_version_string);
if ( sleepTimeE != NULL ) {
sleepTime = strtol( sleepTimeE, NULL, 10 );
sleep(sleepTime);
}
}
shmem_finalize();
return 0;
}
I build it with this command:
cc -loshmem -o helloWorldShmem helloWorldShmem.c