Skip to content

LAMMPS Seg faulting after installing it from the MACE repo #819

@ShubhangG

Description

@ShubhangG

Describe the bug
Hello after step by step following the installation of MACE with LAMMPS as shown here https://mace-docs.readthedocs.io/en/latest/guide/lammps.html
I tried running lammps on my current cluster. But it provides a seg fault

[ccc0420:2262807:0:2262807] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e0)
==== backtrace (tid:2262807) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f0e9cbc4e44]
 1  /lib64/libucs.so.0(+0x2a4cd) [0x7f0e9cbc64cd]
 2  /lib64/libucs.so.0(+0x2a6aa) [0x7f0e9cbc66aa]
 3  /lib64/libc.so.6(+0x3e6f0) [0x7f0e9ce046f0]
 4  /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40(PMPI_Comm_rank+0x33) [0x7f0eb797efa3]
 5  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x59a51d]
 6  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x4953f0]
 7  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x44312f]
 8  /lib64/libc.so.6(+0x29590) [0x7f0e9cdef590]
 9  /lib64/libc.so.6(__libc_start_main+0x80) [0x7f0e9cdef640]
10  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x444935]
=================================

The gdb output with backtrace is:

Thread 1 "lmp" received signal SIGSEGV, Segmentation fault.
0x00007ffff7c94fa3 in PMPI_Comm_rank () from /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40
(gdb) backtrace
#0  0x00007ffff7c94fa3 in PMPI_Comm_rank () from /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40
#1  0x00000000005f4301 in LAMMPS_NS::Universe::Universe (this=0x33c6930, lmp=0x346e160, communicator=1140850688)
    at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/universe.cpp:33
#2  0x0000000000436e7d in LAMMPS_NS::LAMMPS::LAMMPS (this=0x346e160, narg=1, arg=0x7fffffffad18, communicator=1140850688)
    at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/lammps.cpp:140
#3  0x0000000000412a16 in main (argc=1, argv=0x7fffffffad18) at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/main.cpp:77

The valgrind output is:

==1542529== Memcheck, a memory error detector
==1542529== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1542529== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==1542529== Command: /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp
==1542529== 
==1542529== Warning: set address range perms: large range [0x4dbc000, 0x1f11c000) (defined)
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
==1542529== Invalid read of size 1
==1542529==    at 0x4955FA3: PMPI_Comm_rank (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x59A51C: LAMMPS_NS::Universe::Universe(LAMMPS_NS::LAMMPS*, int) (universe.cpp:33)
==1542529==    by 0x4953EF: LAMMPS_NS::LAMMPS::LAMMPS(int, char**, int) (lammps.cpp:140)
==1542529==    by 0x44312E: main (main.cpp:77)
==1542529==  Address 0x440000e0 is not stack'd, malloc'd or (recently) free'd
==1542529== 
[cc-login3:1542529:0:1542529] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e0)
==== backtrace (tid:1542529) ====
 0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x1f936e44]
 1  /lib64/libucs.so.0(+0x2a4cd) [0x1f9384cd]
 2  /lib64/libucs.so.0(+0x2a6aa) [0x1f9386aa]
 3  /lib64/libc.so.6(+0x3e6f0) [0x1f5776f0]
 4  /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40(PMPI_Comm_rank+0x33) [0x4955fa3]
 5  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x59a51d]
 6  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x4953f0]
 7  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x44312f]
 8  /lib64/libc.so.6(+0x29590) [0x1f562590]
 9  /lib64/libc.so.6(__libc_start_main+0x80) [0x1f562640]
10  /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x444935]
=================================
==1542529== 
==1542529== Process terminating with default action of signal 11 (SIGSEGV)
==1542529==    at 0x1F5C494C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F577645: raise (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F5776EF: ??? (in /usr/lib64/libc.so.6)
==1542529==    by 0x4955FA2: PMPI_Comm_rank (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== HEAP SUMMARY:
==1542529==     in use at exit: 38,684,005 bytes in 310,595 blocks
==1542529==   total heap usage: 1,067,851 allocs, 757,256 frees, 105,080,647 bytes allocated
==1542529== 
==1542529== 5 bytes in 1 blocks are definitely lost in loss record 1,855 of 226,598
==1542529==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529==    by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F7E2C0C: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x44310C: main (main.cpp:48)
==1542529== 
==1542529== 5 bytes in 1 blocks are definitely lost in loss record 1,856 of 226,598
==1542529==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529==    by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F780E38: register_variable (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F78218C: mca_base_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F7E2B02: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== 10 bytes in 1 blocks are definitely lost in loss record 2,966 of 226,598
==1542529==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529==    by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529==    by 0x1FA78C14: pmix_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x1FA20518: PMIx_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x493AA23: ompi_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x49439C9: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x44310C: main (main.cpp:48)
==1542529== 
==1542529== 60 bytes in 1 blocks are definitely lost in loss record 114,568 of 226,598
==1542529==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529==    by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F7E2BDC: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x44310C: main (main.cpp:48)
==1542529== 
==1542529== 60 bytes in 1 blocks are definitely lost in loss record 114,569 of 226,598
==1542529==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529==    by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F780E38: register_variable (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F78218C: mca_base_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F7E2ABC: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== 75 bytes in 1 blocks are definitely lost in loss record 151,679 of 226,598
==1542529==    at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529==    by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AB7BDB: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== 156 bytes in 1 blocks are definitely lost in loss record 198,172 of 226,598
==1542529==    at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529==    by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AB7B8B: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== 159 bytes in 1 blocks are definitely lost in loss record 198,200 of 226,598
==1542529==    at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529==    by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4AB7C28: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529==    by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== 464 bytes in 1 blocks are possibly lost in loss record 217,763 of 226,598
==1542529==    at 0x484BF70: calloc (vg_replace_malloc.c:1595)
==1542529==    by 0x4011652: UnknownInlinedFun (rtld-malloc.h:44)
==1542529==    by 0x4011652: allocate_dtv (dl-tls.c:401)
==1542529==    by 0x4012111: _dl_allocate_tls (dl-tls.c:679)
==1542529==    by 0x1F5C38C4: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==1542529==    by 0x1F946603: ucs_pthread_create (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F92CAF8: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F92CB49: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F92ADF9: ucs_async_set_event_handler (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F93D0FE: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F93D287: ucs_rcache_create (in /usr/lib64/libucs.so.0.0.0)
==1542529==    by 0x1F882BB2: ??? (in /usr/lib64/libucp.so.0.0.0)
==1542529==    by 0x1F882C00: ucp_mem_rcache_init (in /usr/lib64/libucp.so.0.0.0)
==1542529== 
==1542529== 464 bytes in 1 blocks are possibly lost in loss record 217,764 of 226,598
==1542529==    at 0x484BF70: calloc (vg_replace_malloc.c:1595)
==1542529==    by 0x4011652: UnknownInlinedFun (rtld-malloc.h:44)
==1542529==    by 0x4011652: allocate_dtv (dl-tls.c:401)
==1542529==    by 0x4012111: _dl_allocate_tls (dl-tls.c:679)
==1542529==    by 0x1F5C38C4: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==1542529==    by 0x1FA0CF38: pmix_thread_start (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x1FA79B3F: pmix_progress_thread_start (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x1FA78BB6: pmix_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x1FA20518: PMIx_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529==    by 0x493AA23: ompi_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x49439C9: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==    by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== 
==1542529== LEAK SUMMARY:
==1542529==    definitely lost: 530 bytes in 8 blocks
==1542529==    indirectly lost: 0 bytes in 0 blocks
==1542529==      possibly lost: 928 bytes in 2 blocks
==1542529==    still reachable: 38,682,547 bytes in 310,585 blocks
==1542529==                       of which reachable via heuristic:
==1542529==                         stdstring          : 6,226,941 bytes in 150,239 blocks
==1542529==         suppressed: 0 bytes in 0 blocks
==1542529== Reachable blocks (those to which a pointer was found) are not shown.
==1542529== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1542529== 
==1542529== For lists of detected and suppressed errors, rerun with: -s
==1542529== ERROR SUMMARY: 11 errors from 11 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

I am on university of Illinois's campus cluster. . I have the following modules loaded:

Currently Loaded Modules:
  1) lmod                6) intel/compiler-rt/2025.0.4
  2) os_paths            7) intel/mkl/2025.0
  3) StdEnv              8) gcc/13.3.0
  4) intel/mpi/2021.14   9) openmpi/5.0.1-gcc-13.3.0
  5) intel/tbb/2022.0   10) anaconda3/2024.10

It ran on another supercomputer we used called Delta, but it has been failing here in this campus HPC and I am not sure why. I have also opened a ticket with the HPC on campus, but am also opening one here in case you have any insight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions