-
Notifications
You must be signed in to change notification settings - Fork 363
Open
Labels
Description
Describe the bug
Hello after step by step following the installation of MACE with LAMMPS as shown here https://mace-docs.readthedocs.io/en/latest/guide/lammps.html
I tried running lammps on my current cluster. But it provides a seg fault
[ccc0420:2262807:0:2262807] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e0)
==== backtrace (tid:2262807) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f0e9cbc4e44]
1 /lib64/libucs.so.0(+0x2a4cd) [0x7f0e9cbc64cd]
2 /lib64/libucs.so.0(+0x2a6aa) [0x7f0e9cbc66aa]
3 /lib64/libc.so.6(+0x3e6f0) [0x7f0e9ce046f0]
4 /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40(PMPI_Comm_rank+0x33) [0x7f0eb797efa3]
5 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x59a51d]
6 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x4953f0]
7 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x44312f]
8 /lib64/libc.so.6(+0x29590) [0x7f0e9cdef590]
9 /lib64/libc.so.6(__libc_start_main+0x80) [0x7f0e9cdef640]
10 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x444935]
=================================
The gdb output with backtrace is:
Thread 1 "lmp" received signal SIGSEGV, Segmentation fault.
0x00007ffff7c94fa3 in PMPI_Comm_rank () from /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40
(gdb) backtrace
#0 0x00007ffff7c94fa3 in PMPI_Comm_rank () from /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40
#1 0x00000000005f4301 in LAMMPS_NS::Universe::Universe (this=0x33c6930, lmp=0x346e160, communicator=1140850688)
at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/universe.cpp:33
#2 0x0000000000436e7d in LAMMPS_NS::LAMMPS::LAMMPS (this=0x346e160, narg=1, arg=0x7fffffffad18, communicator=1140850688)
at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/lammps.cpp:140
#3 0x0000000000412a16 in main (argc=1, argv=0x7fffffffad18) at /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/src/main.cpp:77
The valgrind output is:
==1542529== Memcheck, a memory error detector
==1542529== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1542529== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==1542529== Command: /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp
==1542529==
==1542529== Warning: set address range perms: large range [0x4dbc000, 0x1f11c000) (defined)
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
hwloc x86 backend cannot work under Valgrind, disabling.
May be reenabled by dumping CPUIDs with hwloc-gather-cpuid
and reloading them under Valgrind with HWLOC_CPUID_PATH.
==1542529== Invalid read of size 1
==1542529== at 0x4955FA3: PMPI_Comm_rank (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x59A51C: LAMMPS_NS::Universe::Universe(LAMMPS_NS::LAMMPS*, int) (universe.cpp:33)
==1542529== by 0x4953EF: LAMMPS_NS::LAMMPS::LAMMPS(int, char**, int) (lammps.cpp:140)
==1542529== by 0x44312E: main (main.cpp:77)
==1542529== Address 0x440000e0 is not stack'd, malloc'd or (recently) free'd
==1542529==
[cc-login3:1542529:0:1542529] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x440000e0)
==== backtrace (tid:1542529) ====
0 /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x1f936e44]
1 /lib64/libucs.so.0(+0x2a4cd) [0x1f9384cd]
2 /lib64/libucs.so.0(+0x2a6aa) [0x1f9386aa]
3 /lib64/libc.so.6(+0x3e6f0) [0x1f5776f0]
4 /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40(PMPI_Comm_rank+0x33) [0x4955fa3]
5 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x59a51d]
6 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x4953f0]
7 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x44312f]
8 /lib64/libc.so.6(+0x29590) [0x1f562590]
9 /lib64/libc.so.6(__libc_start_main+0x80) [0x1f562640]
10 /projects/illinois/grants/qmchamm/shared/shubhang/shubhang_builds/lammps/build/bin/lmp() [0x444935]
=================================
==1542529==
==1542529== Process terminating with default action of signal 11 (SIGSEGV)
==1542529== at 0x1F5C494C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==1542529== by 0x1F577645: raise (in /usr/lib64/libc.so.6)
==1542529== by 0x1F5776EF: ??? (in /usr/lib64/libc.so.6)
==1542529== by 0x4955FA2: PMPI_Comm_rank (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== HEAP SUMMARY:
==1542529== in use at exit: 38,684,005 bytes in 310,595 blocks
==1542529== total heap usage: 1,067,851 allocs, 757,256 frees, 105,080,647 bytes allocated
==1542529==
==1542529== 5 bytes in 1 blocks are definitely lost in loss record 1,855 of 226,598
==1542529== at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529== by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529== by 0x1F7E2C0C: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x44310C: main (main.cpp:48)
==1542529==
==1542529== 5 bytes in 1 blocks are definitely lost in loss record 1,856 of 226,598
==1542529== at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529== by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529== by 0x1F780E38: register_variable (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F78218C: mca_base_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F7E2B02: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== 10 bytes in 1 blocks are definitely lost in loss record 2,966 of 226,598
==1542529== at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529== by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529== by 0x1FA78C14: pmix_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x1FA20518: PMIx_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x493AA23: ompi_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x49439C9: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x44310C: main (main.cpp:48)
==1542529==
==1542529== 60 bytes in 1 blocks are definitely lost in loss record 114,568 of 226,598
==1542529== at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529== by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529== by 0x1F7E2BDC: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x44310C: main (main.cpp:48)
==1542529==
==1542529== 60 bytes in 1 blocks are definitely lost in loss record 114,569 of 226,598
==1542529== at 0x484480F: malloc (vg_replace_malloc.c:442)
==1542529== by 0x1F5D512E: strdup (in /usr/lib64/libc.so.6)
==1542529== by 0x1F780E38: register_variable (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F78218C: mca_base_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F7E2ABC: opal_common_ucx_mca_var_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AEF231: mca_pml_ucx_component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== 75 bytes in 1 blocks are definitely lost in loss record 151,679 of 226,598
==1542529== at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529== by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529== by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AB7BDB: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== 156 bytes in 1 blocks are definitely lost in loss record 198,172 of 226,598
==1542529== at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529== by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529== by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AB7B8B: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== 159 bytes in 1 blocks are definitely lost in loss record 198,200 of 226,598
==1542529== at 0x484C184: realloc (vg_replace_malloc.c:1690)
==1542529== by 0x1F5B795F: __vasprintf_internal (in /usr/lib64/libc.so.6)
==1542529== by 0x1F794E08: opal_vasprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F794EA6: opal_asprintf (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4AB7C28: component_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x1F77B9A1: mca_base_framework_components_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C11B: mca_base_framework_register (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x1F77C1CF: mca_base_framework_open (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libopen-pal.so.80.0.1)
==1542529== by 0x4943A02: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x496B74D: PMPI_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== 464 bytes in 1 blocks are possibly lost in loss record 217,763 of 226,598
==1542529== at 0x484BF70: calloc (vg_replace_malloc.c:1595)
==1542529== by 0x4011652: UnknownInlinedFun (rtld-malloc.h:44)
==1542529== by 0x4011652: allocate_dtv (dl-tls.c:401)
==1542529== by 0x4012111: _dl_allocate_tls (dl-tls.c:679)
==1542529== by 0x1F5C38C4: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==1542529== by 0x1F946603: ucs_pthread_create (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F92CAF8: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F92CB49: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F92ADF9: ucs_async_set_event_handler (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F93D0FE: ??? (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F93D287: ucs_rcache_create (in /usr/lib64/libucs.so.0.0.0)
==1542529== by 0x1F882BB2: ??? (in /usr/lib64/libucp.so.0.0.0)
==1542529== by 0x1F882C00: ucp_mem_rcache_init (in /usr/lib64/libucp.so.0.0.0)
==1542529==
==1542529== 464 bytes in 1 blocks are possibly lost in loss record 217,764 of 226,598
==1542529== at 0x484BF70: calloc (vg_replace_malloc.c:1595)
==1542529== by 0x4011652: UnknownInlinedFun (rtld-malloc.h:44)
==1542529== by 0x4011652: allocate_dtv (dl-tls.c:401)
==1542529== by 0x4012111: _dl_allocate_tls (dl-tls.c:679)
==1542529== by 0x1F5C38C4: pthread_create@@GLIBC_2.34 (in /usr/lib64/libc.so.6)
==1542529== by 0x1FA0CF38: pmix_thread_start (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x1FA79B3F: pmix_progress_thread_start (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x1FA78BB6: pmix_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x1FA20518: PMIx_Init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libpmix.so.2.9.4)
==1542529== by 0x493AA23: ompi_rte_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x49439C9: ompi_mpi_instance_init_common (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4944733: ompi_mpi_instance_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529== by 0x4937017: ompi_mpi_init (in /sw/apps/mpi/openmpi/5.0.1/gcc/13.3.0/lib/libmpi.so.40.40.1)
==1542529==
==1542529== LEAK SUMMARY:
==1542529== definitely lost: 530 bytes in 8 blocks
==1542529== indirectly lost: 0 bytes in 0 blocks
==1542529== possibly lost: 928 bytes in 2 blocks
==1542529== still reachable: 38,682,547 bytes in 310,585 blocks
==1542529== of which reachable via heuristic:
==1542529== stdstring : 6,226,941 bytes in 150,239 blocks
==1542529== suppressed: 0 bytes in 0 blocks
==1542529== Reachable blocks (those to which a pointer was found) are not shown.
==1542529== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1542529==
==1542529== For lists of detected and suppressed errors, rerun with: -s
==1542529== ERROR SUMMARY: 11 errors from 11 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
I am on university of Illinois's campus cluster. . I have the following modules loaded:
Currently Loaded Modules:
1) lmod 6) intel/compiler-rt/2025.0.4
2) os_paths 7) intel/mkl/2025.0
3) StdEnv 8) gcc/13.3.0
4) intel/mpi/2021.14 9) openmpi/5.0.1-gcc-13.3.0
5) intel/tbb/2022.0 10) anaconda3/2024.10
It ran on another supercomputer we used called Delta, but it has been failing here in this campus HPC and I am not sure why. I have also opened a ticket with the HPC on campus, but am also opening one here in case you have any insight.