Skip to content

MPI hanging and no program termination #10

@V-Rang

Description

@V-Rang

commit: 1c390ab

In PDEProblems.py, a __del__(self) destructor is used to destroy the 3 solver objects and 6 matrices once the class PDEVariationalProblem goes out of scope. Destroying the three solvers using:

self.solver.destroy()
self.solver_fwd_inc.destroy()
self.solver_adj_inc.destroy()

and running the examples\sfsi_toy_gaussian.py using multiple processes for e.g. mpirun -n 2 python3 sfsi_toy_gaussian.py results in the program hanging (post all expected computations). Using Ctrl+C following the the above 2 proc command, gives the error code:

Stack trace:
28      0x55fb83907ba5 _start + 37
27      0x7f3282d72e40 __libc_start_main + 128
26      0x7f3282d72d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f3282d72d90]
25      0x55fb83907cad Py_BytesMain + 45
24      0x55fb839312d3 Py_RunMain + 371
23      0x55fb8393faaf Py_FinalizeEx + 95
22      0x55fb83940738 python3(+0x265738) [0x55fb83940738]
21      0x55fb8383864e python3(+0x15d64e) [0x55fb8383864e]
20      0x7f327e629256 /usr/local/lib/python3.10/dist-packages/petsc4py/lib/linux-gnu-real64-32/PETSc.cpython-310-x86_64-linux-gnu.so(+0x123256) [0x7f327e629256]
19      0x55fb8382e50b python3(+0x15350b) [0x55fb8382e50b]
18      0x7f327e638873 /usr/local/lib/python3.10/dist-packages/petsc4py/lib/linux-gnu-real64-32/PETSc.cpython-310-x86_64-linux-gnu.so(+0x132873) [0x7f327e638873]
17      0x7f327d2dfaf2 PetscGarbageCleanup + 322
16      0x7f327d2df4fa GarbageKeyAllReduceIntersect_Private + 186
15      0x7f327a3071ea PMPI_Allreduce + 2026
14      0x7f327a5e2b9f /usr/local/lib/libmpi.so.12(+0x32db9f) [0x7f327a5e2b9f]
13      0x7f327a5e175e /usr/local/lib/libmpi.so.12(+0x32c75e) [0x7f327a5e175e]
12      0x7f327a5e0f7d /usr/local/lib/libmpi.so.12(+0x32bf7d) [0x7f327a5e0f7d]
11      0x7f327a5e0757 /usr/local/lib/libmpi.so.12(+0x32b757) [0x7f327a5e0757]
10      0x7f327a5e063f /usr/local/lib/libmpi.so.12(+0x32b63f) [0x7f327a5e063f]
9       0x7f327a54acb6 /usr/local/lib/libmpi.so.12(+0x295cb6) [0x7f327a54acb6]
8       0x7f327a60161e /usr/local/lib/libmpi.so.12(+0x34c61e) [0x7f327a60161e]
7       0x7f327a600ab3 /usr/local/lib/libmpi.so.12(+0x34bab3) [0x7f327a600ab3]
6       0x7f327a5eff6f /usr/local/lib/libmpi.so.12(+0x33af6f) [0x7f327a5eff6f]
5       0x7f327a66989b /usr/local/lib/libmpi.so.12(+0x3b489b) [0x7f327a66989b]
4       0x7f327a665880 /usr/local/lib/libmpi.so.12(+0x3b0880) [0x7f327a665880]
3       0x7f327a663c9b /usr/local/lib/libmpi.so.12(+0x3aec9b) [0x7f327a663c9b]
2       0x7f327c5e7659 /usr/local/lib/libmpi.so.12(+0x2332659) [0x7f327c5e7659]
1       0x7f327c5ba5f5 /usr/local/lib/libmpi.so.12(+0x23055f5) [0x7f327c5ba5f5]
0       0x7f3282d8b520 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f3282d8b520]
2024-03-18 22:38:19.458 (  84.549s) [main            ]                       :0     FATL| Signal: SIGINT
Stack trace:
[truncated]
123     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
122     0x561a1e4c570c _PyFunction_Vectorcall + 124
121     0x561a1e4b38a2 _PyEval_EvalFrameDefault + 24914
120     0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
119     0x561a1e4ade0d _PyEval_EvalFrameDefault + 1725
118     0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
117     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
116     0x561a1e4c570c _PyFunction_Vectorcall + 124
115     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
114     0x561a1e4c570c _PyFunction_Vectorcall + 124
113     0x561a1e4b38a2 _PyEval_EvalFrameDefault + 24914
112     0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
111     0x561a1e4af0d1 _PyEval_EvalFrameDefault + 6529
110     0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
109     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
108     0x561a1e4c570c _PyFunction_Vectorcall + 124
107     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
106     0x561a1e4c570c _PyFunction_Vectorcall + 124
105     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
104     0x561a1e4c570c _PyFunction_Vectorcall + 124
103     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
102     0x561a1e4c570c _PyFunction_Vectorcall + 124
101     0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
100     0x561a1e4c570c _PyFunction_Vectorcall + 124
99      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
98      0x561a1e4c570c _PyFunction_Vectorcall + 124
97      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
96      0x561a1e4c570c _PyFunction_Vectorcall + 124
95      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
94      0x561a1e4c570c _PyFunction_Vectorcall + 124
93      0x561a1e4af0d1 _PyEval_EvalFrameDefault + 6529
92      0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
91      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
90      0x561a1e4c570c _PyFunction_Vectorcall + 124
89      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
88      0x561a1e4c570c _PyFunction_Vectorcall + 124
87      0x561a1e4af0d1 _PyEval_EvalFrameDefault + 6529
86      0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
85      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
84      0x561a1e4c570c _PyFunction_Vectorcall + 124
83      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
82      0x561a1e4c570c _PyFunction_Vectorcall + 124
81      0x561a1e4b38a2 _PyEval_EvalFrameDefault + 24914
80      0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
79      0x561a1e4af0d1 _PyEval_EvalFrameDefault + 6529
78      0x561a1e4d34e1 python3(+0x16e4e1) [0x561a1e4d34e1]
77      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
76      0x561a1e4c570c _PyFunction_Vectorcall + 124
75      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
74      0x561a1e4c570c _PyFunction_Vectorcall + 124
73      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
72      0x561a1e4c570c _PyFunction_Vectorcall + 124
71      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
70      0x561a1e4c570c _PyFunction_Vectorcall + 124
69      0x561a1e4ade0d _PyEval_EvalFrameDefault + 1725
68      0x561a1e4c570c _PyFunction_Vectorcall + 124
67      0x561a1e4b02c1 _PyEval_EvalFrameDefault + 11121
66      0x561a1e4d362e python3(+0x16e62e) [0x561a1e4d362e]
65      0x561a1e4b3c66 _PyEval_EvalFrameDefault + 25878
64      0x561a1e4bb58c _PyObject_MakeTpCall + 508
63      0x561a1e4cf744 python3(+0x16a744) [0x561a1e4cf744]
62      0x561a1e4ba784 _PyObject_FastCallDictTstate + 196
61      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
60      0x561a1e4c570c _PyFunction_Vectorcall + 124
59      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
58      0x561a1e4c570c _PyFunction_Vectorcall + 124
57      0x561a1e4b41f1 _PyEval_EvalFrameDefault + 27297
56      0x561a1e4bb5eb _PyObject_MakeTpCall + 603
55      0x561a1e432683 python3(+0xcd683) [0x561a1e432683]
54      0x561a1e4d362e python3(+0x16e62e) [0x561a1e4d362e]
53      0x561a1e4b4908 _PyEval_EvalFrameDefault + 29112
52      0x561a1e4bb5eb _PyObject_MakeTpCall + 603
51      0x561a1e4c4e0e python3(+0x15fe0e) [0x561a1e4c4e0e]
50      0x7f635ab4e300 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x1a300) [0x7f635ab4e300]
49      0x7f635abc6efe /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x92efe) [0x7f635abc6efe]
48      0x7f635ab56dc1 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x22dc1) [0x7f635ab56dc1]
47      0x7f635ababd75 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x77d75) [0x7f635ababd75]
46      0x7f635ab56b20 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x22b20) [0x7f635ab56b20]
45      0x7f635ab659e7 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x319e7) [0x7f635ab659e7]
44      0x7f635ab5a2f3 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x262f3) [0x7f635ab5a2f3]
43      0x7f635ab4fd73 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x1bd73) [0x7f635ab4fd73]
42      0x7f635ab49e31 /usr/local/lib/python3.10/dist-packages/matplotlib/ft2font.cpython-310-x86_64-linux-gnu.so(+0x15e31) [0x7f635ab49e31]
41      0x561a1e5f70eb _PyObject_CallMethod_SizeT + 203
40      0x561a1e4c9159 python3(+0x164159) [0x561a1e4c9159]
39      0x561a1e4c5969 python3(+0x160969) [0x561a1e4c5969]
38      0x561a1e5ac5fa python3(+0x2475fa) [0x561a1e5ac5fa]
37      0x561a1e5c8ea6 python3(+0x263ea6) [0x561a1e5c8ea6]
36      0x561a1e5c9637 python3(+0x264637) [0x561a1e5c9637]
35      0x561a1e529cbd PyMemoryView_FromBuffer + 429
34      0x561a1e4f9421 python3(+0x194421) [0x561a1e4f9421]
33      0x561a1e59afa0 python3(+0x235fa0) [0x561a1e59afa0]
32      0x561a1e4950aa python3(+0x1300aa) [0x561a1e4950aa]
31      0x561a1e5ef984 python3(+0x28a984) [0x561a1e5ef984]
30      0x561a1e50f973 python3(+0x1aa973) [0x561a1e50f973]
29      0x561a1e4adf52 _PyEval_EvalFrameDefault + 2050
28      0x561a1e4e5dde python3(+0x180dde) [0x561a1e4e5dde]
27      0x7f6364e29a38 /usr/local/lib/python3.10/dist-packages/petsc4py/lib/linux-gnu-real64-32/PETSc.cpython-310-x86_64-linux-gnu.so(+0x133a38) [0x7f6364e29a38]
26      0x7f63641539b6 KSPDestroy + 262
25      0x7f636425fec5 PCDestroy + 53
24      0x7f636425fe26 PCReset + 22
23      0x7f636416f014 /usr/local/petsc/linux-gnu-real64-32/lib/libpetsc.so.3.20(+0xd67014) [0x7f636416f014]
22      0x7f6363e9abe0 MatDestroy + 64
21      0x7f6363d4d51e /usr/local/petsc/linux-gnu-real64-32/lib/libpetsc.so.3.20(+0x94551e) [0x7f6363d4d51e]
20      0x7f6364385317 dmumps_c + 2375
19      0x7f6364386b44 dmumps_f77_ + 4548
18      0x7f63643ef4d2 dmumps_ + 146
17      0x7f635f42f969 pmpi_comm_dup_ + 41
16      0x7f6360b59507 MPI_Comm_dup + 215
15      0x7f6360df4e6a /usr/local/lib/libmpi.so.12(+0x34fe6a) [0x7f6360df4e6a]
14      0x7f6360dff024 /usr/local/lib/libmpi.so.12(+0x35a024) [0x7f6360dff024]
13      0x7f6360dfec64 /usr/local/lib/libmpi.so.12(+0x359c64) [0x7f6360dfec64]
12      0x7f6360e09331 /usr/local/lib/libmpi.so.12(+0x364331) [0x7f6360e09331]
11      0x7f6360dd0757 /usr/local/lib/libmpi.so.12(+0x32b757) [0x7f6360dd0757]
10      0x7f6360dd05e7 /usr/local/lib/libmpi.so.12(+0x32b5e7) [0x7f6360dd05e7]
9       0x7f6360d3b80e /usr/local/lib/libmpi.so.12(+0x29680e) [0x7f6360d3b80e]
8       0x7f6360df161e /usr/local/lib/libmpi.so.12(+0x34c61e) [0x7f6360df161e]
7       0x7f6360df0ab3 /usr/local/lib/libmpi.so.12(+0x34bab3) [0x7f6360df0ab3]
6       0x7f6360ddff6f /usr/local/lib/libmpi.so.12(+0x33af6f) [0x7f6360ddff6f]
5       0x7f6360e5989b /usr/local/lib/libmpi.so.12(+0x3b489b) [0x7f6360e5989b]
4       0x7f6360e55d2a /usr/local/lib/libmpi.so.12(+0x3b0d2a) [0x7f6360e55d2a]
3       0x7f6360e53170 /usr/local/lib/libmpi.so.12(+0x3ae170) [0x7f6360e53170]
2       0x7f6360f2e18b /usr/local/lib/libmpi.so.12(+0x48918b) [0x7f6360f2e18b]
1       0x7f6360f2e036 /usr/local/lib/libmpi.so.12(+0x489036) [0x7f6360f2e036]
0       0x7f6369578520 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6369578520]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions