Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

jussiks · 2020-10-03T19:20:05Z

When running multiple simulation processes, Potku assumes that the entire simulation has stopped with an error when only one of the processes has crashed. Remaining processes are left running in the background, unobserved by Potku. They can no longer be stopped from the GUI, nor is the observed atom count increased until the simulation is continued.

Reproduction

This bug can be reproduced by forcibly killing one of the MCERD processes. In the picture below I have started 4 processes from the GUI, and killed one from the command line. ps -a | grep mcerd shows that there are still three processes running despite the GUI showing them all as stopped.

Same thing can be done on Windows using the task manager.

How to fix

Either:

make the rx pipeline more flexible so that remaining processes can still be observed
kill all processes, when one of them crashes

Generally speaking, if one process crashes, others are likely to crash too as the only difference between them is the random seed. In this case, Potku's current behaviour isn't much of a problem since the runaway processes won't stay alive for too long.

The text was updated successfully, but these errors were encountered:

tpitkanen · 2020-10-04T12:29:58Z

Are MCERD processes prone to crashing on their own, or does this only occur when done manually?

In either case, killing all processes is probably the best option. If a process crashes, there are likely bigger issues than worrying about finishing the rest.

jussiks · 2020-10-04T16:31:00Z

In my experience, MCERD most likely crashes because some Jibal data file is missing, or there is a problem with simulation settings, such as no target layers. These problems would cause all processes to crash immediately. It seems unlikely that a certain random seed could cause a crash while other random seeds work.

So if only one process drops, it could be an indication of some outside shenanigans.

jaakkojulin · 2020-10-05T07:55:43Z

@jussiks is correct that typically crashing is due to some outside influence and killing everything in sight is probably acceptable behaviour. MCERD should not crash on its own.

That being said, MCERD due to the pseudorandom nature of MC simulations (and laziness of the programmers) can crash with some (small) probability, affected by the random seed. Typically an assertion is missing and an out-of-bounds array index leads to a segmentation fault. I have introduced/fixed these kinds of bugs to/from MCERD. Obviously in this case the only solution is to fix that particular bug and learn some defensive programming.

jussiks added the bug Something isn't working label Oct 3, 2020

jussiks mentioned this issue Oct 11, 2020

Refactoring mcerd module #87

Merged

jussiks closed this as completed in 71ed68d Oct 13, 2020

jussiks reopened this Oct 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

jussiks commented Oct 3, 2020

tpitkanen commented Oct 4, 2020 •

edited

Loading

jussiks commented Oct 4, 2020

jaakkojulin commented Oct 5, 2020

Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

Comments

jussiks commented Oct 3, 2020

Reproduction

How to fix

tpitkanen commented Oct 4, 2020 • edited Loading

jussiks commented Oct 4, 2020

jaakkojulin commented Oct 5, 2020

tpitkanen commented Oct 4, 2020 •

edited

Loading