Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potku incorrectly assumes that the entire simulation has stopped when just one MCERD process crashes #84

Open
jussiks opened this issue Oct 3, 2020 · 3 comments
Labels
bug Something isn't working

Comments

@jussiks
Copy link
Member

jussiks commented Oct 3, 2020

When running multiple simulation processes, Potku assumes that the entire simulation has stopped with an error when only one of the processes has crashed. Remaining processes are left running in the background, unobserved by Potku. They can no longer be stopped from the GUI, nor is the observed atom count increased until the simulation is continued.

Reproduction

This bug can be reproduced by forcibly killing one of the MCERD processes. In the picture below I have started 4 processes from the GUI, and killed one from the command line. ps -a | grep mcerd shows that there are still three processes running despite the GUI showing them all as stopped.

mcerd_error

Same thing can be done on Windows using the task manager.

How to fix

Either:

  • make the rx pipeline more flexible so that remaining processes can still be observed
  • kill all processes, when one of them crashes

Generally speaking, if one process crashes, others are likely to crash too as the only difference between them is the random seed. In this case, Potku's current behaviour isn't much of a problem since the runaway processes won't stay alive for too long.

@jussiks jussiks added the bug Something isn't working label Oct 3, 2020
@tpitkanen
Copy link
Member

tpitkanen commented Oct 4, 2020

Are MCERD processes prone to crashing on their own, or does this only occur when done manually?

In either case, killing all processes is probably the best option. If a process crashes, there are likely bigger issues than worrying about finishing the rest.

@jussiks
Copy link
Member Author

jussiks commented Oct 4, 2020

In my experience, MCERD most likely crashes because some Jibal data file is missing, or there is a problem with simulation settings, such as no target layers. These problems would cause all processes to crash immediately. It seems unlikely that a certain random seed could cause a crash while other random seeds work.

So if only one process drops, it could be an indication of some outside shenanigans.

@jaakkojulin
Copy link
Member

@jussiks is correct that typically crashing is due to some outside influence and killing everything in sight is probably acceptable behaviour. MCERD should not crash on its own.

That being said, MCERD due to the pseudorandom nature of MC simulations (and laziness of the programmers) can crash with some (small) probability, affected by the random seed. Typically an assertion is missing and an out-of-bounds array index leads to a segmentation fault. I have introduced/fixed these kinds of bugs to/from MCERD. Obviously in this case the only solution is to fix that particular bug and learn some defensive programming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants