Parallelization slows down simulation on Linux

I noticed that the parallelization option introduced with #543 not only slows down the pipeline runners as described in the PR but also my local Linux machine, so it might be a general Linux problem.

Both machine types (our benchmark runners and my computer) don't use SWAP, and memory usage for one of the benchmarks (Hartmann_3d) was at 533 KB when calling pmap in my case. I think that makes heavy I/O within the process caused by excessive RAM usage pretty unlikely.

I think that there are two possibilities where one or maybe both cause the issue:

1. Fork poisoning, which refers to a dependency loaded in the original process and copied into the newly created processes so that only one process can really use the resource and blocks the others. This seems to be a known problem with Torch as a dependency ([Multiprocessing with Torch](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html)). That may explain why the behavior is different for Mac, since the multiprocessing library uses `Spawn` (forking from a non-participating process) there while the Linux library uses the original `fork` and sort of copies the process which is itself calling the simulation module of BayBE ([Fork vs. Spawn in Python](https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn)).

2. Too many processes and threads are created so that the OS needs more CPU time to handle all of them when they continuously wake each other. This can also be abetted by point 1. I found that besides multiple worker processes and some for maintaining the working queue, which are created from xyzpy dependencies:

<img width="1902" height="921" alt="Image" src="https://github.com/user-attachments/assets/773e4a48-930f-4336-9e6c-74c7683ab583" />

Each of these processes also creates a number of threads where only a few are active:

<img width="1902" height="921" alt="Image" src="https://github.com/user-attachments/assets/69944ee4-645d-432e-bebf-3bb04e550ccc" />

That is a bit strange for me since, AFAIK, best practice is to have one process/thread per physical core in Python because the advantage of sharing the core logically is not efficient with the huge runtime environment of the language ([Cores in Python](https://stackoverflow.com/questions/40217873/multiprocessing-use-only-the-physical-cores)). Maybe there is a collision of parallelization from xyzpy and other dependencies such as Botorch or Torch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelization slows down simulation on Linux #645

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallelization slows down simulation on Linux #645

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions