Description
Hello,
I am not quite sure if this can be considered a bug, but I thought I would share. Feel free to close if this is too much of an edge issue.
I am running a simulation using the bart model that involved creating and deleting the pymc model (with a bart component) in each iteration. I noticed that as I went through iterations I would accumulate python processes that were no longer using CPU but appeared to hold memory(~50-100mb).
When many iterations were done I started having OOM issues due to these processes gradually taking up memory. These processes would die once the main process dies.
These processes were not the multi-chain/multi-thread processes used in the training/inference (those associated processes were spun-up/down correctly).
I believe the issue comes from the multiprocessing.Manager() used to create the 'all_trees' list.
To resolve the issue I used the following codeblock after each iteration was complete.
import multiprocessing as mp
childs = mp.active_children()
for child in childs:
child.kill()
This resolves the issue of lingering processes.
I am not sure if this should be considered bug or not, since it only becomes an issue when a high number of bart models are being created in a single python script. And I don't know if there is really a good general solution to resolving this issue, because if you kill the child process created by the Manager
to early, I would expect there to be issues with further use of the model.
That being said, I could see other users running into this issue if doing a highly iterative process and generally I would say that it having a process that doesn't die when the model is deleted is unexpected behavior. So I just wanted to share my experience for future users reference.
Feel free to close or remove this submission if it is unhelpful.
Thanks!