Error in unserialize(node$con) : MultisessionFuture (future_lapply-4) failed to receive results from cluster RichSOCKnode #4 (PID 436932 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. #685
Replies: 3 comments
-
Hello. It's clear that something causes the parallel workers to completely crash, regardless of you running 'multicore' or 'multisession' workers. Since it happens to both backends, it's likely to be independent of the parallel backend. If you tried with Depending on your Linux setup, it could be that you're running out of memory, and Linux decides to terminate your workers. If you've seen some error from Linux like "Out of Memory (OOM) killer", then that's the reason. Regardless, I'd suggest that you try with fewer workers to see if you can reproduce the crash. Try with If it still crashes with |
Beta Was this translation helpful? Give feedback.
-
Hi, [ Sorry, I didn't know where else to post this. I didn't consider to open another bug report after seeing that you converted @Yunuuuu 's one to this discussion ] Error and
Here the session after the error:
|
Beta Was this translation helpful? Give feedback.
-
I dug more into the data and found the guilty file which returns the following error:
Looks like the As a crosscheck, I then tried [with no positive results]:
You were right: it seems that a particular file caused a function to increase a lot the memory usage, driving a session/worker to crash...even though I'm only guessing from its behaviour on a data subset. I believe this might explain also why 'paths' was not found (?). [Nevertheless, I still can't understand why the same process worked fine a couple of weeks ago with the same |
Beta Was this translation helpful? Give feedback.
-
(Please use https://github.com/HenrikBengtsson/future/discussions for Q&A)
Hi, thanks for your great R package future which really convenient to run long-long task.
Describe the bug
A clear and concise description of what the bug is.
I begin to run this function with
future::plan("multicore", workers = 10L)
, it also gave similar error infos as belows, so I tried above multisession as indicated in #474the
rrho_correct_pval
is a long function deposited in https://github.com/Yunuuuu/biomisc/blob/81948d2e5e2bab5a4cf76fd76e8ab4a096192efd/R/run_rrho.R#L787I put the main future function here:
Reproduce example
Actually, the biomisc::run_rrho also use future_lapply but it won't often gave a error (it'll also gave error randomly, I cannot reproduce biomisc::run_rrho error message, but I run biomsc::rrho_correct_pval after biomisc::run_rrho which will often reproduce above error (use my own data: gene expression array data)).
But when I use a artificial data (I cannot make error occur, so I can't give reboust example code to reproduce this error)
Expected behavior
Run without error
Session information
Please share your session information after the error has occurred so that we also see which packages and versions are involved;
Beta Was this translation helpful? Give feedback.
All reactions