Fail to receive from connection Slurm RStudio Server #634
-
Hi, I've been struggling to find relevant answers that might help solve the problem, but after hundreds of googling I felt I was out of luck. I'm wondering if someone could point out where might the problem is. I'm running parallel with future + doparallel (doFuture) with simple data.table code. I'm using one note with 122 cores on the slurm server, using
This launches R Server (opensource version), and I connect to it with ssh, with connection info generated on rserver.log file:
Below is the setting in my R:
And the process is basically reading lots of csv files and filtering it in parallel. Here's my code:
It seems it starts up multiple subprocesses but crashes in couple minutes.
It doesn't seem related to the size but I made sure with Here's sessionInfo:
and here's number of cores
I'd greatly appreciate if you could point where the problem might be. I've been using future (doParallel) for a while in the same cluster setting, and it's been working great, but somehow it started to give this error message recently. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 4 replies
-
Hi. Can you please install the develop version of parallelly; remotes::install_github("HenrikBengtsson/parallelly", ref="develop") Then rerun, and share the error message you get. This updated parallelly version will make the error message include a tad more info on what the problem is. |
Beta Was this translation helpful? Give feedback.
-
@HenrikBengtsson Here's code and error messages I received after installing dev version parallelly.
|
Beta Was this translation helpful? Give feedback.
-
@HenrikBengtsson I appreciate your comments. I'll examine it step by step and share the result here soon. There's one thing I'd like to share. I was guessing that the above crash could be coming from using a burst request on the server (which allows me to use more resources allocated to my subscription, by using idle cores with low-priority), so I used with small cluster setting with priority (that I'm allocated, which is Well, surprisingly without changing much of code, it ran great until at 73% of progress, then R (Rstudio) was crashed without previous error messages. I used
|
Beta Was this translation helpful? Give feedback.
-
I'm not entirely sure if this is related, but something strange I found during debugging: when
Strangely, the argument
Compare results: subprocess' fread truncated the data.
|
Beta Was this translation helpful? Give feedback.
-
Good news - it seems to be related to the specific RStudio Server version I was running. |
Beta Was this translation helpful? Give feedback.
-
@HenrikBengtsson Also I found the reason for messages from |
Beta Was this translation helpful? Give feedback.
Good news - it seems to be related to the specific RStudio Server version I was running.
This morning I used an updated version of RStudio and the above code started to work seamlessly.
The RStudio Server version I used previously was build 352 (2019 September), and updated is 548.