Allow arbitrary transformations cmd in makeNodePSOCK #12

myoung3 · 2020-08-01T06:33:13Z

The remote server I'm working with (the head node of a cluster) requires me to call newgrp then load a module for R before I open R. e.g.

newgrp groupname
module purge
module load R/R-4.0.0
R

The same goes for calling an Rscript, so currently it's not possible to use future(remote) without altering this behavior on the remote server.

There's an easy fix to this situation that just requires a allowing an arbitrary transformation, via an added argument to makeNodePSOCK , of the Rscript command that gets created internal to that function.

library(future)
tx <- function(x){
  paste0('echo "module purge; module load R/R-4.0.0;',x,' "|newgrp groupname')
}


future::plan(list(
  future::tweak(future::remote,
                workers = future::makeClusterPSOCK(
                  workers = "servername",
                  homogeneous = FALSE,
                  rscript_transform=tx,
                )
  )))

See PR for what this would look like.

The text was updated successfully, but these errors were encountered:

HenrikBengtsson · 2020-08-10T18:22:37Z

Hi, for me to better understand, in your first example you're mentioning commands you want to call prior to launching Rscript whereas in the latter example you're suggesting that you also want to be able to pipe the whole Rscript call to another command.

Is the first one sufficient for you?

myoung3 · 2020-08-10T18:39:14Z

Hi Henrik,
I'm not an expert in linux, but my understanding is that the "newgrp" command opens a completely new shell. So the following code will work interactively:

newgrp mygroup
module purge
module load R/R-4.0.0
Rscript

But if I were to put those files into a shells script and execute them, it wouldn't work because all the lines after "newgrp mygroup" go nowhere.

The workaround for this is to execute the following:
echo "module purge; module load R/R-4.0.0; Rscript" | newgrp mygroup

which pipes "module purge; module load R/R-4.0.0; Rscript" into the new shell created by newgrp.

Since this is a very edge-case situation, I don't think there's need to specifically implement in makeNodePSOCK piping the Rscript and preceding commands into another system command. In my pull request, I just introduce a new argument to makeNodePSOCK that takes an arbitrary function where the input is the finalized rscript command (generated internally to makeNodePSOCK ). This should cover a lot of people's system-specific eccentricities because it will allow the simpler case of executing commands prior to calling Rscript (users could write a function to prepend commands to the Rscript call separated ";") as well as my more complicated situation of needing to pipe Rscript.

Does that make sense?

HenrikBengtsson · 2020-08-10T19:02:20Z

Thanks for the clarification. I wanna stay away from making it possible to tweak the internal call because it introduces a risk of breaking people's pipelines in the future when other things need to be added to makeClusterPSOCK(). What I'm instead thinking about is support for something like:

cl <- future::makeClusterPSOCK("remote.org", rscript = c("eval", "module purge; module load r; Rscript"))
...
which currently attempts to run:
```sh
'eval' 'module purge; module load r; Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()' MASTER=localhost PORT=11365 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE

on the remote.org machine. This almost works, except from the fact that the Rscript -e ... option needs to be protected with an extra layer of shell quotes. If called as is, we get:

bash: syntax error near unexpected token `(

Using rscript is more in line with what's already supported, e.g.

cl <- future::makeClusterPSOCK(..., rscript = c("LD_LIBRARY_PATH=/path/to", "Rscript"))

Question though, at some point it's just easier to add a helper script on the remote machine, e.g. Rscript400

#!/ /usr/bin/env bash

newgrp mygroup
module purge
module load R/R-4.0.0

Rscript "$@"

set the executable flag (chmod ugo+x Rscript400) and make use of that as:

cl <- future::makeClusterPSOCK("remote.org", rscript = "/path/to/Rscript400")

Have you considered that?

myoung3 · 2020-08-10T19:10:04Z

There's slight benefit to putting the system calls in R because it gives easier access to selection of which R module (ie version) to load, whereas creating a script requires a new script for each R version that gets installed on the server. But given this avoids many other complications, you're probably right that it's easier to just have a separate script for each R version on remote.

HenrikBengtsson transferred this issue from futureverse/future Oct 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow arbitrary transformations cmd in makeNodePSOCK #12

Allow arbitrary transformations cmd in makeNodePSOCK #12

myoung3 commented Aug 1, 2020

HenrikBengtsson commented Aug 10, 2020

myoung3 commented Aug 10, 2020 •

edited

Loading

HenrikBengtsson commented Aug 10, 2020

myoung3 commented Aug 10, 2020

Allow arbitrary transformations cmd in makeNodePSOCK #12

Allow arbitrary transformations cmd in makeNodePSOCK #12

Comments

myoung3 commented Aug 1, 2020

HenrikBengtsson commented Aug 10, 2020

myoung3 commented Aug 10, 2020 • edited Loading

HenrikBengtsson commented Aug 10, 2020

myoung3 commented Aug 10, 2020

myoung3 commented Aug 10, 2020 •

edited

Loading