Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary transformations cmd in makeNodePSOCK #12

Open
myoung3 opened this issue Aug 1, 2020 · 4 comments
Open

Allow arbitrary transformations cmd in makeNodePSOCK #12

myoung3 opened this issue Aug 1, 2020 · 4 comments

Comments

@myoung3
Copy link

myoung3 commented Aug 1, 2020

The remote server I'm working with (the head node of a cluster) requires me to call newgrp then load a module for R before I open R. e.g.

newgrp groupname
module purge
module load R/R-4.0.0
R

The same goes for calling an Rscript, so currently it's not possible to use future(remote) without altering this behavior on the remote server.

There's an easy fix to this situation that just requires a allowing an arbitrary transformation, via an added argument to makeNodePSOCK , of the Rscript command that gets created internal to that function.

library(future)
tx <- function(x){
  paste0('echo "module purge; module load R/R-4.0.0;',x,' "|newgrp groupname')
}


future::plan(list(
  future::tweak(future::remote,
                workers = future::makeClusterPSOCK(
                  workers = "servername",
                  homogeneous = FALSE,
                  rscript_transform=tx,
                )
  )))

See PR for what this would look like.

@HenrikBengtsson
Copy link
Collaborator

Hi, for me to better understand, in your first example you're mentioning commands you want to call prior to launching Rscript whereas in the latter example you're suggesting that you also want to be able to pipe the whole Rscript call to another command.

Is the first one sufficient for you?

@myoung3
Copy link
Author

myoung3 commented Aug 10, 2020

Hi Henrik,
I'm not an expert in linux, but my understanding is that the "newgrp" command opens a completely new shell. So the following code will work interactively:

newgrp mygroup
module purge
module load R/R-4.0.0
Rscript

But if I were to put those files into a shells script and execute them, it wouldn't work because all the lines after "newgrp mygroup" go nowhere.

The workaround for this is to execute the following:
echo "module purge; module load R/R-4.0.0; Rscript" | newgrp mygroup

which pipes "module purge; module load R/R-4.0.0; Rscript" into the new shell created by newgrp.

Since this is a very edge-case situation, I don't think there's need to specifically implement in makeNodePSOCK piping the Rscript and preceding commands into another system command. In my pull request, I just introduce a new argument to makeNodePSOCK that takes an arbitrary function where the input is the finalized rscript command (generated internally to makeNodePSOCK ). This should cover a lot of people's system-specific eccentricities because it will allow the simpler case of executing commands prior to calling Rscript (users could write a function to prepend commands to the Rscript call separated ";") as well as my more complicated situation of needing to pipe Rscript.

Does that make sense?

@HenrikBengtsson
Copy link
Collaborator

Thanks for the clarification. I wanna stay away from making it possible to tweak the internal call because it introduces a risk of breaking people's pipelines in the future when other things need to be added to makeClusterPSOCK(). What I'm instead thinking about is support for something like:

cl <- future::makeClusterPSOCK("remote.org", rscript = c("eval", "module purge; module load r; Rscript"))
...
which currently attempts to run:
```sh
'eval' 'module purge; module load r; Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'workRSOCK <- tryCatch(parallel:::.slaveRSOCK, error=function(e) parallel:::.workRSOCK); workRSOCK()' MASTER=localhost PORT=11365 OUT=/dev/null TIMEOUT=2592000 XDR=TRUE

on the remote.org machine. This almost works, except from the fact that the Rscript -e ... option needs to be protected with an extra layer of shell quotes. If called as is, we get:

bash: syntax error near unexpected token `(

Using rscript is more in line with what's already supported, e.g.

cl <- future::makeClusterPSOCK(..., rscript = c("LD_LIBRARY_PATH=/path/to", "Rscript"))

Question though, at some point it's just easier to add a helper script on the remote machine, e.g. Rscript400

#!/ /usr/bin/env bash

newgrp mygroup
module purge
module load R/R-4.0.0

Rscript "$@"

set the executable flag (chmod ugo+x Rscript400) and make use of that as:

cl <- future::makeClusterPSOCK("remote.org", rscript = "/path/to/Rscript400")

Have you considered that?

@myoung3
Copy link
Author

myoung3 commented Aug 10, 2020

There's slight benefit to putting the system calls in R because it gives easier access to selection of which R module (ie version) to load, whereas creating a script requires a new script for each R version that gets installed on the server. But given this avoids many other complications, you're probably right that it's easier to just have a separate script for each R version on remote.

@HenrikBengtsson HenrikBengtsson transferred this issue from futureverse/future Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants