-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
makeClusterPSOCK(..., rscript_envs = ...) - more clever #8
Comments
This will work for the local machine. But, what about remote sessions over, say, SSH? |
Ideally, R should support this, cf. HenrikBengtsson/Wishlist-for-R#110 |
Per futureverse/future#392, we now support: cl <- makeClusterPSOCK(..., rscript = c("LD_LIBRARY_PATH=/path/to", "Rscript")) EDIT: Note that this does not work on MS Windows. |
Regarding not being able to pass environment variables sooner in the So, Example: Local workerInstead of: > cl <- parallelly::makeClusterPSOCK(1L, rscript_envs = c(PI="3.14"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, start worker #1 on local machine 'localhost' with:
"C:/PROGRA~1/R/R-41~1.0/bin/x64/Rscript" --default-packages=datasets,utils,grDevices,graphics,stats,methods -e "options(socketOptions = \"no-delay\")" -e "Sys.setenv(\"PI\"=\"3.14\")" -e "workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()" MASTER=localhost PORT=11876 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential```
we could have it do:
```r
> cl <- parallelly::makeClusterPSOCK(1L, rscript_envs = c(PI="3.14"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, start worker #1 on local machine 'localhost' with:
"C:/PROGRA~1/R/R-41~1.0/bin/x64/R" --no-echo --no-restore R_DEFAULT_PACKAGES="datasets,utils,grDevices,graphics,stats,methods" PI="3.14" -e "options(socketOptions = \"no-delay\")" -e "workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()" --args MASTER=localhost PORT=11876 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential``` Example: Remote workerInstead of: > cl <- parallelly::makeClusterPSOCK("remote.example.org", rscript_envs = c(PI="3.14"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, (i) login into external machine 'remote.example.org':
'/usr/bin/ssh' -R 11121:localhost:11121 remote.example.org
and (ii) start worker #1 from there:
'Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'options(socketOptions = "no-delay")' -e 'Sys.setenv("PI"="3.14")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' MASTER=localhost PORT=11121 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential
Alternatively, start worker #1 from the local machine by combining both step in a single call:
'/usr/bin/ssh' -R 11121:localhost:11121 remote.example.org "'Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'options(socketOptions = \"no-delay\")' -e 'Sys.setenv(\"PI\"=\"3.14\")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' MASTER=localhost PORT=11121 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential" we could do: > cl <- parallelly::makeClusterPSOCK("remote.example.org", rscript_envs = c(PI="3.14"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, (i) login into external machine 'remote.example.org':
'/usr/bin/ssh' -R 11121:localhost:11121 remote.example.org
and (ii) start worker #1 from there:
'R' --no-echo --no-restore R_DEFAULT_PACKAGES='datasets,utils,grDevices,graphics,stats,methods' PI='3.14' -e 'options(socketOptions = "no-delay")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' --args MASTER=localhost PORT=11121 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential
Alternatively, start worker #1 from the local machine by combining both step in a single call:
'/usr/bin/ssh' -R 11121:localhost:11121 remote.example.org "'R' --no-echo --no-restore R_DEFAULT_PACKAGES='datasets,utils,grDevices,graphics,stats,methods' PI='3.14' -e 'options(socketOptions = \"no-delay\")' -e 'Sys.setenv(\"PI\"=\"3.14\")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' --args MASTER=localhost PORT=11121 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential" Note that the above |
In parallelly (>= 1.29.0-9003), we can now do (Issue #75): > cl <- parallelly::makeClusterPSOCK(1L, rscript = file.path(R.home("bin"), "R"), rscript_args = c("--no-echo", "--no-restore", "*", "--args"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, start worker #1 on local machine 'localhost' with:
'/home/hb/software/R-devel/R-4-1-branch/lib/R/bin/R' --no-echo --no-restore --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'options(socketOptions = "no-delay")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' --args MASTER=localhost PORT=11920 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential Now, contrary to > cl <- parallelly::makeClusterPSOCK(1L, rscript = file.path(R.home("bin"), "R"), rscript_args = c("--no-echo", "--no-restore", "*", "--args"))
WARNING: unknown option '--default-packages=datasets,utils,grDevices,graphics,stats,methods'
> cl
Socket cluster with 1 nodes where 1 node is on host 'localhost' (R version 4.1.2 Patched (2021-11-01 r81123), platform x86_64-pc-linux-gnu) |
In the develop version (commit 2299389), default packages are now set via cl <- parallelly::makeClusterPSOCK(1L, rscript = file.path(R.home("bin"), "R"), rscript_args = c("--no-echo", "--no-restore", "*", "--args"), dryrun = TRUE)
----------------------------------------------------------------------
Manually, start worker #1 on local machine 'localhost' with:
R_DEFAULT_PACKAGES=datasets,utils,grDevices,graphics,stats,methods '/home/hb/software/R-devel/R-4-1-branch/lib/R/bin/R' --no-echo --no-restore -e 'options(socketOptions = "no-delay")' -e 'workRSOCK <- tryCatch(parallel:::.workRSOCK, error=function(e) parallel:::.slaveRSOCK); workRSOCK()' --args MASTER=localhost PORT=11606 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential This avoids above warning. Currently, this R_DEFAULT_PACKAGES workaround is only applied for locally launched cluster nodes. For remote workers, we'll get a warning that it's not supported. |
Update: New argument |
Argh... so, on MS Windows, So, on MS Windows, above |
makeClusterPSOCK()
gained argument 'rscript_envs' for setting environment variables in workers on startup, e.g.rscript_envs = c(FOO = "3.14", "BAR")
.Instead of doing this via
-e "Sys.setenv('<name>'='<value>')"
options, can't we do:This way we can set env vars that need to be set very early on in the R startup process in order to take place, e.g.
TMPDIR
.I've verified that the above work on Linux and Windows. Maybe worth adding an internal
with_env()
to make sure things are properly undone for the main R session.The text was updated successfully, but these errors were encountered: