Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R options: options() are not passed down to futures - should they? (document either way) #134

Open
HenrikBengtsson opened this issue Mar 16, 2017 · 23 comments

Comments

@HenrikBengtsson
Copy link
Collaborator

I'm adding this as a reminder to myself to document this (next release), but also to consider more automated solutions (a future release):

R's global options() are not passed down to futures. This has to be done manually right now, e.g.

fopts <- options()[c("digits", "repos")]
f <- future({
  options(fopts)
  [...]
})

This works, but it would be nice to have an automated version of this as well (cf. globals and packages) to better capture the state of the calling R process. Doing this might be a bit tricky since some options are read only / shouldn't be changed and others are specific the local machine, which may not work on a remote machine. Maybe it's possible to use static-code inspection to identify what options are used (similar to how globals are identified).

@HenrikBengtsson
Copy link
Collaborator Author

Some ideas:

Ignoring any "automatic" passing down of options, this could be done by adding an argument options analogue to globals, e.g.

f <- future(..., options = c("digits", "repos"))

That construct provides the developer control over options. For the end user to control options, there would need to be a similar argument that can be specified when setting the plan(), e.g.

plan(batchtools_sge, workers = 100L, options = c("digits", "repos"))

Options specified via plan() will always be set/passed down and additional ones specified via future() would be done in addition to what's in plan().

@rubenarslan
Copy link

So, future.global.maxSize is also not passed down. That leaves me in a pickle.
I send jobs to a Torque cluster from my local. The jobs call brms, which has a future = TRUE argument, so that each of its chains is run according to plan. But these chains are bigger than 500Mb so this fails and I cannot change it, because I cannot alter options in the futures that brms creates.
I think all future options should always be passed down.

@HenrikBengtsson
Copy link
Collaborator Author

Good point, I agree that future.globals.maxSize should be passed down - this is an oversight. As a workaround for now, you can override the default in the ~/.Rprofile on the machine in the chain that complains, e.g.

options(future.globals.maxSize = 1.0 * 1024^3)

@HenrikBengtsson
Copy link
Collaborator Author

Moving the discussion of future.* options to Issue #223 because it's a much more narrow problem than considering any/all of R's options.

@HenrikBengtsson HenrikBengtsson removed this from the Next release milestone May 7, 2018
@HenrikBengtsson
Copy link
Collaborator Author

FYI/@rubenarslan, in the next release (develop branch right now), we're also passing down:

  • future.globals.maxSize
  • future.globals.method
  • future.globals.onMissing
  • future.globals.onReference
  • future.globals.resolve
  • future.resolve.recursive

@HenrikBengtsson
Copy link
Collaborator Author

FYI, future 1.9.0, which passes down several more future.globals.* options, is now on CRAN.

@Rloveeee
Copy link

I don't see any difference, did I do something wrong ?

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] future.apply_1.0.0 future_1.9.0

loaded via a namespace (and not attached):
[1] compiler_3.5.0 parallel_3.5.0 tools_3.5.0 listenv_0.7.0 codetools_0.2-15 digest_0.6.15 globals_0.12.1

options("test"=10)
FUN=function(x) return(getOption("test"))
xx=future.apply::future_lapply(X=list(0),FUN=FUN)
xx

[[1]]
NULL

@HenrikBengtsson
Copy link
Collaborator Author

Nah, please read the above comments again. future 1.9.0 only pass down some specific future.globals.* options. The task on passing done options automagically is still an open question.

@Rloveeee
Copy link

Ok, I have read the comment but I just understand it now.
And I couldn't understand the manual fix:
fopts <- options()[c("digits", "repos")]
f <- future({
options(fopts)
[...]
}).
I tried
f <- future({
options(fopts)
},FUN=FUN,X=list(0))
and several others stuff, but I can't make it.

@HenrikBengtsson
Copy link
Collaborator Author

Here's an example:

library(future)
plan(multisession) # Use background R sessions

## My personal option
options(test = 5)

## Use only 3 digits
options(digits = 3L)

## Options that I want to pass down to the futures/workers
fopts <- options()[c("digits", "test")]

str(fopts)
# List of 2
#  $ digits: int 3
#  $ test  : num 5

# Create a future using those "frozen" 'fopts' options
> f <- future({
+   ## Set 'fopts' options (here fopts is a global variable)
+   options(fopts)
+   ## Show that those options are set
+   cat(sprintf("test = %s\n", getOption("test")))
+   print(pi)
+   2 * getOption("test")
+ })

> v <- value(f)  ## requires future 1.9.0 to see output
test = 5
[1] 3.14

> v
[1] 10

I tried
f <- future({
options(fopts)
},FUN=FUN,X=list(0))
and several others stuff, but I can't make it.

Are you asking how to do the same with future.apply::future_lapply()?

@Rloveeee
Copy link

Are you asking how to do the same with future.apply::future_lapply()?
YES !
I understand your example but I don't see how I could transfer it to future_lapply.

Thanks

@HenrikBengtsson
Copy link
Collaborator Author

HenrikBengtsson commented Jul 23, 2018

The same way, via global fopts, e.g.

> library(future.apply)
> plan(multiprocess)

> options(test = 5)
> options(digits = 3L)
> fopts <- options()[c("digits", "test")]
> y <- future_lapply(1:3, FUN = function(x) {
+   ## Set 'fopts' options (here fopts is a global variable)
+   options(fopts)
+   ## Show that those options are set
+   cat(sprintf("test = %s\n", getOption("test")))
+   print(pi)
+   getOption("test") * x
+ })
test = 5
[1] 3.14
test = 5
[1] 3.14
test = 5
[1] 3.14

> unlist(y)
[1]  5 10 15
> 

@Rloveeee
Copy link

Thanks a lot.
I got another one:

library("future.apply")
plan(multiprocess, workers = 12)
test=function(...){
UseMethod("test")
}
test.numeric=function(x){
return(x)
}
test.character=function(x){
return(x)
}
FUN=function(x) {
test(x)
}
xx=future.apply::future_lapply(FUN=FUN,X=c(2,3))

I also have another one but I can't (yet) reproduce it on small code.
(It can't find roll_sd from RcppRoll).

@HenrikBengtsson
Copy link
Collaborator Author

Please create a separate issue if you're changing topics

@adamaltmejd
Copy link

adamaltmejd commented Oct 30, 2019

Hello! Just stumbled upon this when my code started behaving strangely and realized after a few hours of debugging that the "memory_saving_mode" option had defaulted to TRUE because future wasn't passing options on. Maybe could be suitable with a warning about this?

I'm using future version 1.11.1.1 through Drake, calling future:plan(future::multiprocess). What is the recommended way to pass options to workers?

@Rloveeee
Copy link

Rloveeee commented Oct 30, 2019 via email

@HenrikBengtsson
Copy link
Collaborator Author

... the "memory_saving_mode" option had defaulted to TRUE because future wasn't passing options on. Maybe could be suitable with a warning about this?

So, I'm objecting that a warning could be useful but it's unfortunately not feasible. How would future know what options to warn about and not? Feel free to clarify how you want it to work.

I'm using future version 1.11.1.1 ...

I highly recommend that you update - you are way behind and you're missing out on bug fixes and features. Is there a reason why you're not updating?

... What is the recommended way to pass options to workers?

You can do it via a global variable, e.g.

my_opts <- options(c("memory_saving_mode", "digits"))
f <- future({
  options(my_opts)
  ...
})
v <- value(f)

@HenrikBengtsson HenrikBengtsson added this to the Next release milestone Oct 30, 2019
@adamaltmejd
Copy link

Tack :).

You are probably right. But also package options could create problems if they are changed from their defaults. Same for base R options that are changed. One possibility is to issue a warning when the package is loaded, but that might be too extreme. I guess the best would be to pass options on to works by default and then allowing the user to shut that down in the function call.

As I was saying I'm calling future via drake so am not explicitly creating any "future" objects. Only setting plan(multiprocess) and then run drake::make(parallelism = "future") so not sure how to add global variables like you suggested - but I guess thats a question for Drake.

@tdeenes
Copy link

tdeenes commented May 1, 2020

Maybe the problem of environment variables could be added to this thread.

@tdeenes
Copy link

tdeenes commented May 1, 2020

Identifying automagically which options and environment variables should be passed down to workers is hard, indeed.

I would like to add a further idea as well - not a silver bullet, but sometimes useful:

  1. In .onLoad(), try to figure out what the default options/environment variables in the host and in the worker are, and store them in .options_at_startup. Flag those options which a) do not exist in the worker, but exist in the host, and b) those which do differ between the worker and the host.

  2. When a future is created, check the actual options and environment variables, compare with .options_at_startup, and inject the relevant options and env variables to the future.

This is of course a rude approximation, can result in both false positives and false negatives.

@bakaburg1
Copy link

Probably silly question for @HenrikBengtsson: why not just pass all the options down at worker creation? it should not be much of overhead isn't it?

@scottkosty
Copy link

side note (not related to previous comment's question): I wonder if a useful comparison is to Bash options and sub-shells. For example, in bash if you run set -x (or set -e), that option is not respected in the subshells unless the user explicitly does something like export SHELLOPTS, which somehow causes the subshells to all have the same options.

Reference.

@HenrikBengtsson
Copy link
Collaborator Author

Probably silly question for @HenrikBengtsson: ...

You broke Rule No 1; there are no silly questions - just silly answers.

why not just pass all the options down at worker creation? it should not be much of overhead isn't it?

For a few different reasons, but one issue is "discussed" in futureverse/parallelly#70.

Until then, you can use argument rscript_startup to launch PSOCK workers using any R code you'd like, e.g.

> cl <- parallelly::makeClusterPSOCK(2, rscript_startup = quote(options(hello = "world")))
> str(parallel::clusterEvalQ(cl, getOption("hello")))
List of 2
 $ : chr "world"
 $ : chr "world"

Or similarly,

> library(future)
> plan(multisession, workers = 2, rscript_startup = quote(options(hello = "world")))
> f <- future(getOption("hello"))
> value(f)
[1] "world"

A more important design reason is: If your code depends on getOption("hello") being available on parallel workers, then it would fail on backends that don't support setting options on the workers. It's not clear to me how to handle this, so I take a very conservative approach in Futureverse and say "try to write your code such that you assume fresh, vanilla R workers".

A safer design is probably better to pass down R options via the future(...) - that is the background behind this issue. One reason for not yet implementing that is that I want to introduce "resource specifications", where passing down options would be more natural (I think). OTH, that is a complex problem, so maybe I should just add support for an future(..., options = ...) argument until that is in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants