Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow additional globals to be supplied #227

Closed
bkornfeld opened this issue May 13, 2018 · 11 comments
Closed

Allow additional globals to be supplied #227

bkornfeld opened this issue May 13, 2018 · 11 comments

Comments

@bkornfeld
Copy link

bkornfeld commented May 13, 2018

Hi,

Thanks in advance for taking the time to read/consider.

I was wondering if it might be possible to allow futures to be provided with a list of "additional globals", as opposed to either detecting globals (globals = TRUE), not detecting globals (globals = FALSE) or providing globals (globals = c('a', 'b', 'c')).

i.e. one might provide additional.globals = c('a', 'b', 'c') and globals = TRUE, so that all globals in the expression will be passed, as well as the additional globals.

This is useful as there are some instances where future does not detect globals, and one may wish to manually add globals without having to detect globals manually for the entire expression (happy to subsequently provide such an instance, however it seems to have more to do with the globals package and is somewhat besides the point).

Right now the work around is.

future.addl.globals <- function(expr, envir = parent.frame(),
    addl.expr = list(), packages = NULL, lazy = FALSE, seed = NULL,
    evaluator = plan("next"), ...) {
    
    # Find globals of expressions
    expr.globals = get.future.globals(substitute(expr), envir)
    addl.globals = lapply(addl.expr, get.future.globals, envir)
    globals = unlist(c(expr.globals, addl.globals))
    
    # Get future
    expr.fut = future(expr = expr, envir = envir, substitute = TRUE,
        globals = globals, packages = packages, lazy = lazy, seed = seed,
        evaluator = evaluator, ...)
    
    # Return
    return(expr.fut)
}

get.future.globals <- function(x, envir = parent.env()) {
    
    # Get expression from x
    if (is.character(x) && file.exists(x)) expr = parse(file = x)
    else if (is.character(x)) expr = parse(text = x)
    else if (is.function(x)) expr = as.expression(body(x))
    else if (is.call(x)) expr = x
    
    # Get globals
    req.globals = lapply(expr, globalsOf, mustExist = FALSE, envir = envir) %>%
        lapply(function(x) names(x[!vapply(x, is.null, TRUE)])) %>%
        unlist() %>% unique()
    
    # Return
    return(req.globals)
}

Any thoughts are appreciated

Ben

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented May 13, 2018

Yes, this might be a useful feature to be supported. If added, it should probably be fine by extending the syntax of the globals argument. The same would go for false-positive globals where one might wanna exclude such globals.

Yes, please share an example where globals = TRUE fails to identify one or more of the globals. It could be that it's something that needs to be fixed.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented May 13, 2018

BTW, you can always guide the algorithm by listing the "missing" global variables at the top of the future expression, e.g.

f <- future ({
  a; b; c ## additional globals

  ... expression here ...

})

I thought this was document, but it looks like I might have dropped it.

@bkornfeld
Copy link
Author

bkornfeld commented May 13, 2018

Hi Henrik,

Appreciate the quick response, and I agree a change in globals syntax could fix it.

Case #1: One simple instance (not a fault of the globals package at all) is where one knows that the future is going to have to load a .rds file in which there is, among other things, a saved function. Sometimes it is not practical to save the entire environment of the function (for speed reasons) and so one relies on resetting the function's environment to the local environment on the assumption that an additional variable exists in the local environment.

Case #2: There are also other instances where functions have distinct bodies, values of the sub-function same variable in their parent environments, but this seems like an uncommon use case, so I have not demonstrated below. If it would help, I could do so.

Simplified Case 1 (and obviously something someone would never do):

k = 2
fn = function(x) x + k
saveRDS(fn, file.loc)

future({
    fn = readRDS(file.loc)
    parent.env(environment(fn)) = environment()

    fn(1)
})

And I have in past added the missing variables at the top of the expression, but this is difficult when what those variables are changes dynamically.

Best,
Ben

@HenrikBengtsson
Copy link
Collaborator

And I have in past added the missing variables at the top of the expression, but this is difficult when what those variables are changes dynamically.

I suspected this could be your use case. I'll add this to the to-do list.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Aug 29, 2018

Another request support for specifying additional globals appeared in Issue #248. Just to get going on this... as a starter, it's very easy to implement this - the problem is more of a design decision.

The globals argument currently supports (see also Section 'Globals used by future expressions' in ?future::future):

  • globals = TRUE (default) - automatically identify globals from the future expression
  • globals = FALSE - don't record/export any globals
  • globals = <character vector> - (manually specified) names of globals to be retrieved and used
  • globals = <named list> - (manually specified) named list of of globals (and their values) to be used

The question is, how can we extend this to support specifying additional globals (either by their names or as a name-value list)?

In some (internal) old notes of mine, I found the follow sketch:

f <- future(..., globals = structure(TRUE, add = "ind", ignore = c("x", "y")))

That would setup globals by:

  1. Automatically find all globals in the future expression, cf. globals = TRUE
  2. Append the global ind manually, cf. globals = "ind" (unless already found)
  3. Drop globals x and y if they were automatically found in Step 1.

This would support the case for appending / ignoring globals programatically (as needed in this Issue #227 and #Issue #248), e.g.

extra_globals <- c("f1", "f2")
f <- future(..., globals = structure(TRUE, add = extra_globals))

Now, structure(TRUE, add = "ind", ignore = c("x", "y')) is quite an expression. Although it's more likely that developers rather than end users will use it, it's still ... "ugly".

There's also a high-priority feature request on adding support for hook functions to futures (Issue #172). Maybe there could be a "onGlobals" hook, e.g.

extra_globals <- c("f1", "f2")
f <- future(..., hooks = list(onGlobals = function(globals, ...) {
  addGlobals(globals, extra_globals)
}))

Whatever is returned by onGlobals() will be the final set of globals used. That would provide a full, powerful, low-level, and programmatical control of globals.

Feel free to elaborate on alternatives, to help getting this feature added to the API.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Aug 30, 2018

I've added a prototype for adding globals (by manually specifying the addtional ones) - in the feature/globals-add-ignore develop branch. Install it as:

> remotes::install_github("HenrikBengtsson/future@develop")

Please give it a try.

@bkornfeld, here's how to do it for your example code:

library(future)
plan(multisession, workers = 2L)

file.loc <- tempfile()
k <- 2
fn <- function(x) x + k
saveRDS(fn, file.loc)

f <- future({
    fn <- readRDS(file.loc)
    fn(1)
}, globals = structure(TRUE, add = "k"))
v <- value(f)
print(v)
# [1] 3

@burchill, here's your example from #248 (comment):

library(future)
plan(multisession, workers = 2L)

f2 <- function(x) "f2 function"

f1 <- function(x) paste0(f2(x), " ", x)

l <- list("first" = f1, "second" = f2)

output %<-% { l[["first"]]("a") } %globals% structure(TRUE, add = c("f1", "f2"))
print(output)
# [1] "f2 function a"

@bkornfeld
Copy link
Author

@HenrikBengtsson thanks so much for this update and commit - I will give it a try as soon as I have the chance. I think the structure implementation is fine (certainly much better than the current implementation), given that this is likely only to be used by developers.

@HenrikBengtsson HenrikBengtsson added this to the Next release milestone Aug 31, 2018
@HenrikBengtsson
Copy link
Collaborator

I've now merged this into the develop branch, so install via:

> remotes::install_github("HenrikBengtsson/future@develop")

@HenrikBengtsson
Copy link
Collaborator

I've also added support for ignoring globals in the automatic search, that is, one can now do things such as:

globals <- structure(TRUE, ignore = "foo", add = c("f1", "f2"))

@HenrikBengtsson
Copy link
Collaborator

FYI, future 1.10.0 implementing this is on CRAN as of today (2018-10-17)

@bkornfeld
Copy link
Author

bkornfeld commented Oct 17, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants