Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect forking #224

Closed
goldingn opened this issue May 9, 2018 · 8 comments
Closed

detect forking #224

goldingn opened this issue May 9, 2018 · 8 comments
Labels
base R Possibly something for base R itself feature request

Comments

@goldingn
Copy link

goldingn commented May 9, 2018

I'm back to working on integrating future with greta.

Everything works swimmingly, except that I can't simultaneously execute tensorflow graphs in forked processes. Even if I re-import tensorflow, create a new graph etc., tensorflow just wigs out and the processes hang.

I think the best strategy for now is just to detect when the user is trying to use a forked plan, and error with suggestion they use a multisession process.

I can detect whether they've done plan(multicore) or plan(multiprocess), but I don't know how to detect whether they've something like this:

cl <- parallel::makeForkCluster(n)
plan(cluster, workers = cl)

Is there a preferred way of detecting this, or some other mechanism by which I can restrict the allowable plans?

@goldingn
Copy link
Author

goldingn commented May 9, 2018

It would also be nice, though less important, to only error if their multiprocess session is set up to fork.

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented May 11, 2018

Interesting use case. There's currently nothing in the API that supports this type of querying of details of the backend (to be) used. One hack that I could think of that you could use internally is to (disclaimer: I cannot guarantee that it'll be supported in the long term):

f <- future(NULL, lazy = TRUE)
workers <- f$workers
if (inherits(workers, "cluster")) {
  ## Worker is not yet assigned. Assume all are of the same kind; use first
  worker <- workers[[1]]
  if (inherits(worker, "forknode")) {
    stop("Parallel processing using forked processes is not supported")
  }
}

A long-term solution would be to extend the Future API with a mechanism to specify this type of requirement, e.g.

f <- future(..., resources = list(disallow = "fork"))

This type of API extension falls under the general discussion in #172.

@goldingn
Copy link
Author

Nice, that should work perfectly for now - thanks!

@HenrikBengtsson
Copy link
Collaborator

Oops, it should have been lazy = TRUE to avoid triggering an actual fork, or is it ok to launch a dummy forked future?

@goldingn
Copy link
Author

Should be fine either, but with lazy is tidier. Thanks!

@HenrikBengtsson
Copy link
Collaborator

More examples where forked processing with multi-threading fails badly are starting to show up.
I've created #355 to track whether the future framework can/should protect against this or not.

Maybe the problem on how to detect if we're running in a forked process or not should be addressed by R itself because the stability issue applies the 'parallel' package too. Posting to R-devel might be a good start.

@HenrikBengtsson HenrikBengtsson added the base R Possibly something for base R itself label Jan 7, 2020
@HenrikBengtsson
Copy link
Collaborator

@gaborcsardi just mentioned parallel:::isChild() in https://stat.ethz.ch/pipermail/r-devel/2020-January/078910.html and suggested to have it exported from the 'parallel' package. This function will let you know if an R process is a forked processes or not:

> parallel:::isChild()
[1] FALSE

> f <- parallel::mcparallel(parallel:::isChild())
> parallel::mccollect(f)
[1] TRUE

> cl <- parallel::makeForkCluster(1L)
> parallel::clusterEvalQ(cl, { parallel:::isChild() })
[1] TRUE

So, a more generic approach to check if a future plan is set to use forked processing (via mc*...) or not is to launch a test future:

f <- future(parallel:::isChild())
is_forked <- value(f)

This should cover more cases out of the box

Examples:

> library(future)
> f <- future(parallel:::isChild())
> value(f)
[1] FALSE

> plan(multisession, workers = 2L)
> f <- future(parallel:::isChild())
> value(f)
[1] FALSE

> plan(multicore, workers = 2L)
> f <- future(parallel:::isChild())
> value(f)
[1] TRUE

> cl <- parallel::makeForkCluster(1L)
> plan(cluster, workers = cl)
> f <- future(parallel:::isChild())
> value(f)
[1] TRUE

@HenrikBengtsson
Copy link
Collaborator

I've added futureverse/parallelly#18 for the possibility of having parallelly exporting isChild().

Closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
base R Possibly something for base R itself feature request
Projects
None yet
Development

No branches or pull requests

2 participants