-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine if calling future({}) will block #264
Comments
Sorry for the delay. No, correct, such a feature does not exist in the Future API. It's an interesting idea. I wonder though exactly what type of questions can be asked and what answers can be guaranteed. For simplicity, imagine we create a lazy future: f <- future(42, lazy = TRUE) then what would: res <- will_it_block(f) mean for different types of backends? We know that for a sequential future the answer is Maybe the answer to your question is that the Future API is not meant to be used for implementing job schedulers. I don't know the answer to all this and I need to digest the idea much further before coming to a conclusion. But I'm open to further discussions/thoughts. |
You are pretty close to the way I imagined it. However, my thoughts of "will_it_block()" is more about whether or not it would block the process flow of the calling R script. For example on a dual core machine running
f3 would block until f1 or f2 finishes. For building a simple scheduler I would want to know that calling f3 would block and therefore would either:
In my use case the view is: "I don't care when the future starts, I just want to queue it and I will wait for a complete signal/poll to check if it has finished at a later time, until then I need to go and do something else first, (eg tell the user the jobs are queued)" Obviously there still needs to be a check at a programmer level that understands whether queues are sensible (multicore, multisession, HPC scheduler etc) or are not (single core, sequential) and to deal with those 2 cases, which they have to understand already. |
Related to this point, it might be that PS. Note that, I (=my extremely conservative inner self) am still trying distill/identify exactly what the core Future API is () and I'm extremely careful in adding a new "feature" before it is well understood what that entails. One of the objectives of the Core Future API is that it should handle all use cases and be supported by all backends (existing and future ones). Other features need to be "optional" since they cannot be supported everywhere - those will have to be part of the Extended/Optional Future API. This is what Issue #172 is about. It might be that |
Forgot to say, you might have "blocking" dependencies due to communication, e.g. launching a future on a remote system may block "for a substantial period of time" due to a poor internet connection - so what does it mean that launching future will "block"? |
Would about a method that answers the question "is there an idle worker?" or "will this block my main R session"? This could remove my hacky logic that Currently, I am accessing the number of possible workers using the code below... which does not seem robust. workers <- formals(future::plan("next"))$workers %||% parallelly::availableCores() Is there a cleaner way to access how many workers are executing? # my current / hacky solution...
worker_is_available <- function() {
workers <- formals(future::plan("next"))$workers %||% parallelly::availableCores()
# reach into the named `db` list and get the last entry
reg <- tail(names(get("db", envir = environment(FutureRegistry))), 1)
# As the registry for all workers
used <- length(FutureRegistry(reg, action = "list", earlySignal = FALSE))
used < workers
} My take on ...
This situation should always return |
Unfortunately, this won't work if plan(multisession, workers = function(...) sample(2:4, size=1)) Instead, just use: > nbrOfWorkers()
[1] 3 Which leads to ...
Maybe a first step toward this is to extend > nbrOfWorkers(free = TRUE)
[1] 2 Maybe that will be sufficient to get going here? Since the Future API is not really aware of the concept "worker", I'm not a big fan of having to program with such a function but we might come with a wrapper function in the future that does not mention "worker" but that can come later. Anyway, I'll try to explore if it possible to support I suspect that for > nbrOfWorkers(free = TRUE)
[1] 0 |
Great! (Sorry I missed the function 🤦 )
This should work for me! In my situation, I could just check if
Thank you!
Hmmmm. I'm wondering if it should always return |
I've just pushed branch
Yes, that might be better. In the NEWS draft, I wrote:
So, I've just pushed an update where I think the above is an example of why the "will-it-block" feature request is tricky. It probably depends on what you're after and why you're asking the question. I don't remember all the details, but we had discussions in the past on "asynchronousness" and whether that should be something one should be able to request/declare. I've got too little time and am too tired to revisit that right now, but it could be related to the use-cases here. |
…only produce an error if 'free=TRUE' is asked [#264]
I would disagree. If you look at threading as a model, the idea of a "worker" is something that can be done concurrantly (and often in a totally isolated context) with the parent/controller. Counting the controller as a worker can be counter intuitive in this regards. Take the following: jobsToDo <- list(......)
jobsRun <- list()
while (true) {
if (nbrOfWorkers(free = TRUE) > 0) {
job <- jobsToDo[[1]]
jobsToDo <- jobsToDo[[-1]]
jobsRun <- c(jobsRun, future(job))
}
processEvents() # may add more items to "jobsToDo" and may process the "jobsRun" list
# maybe a small sleep here
} Without looking at the module code it is easy to grasp the concept that whilst there is a "worker" free, ask it to do a job and then let the main thread process any events (gui, sockets, etc). With a 1 instead of 0, theres a natural "why is there always a worker left free?". For my money, nbrOfWorkers(free = TRUE) should return 0 on a sequential plan (as you originally surmised) leaving the coder to decide if their code flow should allow for the main R thread to be blocked or not. In my eyes it would be more acceptable for the main R process flow to continue and never do a background task than to become unresponsive processing background tasks. |
I agree with Henrik on maybe needing different functions (or arguments) to answer different questions as there can be multiple ways to interpret "what does blocking mean?". In your (@avsdev-cw) example, the code is used in a way that could be interpreted as “am I able to submit a new future job without waiting for a prior future to complete?“. Which is different than “when I submit this future job, will it be resolved in my main R session?“. I agree that Using the same threading model concept, shouldn't
If free means: “I am able to submit a new future job (right now) without waiting for a prior future to complete“, then the sequential plan should always be free. |
Another related discussion that revolves around the main process of doing actual work or not is in good old Issue #7 (WISH: It should be possible to adjust the number of assigned cores) |
FYI, I had to replace It turned out to be quite complicated to roll out |
What about adding a > plan(sequential)
> nbrOfFreeWorkers(background = FALSE)`
[1] 1
> nbrOfFreeWorkers(background = TRUE)`
[1] 0 and > plan(cluster, workers = 1L)
> nbrOfFreeWorkers(background = FALSE)`
[1] 1
> nbrOfFreeWorkers(background = TRUE)`
[1] 1 ? |
This solves all the use cases I can think of. Kudos! |
I've now merged this into the 'develop' branch, i.e. |
future 1.21.0 is now on CRAN with |
In relation to issues #86 and #109 I believe it would be beneficial to have some mechanism of detecting if calling future({}) (or value(f) on a lazy future) would block.
This would provide a number of options of high level ways of building schedulers which wouldn't rely on the developer having to deal with number of cores/cluster size/etc to manage the queue.
Apologies if this already exists, but I didn't see anything documented/mentioned in the issues
The text was updated successfully, but these errors were encountered: