-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Submission rate too high" with a large future_lapply #13
Comments
Apparently, there is a way to restrict the maximum number of jobs running at a time. It will probably be a SLURM environment variable. You might look at This is why |
What's missingInternally,
I'm sure what "exponential backoff between 5 and 120 seconds" really means. @mllg, does this mean that the sleep time grows exponentially from a minimum 5 seconds to a maximym 120 seconds between jobs? Now, @wlandau-lilly, I have to think more about if Workaround for now: Control via load balancing
If you look at Comment: The main rational for the |
Original comment: To update, I have been happily using Updated comment: The original version of this comment was plain wrong. The error just hadn't shown up. 500 seems to fail, 300 seems to fail, 200 seems to work fine. Even when sending more than 200, a bunch of jobs do start and, since drake is in charge, those resources aren't wasted. |
@HenrikBengtsson from drake's point of view, this so-called "workaround" is actually an ideal solution in its own right. Here, imports and targets are parallelized with different numbers of workers, which is the right approach for distributed parallelism. library(drake)
library(future.batchtools)
future::plan(batchtools_local(workers = 8))
# 4 jobs for imports, 8 jobs for targets:
make(my_plan, parallelism = "future_lapply", jobs = 4) I will recommend this approach in the documentation shortly. |
Yes. It was buried in the configuration, but you can also control it via setting the resource
Exactly. The sleep time for iteration 5 + 115 * pexp(i - 1, rate = 0.01) But note that I discovered a bug lately so that the there was no sleeping at all 😞 There is currently no support for controlling the submission rate. I could however use the reported error message and treat the error as a temporary error which then automatically leads to the described sleep mechanism in |
This problem appears to be solved with the latest version of batchtools. Feel free to close. |
Related to this issue: I've changed the default number of workers on HPC schedulers from |
My SLURM system got upset when submitting a large number of jobs:
Perhaps one could solve this with an interface to the
sleep
option inbatchtools::submitJobs
?The text was updated successfully, but these errors were encountered: