-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resubmission function #203
Comments
As a simpler approach, clustermq could just keep a log of all jobs that have completed successfully (or just those that failed), and the user could then set a parameter in |
It appears that clustermq cluster array jobs suffer from the issue that if one of the N parallel cluster jobs (eg., In my case, this means that I currently only have 4 of 20 ( What do others do in cases where some of their 100's or 1000's of jobs fail? Do they have to always figure out which failed and then re-run just those jobs? That's potentially a lot of extra code just to figure out failed jobs and re-run only those (or just re-run everything). |
originally posted in #153 by @nick-youngblut
I'm a big fan of snakemake, which allows for automatic resubmission of failed jobs with increased resources (eg.,
mem = lambda wildcards, threads, attempt: attempt * 8 # Gb of memory doubles per attempt
). It would be really awesome to have that feature in clustermq. For example, one could provide a function instead of a value for the template:One would also need a
max_attempts
parameter forQ()
.The text was updated successfully, but these errors were encountered: