Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry jobs which fail or have gone missing #169

Draft
wants to merge 2 commits into
base: 1.x
Choose a base branch
from

Conversation

fredden
Copy link
Collaborator

@fredden fredden commented Mar 14, 2022

This is a feature suggestion. When jobs fail or have been detected as 'gone away', it would be helpful to be able to automatically retry these jobs. I've set the default to ignore 'failed' jobs, but to automatically retry 'missing' jobs.

I'm happy to discuss this / receive feedback about this feature.

@Ethan3600 Ethan3600 self-requested a review March 16, 2022 00:31
@Ethan3600
Copy link
Owner

@fredden Love this idea!

My only concern is the scenario where someone has a custom job that's failing for some arbitrary reason (let's say some I/O operation fails or something.. and it's very taxing on the system). Maybe there should be a max retry logic as well? Not sure how we'd want to (or even if we'd want to) persist that as we'd now have to keep track of how many times an arbitrary job runs.

We just don't want to bring an instance to a halt b/c we keep retrying something that's a super heavy lift. I think that's my only concern.

What do you think? Other than that, I think this is a great addition.

@fredden
Copy link
Collaborator Author

fredden commented Apr 5, 2022

I've thought about this some more. I think these probably need to be configuration options for each group (not global), and be off by default for all. It also makes sense to have a retry limit per job; working out when to reset this counter may be tricky.

I agree that we don't want to create a situation where a resource-intensive process crashes on repeat and this feature leads to an outage / denial of service.

@fredden fredden marked this pull request as draft April 5, 2022 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants