Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds (optional) automatic retries with exponential backoff to jobs #777

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jhilden
Copy link

@jhilden jhilden commented Feb 28, 2019

Define a job with your intended maximum number of retries and agenda will take care of automatically rerunning the job in case of a failure.

agenda.define('job with retries', { maxRetries: 2 },

The job is retried with an exponentially increasing delay to avoid too high load on your queue.

The formula for the backoff is copied from Sidekiq and includes a random element.

These would be some possible example values for the delay:

retry # delay in s
1 27
2 66
3 118
4 346
6 727
7 1366
8 2460
9 4379
10 6613
11 10288
12 14977
13 20811
14 28636
15 38554
16 50830
17 65803
18 83625

Fixes #123

Would you consider adding such a feature to agenda?

Define a job with your intended maximum number of retries and agenda will take care of automatically rerunning the job in case of a failure.

`agenda.define('job with retries', { maxRetries: 2 },`

The job is retried with an exponentially increasing delay to avoid too high load on your queue.

The formula for the backoff is copied from [Sidekiq](https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry) and includes a random element.

These would be some possible example values for the delay:

|retry #|delay in s|
|---| --- |
| 1 | 27 |
| 2 | 66 |
| 3 | 118 |
| 4 | 346 |
| 6 | 727 |
| 7 | 1366 |
| 8 | 2460 |
| 9 | 4379 |
| 10 | 6613 |
| 11 | 10288 |
| 12 | 14977 |
| 13 | 20811 |
| 14 | 28636 |
| 15 | 38554 |
| 16 | 50830 |
| 17 | 65803 |
| 18 | 83625 |

Fixes agenda#123
@santiq
Copy link

santiq commented Feb 28, 2019

Hey @jhilden interesting feature that you made it adds more stability and make jobs more robust, I'm relying on agenda.js for a couple of projects so this seems something that I would use :)

I hope that the maintainers can review it and possibly merge it. If they don't reply you, you may try to speak with them on the slack channel.
Beware that CI tests may be broken because of a small linter issue #701

@jhilden
Copy link
Author

jhilden commented Mar 15, 2019

It would be great if I could get some feedback on this draft PR, so I can decide whether to put more time into it.

If there should be no interest in merging such a feature, I will instead spend that time on simply building a custom retry solution in our private project.

@guillegette
Copy link

I think this is pretty solid. How can we get some attention from the maintainers ? @dandv @santiq

@alex-hall
Copy link
Contributor

@dandv @santiq We use Agenda heavily in production (30+ jobs that run hourly), and we're struggling with building a sane alerting policy that doesn't go off constantly in the middle of the night.

Having an intelligent retry would save our support teams a lot of headache.

@santiq
Copy link

santiq commented Apr 5, 2019

@alex-hall @guillegette

I use agenda in production too, and with more than 100+ jobs running each day, so this concerns me as well.

But I'm afraid that I don't have merge access, and I asked to be a maintainer several times but they didn't reply or they don't want me, I don't know.

What you can do is enter the slack channel and ask there to the maintainers if they could have a look at this.

Or just fork the repo and use that on your package.json.

@alex-hall
Copy link
Contributor

Thanks for the response @santiq. I actually posted into the slack at 2:40 PM EST today, so we'll see if I get a response.

Otherwise, I may have to jump ship to bull or something with this feature already :(

@dandv
Copy link
Contributor

dandv commented Apr 5, 2019

Before accepting any code PRs, we need to fix the CI. That would open the gates to resuming Agenda development.

@guillegette
Copy link

Thanks @dandv the fix for the CI seems to be a all lint problem, would you guys review a PR for that? Are you guys looking for sponsors ?

@alex-hall
Copy link
Contributor

alex-hall commented Apr 5, 2019

@gaboesquivel @dandv please see #798 which fixes CI linter errors

@alex-hall
Copy link
Contributor

Bumping this. The PR above is merged and CI is fixed.

@guillegette
Copy link

Any update about this?

@guillegette
Copy link

@dandv @gaboesquivel is there anything we can do guys to help you move this forward? are you guys accepting other contributors? I understand that you may not have time but there is enough people interested in this community to help :)

@ndraiman
Copy link

ndraiman commented Jul 6, 2019

Thank you @jhilden
Using code copied from the PR as a workaround while waiting for the PR to merge:

import * as agenda from 'agenda';
import * as moment from 'moment';

const maxRetries = 5;

const agenda = new Agenda();
agenda.define('MyJob', (job, done) => {...});
agenda.on('fail:MyJob', onJobFailure);

async function onJobFailure(error: any, job: Agenda.Job<any>) {
    const retryCount = job.attrs.failCount - 1;
    if (retryCount <= maxRetries) {
        job.attrs.nextRunAt = calcExponentialBackoff(retryCount);;
        await job.save();
    }
}

function calcExponentialBackoff(retryCount: number): Date {
    const waitInSeconds = Math.pow(retryCount, 4) + 15 + Math.random() * 30 * (retryCount + 1);
    return moment()
        .add(waitInSeconds, 'seconds')
        .toDate();
}

@hosnas
Copy link

hosnas commented Oct 27, 2019

Isnt this going to be merged? 🤔

@hosnas
Copy link

hosnas commented Nov 10, 2019

we need this pull request to be merged
@simison

@AgustinQuetto
Copy link

Hello guys. It will be merged? Because I want a feature like this. Temporary I'm making my own feature. Thanks.

@salmanahmad94
Copy link

Any updates on this merge?

@benheller
Copy link

👍 I'm also tracking this PR, and would love to see it integrated into Agenda. Thanks!

@ScreamZ
Copy link

ScreamZ commented May 1, 2020

+1

@simison @OmgImAlexis @rschmukler

Guys could you at least give some vision about such feature implementation or confirm the pattern ?

Thanks a lot and stay safe :)

Thank you @jhilden
Using code copied from the PR as a workaround while waiting for the PR to merge:

import * as agenda from 'agenda';
import * as moment from 'moment';

const maxRetries = 5;

const agenda = new Agenda();
agenda.define('MyJob', (job, done) => {...});
agenda.on('fail:MyJob', onJobFailure);

async function onJobFailure(error: any, job: Agenda.Job<any>) {
    const retryCount = job.attrs.failCount - 1;
    if (retryCount <= maxRetries) {
        job.attrs.nextRunAt = calcExponentialBackoff(retryCount);;
        await job.save();
    }
}

function calcExponentialBackoff(retryCount: number): Date {
    const waitInSeconds = Math.pow(retryCount, 4) + 15 + Math.random() * 30 * (retryCount + 1);
    return moment()
        .add(waitInSeconds, 'seconds')
        .toDate();
}

@jaschaio
Copy link

Any reason this PR isn't getting any attention despite lots of interest? @simison @OmgImAlexis @rschmukler

@@ -18,5 +19,15 @@ module.exports = function(reason) {
this.attrs.failedAt = now;
this.attrs.lastFinishedAt = now;
debug('[%s:%s] fail() called [%d] times so far', this.attrs.name, this.attrs._id, this.attrs.failCount);
const retryCount = this.attrs.failCount - 1;
if (retryCount <= this.attrs.maxRetries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong. it should be < and not <= . in my adaptation if you modify your test to maxRetry 0 it would still pass the test even though the test should fail.

@Uzlopak
Copy link
Contributor

Uzlopak commented May 26, 2020

imho: maxRetries should be part of the definition and not part of the task

@@ -3,3 +3,4 @@ coverage.html
.idea
.DS_Store
docs
package-lock.json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package-lock.json should be kept in VCS as per https://docs.npmjs.com/configuring-npm/package-lock-json.html

@zizzfizzix
Copy link

zizzfizzix commented May 30, 2020

imho: maxRetries should be part of the definition and not part of the task

I think it's better to provide it when scheduling rather than as part of definition e.g. agenda.now('newTask', { maxRetries: 5 });

@salmanahmad94
Copy link

imho: maxRetries should be part of the definition and not part of the task

I think it's better to provide it when scheduling rather than as part of definition e.g. agenda.now('newTask', { maxRetries: 5 });

Yup. That's what I'm doing.

@CoolHandLuke88
Copy link

Hey all, first I would like to say thank you for all the hard work on this project!

Are there any updates regarding this feature? If not what would it take to get this across the finish line?

@code-xhyun
Copy link

It's a shame that these essential functions are missing.
Pulse, the fork project of agenda, provides that functionality.

Check out this docs page -> Docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to restart job after failing?