New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sudden reconcile failures/progress failure #3418
Comments
Did the controller restart around the time it was fixed? |
As far as I can tell, it didn't. Digging more, here a bit more info, which is weird: These seem to be in a cluster together, and are also happening throughout - before, during, and after the bad notifs.
Also seeing these in the logs:
The first one lines up directly with the first 'wrong' notification received, and the last one in the logs lines up with the last 'wrong' notif received. and same for
Which is weird, since this trigger is configured. These The However, the looking at when the notifs start, this coincides with first 'false' alarm notif:
so both those triggers it says aren't configured are configured. but beyond that, nothing is standing out. |
I also stumbled upon this. Do you happen to have HPA? Only way I was able to fix this was to restart argo rollouts controller.
|
@Laakso Yes we had HPA on this service |
I am suspecting it has something to do with this issue. It is hard to verify though. |
So this is a fun one :D
and these (which i assume is why we start seeing a lot of these cancelled notifs)
this went on for a hours. here is a sample - no real pattern timing wise:
And then 11 hours after the first message, it just stopped. it started 16:55 -> ended 03:55
To be clear - there was no rollout in progress. We have more than 100 rollouts in cluster, but this was the only one where this started happened. (edited)
Can't reproduce. it just 'happened'.
Checklist:
Version
1.6.6
Logs
In the body. was just those over and over.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: