-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to restore cortex operator normally when too many jobs are requested #2394
Comments
the The first thing I would try is If the If that doesn't work, there are ways to "fix" that weird state, but still require a lot of manual intervention, or eventually an automated script. When you create a BatchAPI job this happens:
In order to fix that weird state you have to:
|
hello.
I'm currently using cortex 0.40.0.
I seldom request thousands of jobs to certain cortex api by mistake.
When I do like that, I can't use cortex cli well (the response time is so long, or just hanging) and I guess that cortex operator is overloaded because of me.
(the status of
operator-controller-manager
pod is continuously goes to OOMKilled -> CrashLoopBackOff)To resolve this issue, I attempted these so far but It didn't work well.
After all I just down the cluster and up (+ re-deploy all of api) to make cortex work well.
If this is happened, what should I do to restore cortex without down and up cluster?
I glad to your support. Thank you so much.
The text was updated successfully, but these errors were encountered: