-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster-autoscaler has a panic in NodeDeletionBatcher.AddNode #5891
Comments
Made a pass at fixing it, still not 100% sure but I think it should hopefully be as simple as this: #5892 |
Thanks for the issue and the PR. Can you reproduce the issue consistently? I don't think CA supports running multiple instances in the same cluster. Maybe things are working because you are using two different cloud providers? I haven't seen a case where someone is running multiple CA instances. If the issue is hard to reproduce, I wonder if it's because of a race condition (I would expect some degree of separation in execution between 2 instances though) |
Yep, our CA container is restarting pretty consistently with this panic. I'm not sure if us running 2 CA instances has much to do with it, if you look at my PR with the proposed fix, it just looks like we don't have a |
|
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
Component version: v1.27.1
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
AWS via cluster-api, cluster-api-provider-aws (MachinePool), and the aws cluster-autoscaler provider
What did you expect to happen?:
No panic
What happened instead?:
Panic:
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
We have 2 different cluster-autoscaler instances running, 1 to manage the
MachineDeployment
resources using thecluster-api
provider and 1 to manage theMachinePool
resources using theaws
provider. This is a workaround until kubernetes-sigs/cluster-api-provider-aws#4184 is complete and thecluster-api
provider can be used forMachinePool
resources.It looks like it might be due to calling
nodeGroup.Id()
inautoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go
Line 77 in 62d9c94
nodeGroup
might benil
, but I haven't confirmed.The text was updated successfully, but these errors were encountered: