Skip to content

cluster-autoscaler has a panic in NodeDeletionBatcher.AddNode #5891

Closed
@com6056

Description

@com6056

Which component are you using?:
cluster-autoscaler

What version of the component are you using?:

Component version: v1.27.1

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:41:01Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:32:02Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
AWS via cluster-api, cluster-api-provider-aws (MachinePool), and the aws cluster-autoscaler provider

What did you expect to happen?:
No panic

What happened instead?:
Panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x46d3b08]

goroutine 3346 [running]:
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*NodeDeletionBatcher).AddNode(0xc0074eb900, 0xc0047c7340, 0xaf?)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/delete_in_batch.go:77 +0xa8
k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*Actuator).scheduleDeletion(0xc0061a4320, 0xc000a247d0?, {0xc007493a70, 0xf}, 0x88?)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/actuator.go:363 +0x85
created by k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation.(*Actuator).deleteNodesAsync
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/scaledown/actuation/actuator.go:296 +0xc95

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
We have 2 different cluster-autoscaler instances running, 1 to manage the MachineDeployment resources using the cluster-api provider and 1 to manage the MachinePool resources using the aws provider. This is a workaround until kubernetes-sigs/cluster-api-provider-aws#4184 is complete and the cluster-api provider can be used for MachinePool resources.

It looks like it might be due to calling nodeGroup.Id() in

CleanUpAndRecordFailedScaleDownEvent(d.ctx, node, nodeGroup.Id(), drain, d.nodeDeletionTracker, "", result)
when nodeGroup might be nil, but I haven't confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions