Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

borg-z · 2024-12-03T08:50:58Z

What happened:

In our AWS Kubernetes cluster, we use the cluster autoscaler with the priority expander, and have multiple Node Groups (NGs) with correct priorities and node selectors configured:

NG1: Uses Spot Instances with higher priority.
NG2: Uses On-Demand Instances with lower priority.

When we deploy pods that require scaling up NG1 (Spot Instances), and AWS lacks sufficient capacity for these instances, the autoscaler attempts to scale up NG1. After the scale-up times out (e.g., after 5 minutes), NG1 goes into backoff. However, the existing pods remain in a Pending state, and the autoscaler does not attempt to scale up NG2, even though it's the next NG in the priority list.

Only when we create new pods, the autoscaler recognizes that there are unschedulable pods, notices that NG1 is in backoff, and then correctly scales up NG2 according to the priority expander. If no new pods are created, the autoscaler does not attempt to switch to NG2, and the pending pods remain unscheduled indefinitely.

What you expected to happen:

We expected that after NG1 goes into backoff due to the scale-up timeout, the autoscaler would immediately attempt to scale up NG2 to schedule the pending pods. The autoscaler should recognize that the existing pods are still unschedulable and continue trying to schedule them on other available NGs, respecting the configured priorities and node selectors.

How to reproduce it (as minimally and precisely as possible):

Set up the cluster:
- Deploy a Kubernetes cluster on AWS.
- Install the cluster autoscaler with the priority expander.
- Ensure node group priorities and node selectors are configured correctly.
Create Node Groups:
- NG1 (High Priority): Uses Spot Instances.
- NG2 (Low Priority): Uses On-Demand Instances.
Deploy pods:
- Deploy pods that require scaling up NG1.
Simulate Spot Instance unavailability:
- Ensure AWS cannot provide the Spot Instances (e.g., select an instance type that's unavailable as Spot).
Observe autoscaler behavior:
- Autoscaler attempts to scale up NG1.
- After the timeout, NG1 goes into backoff.
- Existing pods remain in Pending state.
- Autoscaler does not attempt to scale up NG2.
Create new pods:
- Deploy additional pods.
Observe autoscaler behavior again:
- Autoscaler recognizes unschedulable pods.
- Notices NG1 is in backoff.
- Scales up NG2 as per priority configuration.

Anything else we need to know:

Priority Expander Configuration:
- Priorities of the node groups are set correctly.
- Node selectors and labels are configured properly, allowing pods to be scheduled on either NG.
Issue Summary:
- The autoscaler does not switch to the next NG after the first NG goes into backoff unless new pods are created.
- This means pending pods remain unscheduled indefinitely if no new pods are added.
- The mechanism of switching NGs works only when new pods are created.

Relevant Logs Demonstrating the Behavior:

Autoscaler attempts to scale up NG1 (Spot Instances):

I1203 07:12:44.200968       1 scale_up.go:531] Best option to resize: NG1 (Spot Instances)
I1203 07:12:44.201066       1 scale_up.go:534] Estimated 1 node(s) needed in NG1
I1203 07:12:44.201183       1 scale_up.go:658] Scaling group NG1 size from 0 to 1

NG1 goes into backoff after timeout:

W1203 07:17:44.625427       1 clusterstate.go:272] Scale-up timed out for node group NG1 after 5m0.429152893s
W1203 07:17:44.625476       1 clusterstate.go:307] Disabling scale-up for node group NG1 until <future time>; errorClass=Other; errorCode=timeout

Autoscaler reports "No unschedulable pods" despite Pending pods:

I1203 07:18:54.855439       1 static_autoscaler.go:567] No unschedulable pods

After creating new pods, autoscaler scales up NG2 (On-Demand Instances):

I1203 08:03:57.257735       1 priority.go:166] priority expander: NG2 (On-Demand Instances) chosen as the highest available
I1203 08:03:57.257745       1 scale_up.go:531] Best option to resize: NG2
I1203 08:03:57.257753       1 scale_up.go:534] Estimated 1 node(s) needed in NG2
I1203 08:03:57.257867       1 scale_up.go:658] Scaling group NG2 size from 0 to 1

Environment:

Autoscaler: 1.30.1
MCM: 0.36 (we cannot update it due to important reasons). Maybe that's the problem, but I was hoping that even without the early backoff the switching should work after the timeout

Thank you for your help in resolving this issue.

The text was updated successfully, but these errors were encountered:

borg-z added the kind/bug Bug label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

borg-z commented Dec 3, 2024 •

edited

Loading

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

Comments

borg-z commented Dec 3, 2024 • edited Loading

borg-z commented Dec 3, 2024 •

edited

Loading