Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

Open
borg-z opened this issue Dec 3, 2024 · 0 comments
Open

Cluster Autoscaler Does Not Scale Up Next Node Group After Backoff #340

borg-z opened this issue Dec 3, 2024 · 0 comments
Labels

Comments

@borg-z
Copy link

borg-z commented Dec 3, 2024

What happened:

In our AWS Kubernetes cluster, we use the cluster autoscaler with the priority expander, and have multiple Node Groups (NGs) with correct priorities and node selectors configured:

  • NG1: Uses Spot Instances with higher priority.
  • NG2: Uses On-Demand Instances with lower priority.

When we deploy pods that require scaling up NG1 (Spot Instances), and AWS lacks sufficient capacity for these instances, the autoscaler attempts to scale up NG1. After the scale-up times out (e.g., after 5 minutes), NG1 goes into backoff. However, the existing pods remain in a Pending state, and the autoscaler does not attempt to scale up NG2, even though it's the next NG in the priority list.

Only when we create new pods, the autoscaler recognizes that there are unschedulable pods, notices that NG1 is in backoff, and then correctly scales up NG2 according to the priority expander. If no new pods are created, the autoscaler does not attempt to switch to NG2, and the pending pods remain unscheduled indefinitely.

What you expected to happen:

We expected that after NG1 goes into backoff due to the scale-up timeout, the autoscaler would immediately attempt to scale up NG2 to schedule the pending pods. The autoscaler should recognize that the existing pods are still unschedulable and continue trying to schedule them on other available NGs, respecting the configured priorities and node selectors.

How to reproduce it (as minimally and precisely as possible):

  1. Set up the cluster:

    • Deploy a Kubernetes cluster on AWS.
    • Install the cluster autoscaler with the priority expander.
    • Ensure node group priorities and node selectors are configured correctly.
  2. Create Node Groups:

    • NG1 (High Priority): Uses Spot Instances.
    • NG2 (Low Priority): Uses On-Demand Instances.
  3. Deploy pods:

    • Deploy pods that require scaling up NG1.
  4. Simulate Spot Instance unavailability:

    • Ensure AWS cannot provide the Spot Instances (e.g., select an instance type that's unavailable as Spot).
  5. Observe autoscaler behavior:

    • Autoscaler attempts to scale up NG1.
    • After the timeout, NG1 goes into backoff.
    • Existing pods remain in Pending state.
    • Autoscaler does not attempt to scale up NG2.
  6. Create new pods:

    • Deploy additional pods.
  7. Observe autoscaler behavior again:

    • Autoscaler recognizes unschedulable pods.
    • Notices NG1 is in backoff.
    • Scales up NG2 as per priority configuration.

Anything else we need to know:

  • Priority Expander Configuration:

    • Priorities of the node groups are set correctly.
    • Node selectors and labels are configured properly, allowing pods to be scheduled on either NG.
  • Issue Summary:

    • The autoscaler does not switch to the next NG after the first NG goes into backoff unless new pods are created.
    • This means pending pods remain unscheduled indefinitely if no new pods are added.
    • The mechanism of switching NGs works only when new pods are created.
  • Relevant Logs Demonstrating the Behavior:

    1. Autoscaler attempts to scale up NG1 (Spot Instances):

      I1203 07:12:44.200968       1 scale_up.go:531] Best option to resize: NG1 (Spot Instances)
      I1203 07:12:44.201066       1 scale_up.go:534] Estimated 1 node(s) needed in NG1
      I1203 07:12:44.201183       1 scale_up.go:658] Scaling group NG1 size from 0 to 1
      
    2. NG1 goes into backoff after timeout:

      W1203 07:17:44.625427       1 clusterstate.go:272] Scale-up timed out for node group NG1 after 5m0.429152893s
      W1203 07:17:44.625476       1 clusterstate.go:307] Disabling scale-up for node group NG1 until <future time>; errorClass=Other; errorCode=timeout
      
    3. Autoscaler reports "No unschedulable pods" despite Pending pods:

      I1203 07:18:54.855439       1 static_autoscaler.go:567] No unschedulable pods
      
    4. After creating new pods, autoscaler scales up NG2 (On-Demand Instances):

      I1203 08:03:57.257735       1 priority.go:166] priority expander: NG2 (On-Demand Instances) chosen as the highest available
      I1203 08:03:57.257745       1 scale_up.go:531] Best option to resize: NG2
      I1203 08:03:57.257753       1 scale_up.go:534] Estimated 1 node(s) needed in NG2
      I1203 08:03:57.257867       1 scale_up.go:658] Scaling group NG2 size from 0 to 1
      

Environment:

  • Autoscaler: 1.30.1
  • MCM: 0.36 (we cannot update it due to important reasons). Maybe that's the problem, but I was hoping that even without the early backoff the switching should work after the timeout

Thank you for your help in resolving this issue.

@borg-z borg-z added the kind/bug Bug label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant