Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed HPA rule to use more correct condition #408

Merged
merged 2 commits into from May 13, 2024
Merged

Conversation

dharapvj
Copy link
Contributor

Correct rule expression evaluation...
image

Current rule expression evaluation...
image

@samber
Copy link
Owner

samber commented Apr 29, 2024

LGTM.

I wonder if we need to split it into 2 metrics:

  • ScalingLimited==false -> alert
  • ScalingLimited==true && AbleToScale==false -> alert

WDYT ?

@dharapvj
Copy link
Contributor Author

Sorry I did not get back to you quickly.

ScalingLimited==false -> alert

I don't think if ScalingLimited==false is a cause of concern. that just means, there is no limit on scaling right now.

Having said that - I observed, that if, we only check for ScalingLimited==true - it alerts on both side.. if it cannot scale beyond max replicas as well as when hpa cannot scale down below min-replica e.g. minReplica is 3 and hpa thinks that we only need 1 or 2 (basically - over-allocation case).

We did not want to get alerts for this over-allocation considering criticality of the system. so I combined couple of metrics to get alerted only when HPA wants to create more pods but is restricted from doing so..

My new expression looks like this:

(
  kube_horizontalpodautoscaler_spec_max_replicas{namespace=~"YOUR_NS"} - 
  kube_horizontalpodautoscaler_status_desired_replicas{namespace=~"YOUR_NS"}
) * on (horizontalpodautoscaler,namespace)
(
  kube_horizontalpodautoscaler_status_condition{condition="ScalingLimited",namespace=~"YOUR_NS",status="true"} == 1
) == 0

Derivation of this alert expression:

a. ScalingLimited condition returns true for both when HPA hits max limit as well as min limit. we want to focus only on maxed out HPAs.
b. Difference between max and desired replicas only tells us that currently we have exhausted all possible replicas but does not indicate pressure for another replica, which is not getting fulfilled as we have reached max.

So, we take cross product of HPA scalinglimited==true and HPA where max and desired replica diff is zero. Whenever these two expressions match, their multiplication will come as zero and that's our actual alerting HPA we want to look at!

Hope this helps..

@samber
Copy link
Owner

samber commented May 13, 2024

Ok. I'm going to do the update and merge ✌️

@samber samber merged commit 870bbd4 into samber:master May 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants