Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource requests/limits updates, mainly to ingress-nginx and prometheus #2324

Conversation

consideRatio
Copy link
Contributor

@consideRatio consideRatio commented Mar 9, 2023

General motivation

By putting memory requests to match limits for prometheus-server, we shouldn't get memory pressure on the node where it runs due to prometheus-server any more I think. And like that, we won't get unrelated pod's evicted on the node which is what has happened to a critical ingress-nginx-controller pod in #2322.

If we would experience memory pressure on a node still for some reason, we now have memory requests for ingress-nginx that may very well be higher than what is used, which makes it not be considered for eviction.

Details about changes

  • Updating the cpu/memory requests and limits for ingress-nginx controller pod, defaulted at 90Mi to 250Mi.
  • Updating the cpu/memory requests and limits for prometheus-server pod.
    • I made a comment in the support chart about cpu/memory limits:
        # Prometheus cpu/memory requests/limits needs to be carefully considered based on:
        #
        # CPU:
        #
        #   During startup prometheus-server can compete for ~all of the node's
        #   CPU unless its bounded by a limit. The key to avoiding this isn't to
        #   provide a low limit as what's low is relative to the node's capacity,
        #   which could for example be 2, 4, or 8 CPU.
        #
        #   The avoid starving other pods, the key is to have a sufficiently low
        #   CPU request so that this pod doesn't get far higher priority to use
        #   the available CPU on the node when in competition with the other pods
        #   for CPU.
        #
        # Memory:
        #
        #   During startup prometheus-server will use up a lot of memory, up to X
        #   where X grows are more metrics has been collected. Over time,
        #   prometheus-server may run into its memory limit and get OOMKilled, or
        #   the node it runs on may run out of memory before that happens and
        #   prometheus-server or other pods exceeding their memory limit will get
        #   evicted.
        #
        #   To fail reliably in prometheus-server without influencing other pods,
        #   we should put the memory request to match the limit.
        #
    • The CPU request is lowered to 0.1 across all clusters to ensure that we reduce the risk of starving colocated hub and proxy pods etc without resource requests or requests at 0.01. Like this, prometheus can only hog 10x more CPU than such pods with a 10x lower request. I don't think prometheus runs the risk of being starved itself either by some pod with far higher CPU request competing for all CPU.
    • The Memory request is increased to match the limit systematically, as that ensures we avoid pod evictions and memory pressure on the node overall. Like this, we will fail more reliably in the prometheus-server pod specifically if we are get close needing to increase the memory requests/limit further.

Related

@github-actions

This comment was marked as resolved.

@consideRatio consideRatio force-pushed the pr/prometheus-ingress-nginx-resources-req-lim branch from 3640d4f to a2b8cc6 Compare March 9, 2023 11:40
@consideRatio consideRatio requested a review from a team March 9, 2023 11:47
Copy link
Contributor

@pnasrat pnasrat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good - and correctly provisioning our needed support infrastructure is definitely a good thing as can cause spurious issues like the one that prompted your investigation.

LGTM

Would this trigger core pool scale up in some places? There might be a slight risk of a flurry of pages if prometheus reschedules and then takes a while (11 mins) to come up if it has to move to a newly created node. Probably worth keeping an eye on the pagerduty-notifications and post to maintenance-notices before deploying.

@consideRatio consideRatio self-assigned this Mar 9, 2023
@consideRatio
Copy link
Contributor Author

Thank you for the review @pnasrat!!!

Would this trigger core pool scale up in some places?

I think possibly! But even if we don't wait for a scalup the prometheus-server pod's will typically restart which can make us trigger pagerduty notifications. When they restart they can take several minutes, I think maybe about 1-10, before reading through a write-ahead log (WAL). I think during that time, they may be unresponsive as well!

Also, when we now change the resource requests on the ingress-nginx pod, we will disrupt networking briefly when it restarts.

@pnasrat
Copy link
Contributor

pnasrat commented Mar 9, 2023

Nod - my change in #2321 will increase the page window to 31 min of being down for prometheus which will stop noisy pages for restarts.

The nginx reschedule worries me more, how might you release this in a way that doesn't impact hub users?

@consideRatio
Copy link
Contributor Author

consideRatio commented Mar 9, 2023

The nginx reschedule worries me more, how might you release this in a way that doesn't impact hub users?

When I think of it, I think this change may not be as disruptive as I thought initially:

  1. The upgrade strategy is a rolling update, so another replica is added before the old one is removed.
  2. I think maybe if ingress-nginx is being asked to stop after the new replica is running, that it may become unready and no longer receive new connections from the k8s Service redirecting to only ready pods.
  3. I don't understand the networking well enough to think confidently about this, but I suspect many web interactions, besides tens of second long file downloads or similar, may not experience any disruptions.

I'm not sure about the consequences, but I think we should accept the risk. I could delay merging this and do work on the Sunday evening / monday Swedish morning as I've planned to perform maintenance on the LEAP cluster then, and then there may be less activity overall.

@pnasrat
Copy link
Contributor

pnasrat commented Mar 9, 2023

Note I don't think this merits out of our deployment, merely thinking through the impact and understanding it.

I think you are right on the deployment upgrade strategy https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment

LGTM - go ahead and merge and monitor the deployment.

@consideRatio consideRatio merged commit 22c87ec into 2i2c-org:master Mar 9, 2023
@github-actions
Copy link

github-actions bot commented Mar 9, 2023

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/4374529751

@consideRatio
Copy link
Contributor Author

Thank you for your thorough review of this @pnasrat!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

ingress-controller pod that routes all incoming traffic to k8d pods got evicted?
2 participants