-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure some CPU/memory requests for hub and proxy pods in basehub #2127
Comments
I think looking at observed usage metrics and setting appropriate requests and limits is a good idea! We don't want them to be too high (especially in shared clusters) as that might increase overall cost, but we already have data for this so I leave it to you to figure out a decent number and get it there! I agree that the current situation has to change |
Ref https://2i2c.freshdesk.com/a/tickets/414 We should figure out better defaults in 2i2c-org#2127, but as LEAP is getting close to publication on some stuff, this will help us with stabilizing the infrastructure.
Ref https://2i2c.freshdesk.com/a/tickets/414 We should figure out better defaults in 2i2c-org#2127, but as LEAP is getting close to publication on some stuff, this will help us with stabilizing the infrastructure.
It's a long running connection kept open, serving progressbar responses via [EventSource](https://developer.mozilla.org/en-US/docs/Web/API/EventSource). So it can't be treated as a regular HTTP request / response. Getting rid of this unmasks more real problems in hub response latency by removing this noise. Ref 2i2c-org/infrastructure#2127 (comment)
@consideRatio good catch, I opened jupyterhub/grafana-dashboards#59 as a 'fix' on grafana |
It's a long running connection kept open, serving progressbar responses via [EventSource](https://developer.mozilla.org/en-US/docs/Web/API/EventSource). So it can't be treated as a regular HTTP request / response. Getting rid of this unmasks more real problems in hub response latency by removing this noise. Ref 2i2c-org/infrastructure#2127 (comment)
Currently, the hub and proxy pod requests very little CPU/Memory, but various pods by default have already 100m in requests. This could starve our hub/proxy pod of CPU. I think for the sake of stability, we should grant the hub pod 1 full CPU, and allow them to request memory to an extent making us confident we won't get Evicted/OOMKilled either.
It also seems that the hub/proxy pod request of 128 MB memory isn't covering the need. It would be good to request memory more than we typically use so that we don't risk being evicted or OOMKilled.
This could have been relevant for the incident in #2126, if it wasn't it would be good to rule it out by having these increased requests.
Config in basehub
If we provide a 10m request and other pods on the node has 100m requests and going full throttle - they will get a ten times larger share of CPU than the hub pod. On core nodes with 4 CPU it means that our hub pod would only get 0.4 CPU.
I understand it as the hub pod can benefit of up to 1 full CPU from time to time, but I'm a bit confused about it. I recall a grafana dashboard I've seen in the past presented metrics in a way that fails to capture the peaks properly unless zoomed in.
@yuvipanda I think we could put 50m or 100m in requests here for the hub pod to reduce the risk of getting throttled before 1 CPU if in competition with other pods. What do you think?
Hub pod
infrastructure/helm-charts/basehub/values.yaml
Lines 433 to 439 in a9d8816
Proxy pod
infrastructure/helm-charts/basehub/values.yaml
Lines 139 to 146 in a9d8816
The text was updated successfully, but these errors were encountered: