Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does krr always recommend unsetting CPU limit? #153

Open
jdheyburn opened this issue Oct 6, 2023 · 4 comments
Open

Why does krr always recommend unsetting CPU limit? #153

jdheyburn opened this issue Oct 6, 2023 · 4 comments

Comments

@jdheyburn
Copy link

This issue doesn't necessarily relate to a feature request or a bug, so I did not use the template for these.

This might be somewhat of an opinionated question, and I understand the reasoning for this. However when reporting on the % of CPU resources used by a container, we need to know the upper boundary (CPU limit) so that we can know if it is appropriately sized.

If we remove the limit, then we can't calculate a percentage of CPU used.

I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.

@LeaveMyYard
Copy link
Contributor

I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.

This is based on this article

Still while I agree that there are some usecases, I am not sure what should be recommended as a limit for CPU (like what the formula should be. So if you know what it should be you can maybe propose it (or any other solution)

Also you are always able to fork the code and to add your own strategy (KRR currently have only a simple strategy, but it is built for being easily extendible)

@sd-matt-b
Copy link

sd-matt-b commented Nov 2, 2023

I agree that setting a CPU limit then potentially leads towards CPU throttling, however this is where a tool like krr is useful, such that we can size it appropriately.

This is based on this article

Still while I agree that there are some usecases, I am not sure what should be recommended as a limit for CPU (like what the formula should be. So if you know what it should be you can maybe propose it (or any other solution)

Dave Chiluk from Indeed suggests targeting between 0% and 10% throttling in his 2019 Kubecon talk: https://youtu.be/UE7QX98-kO0?t=2265

This is, assumedly, for the average throttling over the lifetime of the container. Obviously people should be able to use their own custom logic still if they have unique patterns, but it'd be possible to calculate the average usage for a container and then calculate how much CPU likely would have gotten you to a target throttling percentage. It goes without saying a more robust formula should be considered that can account for a wider range of reasonable usage patterns, but I think a flat usage shape recommendation based purely on average CPU usage is a good place to start.

I believe we should target 3%, if 0-10% throttling is the target for efficient applications that are good cluster neighbors, then we should err on the side of performance in that range and suggest a cpu limit that would theoretically result in an average throttling of 2%.

That'd be an interesting formula to come up with, but I think it's possible.

Also you are always able to fork the code and to add your own strategy (KRR currently have only a simple strategy, but it is built for being easily extendible)

If it's not possible yet, I would love to be able to pass a strategy definition via command line so I can commit our project's strategy to the repo it's being used in! This would help us codify application expectations, and I think would allow a lot of added value.

@aantn
Copy link
Contributor

aantn commented Nov 13, 2023

@sd-matt-b we're open to PRs to support anything you need. We'd rather have the strategy in the codebase (assuming its something you can contribute) and read in settings from a yaml file in your project repo.

@fenio
Copy link
Contributor

fenio commented Feb 12, 2024

Maybe for start there could be option to stop reporting CPU limits as something bad and stop lowering overall grade if it is detected?
I personally think that this whole "stop using CPU limits" trend is just complete nonsense.
Various issues where found in schedulers in the past but these were issues. They needed to be fixed and they have been fixed here and there over time. Stopping using CPU limits at all is just unreasonable. Thus encouraging to do so is also a bit problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants