-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [nodeutilization]: allow to set/remove a soft taint from overutilized nodes #1625
base: master
Are you sure you want to change the base?
Conversation
Welcome @tiraboschi! |
Hi @tiraboschi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3fdbda3
to
fdf1a3a
Compare
/ok-to-test |
61f2819
to
00d0c86
Compare
/cc |
@googs1025: GitHub didn't allow me to request PR reviews from the following users: I, will, try, to, understand, this, pr. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -291,19 +291,28 @@ like `kubectl top`) may differ from the calculated consumption, due to these com | |||
actual usage metrics. Metrics-based descheduling can be enabled by setting `metricsUtilization.metricsServer` field. | |||
In order to have the plugin consume the metrics the metric collector needs to be configured as well. | |||
See `metricsCollector` field at [Top Level configuration](#top-level-configuration) for available options. | |||
That gap can only be reduced using the SoftTainter to dinamically apply/remove soft taints (`effect: PreferNoSchedule`) to/from nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if setting taint to node has left the descheduler itself 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the key is on Soft
(effect: PreferNoSchedule
) taints only. I tend to this that having the descheduler setting also hard taints is a bad idea.
In the last update I also added a ValidatingAdmissionPolicy
(GA in k8s 1.30) with a few rules to enforce that the ServiceAcoount used by the scheduler is only allowed to set/delete/amend its own taints (by a fixed key prefix) and only if with effect: PreferNoSchedule
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am concerned about is whether this is not a feature of the descheduler. I understand that after rescheduling, new pods may still be scheduled to this node again. But we need to think whether adding taints to the descheduler is a good choice. Maybe we should use a custom component to accomplish this? 🤔 I don't seem to see other plugins that have similar operations for modifying node fields or taints (maybe I'm wrong, I haven't delved into all the plugins).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar issue from : kubernetes/node-problem-detector#565
There are similar discussions in the NDP project. I understand that the two projects are different, but when we introduce similar feature, do we have to think clearly or consider it more general? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the NPD front the discussion was more about hard taints "effect": "NoSchedule"
to filter out problematic nodes and yes, in that case all the concerns are valid since, for instance, hard tainting a relevant amount of nodes could lead the cluster in a position where it could not have enough resource to run expected pods.
Here instead we are only talking about soft taints as an hint for the scheduler to avoid overutilized nodes if possible.
Please notice that in the scheduler (intolerable) soft taints are simply counted for the scoring step:
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/tainttoleration/taint_toleration.go#L198
But they are not going to affect the filtering step.
And since the soft taints are already counted to weight a node, neither having two independent controllers setting them is really an issue since a node with more than one soft taint will symple receive a worst score than a node with just a single intolerable soft taint.
We can eventually also think about leveraging that counting mechanism also here applying a variable number of soft taints according to how much a node is over utilized to get nodes slightly overutilized preferred against nodes that are considerably over utilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is: Does this deviate from the original intention of the descheduler? But I understand that the feature is reasonable. 🤔
// IsNodeSoftTainted checks if the node is already tainted | ||
// with the provided key and value and PreferNoSchedule effect. | ||
// If a match is found, it returns true. | ||
func IsNodeSoftTainted(node *v1.Node, taintKey, taintValue string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsNodeSoftTainted
AddOrUpdateSoftTaint
RemoveSoftTaint
maybe we can remove Soft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just to emphasize that the descheduler is only supposed to add Soft
taints as an hint for the scheduler to avoid overutilized nodes if possible without really interfering with the scheduling process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got this. I missed TaintEffectPreferNoSchedule
00d0c86
to
1375b73
Compare
9cbd483
to
f57c4e8
Compare
…lized nodes Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin. After kubernetes-sigs#1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests). On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satified. The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at the descheduler eyes. Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling. With a fine-grained `ValidatingAdmissionPolicy` (GA in k8s 1.30) we can enforce that the `deschduler-sa` is only allowed to apply/remove soft-taints (enforcing `PreferNoSchedule` effect) and with a predefined key-prefix (eventually customizable with a parameter). TODO: add more unit and e2e tests Signed-off-by: Simone Tiraboschi <[email protected]>
f57c4e8
to
3bddb9d
Compare
welcome for First-time contributor~~ look forward to see you more and more PR and contribute~~ |
Dynamically set/remove a soft taint (effect: PreferNoSchedule) to/from nodes that the descheduler identified as overutilized according to the nodeutilization plugin.
After #1555 the nodeutilization plugin can consume utilization data from actual kubernetes metrics and not only static reservations (requests).
On the other side, the default kube-scheduler is only going to ensure that Pod's specific resource requests can be satisfied.
The soft taint will simply gave an hint to the scheduler to try avoiding nodes that looks overutilized at
descheduler eyes.
Since it's just a soft taint (do it just if possible) there is no need to restrict it only to a certain amount of nodes in the cluster or be strict on error handling.
With a fine-grained
ValidatingAdmissionPolicy
(GA in k8s 1.30)we can enforce that the
descheduler-sa
is only allowedto apply/remove soft-taints (enforcing
PreferNoSchedule
effect)and with a predefined key-prefix (eventually customizable with a parameter).
TODO: add more unit and e2e tests
Fixes: #1626