Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not consider preemption.borrowWithinCohort in FairSharing preemptions #4165

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gabesaba
Copy link
Contributor

@gabesaba gabesaba commented Feb 6, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

preemption.borrowWithinCohort is being used to override dominantResourceShare, causing preemptions which do not match scheduling order, resulting in the bug described here. We do not document preemption.borrowWithinCohort as applying to FairSharing, but only the other two preemption policies - link.

Which issue(s) this PR fixes:

Fixes #3779

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Update FairSharing to be incompatible with ClusterQueue.Preemption.BorrowWithinCohort. Using these parameters together is a no-op, and will be validated against in future releases. This change fixes an edge case which triggered an infinite preemption loop when these two parameters were combined.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 6, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 6, 2025
Copy link

netlify bot commented Feb 6, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 8390d77
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67b48286196fd20008d961ba

@tenzen-y
Copy link
Member

tenzen-y commented Feb 6, 2025

/cc

@mimowo
Copy link
Contributor

mimowo commented Feb 6, 2025

/hold
For determining if we want to support the preemption.borrowWithinCohort along with FairSharing, see #3779 (comment), as this PR makes the preemption.borrowWithinCohort setting no-op.

If we decide the combination is useful we may need to re-write the PR. If we decide the combination is not useful I would like to follow it up with validation which prevents combining the settings.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2025
@tenzen-y
Copy link
Member

tenzen-y commented Feb 6, 2025

/hold For determining if we want to support the preemption.borrowWithinCohort along with FairSharing, see #3779 (comment), as this PR makes the preemption.borrowWithinCohort setting no-op.

If we decide the combination is useful we may need to re-write the PR. If we decide the combination is not useful I would like to follow it up with validation which prevents combining the settings.

One more consideration: this PR tries to break the current FairSharing feature specification (before this PR, the borrowWithinCohort could be involved in the FairSharing decision). Ideally, we want to looking for another solution to resolve #3779. If we decide to go with this solution, we might want to prepare any mitigation way since this already has been published as Beta feature.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great to me. The next step is decision whether we should take this way.

@mimowo
Copy link
Contributor

mimowo commented Feb 7, 2025

I gave it another round of thinking.

Actually, I think the root cause for the infinite loop is that we do belowThreshold || strategy, because then the low priority workloads can bypass the strategy.

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

This will make sure no workloads bypass the strategy, yet we require the target workloads to be below the priority threshold. I think it is likely a valid usecase to safe-guard high priority workloads from fair sharing, and the field allows for that.

I think the idea of the preemption.borrowWithinCohort.maxPriorityThreshold was to safe-guard high priority workloads from the preemption.

EDIT: I would be happy to move forward with the approach in this comment (unless you see some counter-examples) and backport it, regardless of the long-term plans of combining the configurations.

@mimowo
Copy link
Contributor

mimowo commented Feb 7, 2025

WDYT @gabesaba @tenzen-y ?

@tenzen-y
Copy link
Member

tenzen-y commented Feb 7, 2025

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

My concern is making fairSharing complicated since the idea indicates cooperation between borrowWithinCohort.maxPriorityThreshold and fairSharing.
As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

But, I'm not honestly sure if whieh solution can be understandable for users. So, here, it might be better to proceed with your (@mimowo) approach, and then consider the above approach (fairSharing with filter mechanism) based on user feedback.
That is a more low-risk approach, IMO.

@gabesaba
Copy link
Contributor Author

gabesaba commented Feb 7, 2025

I think it is likely a valid usecase to safe-guard high priority workloads from fair sharing, and the field allows for that.

This protection is defined on the preempting CQ's side. Which means, for these high priority workloads to be safe, this setting needs to be applied to all CQs. I think a better interface would be preemption protection defined by the CQ that wants to protect its workloads. Or, for that CQ to simply not borrow for a flavor it doesn't want to risk preemptions in.

So, I have another idea to turn this check to just strategy, but filter out workloads above the threshold in findCandidates when fairSharing is enabled. I believe the idea has also a relatively low implementation cost.

This still leads to weird behavior when the workload wants to do reclaimWithinCohort. With this proposal, we would filter valid reclaimWithinCohort targets, so that the semantics no longer match non-fair sharing reclamation. We could add a condition to do reclaim when not borrowing [1][2] - but I'm opposed to introducing more complexity into an already complex algorithm. I think we should leave these decisions to the result of the fair share value, to to keep the complexity manageable.

notes:
[1] but adds even more complexity, as definition of borrowing is complex in the hierarchical case #3948 (comment)
[2] plus the CQ could stop borrowing once it preempts some of its own workloads

@mimowo
Copy link
Contributor

mimowo commented Feb 7, 2025

As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

That was also my thought, but in a way FairSharing is already aware of other candidate filtering configurations based on preemption.reclaimWithinCohort, so respecting also preemtion.borrowWithinCohort.maxPriorityThreshold seems justified.

@mimowo
Copy link
Contributor

mimowo commented Feb 7, 2025

we would filter valid reclaimWithinCohort targets, so that the semantics no longer match non-fair sharing reclamation

We already use preemption.reclaimWithinCohort for filtering of candidates later considered by FairSharing algorithm based on relative priority of preemptor and candidate. Considering preemption.borrowWithinCohort when fair sharing enabled just adds another constraint on the set of candidates based on absolute priority rather than the relative one.

We could add a condition to do reclaim when not borrowing [1][2]

The notes are valid, and I agree this approach leads to some sub-optimal decisions given the dynamic nature of "borrowing" as we progress through candidates. However:

  • the algorithm as in master does not make the check dynamic [2] based on the current needs of preemption, so there would be no regression in this regard
  • it eliminates the infinite loop while allowing to put cap on the priority of the preempted workloads (if one has such a business use case).
  • it lowers the complexity of the algorithm anyway as the filtering is moved to findCandidates which is easier to understand

but I'm opposed to introducing more complexity into an already complex algorithm

Depends how you look at it. IMO, the complexity of would be anyway lower compared to the current "main" branch, because we would essentially commit your changes inside the fairPreemptions, but move the complexity to the findCandidates. So, the main part of the algorithm would get simplified anyway.

In the end - I'm fine with the approach proposed in this PR as long as we confirm somehow this does not break any business use cases, but I'm worried confirming this might be difficult in practice given the OSS nature of the software. When we disable configuration it is generally better to go over some deprecation (feature-gate) process, and short term has something which can be cherry-picked as bugfixing with marginal risk of breaking.

@gabesaba
Copy link
Contributor Author

gabesaba commented Feb 7, 2025

Discussed and reached consensus with @mimowo offline

The infinite preemption issue (which this PR fixes), and whether we limit preemptions to a threshold priority (potentially how users expect this parameter to work with FairSharing), are separate issues. Since the latter didn't exist in FairSharing in 0.10.1 anyway, we will fix it separately, ensuring that the new API makes sense. I opened #4173 with a proposal.

I disagree with the approach of using preemption.borrowWithinCohort with FairSharing, as it results in opposite semantics in Classical Preemption vs Fair Sharing. See argument in #4173

As an alternative approach, we might be able to add a similar filtering mechanism as borrowWithinCohort.maxPriorityThreshold to fairSharing, and then make borrowWithinCohort.maxPriorityThreshold and fairSharing as mutually exclusive.

@tenzen-y Please take a look at #4173. I think we can add a threshold in reclaimWithinCohort to accomplish this

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2025
@mimowo
Copy link
Contributor

mimowo commented Feb 10, 2025

Yes, I agree that supporting the borrowWithinCohort with FairSharing would be confusing. For example, my proposal (of filtering the candidates based on the threshold regardless of borrowing of the preemptor) leads also to a problematic behavior in some cases when the preemptor does not need to borrow as the algorithm progresses.

So, I'm happy to just make it no-op.

However, I would like to update the documentation for the preemption.borrowWithingCohort field to say: "it is not supported when fair sharing is enabled", or "the behavior when fair sharing is enabled is unspecified. The current implementation ignores the field". We may also update some other relevant places in the documentation.

Ideally, we would also have some warnings in webhooks and on Kueue startup, but if this requires substantial ground work then I'm ok with just documentation update.

@tenzen-y
Copy link
Member

Ideally, we would also have some warnings in webhooks and on Kueue startup, but if this requires substantial ground work then I'm ok with just documentation update.

I think that it could be possible, easily. I guess that we can just pass the Kueue Configuration to the CQ webhook server, and hold it in

type ClusterQueueWebhook struct{}
.

The validation could be performed when CQ is created or updated.

@gabesaba gabesaba force-pushed the fair_sharing_preemption_loop branch from a2a965f to 1d96117 Compare February 17, 2025 11:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 17, 2025
@gabesaba gabesaba force-pushed the fair_sharing_preemption_loop branch from 1d96117 to 28653b2 Compare February 17, 2025 11:18
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 17, 2025
@gabesaba
Copy link
Contributor Author

a2a965f to 1d96117 - rebase
1d96117 to 28653b2 - documentation + webhook validation

@mimowo @tenzen-y

@gabesaba gabesaba force-pushed the fair_sharing_preemption_loop branch 2 times, most recently from 40a8211 to 88c83fb Compare February 17, 2025 11:43
@gabesaba
Copy link
Contributor Author

Ideally, we would also have some warnings in webhooks and on Kueue startup, but if this requires substantial ground work then I'm ok with just documentation update.

I think that it could be possible, easily. I guess that we can just pass the Kueue Configuration to the CQ webhook server, and hold it in

type ClusterQueueWebhook struct{}

.
The validation could be performed when CQ is created or updated.

Added this validation. I only tested in unit tests, not in integration tests, as it would have required some refactoring of webhook integration tests to pass this new configuration. I can update these tests if you think is worth the effort.

@gabesaba gabesaba force-pushed the fair_sharing_preemption_loop branch from 88c83fb to 9c2c491 Compare February 17, 2025 11:58
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Feb 17, 2025

@gabesaba: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-customconfigs-e2e-main a2a965f link true /test pull-kueue-test-customconfigs-e2e-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gabesaba gabesaba force-pushed the fair_sharing_preemption_loop branch from 9c2c491 to 9e733cf Compare February 17, 2025 12:46
@mimowo
Copy link
Contributor

mimowo commented Feb 17, 2025

The strong validation makes me feel uncomfortable - I'm thinking about the migration from classical preemption to fair sharing.

Consider the two migration options for users who use borrowWithinCohort:

  1. enable fair sharing then adjust CQs; the requests to update CQ will continuously fail until the CQs are updated
  2. adjust CQs then enable fair sharing; there is no preemption while borrowing until fair sharing is enabled

Also, there is no easy way to experiment (enable / disable fair sharing to compare the impact). It will require scripting the enablement and disablement procedures to make it quick. I think ideally the validation would be "soft" warnings rather than "hard".

I would suggest splitting the PR into two:

  1. logic & docs changes
  2. validation

So that we can make the decision independently. I'm convinced we want to cherry-pick 1, but (2.) makes me feel less comfortable. I think ideally this is not strong validation, but soft. We have two types of validation in the core k8s, see:

If you @tenzen-y and @gabesaba think strong validation is ok, I'm good to go, but even then prefer two PRs to be able to easily bailout (revert) just the validation.

@gabesaba
Copy link
Contributor Author

The strong validation makes me feel uncomfortable - I'm thinking about the migration from classical preemption to fair sharing.

Consider the two migration options for users who use borrowWithinCohort:

  1. enable fair sharing then adjust CQs; the requests to update CQ will continuously fail until the CQs are updated
  2. adjust CQs then enable fair sharing; there is no preemption while borrowing until fair sharing is enabled

Also, there is no easy way to experiment (enable / disable fair sharing to compare the impact). It will require scripting the enablement and disablement procedures to make it quick. I think ideally the validation would be "soft" warnings rather than "hard".

I would suggest splitting the PR into two:

  1. logic & docs changes
  2. validation

So that we can make the decision independently. I'm convinced we want to cherry-pick 1, but (2.) makes me feel less comfortable. I think ideally this is not strong validation, but soft. We have two types of validation in the core k8s, see:

If you @tenzen-y and @gabesaba think strong validation is ok, I'm good to go, but even then prefer two PRs to be able to easily bailout (revert) just the validation.

Split out to #4298

Let's discuss there whether we add soft or strong validation, and whether or not to cherry-pick that change

@tenzen-y
Copy link
Member

The strong validation makes me feel uncomfortable - I'm thinking about the migration from classical preemption to fair sharing.

Consider the two migration options for users who use borrowWithinCohort:

  1. enable fair sharing then adjust CQs; the requests to update CQ will continuously fail until the CQs are updated
  2. adjust CQs then enable fair sharing; there is no preemption while borrowing until fair sharing is enabled

Also, there is no easy way to experiment (enable / disable fair sharing to compare the impact). It will require scripting the enablement and disablement procedures to make it quick. I think ideally the validation would be "soft" warnings rather than "hard".

I would suggest splitting the PR into two:

  1. logic & docs changes
  2. validation

So that we can make the decision independently. I'm convinced we want to cherry-pick 1, but (2.) makes me feel less comfortable. I think ideally this is not strong validation, but soft. We have two types of validation in the core k8s, see:

If you @tenzen-y and @gabesaba think strong validation is ok, I'm good to go, but even then prefer two PRs to be able to easily bailout (revert) just the validation.

I fully agree with you.

@gabesaba
Copy link
Contributor Author

The strong validation makes me feel uncomfortable - I'm thinking about the migration from classical preemption to fair sharing.
Consider the two migration options for users who use borrowWithinCohort:

  1. enable fair sharing then adjust CQs; the requests to update CQ will continuously fail until the CQs are updated
  2. adjust CQs then enable fair sharing; there is no preemption while borrowing until fair sharing is enabled

Also, there is no easy way to experiment (enable / disable fair sharing to compare the impact). It will require scripting the enablement and disablement procedures to make it quick. I think ideally the validation would be "soft" warnings rather than "hard".
I would suggest splitting the PR into two:

  1. logic & docs changes
  2. validation

So that we can make the decision independently. I'm convinced we want to cherry-pick 1, but (2.) makes me feel less comfortable. I think ideally this is not strong validation, but soft. We have two types of validation in the core k8s, see:

If you @tenzen-y and @gabesaba think strong validation is ok, I'm good to go, but even then prefer two PRs to be able to easily bailout (revert) just the validation.

I fully agree with you.

Removed the validations from this PR, so it is only fixing the bug and updating documentation. I also updated the release note.

@mimowo
Copy link
Contributor

mimowo commented Feb 18, 2025

/lgtm
/approve
/cherry-pick release-0.10 release-0.9

@k8s-infra-cherrypick-robot
Copy link
Contributor

@mimowo: once the present PR merges, I will cherry-pick it on top of release-0.10 in a new PR and assign it to you.

In response to this:

/lgtm
/approve
/cherry-pick release-0.10 release-0.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f7ddced2404225cdff9d823bdb5ca949983d2ded

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2025
@mimowo
Copy link
Contributor

mimowo commented Feb 18, 2025

Leaving the hold label to release to @gabesaba

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
for API comment appearance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabesaba Could you investigate the reason why new API comments are not recorded in CRD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infinite preemption loop in fair sharing
5 participants