Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multikueue] add support for custom multikueue controller #2405

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mszadkow
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Provide a way to pass custom controller to multikueue admission check controller.

Which issue(s) this PR fixes:

Fixes #2349

Special notes for your reviewer:

Consider base branch: #2373

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mszadkow
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @mszadkow. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 12, 2024
Copy link

netlify bot commented Jun 12, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit c1e4244
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66746086481fe700096f6b02

// +kubebuilder:validation:Required
// +kubebuilder:default="kueue.x-k8s.io/multikueue"
// +kubebuilder:validation:XValidation:rule="self == oldSelf", message="field is immutable"
ControllerName string `json:"controllerName"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We deliberately didn't make it configurable if the first iteration to limit the potential of misconfiguring the system. Could you first open the issue to discuss the need for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y is this change required in your understanding of #2349?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status of a multikueue cluster follows the state of a contreller-runtime client in the kueue-controller-manager, if we want to be able to have multiple controller managers working with the clusters we need a way to mark who is managing each cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I'm wondering if we could alternatively still use MultiKueue controller, but have a concept of external plugin?

To make it easier to track of an object is managed by MultiKueue or not. I feel this would help OnCall, but maybe I need to think about this more.

Since this required API changes should we not first go via KEP to discuss alternatives? WDYT @alculquicondor, @tenzen-y?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to be able to have multiple controller managers working with the clusters we need a way to mark who is managing each cluster.

There should only be one controller manager modifying these objects (Kueue).

Other controller managers should only be in charge of mirroring the Job CRDs, but otherwise they can use all the same objects.

I would like to see at least a description of what the plan is, how components should interact, what each of them is responsible of. Maybe a KEP is worth, but please start with some description before writing more code.

Copy link
Contributor

@trasc trasc Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I'm wondering if we could alternatively still use MultiKueue controller, but have a concept of external plugin?

That will be a lot more complicated and will most likely require some additional API object type(s) to communicate between that custom controller to multikueue admission check controller.

The target of this PR is to be able to instantiate multikueue admission check controllers having different sets of adapters that can work in the same cluster at the same time.

There should only be one controller manager modifying these objects (Kueue).

MultiKueueClusters should not be that different then AdmissionChecks, which by design are managed by an AdmissionCheckController running in or outside of Kueue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach will lose the consistency between the kueue-manager and external controllers since it seems that the current approach allows us to re-implement the multikueue admission controller.

Can we consider an alternative approach to prevent the losing specification consistency?

As I described in the #2349, nice to start in the synchronizing design and the actual implementations.

@mszadkow
Copy link
Contributor Author

/assign @trasc

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2024
@alculquicondor
Copy link
Contributor

alculquicondor commented Jun 19, 2024

With the refactor and documentation PRs merged, can you now propose an update to the KEP to add custom jobs support?

@mszadkow mszadkow force-pushed the feature/multikueue-support-for-custom-jobs branch from 6333bb8 to ee5a798 Compare June 19, 2024 20:17
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 19, 2024
@mszadkow
Copy link
Contributor Author

With the refactor and documentation PRs merged, can you now propose an update to the KEP to add custom jobs support?

Yes, I am working on KEP now.

@mszadkow
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

@mszadkow: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@trasc
Copy link
Contributor

trasc commented Jun 20, 2024

/ok-to-test
/test-all

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 20, 2024
}
}

// WithAdapters - sets all the MultiKueue adaptors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

}
}

// WithAdapters - sets or updates the adadpter of the MultiKueue adaptors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment

@mszadkow
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 29, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mszadkow
Copy link
Contributor Author

mszadkow commented Jul 9, 2024

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2024
@k8s-ci-robot
Copy link
Contributor

@mszadkow: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-e2e-main-1-31 c1e4244 link true /test pull-kueue-test-e2e-main-1-31
pull-kueue-test-tas-e2e-main c1e4244 link true /test pull-kueue-test-tas-e2e-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to implement a MultiKueue support for the custom Jobs
6 participants