-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to implement a MultiKueue support for the custom Jobs #2349
Comments
+1 We've had requests to support kubeflow TFJobs |
I will try this issue after I finish #2175. |
We should design this to support external integrations as well. |
Yes, we can do it. But I think starting things as a minimum required is better. So, I want to take the external integrations as a separate issue. |
For other jobs with built-in support in kueue it is fairly easy to do. For fully external implementation it might be a bit harder, the multikueue admission check controller is composed of 3 object controllers (reconcilers for workload, admissionchecks and multikueue clusters) and some long lived clients with watchers connected to the worker clusters. It's likely that the MulikueueCluster API needs to be modified to ensure that one multikueue cluster is only managed by one admission check controller. For now I wold suggest to do the builtin jobs support and create a follow-up for the external usage. |
/assign |
@trasc This issue motivation is supporting in-house/external jobs. So, if you work only on built-in jobs like TFJob, could you open another issue? |
@trasc Thank you. I'm guessing that we can work on this after #2350 is done because we will implement a mechanism to support more bilt-in Jobs in #2350, right? |
I synced supporting MultiKueue for the custom Jobs with @trasc.
In #2350, @trasc tries the first phase, and then he will try the second phase in this issue. |
/assign |
That sounds odd. External controllers shouldn't have to re-implement the admission check. They should only implement how to sync jobs. If those two things are tightly coupled, they should be decoupled. @trasc can you actually write a short description of the components involved in MultiKueue and put it in https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue |
@alculquicondor @trasc TBH, as an external contributor (non Googler), I'm feeling that the MultiKueue specifications is not clearing. So, before we apply the something enhancements into the MultiKueue, shouldn't we update the KEP to align with the actual implementations and behavior? After that, can we add new enhancements to the MultiKueue? |
Hi @tenzen-y , indeed the KEP does not contain all the implementation details, but I don't think it should, and I don't see any significant deviations from the design. To be specific kueue/apis/kueue/v1alpha1/multikueue_types.go Lines 112 to 118 in 8b217c6
and used:
|
I meant the MultiKueueCluster.
That is user documentation, so I still think updating the MultiKueue proposal would be worth it since clarifying the internal mechanism would be worth it. Even if some maintainers step down from this project, this Kueue project continue to be the journey. Actually, I have some experience in the unclear internal specifications brought OSS projects bugs and challenging maintenance. Additionally, the latest KEP (not outdated) allows us to clear the place where the rabbit hole is and could reduce the risks of generating traps by being reviewed by multiple contributors. |
Personally, I think this is a distinguished feature which justifies a dedicated KEP (even though I would be acceptive of MultiKueue KEP update). I would like to see the chosen approach outlined, and indicated alternatives, along with pros and cons. Additionally, the KEP process creates space for early feedback from users who are not code reviewers. My ask is motivated by difficulty understanding design choices taken in the pending PRs (#2373 and #2405). Getting the decisions discussed and agreed early on could avoid the confusion, and potentially dropping a significant amount of code in case we decide to change the approach. |
kueue/apis/kueue/v1alpha1/multikueue_types.go Lines 57 to 85 in 8b217c6
|
The KEP cannot cover all the implementation details, The details in all the KEPs are limited, take the provisioning admission check controller , a very similar component, there is no mentioning of it's internal components. |
I synced with @trasc offline. We agreed that there are quite a few mini-components in MultiKueue that it is a bit hard to keep track of as an outsider. Traian will update the KEP with those design elements, spending about 1 paragraph for each mini-component. I agree that most features don't require such level of detail, but the complexity in MultiKueue is a bit higher. |
#2373 looks like a simple refactoring overall to me. I'm ok merging it and have the documentation match those changes. |
Actually, I synced with @trasc regarding to MultiKueue previous KEP, In conclusion, I agree with the current proposal, basically. |
I'm sorry in advance if I ask stupid questions, but I still have some difficulty understanding the approach which I see continues to be implemented in #2458. Specifically:
Can we at least consider a design where we still have a single MultiKueue admission check, but the custom jobs adapters are registered to it. I think it will make life on OnCall easier to have a single MutliKueue admission check that can be quickly identified by its name rather than a zoo of AdmissionChecks. It would also make admin users life easier to have a global configuration for the MK AC plugins, rather than doing this per ClusterQueue. |
We are preparing a KEP for that. |
Ok great! |
That sounds great to me. I will review the submitted design soon. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@mszadkow Is there any progressing on your KEP? |
We will probably postpone the KEP effort until after MK goes beta |
That makes sense. The KEP could move forward during the MK beta graduation if @mszadkow has time. |
This is blocked by
spec.managedBy
field. #2331What would you like to be added:
I would like to support the extension mechanism so that we can implement the MultiKueue controller for the custom / in-house Jobs. I'm thinking of exposing the
jobAdapter
interface and implementing a mechanism to add arbitrary objects to theadapters
similar to the jobframework integration manager:adapters
:kueue/pkg/controller/admissionchecks/multikueue/workload.go
Lines 48 to 51 in e461fe0
jobAdapters
interface:kueue/pkg/controller/admissionchecks/multikueue/workload.go
Lines 66 to 81 in e461fe0
jobframework integration manager
: https://github.com/kubernetes-sigs/kueue/blob/e461fe0827786e5cd6f45ff6739ebeed9a700b05/pkg/controller/jobframework/integrationmanager.goWhy is this needed:
In many company, they have company specific CustomResources, and we should give a possibility to manage such CustomResources across multiple clusters.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: