Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter manageJobsWithoutQueueNames using a namespaceSelector for all integrations #3589

Open
3 tasks
dgrove-oss opened this issue Nov 18, 2024 · 6 comments
Open
3 tasks
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@dgrove-oss
Copy link
Contributor

What would you like to be added:

Add a manageJobsWithoutQueueNameNamespaceSelector to the Kueue configuration API and use it to restrict the jobs to which manageJobsWithoutQueueName is applied across all integrations.

We can then deprecate podOptions.namepaceSelector and eventually remove it from the configuration API.

Why is this needed:

As discussed in the review of #3520, because the Deployment and StatefulSet integrations are built on top of the Pod integration, they implicitly restrict the scope of manageJobsWithoutQueueName by filtering it with podOptions.namepaceSelector. This results in an irregular API that may be confusing to users.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@dgrove-oss dgrove-oss added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 18, 2024
@dgrove-oss
Copy link
Contributor Author

/assign

I will work on drafting a KEP.

@mimowo
Copy link
Contributor

mimowo commented Nov 19, 2024

cc @mwielgus @tenzen-y @mwysokin

@tenzen-y
Copy link
Member

tenzen-y commented Nov 19, 2024

IIRC, we did not introduce this mechanism in the previous discussions: #2119

Instead of introducing this knob, shouldn't we decouple the Stateful and Deployment integration from Pod integration?
Actually, the StatefulSet integration has dedicated controller, and I'm not sure the reason why we keep depending on the Pod integrations.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 19, 2024

This was discussed before (#2119), but what changed since then is the addition of the Deployment and StatefulSet integrations.

Deployment and StatefulSet have the same fundamental issue with manageJobsWithoutQueueNames as Pod (independent of how we choose to implement the integrations). If manageJobsWithoutQueueNames is true and you don't restrict its scope somehow things will break because system namespaces that use Deployments or StatefulSets will unexpectedly be suspended, breaking the system's functionality.

So, the proposal is to put together a KEP to see what it would take to have a uniform API to modulate manageJobsWithoutQueueNames.

An alternative that we can discuss in the KEP is to instead deprecate and then remove manageJobsWithoutQueueNames entirely in favor of a ValidatingAdmissionPolicy. I've implemented such a policy as part of our MLBatch project (see admissionPolicy.yaml) and it was subtler than was suggested in the discussion of #2119. In particular, dealing with child Jobs requires teaching the VAP about ownership links and (at least for me) that was tricky and I'm not convinced it is less complex than the namespaceSelector option. Note that my VAP only deals with one level of ownership (because that's all I needed for our restricted use case). A general one would have to deal with crawling several links to get to the "top-level" Job that was being managed by Kueue.

@mimowo
Copy link
Contributor

mimowo commented Nov 19, 2024

Regarding the Deployments and StatefulSet via pod-based integration.

The fundamental difference between Deployments, StatefulSet and JobCRDs is that there is no suspend field ( and it is unlikely to be ever added, or it will take a year at very least to get into beta).

So, using scheduling Gates for suspending seems like a good fit, and this is already provided by the pod-based integration, so we are reusing it. I'm open to alternatives, but I think they need to be presented more holistically, and in detail, because they seem vague at the moment to me at least.

@dgrove-oss
Copy link
Contributor Author

dgrove-oss commented Nov 20, 2024

Initial draft of the KEP in #3595

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants