-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GetMultiKueueAdapters() returns only registered and enabled adapters #3603
base: main
Are you sure you want to change the base?
GetMultiKueueAdapters() returns only registered and enabled adapters #3603
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/ok-to-test |
c80ad1d
to
0b07501
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mszadkow The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @mbobrovskyi |
adapters, err := jobframework.GetMultiKueueAdapters(sets.New([]string{ | ||
"batch/job", "kubeflow.org/mpijob", "ray.io/rayjob", "ray.io/raycluster", | ||
"jobset.x-k8s.io/jobset", "kubeflow.org/mxjob", "kubeflow.org/paddlejob", | ||
"kubeflow.org/pytorchjob", "kubeflow.org/tfjob", "kubeflow.org/xgboostjob"}...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adapters, err := jobframework.GetMultiKueueAdapters(sets.New([]string{ | |
"batch/job", "kubeflow.org/mpijob", "ray.io/rayjob", "ray.io/raycluster", | |
"jobset.x-k8s.io/jobset", "kubeflow.org/mxjob", "kubeflow.org/paddlejob", | |
"kubeflow.org/pytorchjob", "kubeflow.org/tfjob", "kubeflow.org/xgboostjob"}...)) | |
adapters, err := jobframework.GetMultiKueueAdapters(sets.New[string]( | |
"batch/job", "kubeflow.org/mpijob", "ray.io/rayjob", "ray.io/raycluster", | |
"jobset.x-k8s.io/jobset", "kubeflow.org/mxjob", "kubeflow.org/paddlejob", | |
"kubeflow.org/pytorchjob", "kubeflow.org/tfjob", "kubeflow.org/xgboostjob")) |
Also the slice of enabled integrations can be an argument to this function, whit this you can move the tests from "./failedsetup" , in this suite (but sure in their own var _ = ginkgo.Describe(
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am taking the nit, but actually the difference comes with installed CRDs.
I know it's kinda administrator responsibility to make sure CRDs are there, but we would like to document the behaviour when the CRD is missing - that what I think is the reasoning here.
cmd/kueue/main.go
Outdated
@@ -264,10 +265,16 @@ func setupControllers(ctx context.Context, mgr ctrl.Manager, cCache *cache.Cache | |||
} | |||
|
|||
if features.Enabled(features.MultiKueue) { | |||
adapters, err := jobframework.GetMultiKueueAdapters(sets.New(cfg.Integrations.Frameworks...)) | |||
if err != nil { | |||
setupLog.Error(err, "Could not setup MultiKueue controller") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setupLog.Error(err, "Could not setup MultiKueue controller") | |
setupLog.Error(err, "Could not get the enabled multikueue adapters") |
enabledIntegrations := sets.New([]string{"batch/job"}...) | ||
adapters, _ := jobframework.GetMultiKueueAdapters(enabledIntegrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enabledIntegrations := sets.New([]string{"batch/job"}...) | |
adapters, _ := jobframework.GetMultiKueueAdapters(enabledIntegrations) | |
adapters, _ := jobframework.GetMultiKueueAdapters(sets.New("batch/job")) |
The same goes in a couple of other places.
@@ -51,6 +52,7 @@ var ( | |||
) | |||
|
|||
func TestWlReconcile(t *testing.T) { | |||
t.Cleanup(jobframework.EnableIntegrationsForTest(t, "batch/job")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed indeed
util.ExpectObjectToBeDeleted(managerTestCluster.ctx, managerTestCluster.client, managerMultikueueSecret1, true) | ||
}) | ||
|
||
ginkgo.It("Should not create a MPIJob workload, because MPIJob CRD is not installed", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ginkgo.It("Should not create a MPIJob workload, because MPIJob CRD is not installed", func() { | |
ginkgo.It("Should not create a MPIJob workload, when MPIJob adapter is not enabled", func() { |
gomega.Expect(apimeta.IsNoMatchError(err)).To(gomega.BeTrue()) | ||
|
||
wlLookupKey := types.NamespacedName{Name: workloadmpijob.GetWorkloadNameForMPIJob(mpijob.Name, mpijob.UID), Namespace: managerNs.Name} | ||
ginkgo.By("workload not found", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MpiJob integration should be enabled, and we should check that it's workload get's rejected by multikueue die to missing adapter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another test imho, equally important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workload associated to an MPIJob is created by the MPIJob integration, agnostic to multikueue and multikueue configuration.
for MK behavior in case of disabled integration
0b07501
to
8d915b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, but I would like yet to add the "positive" integration test to clearly demonstrate the issue is fixed
util.ExpectObjectToBeDeleted(managerTestCluster.ctx, managerTestCluster.client, managerMultikueueSecret1, true) | ||
}) | ||
|
||
ginkgo.It("Should not create a MPIJob workload, when MPIJob adapter is not enabled", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks ok, but I think we need to add a test case for using the positive scenario (which was broken), that some integrations can be used (e.g. batch/Job) while not all are enabled, maybe "should create batch/Job workload when some frameworks are disabled"
@@ -1638,6 +1644,157 @@ var _ = ginkgo.Describe("Multikueue no GC", ginkgo.Ordered, ginkgo.ContinueOnFai | |||
}) | |||
}) | |||
|
|||
var _ = ginkgo.Describe("Multikueue with disabled integrations", ginkgo.Ordered, ginkgo.ContinueOnFailure, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var _ = ginkgo.Describe("Multikueue with disabled integrations", ginkgo.Ordered, ginkgo.ContinueOnFailure, func() { | |
var _ = ginkgo.Describe("Multikueue with some integration disabled", ginkgo.Ordered, ginkgo.ContinueOnFailure, func() { |
What type of PR is this?
/kind bug
What this PR does / why we need it:
MK adapter didn't contain the condition to only allow enabled integrations.
Thus while using multi cluster setup, all registered MK adapters were used to setup controllers.
In result we get
ClientConnectionFailed error with no matches for kind "XYZ" in "version v1" message
Which issue(s) this PR fixes:
Fixes #3582
Special notes for your reviewer:
Does this PR introduce a user-facing change?