Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to disable strict pod spec validation #3540

Open
2 of 3 tasks
LucasZanellaMBTI opened this issue Nov 15, 2024 · 6 comments
Open
2 of 3 tasks

Add option to disable strict pod spec validation #3540

LucasZanellaMBTI opened this issue Nov 15, 2024 · 6 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@LucasZanellaMBTI
Copy link

LucasZanellaMBTI commented Nov 15, 2024

What would you like to be added:

We would need to influence strictness level of Pod spec validation.

Why is this needed:

It turned out Kueue validates Pod spec differently compared to plain K8s.
In our case workloads have been blocked due to env vars being duplicated in Pod spec.
Because there are many components involved regarding pod spec creation and manipulation
we cannot guarantee env var keys to be unique.

Plain K8s wouldn't complain about duplicated env vars. This is why we think it would be helpful
to offer more control on Kueue's validation mechanism.

This could also be relevant to resource types other than Pods.

Log of kueue controller v0.8.1:

 {
    "level": "error",
    "ts": "2024-11-14T17:56:56.563529963Z",
    "caller": "controller/controller.go:329",
    "msg": "Reconcilererror",
    "controller": "v1_pod",
    "namespace": "",
    "name": "",
    "reconcileID": "",
    "error": "Workload.kueue.x-k8s.io\"pod-reproduce-env-bug-o5ymx1-n0-0-24899\"isinvalid:spec.podSets[0].template.spec.containers[0].env[2]:Duplicatevalue:map[string]interface{},{\"name\":\"DUPLICATE_ENV\"},",
    "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"
  }

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Lucas Zanella [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@LucasZanellaMBTI LucasZanellaMBTI added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 15, 2024
@mimowo
Copy link
Contributor

mimowo commented Nov 15, 2024

I believe this is not about Kueue validation but the underlying bug in SSA which prevents updates when fields are duplicated. For reference: kubernetes/kubernetes#113482.

Given the bug for SSA exists for 2 years now I think it is unlikely this will be resolved soon, so I would suggest to consider withdrawing from SSA in Kueue. This is what we did for PodFailurePolicy in the core k8s: kubernetes/kubernetes#121103.

I guess we could maintain two modes in Kueue behind a feature gate like UseSSAForWorkload, in case someone needs it. WDYT @tenzen-y ?

@mbobrovskyi
Copy link
Contributor

mbobrovskyi commented Nov 18, 2024

I believe this is not about Kueue validation but the underlying bug in SSA which prevents updates when fields are duplicated. For reference: kubernetes/kubernetes#113482.

Given the bug for SSA exists for 2 years now I think it is unlikely this will be resolved soon, so I would suggest to consider withdrawing from SSA in Kueue. This is what we did for PodFailurePolicy in the core k8s: kubernetes/kubernetes#121103.

I guess we could maintain two modes in Kueue behind a feature gate like UseSSAForWorkload, in case someone needs it. WDYT @tenzen-y ?

The problem occurs when creating the workload, but not when patching it.

if err = r.client.Create(ctx, wl); err != nil {
return err
}

@mimowo
Copy link
Contributor

mimowo commented Nov 18, 2024

@mbobrovskyi can you check which layer rejects the creation? Is this validation in Kueue? If validation if Kueue I suppose it might have been added to prevent later failures due to SSA for patches.

@mbobrovskyi
Copy link
Contributor

mbobrovskyi commented Nov 18, 2024

@mbobrovskyi can you check which layer rejects the creation? Is this validation in Kueue?

No, it looks like k8s validation.

@mimowo
Copy link
Contributor

mimowo commented Nov 18, 2024

Interesting, the error message in the description is from controller/controller.go:329, and the message contains Workload.kueue.x-k8s.io\"pod-reproduce-env-bug-o5ymx1-n0-0-24899, which suggests that the object already exists.

@mbobrovskyi
Copy link
Contributor

mbobrovskyi commented Nov 18, 2024

I tried with creation. But I believe we have the same issue with update.

{"level":"Level(-2)","ts":"2024-11-18T05:35:30.888806634Z","caller":"jobframework/reconciler.go:332","msg":"Reconciling Job","controller":"v1_pod","namespace":"default","name":"kueue-sleep-bswbh","reconcileID":"541c2f22-019a-4285-a710-5a7f28c0c0ba","job":"default/kueue-sleep-bswbh","gvk":"/v1, Kind=Pod"}
{"level":"error","ts":"2024-11-18T05:35:30.902655051Z","caller":"jobframework/reconciler.go:410","msg":"Handling job with no workload","controller":"v1_pod","namespace":"default","name":"kueue-sleep-bswbh","reconcileID":"541c2f22-019a-4285-a710-5a7f28c0c0ba","job":"default/kueue-sleep-bswbh","gvk":"/v1, Kind=Pod","error":"Workload.kueue.x-k8s.io \"pod-kueue-sleep-bswbh-cb334\" is invalid: spec.podSets[0].template.spec.containers[0].env[1]: Duplicate value: map[string]interface {}{\"name\":\"DUPLICATE\"}","stacktrace":"sigs.k8s.io/kueue/pkg/controller/jobframework.(*JobReconciler).ReconcileGenericJob\n\t/workspace/pkg/controller/jobframework/reconciler.go:410\nsigs.k8s.io/kueue/pkg/controller/jobs/pod.(*Reconciler).Reconcile\n\t/workspace/pkg/controller/jobs/pod/pod_controller.go:123\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}
{"level":"error","ts":"2024-11-18T05:35:30.902756843Z","caller":"controller/controller.go:316","msg":"Reconciler error","controller":"v1_pod","namespace":"default","name":"kueue-sleep-bswbh","reconcileID":"541c2f22-019a-4285-a710-5a7f28c0c0ba","error":"Workload.kueue.x-k8s.io \"pod-kueue-sleep-bswbh-cb334\" is invalid: spec.podSets[0].template.spec.containers[0].env[1]: Duplicate value: map[string]interface {}{\"name\":\"DUPLICATE\"}","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants