Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support more Workload kinds: argo rollouts? #397

Closed
thomas-riccardi opened this issue Jul 24, 2023 · 7 comments
Closed
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@thomas-riccardi
Copy link

Expected Behavior

Support custom Workloads kinds: argo rollouts

Actual Behavior

The AuthProxyWorkload "xxx" is invalid: spec.workload.kind: Invalid value: "Rollout": Kind was "Rollout", must be one of CronJob, Job, StatefulSet, Deployment, DaemonSet or Pod

Steps to Reproduce the Problem

  1. spec.workload.kind: Rollout
  2. kubectl apply

Specifications

  • Version: 1.0.2
  • Platform: GKE 1.24

more

I also tried with kind: Pod & selector.matchLabels, but it doesn't seem to work: the Pod does get the cloudsql.cloud.google.com/vesta-web: 1,gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.3.0 annotation, but no extra container is added...

I also tried with kind: Deployment & name, because our argo Rollout uses a Deployment to define the pod spec, but I get the same result as with kind: Pod.

even more

when using on a standard Deployment, the pod spec is not changed except for the annotation.
I initially assumed the operator would patch the pod spec to inject there the extra container. But it seems to do it late, only on the Pod.
Is there a reason for that? it is not documented why, when it was implemented: #41
(it seems to theoretically increase the load on the operator, and asks High Availability questions)

@enocom enocom added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jul 25, 2023
@enocom
Copy link
Member

enocom commented Jul 25, 2023

Thanks for the feature request. This seems like a high value / low cost feature. We'll work on getting it into our upcoming iterations.

@thomas-riccardi
Copy link
Author

@enocom Should I open a second issue for the kind: Pod + selector.matchLabels not working as I expected? (should I expect something else? or is it a bug? or did I configure something wrong?)

@thomas-riccardi
Copy link
Author

If argo rollout support is added, how will it interact with the Rollout Strategy?
https://github.com/GoogleCloudPlatform/cloud-sql-proxy-operator/blob/main/docs/api.md#authproxycontainerspec

The api doc says it will follow the Strategy set on that workload, but when testing the operator I found out that only the Pods are modified, not the workloads pod template spec, so it seems to be hard to guarantee the rollout strategy of unknown workload controllers that way.
Conversely, patching the workload pod template spec would get us that for free.

(side question: if we set rolloutStrategy to None, only new pods (independently created) will get the new proxy?)

@enocom
Copy link
Member

enocom commented Jul 25, 2023

Let me dig into some of the questions and get back to you. Big picture, argo is widely used and should work seamlessly with the Operator.

@hessjcg
Copy link
Collaborator

hessjcg commented Jul 26, 2023

Hi @thomas-riccardi, Thank you for taking the time write. You have raised some interesting questions. I hope these answers help:

If we set rolloutStrategy to None, only new pods (independently created) will get the new proxy?

Yes. I wrote a longer explanation of rolloutStrategy below.

Are any additional workloads supported?

Not yet. The operator only supports these built-in workload types: Pod, ReplicaSet, Deployment, DaemonSet, Job, and CronJob. We would consider adding support for additional, widely used workload types like ArgoCD Rollout or KNative Functions. Please let us know what additional workload types the operator should support.

Is there a bug in kind: Pod + selector.matchLabels?

It is important to remember that once a pod is created, you cannot modify it's PodSpec. When you create an AuthProxyWorkload using kind: Pod + selector.matchLabels, the operator should add the proxy container to pods created after the creation of the AuthProxyWorkload. But the operator cannot add a proxy container to pods that existed before the creation of the AuthProxyWorkload. The operator will add the annotation to pre-existing pods. If you are seeing different behavior when you use Pod + selector.matchLabels, please file a bug.

Why modify the pod instead of the workload's PodSpec?

Thank you, I have been waiting for someone to ask this question. The operator adds the proxy container to Pods during pod creation, rather than modifying the PodSpec of a workload. We spent quite some time debating between these two ways of adding a container to a pod. Ultimately, we decided that the operator would add the container to the pod, not the PodSpec for a few reasons:

  • Job does not allow modifications to its PodSpec. But when a job creates a new pod, the user expects that pod get the latest proxy configuration. The only way to achieve this is to add the container to the pod.
  • This behavior is more compatible with CI/CD tools like ArgoCD, Helm, or Spinnaker. The operator should add the sidecar container, but it should not cause a change that triggers the CI/CD tool to detect the unexpected addition of the proxy container to a workload's PodSpec and then overwrite it.
  • This behavior is in line with how Istio and other operators that add sidecars behave. We took that as external confirmation that CI/CD tools play nice with this approach.

How does rolloutStrategy work?

Kubernetes only permits modification of the PodSpec on some of the built-in workload types: Deployment, DaemonSet, and CronJob. Here's what happens when you modify an AuthProxyWorkload when the rolloutStrategy is set to Workload:

  • The operator receives an event from the Kubernetes API
  • The operator finds matching workloads, determines if the kind allows PodSpec modification, and then updates the annotation on that workload's pods to include the updated "generation" of the AuthProxyWorkload.
  • Then, the built-in kubernetes controllers take over stopping existing pods and creating new pods.
    • For Deployments and Daemonsets, the built-in controller will notice that the podspec has changed, and then follow Deployment's replicas and minimum availability.
    • For CronJobs, the next scheduled job will get a proxy container with the updated AuthProxyWorkload configuration.
  • Finally the operator's webhook adds the proxy container to pods as they are created.

The other built-in workload types do not allow their PodSpec to be modified, so the operator cannot trigger a rollout when the AuthProxyWorkload resource changes.

When rolloutStrategy is set to "None", then the operator will not attempt to trigger a rollout on a Deployment or DaemonSet when the AuthProxyWorkload changes. This is a good choice when you need to carefully orchestrate the replacement of workload pods.

@thomas-riccardi
Copy link
Author

Thank you @hessjcg for the detailed explanation, I was looking for such documentation and didn't find it.

Kind: Pod

OK, it makes totally sense for pre-existing Pods. I confirm it works as expected: new Pods are patched.

Workload support

=> It would be useful to us to get the support for argo rollouts workloads.

In the meantime I think we will use Pod+selector + rolloutStrategy: None.

Why modify the pod instead of the workload's PodSpec? & rolloutStrategy

(For Jobs it makes sens: it's a workload kind that does not handle PodSpec edition; but for the other kinds it could make sense.
I would assume Server-Side Apply would handle conflicts with argo cd & co.
The istio inspiration is indeed a strong argument.)

I wrongly assumed that the operator would not use the workload PodSpec update features (since it didn't patch the containers), but in fact it does, but minimally: via Pod annotations.

=> everything makes sense for me now!
(IMO it would help if the actual behavior would be documented somewhere, here in this issue is a start!)

@hessjcg
Copy link
Collaborator

hessjcg commented Aug 2, 2023

Thomas, I'm glad you found a solution that works for now. Thanks for asking such good questions.

I created a new issue to track feedback for additional workload types, #399. I also created an issue to add the explanation to the docs, #400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants