Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Possible Regression] A rollout object that uses workloadRef does not transparently pass through template (Deployment reference) to the notification webhook templates #3536

Open
2 tasks done
Nexarian opened this issue Apr 22, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@Nexarian
Copy link

Nexarian commented Apr 22, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

The bug is identical to the closed report here. In summary: For Rollout objects that use WorkloadRef rather than directly inline the deployment manifest, referencing the template object in argo-rollouts-notification-configmap

The above issue was supposed to fix this, so one of the following is therefore true:

  • This is a regression since then.
  • The original issue was never truly fixed (note that the last comment on it is to this effect).
  • I am doing something with my specific Rollout/Webhook/Notification configuration that uniquely breaks this scenario.

To Reproduce

Create rollout:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: test-deployment-service
  annotations:
    notifications.argoproj.io/subscribe.test-service.test-service: ""
spec:
  replicas: 2
  revisionHistoryLimit: 2
  selector:
    matchLabels:
        ...
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-deployment-service

Create webhook:

apiVersion: v1
kind: ConfigMap
metadata:
  name: argo-rollouts-notification-configmap
  labels:
    app.kubernetes.io/instance: test-deployment-service
data:
  template.test-service: |
    webhook:
      test-service:
        method: POST
        body: |
          {
            "data":  "{{ .rollout.spec.template }}"
          }
  trigger.test-service: |
    - description: "When to trigger the test service"
      when: rollout.status.blueGreen.activeSelector == rollout.status.blueGreen.previewSelector
      send:
      - test-service
  service.webhook.test-service: |
    url: [URL of service to call]
    headers:
    - name: accept
      value: application/json
    - name: Content-Type
      value: application/json
    - name: X-API-KEY
      value: $service-token

Deploy rollout with a simple deployment. This was largely tested with a "hello world" style service built using FastAPI.

When the webhook triggers, the data in the template object is template:map[metadata:map[creationTimestamp:<nil>] or simple "no value" in the Argo Rollouts logs.

Expected behavior

For this not to be an empty map with no value. The real thing I'm after is "{{ .rollout.spec.template.spec.containers 0).image }}" but that can't work if .rollout.spec.template doesn't resolve.

Version

Argo Rollouts controller: v1.6.6
Helm Chart: https://github.com/argoproj/argo-helm/blob/argo-rollouts-2.35.1/charts/argo-rollouts

Incidental Information
A few more details may help reproduce this that I'm not 100% sure are relevant:

  • I have --self-service-notification-enabled turned on for my Argo Rollouts Deployment
  • As you may note above, I am using a service token that is stored in argo-rollouts-notification-secret

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@Nexarian Nexarian added the bug Something isn't working label Apr 22, 2024
@Nexarian
Copy link
Author

@meeech Can you share more about your use-case, and share manifests that will help them reproduce?

@meeech
Copy link
Contributor

meeech commented May 1, 2024

@Nexarian I will see what I can do re: simple repro. I agree with you looks like a regression, since what you describe should work according to:

https://github.com/argoproj/argo-rollouts/blob/v1.6.6/utils/record/record.go#L345-L350

	// The JSON marshalling above drops the `spec.template` and `spec.selectors` fields if the rollout
	// is using workload referencing. The following restores those fields in the returned object map
	// so that notification templates can refer to them (as if workload ref was not used).

Which implies to me you should be seeing the template.

@Nexarian
Copy link
Author

Nexarian commented May 1, 2024

@meeech So after chatting with @zachaller, it looks like this isn't so much a regression as something that "looks like" a regression. The original bug was fixed for built-in event triggers (such as on-rollout-complete), but not for customized when blocks, which is what I'm using here.

I could probably switch to using built in event triggers for my use-cases, but that doesn't change this is still a bug.

@zachaller Was thinking it might need to be something fixed here: https://github.com/argoproj/argo-rollouts/blob/master/utils/record/record.go#L313

Here's my investigation from the Slack thread, which I've @'ed you in:

I spent a couple hours digging into this already. My current dev workflow is to simply run "make image" and push it to our private repository, and then I have an experimental EKS cluster where I kill the Argo Rollouts pods (triggering a pull of the updated image), and it immediately tries to fire the webhook. I've been putting logging statements everywhere in order to see what's going on.
I've found the following:
The code from the original fix for this issue never fires, because if you look above, if len(destinations) == 0 { is triggering. The webhook that I've defined, for some reason, isn't following this code path... But that's weird because there ARE logging statements showing that it is trying to to trigger the webhook
Ok, so it might be because I have --self-service-notification-enabled turned on. Is it possible, nay, likely, that the combination of these two things was never tested? Probably. So let's dig into that code path:

notificationcontroller.WithToUnstructured(func(obj metav1.Object) (*unstructured.Unstructured, error) {

I've tried making toObjectMap (the function that's supposed to fix this) public and calling it from the function passed to WithToUnstructured, but several things:
This conversion object.(*v1alpha1.Rollout) is failing, and I'm not sure why. ok is false and ro is nil . This is a hard issue to debug, because a single error in any of the multitudes of fields could cause the conversion to fail, and golang doesn't report to you what went wrong. Only yes/no. Diving into this and figuring out what is mis-aligned could take a very long time!
toObjectMap re-adds ro.Spec.Template, but that is also an empty map in the pre-converted object. I was going to start trying to figure out where the template object is populated from WorkloadRef to see if that is in fact being imported correctly such that this can be used properly.
I think what does that conversion is called an Informer?
I'm also desperately trying to not get sucked into having to ALSO load and modify the core notifications codebase, partially because I don't know how to compile changes to Argo Rollouts AND the Notifications engine into a single image from my dev workstation.
This is where my investigation ends. I'm not sure the original fix ever worked?
I spent a couple hours digging into this already. My current dev workflow is to simply run "make image" and push it to our private repository, and then I have an experimental EKS cluster where I kill the Argo Rollouts pods (triggering a pull of the updated image), and it immediately tries to fire the webhook. I've been putting logging statements everywhere in order to see what's going on.
I've found the following:
The code from the original fix for this issue never fires, because if you look above, if len(destinations) == 0 { is triggering. The webhook that I've defined, for some reason, isn't following this code path... But that's weird because there ARE logging statements showing that it is trying to to trigger the webhook
Ok, so it might be because I have --self-service-notification-enabled turned on. Is it possible, nay, likely, that the combination of these two things was never tested? Probably. So let's dig into that code path:
notificationcontroller.WithToUnstructured(func(obj metav1.Object) (*unstructured.Unstructured, error) {

I've tried making toObjectMap (the function that's supposed to fix this) public and calling it from the function passed to WithToUnstructured, but several things:
This conversion object.(*v1alpha1.Rollout) is failing, and I'm not sure why. ok is false and ro is nil . This is a hard issue to debug, because a single error in any of the multitudes of fields could cause the conversion to fail, and golang doesn't report to you what went wrong. Only yes/no. Diving into this and figuring out what is mis-aligned could take a very long time!
toObjectMap re-adds ro.Spec.Template, but that is also an empty map in the pre-converted object. I was going to start trying to figure out where the template object is populated from WorkloadRef to see if that is in fact being imported correctly such that this can be used properly.
I think what does that conversion is called an Informer?
I'm also desperately trying to not get sucked into having to ALSO load and modify the core notifications codebase, partially because I don't know how to compile changes to Argo Rollouts AND the Notifications engine into a single image from my dev workstation.
This is where my investigation ends. I'm not sure the original fix ever worked?

@meeech
Copy link
Contributor

meeech commented May 1, 2024

@Nexarian thanks for that update. I feel your pain. I'm working on learning a bit better how the informers are working. i suspect those are playing a role in this other issue I'm trying to look into (how the rollout passed to the notif is the old, not latest/updated) so def feel your pain. // As an aside, I would be happy to pair to share how I'm running AR locally - so you don't have to deal with recompiling to trace this down (you can step into the notification code as well) - ping me in slack if you interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants