Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility between plugin and http templates #14211

Open
3 of 4 tasks
tico24 opened this issue Feb 20, 2025 · 8 comments
Open
3 of 4 tasks

Incompatibility between plugin and http templates #14211

tico24 opened this issue Feb 20, 2025 · 8 comments
Labels
area/agent Argo Agent that runs for HTTP and Plugin templates type/bug

Comments

@tico24
Copy link
Member

tico24 commented Feb 20, 2025

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

It seems that users cannot have plugins in their cluster and expect http templates to work.
This is both undocumented and unexpected.

Setup

  • I installed Argo Workflows from its official release manifest. Versions are documented below.
  • I added the env var as directed by the docs:
            - name: ARGO_EXECUTOR_PLUGINS
              value: "true"
  • I installed the hello-world plugin from argoproj-labs into the same namespace as the controller.
  • I ran a basic workflow with a plugin template to call that plugin. I see the plugin as a sidecar to that Pod.

Testing on 3.6.4

  • If ARGO_EXECUTOR_PLUGINS == true and there IS a plugin installed, the plugin is added as a sidecar of the http template pod AND the http template fails with unclear reasoning. << This is unexpected and undocumented.
    • Even more fun is I have managed to make this pass on occasion. It's intermittent
  • If ARGO_EXECUTOR_PLUGINS == true but there is no plugin installed in the cluster, http template runs fine.
  • If ARGO_EXECUTOR_PLUGINS is not set and there is no plugin installed in the cluster, http template runs fine.

Testing on 3.5.14

Initially, nothing worked and then I saw extra logs in the controller that aren't on 3.6.x:

time="2025-02-20T13:29:59.644Z" level=info msg="Updated message  -> failed to check if secret argo-workflows-agent-ca-certificates exists: secrets \"argo-workflows-agent-ca-certificates\" is forbidden: User \"system:serviceaccount:argo:argo\" cannot get resource \"secrets\" in API group \"\" in the namespace \"workflows\"" namespace=workflows workflow=http-jnn24

(argo is the namespace and serviceaccount of the controller, not the running workflow)

After addressing this with extra RBAC, my findings were the same as 3.6.4:

  • If ARGO_EXECUTOR_PLUGINS == true and there IS a plugin installed, the plugin is added as a sidecar of the http template pod AND the http template fails with unclear reasoning. << This is unexpected and undocumented.
    • Even more fun is I have managed to make this pass on occasion. It's intermittent
  • If ARGO_EXECUTOR_PLUGINS == true but there is no plugin installed in the cluster, http template runs fine.
  • If ARGO_EXECUTOR_PLUGINS is not set and there is no plugin installed in the cluster, http template runs fine.

Additional information

I also repeated the 3.6.4 tests with that new secrets permission added, but the results did not change.

Summary of issues

  1. I can't run http templates if there are plugins in my cluster. At a mimum the documentation needs to be updated to reflect this, but it feels like a bug.
  2. This shouldn't be intermittent. This screams "bug" to me.
  3. 3.5 needed more permissions. It's not clear why these permissions are required. An explanation is required here.
  4. I see no reason why a plugin should run when an http template is called.

I have not tested on latest

Version(s)

v3.6.4, v3.5.14

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: http-
  namespace: workflows
  annotations:
    workflows.argoproj.io/description: >-
      Demonstrates the HTTP template that executes a HTTP request.
    workflows.argoproj.io/maintainer: 'Pipekit Inc'
spec:
  serviceAccountName: workflows
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: get-example-homepage
            template: http
    - name: http
      http:
        timeoutSeconds: 20
        url: https://example.com/
        method: "GET"

Logs from the workflow controller

This is from when plugins=true and there is a plugin in the cluster on 3.6.4:

time="2025-02-20T14:09:21.803Z" level=info msg="Processing workflow" Phase= ResourceVersion=810 namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.810Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=http-jv7f6
time="2025-02-20T14:09:21.810Z" level=info msg="Updated phase  -> Running" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="Retry node http-jv7f6 initialized Running" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="Steps node http-jv7f6-2014232316 initialized Running" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="StepGroup node http-jv7f6-2592250198 initialized Running" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.811Z" level=info msg="Retry node http-jv7f6-438562091 initialized Running" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.812Z" level=info msg="HTTP node http-jv7f6-679105702 initialized Pending" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.812Z" level=info msg="Workflow step group node http-jv7f6-2592250198 not yet completed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.812Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.812Z" level=info msg="Creating TaskSet" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.820Z" level=info msg=reconcileAgentPod namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.826Z" level=warning msg="couldn't retrieve node for nodeName , will get nil templateDeadline" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.826Z" level=warning msg="couldn't get boundaryTemplate through nodeName " namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.837Z" level=info msg="Created Agent pod" namespace=workflows podName=http-jv7f6-1340600742-agent workflow=http-jv7f6
time="2025-02-20T14:09:21.837Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:21.837Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:21.862Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=817 workflow=http-jv7f6
time="2025-02-20T14:09:22.839Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=817 namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.840Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=http-jv7f6
time="2025-02-20T14:09:22.840Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.840Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:22.840Z" level=info msg="Workflow step group node http-jv7f6-2592250198 not yet completed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.840Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.840Z" level=info msg="Creating TaskSet" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.845Z" level=info msg=reconcileAgentPod namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.845Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:22.845Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:23.847Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=817 namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.847Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=http-jv7f6
time="2025-02-20T14:09:23.847Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.847Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:23.848Z" level=info msg="Workflow step group node http-jv7f6-2592250198 not yet completed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.848Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.848Z" level=info msg="Creating TaskSet" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.852Z" level=info msg=reconcileAgentPod namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.852Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:23.852Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:25.123Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=817 namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.123Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=http-jv7f6
time="2025-02-20T14:09:25.123Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.123Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:25.123Z" level=info msg="Workflow step group node http-jv7f6-2592250198 not yet completed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.123Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.123Z" level=info msg="Creating TaskSet" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.128Z" level=info msg=reconcileAgentPod namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.128Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:25.128Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:25.136Z" level=info msg="Workflow update successful" namespace=workflows phase=Running resourceVersion=840 workflow=http-jv7f6
time="2025-02-20T14:09:26.132Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=840 namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.133Z" level=info msg="Task-result reconciliation" namespace=workflows numObjs=0 workflow=http-jv7f6
time="2025-02-20T14:09:26.133Z" level=info msg=updateAgentPodStatus namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.133Z" level=info msg=assessAgentPodStatus namespace=workflows podName=http-jv7f6-1340600742-agent
time="2025-02-20T14:09:26.134Z" level=info msg="Retry Policy: OnError (onFailed: false, onError true)" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="Node not set to be retried after status: Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-438562091 phase Running -> Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-438562091 message: context canceled" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-438562091 finished: 2025-02-20 14:09:26.134683921 +0000 UTC" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="Step group node http-jv7f6-2592250198 deemed failed: child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2592250198 phase Running -> Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2592250198 message: child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2592250198 finished: 2025-02-20 14:09:26.134759254 +0000 UTC" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="step group http-jv7f6-2592250198 was unsuccessful: child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="Outbound nodes of http-jv7f6-438562091 is [http-jv7f6-679105702]" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="Outbound nodes of http-jv7f6-2014232316 is [http-jv7f6-679105702]" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2014232316 phase Running -> Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2014232316 message: child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.134Z" level=info msg="node http-jv7f6-2014232316 finished: 2025-02-20 14:09:26.134836837 +0000 UTC" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="Retry Policy: OnError (onFailed: false, onError true)" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="Node not set to be retried after status: Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="node http-jv7f6 phase Running -> Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="node http-jv7f6 message: child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="node http-jv7f6 finished: 2025-02-20 14:09:26.135065962 +0000 UTC" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="TaskSet Reconciliation" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg=reconcileAgentPod namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="Updated phase Running -> Failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="Updated message  -> child 'http-jv7f6-438562091' failed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.135Z" level=info msg="Marking workflow completed" namespace=workflows workflow=http-jv7f6
time="2025-02-20T14:09:26.143Z" level=info msg="cleaning up pod" action=deletePod key=workflows/http-jv7f6-1340600742-agent/deletePod
time="2025-02-20T14:09:26.162Z" level=info msg="Queueing Failed workflow workflows/http-jv7f6 for delete in 1h0m0s due to TTL"
time="2025-02-20T14:09:26.162Z" level=info msg="Workflow update successful" namespace=workflows phase=Failed resourceVersion=846 workflow=http-jv7f6

Logs from in your workflow's wait container

There is no `wait` container.
@jswxstw
Copy link
Member

jswxstw commented Feb 21, 2025

Maybe similar to #12708, did you check the status of agent pod? Please get the status and logs of agent pod if you can(since it will be deleted when workflow is completed).

In addition, can you provide the workflow detail including spec and status? According to the controller logs, the actual running workflow is different from the example you provided, workflow defaults configed in workflow-controller-configmap?

@jswxstw jswxstw added problem/more information needed Not enough information has been provide to diagnose this issue. area/agent Argo Agent that runs for HTTP and Plugin templates labels Feb 21, 2025
@tico24
Copy link
Member Author

tico24 commented Feb 21, 2025

did you check the status of agent pod?

I am yet to see an agent pod in any of my testing

According to the controller logs, the actual running workflow is different from the example you provided

No it isn't.

workflow defaults configed in

Yes but that's not relevant.

# This file describes the config settings available in the workflow controller configmap
apiVersion: v1
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
data:
  executor: |
    resources:
      requests:
        cpu: 200m
        memory: 128Mi

  mainContainer: |
    resources:
      requests:
        cpu: 20m
        memory: 20Mi

  metricsConfig: |
    enabled: false
    secure: false

  # Default values that will apply to all Workflows from this controller, unless overridden on the Workflow-level
  # See more: docs/default-workflow-specs.md
  workflowDefaults: |
    spec:
      # Time out after 1h
      activeDeadlineSeconds: 3600
      # Delete (archive) workflows after 1h
      ttlStrategy:
        secondsAfterCompletion: 3600
      volumeClaimGC:
        strategy: OnWorkflowCompletion
      podGC:
        strategy: OnPodSuccess
        deleteDelayDuration: 120s
      retryStrategy:
        retryPolicy: OnError
        limit: 3

The whole setup is here if you want to recreate: https://github.com/pipekit/talk-demos/pull/35/files#diff-f9974f30e3e6bc5db30cbefbe6af24ba6a897842664374f76a0e1804bd1c1815

@tico24
Copy link
Member Author

tico24 commented Feb 21, 2025

Your argument in #12078 seems to be that plugins should only live in the same namespace as the workflow.

The documentation on this is extremely light: https://argo-workflows.readthedocs.io/en/latest/executor_plugins/#discovery with nothing to suggest that the setup I'm using is wrong.

@tico24 tico24 removed the problem/more information needed Not enough information has been provide to diagnose this issue. label Feb 21, 2025
@jswxstw
Copy link
Member

jswxstw commented Feb 21, 2025

You are also running a workflow in a non-argo namespace just like I did before, I believe this issue is partly due to the unreasonable plugin loading mechanism and partly due to RBAC problems.

The whole setup is here if you want to recreate: https://github.com/pipekit/talk-demos/pull/35/files#diff-f9974f30e3e6bc5db30cbefbe6af24ba6a897842664374f76a0e1804bd1c1815

There is no ServiceAccount and Secret for hello-executor-plugin.

@tico24
Copy link
Member Author

tico24 commented Feb 21, 2025

You are also running a workflow in a non-argo namespace just like I did before

Again, yes, just like the docs say you can.

There is no ServiceAccount and Secret for hello-executor-plugin.

I agree. Exactly like the docs advise to set it up.

They seem to use the argo service account, which I have patched according to the shouting in the logs here: https://github.com/pipekit/talk-demos/blob/2847b1d0e0858d3033dea5481569e75857f14081/argocon-demos/2025-argo-workflow-templates-a-practical-deep-dive/examples/bootstrap/argo-workflows/add-argo-sa-permissions.yaml

@tico24
Copy link
Member Author

tico24 commented Feb 21, 2025

I will re-run all the tests again later with the plugin in the workflow's namespace and will report back.

From this conversation though, it's clear there is a mismatch between the docs and at least your expectation of how users should use plugin and http templates.

@jswxstw
Copy link
Member

jswxstw commented Feb 21, 2025

From this conversation though, it's clear there is a mismatch between the docs and at least your expectation of how users should use plugin and http templates.

When running a workflow in non-argo namespace, the agent pod will also load all plugins in namespace argo, which needs additional RBAC configuration, and I think it is not reasonable.

In your case, you submit a workflow only includes HTTP templates, but you also have to consider additional plugins loading, which is even more unacceptable.

@tico24
Copy link
Member Author

tico24 commented Feb 24, 2025

Confirmed your hunch. If there are only plugins in the workflow's namespace and not the controller's namespace, it works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Argo Agent that runs for HTTP and Plugin templates type/bug
Projects
None yet
Development

No branches or pull requests

2 participants