Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

application-controller and argocd-server do not restart upon "server.secretkey is missing" failure #22231

Open
3 tasks done
jwtty opened this issue Mar 6, 2025 · 4 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@jwtty
Copy link

jwtty commented Mar 6, 2025

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Hi community, I'm trying to install argocd in a multi-cluster environment where the resources are firstly created on the hub cluster and then deployed to the member clusters. I encounter an issue that when the argocd-server secret is installed after the controllers/servers, the controllers report server.secretkey is missing error in log but they just keep running, no crash but do not function (do not reconcile Applications upon their creation). And I have no eary way to know they are not functioning. I have to manually do a rollout restart on the application-controller, argocd-server to fix the issue. Only the dex server crashes when this error happens, then it is restarted automatically and then it's back to normal because the second time the secret is there and the field is added.

My observation is that even after server.secretkey is automatically generated, the controllers/servers do not detect the event and restart automatically. Any help is welcomed. Thank you!

To Reproduce

Create the deployment first and then create the argocd-server secret.

Expected behavior

The controllers/server pods would crash and be restarted by kubelet when this error happens or when the secret is updated, they detect the change and restart.

Screenshots

kubectl logs -f -n argocd argocd-application-controller-0
time="2025-03-06T19:15:59Z" level=info msg="maxprocs: Leaving GOMAXPROCS=2: CPU quota undefined"
time="2025-03-06T19:15:59Z" level=info msg="ArgoCD Application Controller is starting" built="2025-03-04T20:39:46Z" commit=3d901f2037888af302a85f518bea70b33ee8e1c7 namespace=argocd version=v2.14.4+3d901f2
time="2025-03-06T19:15:59Z" level=info msg="Processing all cluster shards"
time="2025-03-06T19:15:59Z" level=info msg="Processing all cluster shards"
time="2025-03-06T19:15:59Z" level=info msg="appResyncPeriod=3m0s, appHardResyncPeriod=0s, appResyncJitter=0s"
time="2025-03-06T19:15:59Z" level=info msg="Starting configmap/secret informers"
time="2025-03-06T19:15:59Z" level=info msg="Configmap/secret informer synced"
time="2025-03-06T19:15:59Z" level=warning msg="Cannot init sharding. Error while querying clusters list from database: server.secretkey is missing"
time="2025-03-06T19:15:59Z" level=warning msg="Failed to save clusters info: server.secretkey is missing"
time="2025-03-06T19:15:59Z" level=info msg="0xc0004ed260 subscribed to settings updates"
time="2025-03-06T19:15:59Z" level=info msg="Cluster https://kubernetes.default.svc has been assigned to shard 0"
time="2025-03-06T19:15:59Z" level=info msg="Starting secretInformer forcluster"
time="2025-03-06T19:16:00Z" level=warning msg="Unable to parse updated settings: server.secretkey is missing"
time="2025-03-06T19:16:00Z" level=warning msg="Unable to parse updated settings: server.secretkey is missing"
time="2025-03-06T19:16:01Z" level=info msg="Notifying 1 settings subscribers: [0xc0004ed260]"
time="2025-03-06T19:25:59Z" level=info msg="Alloc=12965 TotalAlloc=48945 Sys=34645 NumGC=8 Goroutines=109"
time="2025-03-06T19:35:59Z" level=info msg="Alloc=12917 TotalAlloc=74452 Sys=34645 NumGC=13 Goroutines=109"
time="2025-03-06T19:45:59Z" level=info msg="Alloc=12946 TotalAlloc=100006 Sys=34645 NumGC=18 Goroutines=109"
kubectl get secrets -n argocd argocd-secret -o yaml
apiVersion: v1
data:
  admin.password: JD....
  admin.passwordMtime: Mj....
  server.secretkey: NW...
  tls.crt: LS0t...
  tls.key: LS0t...
kind: Secret
metadata:
  creationTimestamp: "2025-03-06T19:15:56Z"
  labels:
    app.kubernetes.io/name: argocd-secret
    app.kubernetes.io/part-of: argocd
  name: argocd-secret
  namespace: argocd
  resourceVersion: "41282527"
  uid: ...
type: Opaque

Version

Paste the output from `argocd version` here.
argocd: v2.12.6+4dab5bd
  BuildDate: 2024-10-18T18:51:56Z
  GitCommit: 4dab5bd6a60adea12e084ad23519e35b710060a2
  GitTreeState: clean
  GoVersion: go1.22.8
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.14.4+3d901f2

Logs

Paste any relevant application logs here.
@jwtty jwtty added the bug Something isn't working label Mar 6, 2025
@sivchari
Copy link
Contributor

sivchari commented Mar 7, 2025

Hi @jwtty

Surely, app-controller output warning log instead of crashing ref
But, after that you receive the subscribing from informer and it means that all components are ready by reconciliation loop.
I think this error shouldn't need to make app-controoler crashing since the app-controller waits subscribing from informer.
Thanks.

@jwtty
Copy link
Author

jwtty commented Mar 7, 2025

Hi @sivchari, thank you for the explanation. In my case, the controller remains in this state and when I create Applications, they do not reconcile it. Basically the pods are running but they are not functioning.

@sivchari
Copy link
Contributor

sivchari commented Mar 7, 2025

Ah, I got it. Maybe it happens since we don't have the way to retry, right ?
I come up with two ideas

  • Do retry until we can fetch cluster list
  • Must crash using errors.CheckError

Currently, we make it crash when app-controller couldn't prepare live-state cache. it uses setting-manager for constructing and ListClusters also use setting-manager. Thus it might be good to use CheckError too when server.secretkey is missing.

@jwtty
Copy link
Author

jwtty commented Mar 8, 2025

Thanks @sivchari !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants