New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
werf helm upgrade fails with error processing rollout phase stage: error tracking resources:
but normal helm doesn't
#6048
Comments
When this happening can you check if the pod By default we ignore the first error per pod, but fail the release if it happens again. This behavior can be configured with annotations: |
I just caught the issue again. Then: ❯ k get pods
NAME READY STATUS RESTARTS AGE
jenkins-0 1/1 Running 0 39s
❯ k describe pod jenkins-0
Name: jenkins-0
Namespace: default
Priority: 0
Service Account: jenkins
Node: k3d-jenkins-agent-dind-test-server-0/172.28.0.2
Start Time: Fri, 05 Apr 2024 17:42:25 -0300
Labels: app.kubernetes.io/component=jenkins-controller
app.kubernetes.io/instance=jenkins
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=jenkins
controller-revision-hash=jenkins-7dd6558c9
statefulset.kubernetes.io/pod-name=jenkins-0
Annotations: checksum/config: 9cda187dbf3e8e406c63fb2fa0f8b9be1282a524afe08f969aaa13634095fca5
Status: Running
IP: 10.42.0.14
IPs:
IP: 10.42.0.14
Controlled By: StatefulSet/jenkins
Init Containers:
init:
Container ID: containerd://2b16a7b993d311fa92aea7c6bfbee146191391e21a52caf53c10b53453f46d73
Image: jenkins-agent-dind-test-registry:5000/jenkins:latest
Image ID: jenkins-agent-dind-test-registry:5000/jenkins@sha256:87909327ff3bea4bcf6067f5be6fa7cfa0f714ef5d4fd75b68987dc17e284396
Port: <none>
Host Port: <none>
Command:
sh
/var/jenkins_config/apply_config.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 05 Apr 2024 17:42:26 -0300
Finished: Fri, 05 Apr 2024 17:42:26 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 50m
memory: 256Mi
Environment: <none>
Mounts:
/var/jenkins_config from jenkins-config (rw)
/var/jenkins_home from jenkins-home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m2nfn (ro)
Containers:
jenkins:
Container ID: containerd://8cd20a8929bf0b96285dd9db763367a677d8f14817271408881b3874db382108
Image: jenkins-agent-dind-test-registry:5000/jenkins:latest
Image ID: jenkins-agent-dind-test-registry:5000/jenkins@sha256:87909327ff3bea4bcf6067f5be6fa7cfa0f714ef5d4fd75b68987dc17e284396
Ports: 8080/TCP, 50000/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--httpPort=8080
State: Running
Started: Fri, 05 Apr 2024 17:42:27 -0300
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 4Gi
Requests:
cpu: 50m
memory: 256Mi
Liveness: http-get http://:http/login delay=0s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:http/login delay=0s timeout=5s period=10s #success=1 #failure=3
Startup: http-get http://:http/login delay=0s timeout=5s period=10s #success=1 #failure=12
Environment:
SECRETS: /run/secrets/additional
POD_NAME: jenkins-0 (v1:metadata.name)
JAVA_OPTS:
JENKINS_OPTS: --webroot=/var/jenkins_cache/war
JENKINS_SLAVE_AGENT_PORT: 50000
CASC_JENKINS_CONFIG: /var/jenkins_home/casc_configs
Mounts:
/run/secrets/additional from jenkins-secrets (ro)
/tmp from tmp-volume (rw)
/var/jenkins_cache from jenkins-cache (rw)
/var/jenkins_config from jenkins-config (ro)
/var/jenkins_home from jenkins-home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m2nfn (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
jenkins-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jenkins
Optional: false
jenkins-secrets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: jenkins
SecretOptionalName: <nil>
jenkins-cache:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
jenkins-home:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: jenkins
ReadOnly: false
sc-config-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-m2nfn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 54s default-scheduler Successfully assigned default/jenkins-0 to k3d-jenkins-agent-dind-test-server-0
Normal Pulling 53s kubelet Pulling image "jenkins-agent-dind-test-registry:5000/jenkins:latest"
Normal Pulled 53s kubelet Successfully pulled image "jenkins-agent-dind-test-registry:5000/jenkins:latest" in 73.017453ms (73.025639ms including waiting)
Normal Created 53s kubelet Created container init
Normal Started 53s kubelet Started container init
Normal Pulling 52s kubelet Pulling image "jenkins-agent-dind-test-registry:5000/jenkins:latest"
Normal Pulled 52s kubelet Successfully pulled image "jenkins-agent-dind-test-registry:5000/jenkins:latest" in 57.934933ms (57.959749ms including waiting)
Normal Created 52s kubelet Created container jenkins
Normal Started 52s kubelet Started container jenkins
Warning Unhealthy 44s kubelet Startup probe failed: HTTP probe failed with statuscode: 503 |
Looks like the pod was indeed unhealthy for some time, probably during the time is was being terminated (as it takes some seconds to terminate, and healthcheck would fail during it). |
I just wonder if werf should indeed be this agressive by default. My intention is just to replace helm calls in my pipelines to use |
I see that this single failed startupProbe is not the reason:
But this likely is:
since
Well, that wasn't our intention, probably we should ignore pod errors when the pod is terminating. For now as a workaround add this annotations to your StatefulSet: |
Before proceeding
Version
1.2.305
How to reproduce
This is a bit difficult to reproduce because it does not happen every time. However I get this error with werf helm upgrade where helm upgrade --wait never fails for this.
Result
Expected result
It was supposed to retry until timeout, I believe. I believe that's what helm upgrade --wait does.
Additional information
If you need me to build a reproducible environment, please let me know.
The text was updated successfully, but these errors were encountered: