Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"failed to reserve container name" #7690

Closed
hedefalk opened this issue Jun 8, 2023 · 8 comments
Closed

"failed to reserve container name" #7690

hedefalk opened this issue Jun 8, 2023 · 8 comments
Assignees

Comments

@hedefalk
Copy link

hedefalk commented Jun 8, 2023

Environmental Info:
K3s Version:
k3s version v1.25.6+k3s1 (9176e03)
go version go1.19.5

Node(s) CPU architecture, OS, and Version:
Linux pi1 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux

Cluster Configuration:
Two RPI 4 8GB, 1 server, 1 agent

Describe the bug:
For one of my deployments, I repeatedly get this error "failed to reserve container name xxx, is reserved for yyy" because it seems to be reserved by a previous attempt that might have timed out?

Sometimes it seems it actually deploys after a couple of hours, but iterating over configs makes this a nightmare.

Here are the events of the pod:

kubectl describe pod/homeassistant-58488865d6-sqfg9

Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  24m                 default-scheduler  Successfully assigned default/homeassistant-58488865d6-sqfg9 to pi1
  Warning  Failed     21m (x2 over 22m)   kubelet            Error: failed to reserve container name "homeassistant_homeassistant-58488865d6-sqfg9_default_bc1cc92e-0fce-43bf-91e9-91682563c3bf_0": name "homeassistant_homeassistant-58488865d6-sqfg9_default_bc1cc92e-0fce-43bf-91e9-91682563c3bf_0" is reserved for "6a37ff9b916561e21557c3f39e5b7b2c4f1a21c91c93cb8d86884ed9751f2f5b"
  Normal   Pulled     17m (x13 over 24m)  kubelet            Container image "homeassistant/home-assistant:2023.5" already present on machine
  Warning  Failed     17m (x9 over 19m)   kubelet            Error: failed to reserve container name "homeassistant_homeassistant-58488865d6-sqfg9_default_bc1cc92e-0fce-43bf-91e9-91682563c3bf_1": name "homeassistant_homeassistant-58488865d6-sqfg9_default_bc1cc92e-0fce-43bf-91e9-91682563c3bf_1" is reserved for "d33b950fda9495733511b17227f54a557b304f7d75720211e456a091603a46ad"
  Warning  Failed     2m9s (x8 over 22m)  kubelet            Error: context deadline exceeded

If I look at the host machine's containerd I can see that there are actually two containers matching the ids of the kubelet error messages:

pi@pi1:~ $ sudo ctr c list | grep homeassistant
6a37ff9b916561e21557c3f39e5b7b2c4f1a21c91c93cb8d86884ed9751f2f5b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    
d33b950fda9495733511b17227f54a557b304f7d75720211e456a091603a46ad    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2  

After reading on other reports on containerd, my feeling is that there is some kind of mismatch in kubelet and containerd with timeouts so that kubelet retries too early and it all gets congested?

Steps To Reproduce:
This is the deployment, really nothing special

apiVersion: apps/v1
kind: Deployment
metadata:
  name: homeassistant
  labels:
    app: homeassistant
spec:
  replicas: 1
  selector:
    matchLabels:
      app: homeassistant
  template:
    metadata:
      labels:
        app: homeassistant
    spec:
      containers:
        - name: homeassistant
          image: homeassistant/home-assistant:2023.5
          ports:
            - containerPort: 8123
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: "/config"
              name: homeassistant-persistent-storage
            - mountPath: /dev/skyconnect
              name: skyconnect
            - mountPath: /run/dbus
              name: bluetooth
      restartPolicy: Always
      volumes:
        - name: homeassistant-persistent-storage
          persistentVolumeClaim:
            claimName: homeassistant-pvc-longhorn
        - name: bluetooth
          hostPath:
            path: /run/dbus
        # - name: sonoff-controller
        #   hostPath:
        #     # path: /dev/ttyUSB0
        #     path: /dev/serial/by-id/usb-ITead_Sonoff_Zigbee_3.0_USB_Dongle_Plus_944082cfb512ec118b3721c7bd930c07-if00-port0
        - name: skyconnect # temp solution for skyconnect
          hostPath:
            path: /dev/serial/by-id/usb-Nabu_Casa_SkyConnect_v1.0_4e4431665091ed11a63fc1d13b20a988-if00-port0
            # path: /dev/ttyUSB1
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: device
                    operator: In
                    values:
                      - skyconnect

The only thing that comes to mind is that I'm using longhorn for the persistence, looking at #2312 I got the feeling that people had problems with slow NFS.

Additional context / logs:
Similar but closed: #2312
Similar problems on GKE with containerd: containerd/containerd#4604

@hedefalk
Copy link
Author

hedefalk commented Jun 8, 2023

Oh yeah, there are new containers created over and over, but at all times, there are only two:

pi@pi1:~ $ sudo ctr c list | grep homeassistant
6a37ff9b916561e21557c3f39e5b7b2c4f1a21c91c93cb8d86884ed9751f2f5b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    
d33b950fda9495733511b17227f54a557b304f7d75720211e456a091603a46ad    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2   

…

pi@pi1:~ $ sudo ctr c list | grep homeassistant
1f31153a491f7d279fe5dfabaab2b9ad1890f80f6f34be7f90be72d7d25cc06b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    
2147ee181e7d1d8763847a43acc9388396f7eae0d6d15b150bfb81b60e07f832    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    


pi@pi1:~ $ sudo ctr c list | grep homeassistant
1f31153a491f7d279fe5dfabaab2b9ad1890f80f6f34be7f90be72d7d25cc06b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    
4efa4d673ad4005337ea11f6c94dbe6f6955405471b1c54516fded91ba05f59b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2  

  
pi@pi1:~ $ sudo ctr c list | grep homeassistant
4efa4d673ad4005337ea11f6c94dbe6f6955405471b1c54516fded91ba05f59b    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2    
628eac8fd0de7ec006beea7b4bfe899b4d08b369225b8f3ed5b4d1bbeb4ab229    docker.io/homeassistant/home-assistant:2023.5            io.containerd.runc.v2  

@caroline-suse-rancher caroline-suse-rancher moved this from New to To Triage in K3s Development Jun 21, 2023
@dereknola dereknola self-assigned this Jul 25, 2023
@dereknola
Copy link
Member

I was unable to replicate this on a pi4 with v1.25.11+k3s1.

NAME                             READY   STATUS    RESTARTS   AGE
homeassistant-858bff4c98-mhklq   1/1     Running   0          78m
root@noder1:/home/pi# sudo ctr c list | grep homeassistant
51a45c2fd0deb030a743d59d209d489651f3f3b7ed6a9b8d918b3d5d546c8052    docker.io/homeassistant/home-assistant:2023.5                 io.containerd.runc.v2    

@caroline-suse-rancher caroline-suse-rancher moved this from To Triage to In Triage in K3s Development Aug 28, 2023
@caroline-suse-rancher
Copy link
Contributor

I am going to close this since we're not able to reproduce - we can reopen if new details become available.

@caroline-suse-rancher caroline-suse-rancher closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2023
@github-project-automation github-project-automation bot moved this from In Triage to Done Issue in K3s Development Aug 28, 2023
@hedefalk
Copy link
Author

@dereknola Thank you so much for going through the trouble of trying to replicate on actual hardware.

I still have this problem with no improvement. To concretise my particular problem, it is my homeassistant deployment that is problematic. Anytime I try to change the deployment it gets stuck like this. Before, it might have resolved within a couple of hours or a day with luck but its seems worse now. Tried updating the image yesterday and rollout haven't worked in 35h:

🐟 k get pods | grep homeassistant
homeassistant-58488865d6-6rj6g   1/1     Running                65            75d
homeassistant-84bfbdf77b-gq85g   0/1     CreateContainerError   550           35h
The failing pod:

🐟 k get pod/homeassistant-84bfbdf77b-gq85g -o json
E1222 11:23:49.741526   79549 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1222 11:23:49.790793   79549 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E1222 11:23:49.797381   79549 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        "creationTimestamp": "2023-12-20T22:34:26Z",
        "generateName": "homeassistant-84bfbdf77b-",
        "labels": {
            "app": "homeassistant",
            "pod-template-hash": "84bfbdf77b"
        },
        "name": "homeassistant-84bfbdf77b-gq85g",
        "namespace": "default",
        "ownerReferences": [
            {
                "apiVersion": "apps/v1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "ReplicaSet",
                "name": "homeassistant-84bfbdf77b",
                "uid": "d7cf0c16-fe47-44df-a9d0-9be872e691d7"
            }
        ],
        "resourceVersion": "34622397",
        "uid": "97a58610-7f94-4e3d-9dc9-5240274a0450"
    },
    "spec": {
        "affinity": {
            "nodeAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": {
                    "nodeSelectorTerms": [
                        {
                            "matchExpressions": [
                                {
                                    "key": "device",
                                    "operator": "In",
                                    "values": [
                                        "skyconnect"
                                    ]
                                }
                            ]
                        }
                    ]
                }
            }
        },
        "containers": [
            {
                "image": "homeassistant/home-assistant:2023.12",
                "imagePullPolicy": "IfNotPresent",
                "name": "homeassistant",
                "ports": [
                    {
                        "containerPort": 8123,
                        "protocol": "TCP"
                    }
                ],
                "resources": {},
                "securityContext": {
                    "privileged": true
                },
                "terminationMessagePath": "/dev/termination-log",
                "terminationMessagePolicy": "File",
                "volumeMounts": [
                    {
                        "mountPath": "/config",
                        "name": "homeassistant-persistent-storage"
                    },
                    {
                        "mountPath": "/dev/skyconnect",
                        "name": "skyconnect"
                    },
                    {
                        "mountPath": "/run/dbus",
                        "name": "bluetooth"
                    },
                    {
                        "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount",
                        "name": "kube-api-access-65fr7",
                        "readOnly": true
                    }
                ]
            }
        ],
        "dnsPolicy": "ClusterFirst",
        "enableServiceLinks": true,
        "nodeName": "pi1",
        "preemptionPolicy": "PreemptLowerPriority",
        "priority": 0,
        "restartPolicy": "Always",
        "schedulerName": "default-scheduler",
        "securityContext": {},
        "serviceAccount": "default",
        "serviceAccountName": "default",
        "terminationGracePeriodSeconds": 30,
        "tolerations": [
            {
                "effect": "NoExecute",
                "key": "node.kubernetes.io/not-ready",
                "operator": "Exists",
                "tolerationSeconds": 300
            },
            {
                "effect": "NoExecute",
                "key": "node.kubernetes.io/unreachable",
                "operator": "Exists",
                "tolerationSeconds": 300
            }
        ],
        "volumes": [
            {
                "name": "homeassistant-persistent-storage",
                "persistentVolumeClaim": {
                    "claimName": "homeassistant-pvc-longhorn"
                }
            },
            {
                "hostPath": {
                    "path": "/run/dbus",
                    "type": ""
                },
                "name": "bluetooth"
            },
            {
                "hostPath": {
                    "path": "/dev/serial/by-id/usb-Nabu_Casa_SkyConnect_v1.0_4e4431665091ed11a63fc1d13b20a988-if00-port0",
                    "type": ""
                },
                "name": "skyconnect"
            },
            {
                "name": "kube-api-access-65fr7",
                "projected": {
                    "defaultMode": 420,
                    "sources": [
                        {
                            "serviceAccountToken": {
                                "expirationSeconds": 3607,
                                "path": "token"
                            }
                        },
                        {
                            "configMap": {
                                "items": [
                                    {
                                        "key": "ca.crt",
                                        "path": "ca.crt"
                                    }
                                ],
                                "name": "kube-root-ca.crt"
                            }
                        },
                        {
                            "downwardAPI": {
                                "items": [
                                    {
                                        "fieldRef": {
                                            "apiVersion": "v1",
                                            "fieldPath": "metadata.namespace"
                                        },
                                        "path": "namespace"
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    },
    "status": {
        "conditions": [
            {
                "lastProbeTime": null,
                "lastTransitionTime": "2023-12-20T22:34:26Z",
                "status": "True",
                "type": "Initialized"
            },
            {
                "lastProbeTime": null,
                "lastTransitionTime": "2023-12-20T22:34:26Z",
                "message": "containers with unready status: [homeassistant]",
                "reason": "ContainersNotReady",
                "status": "False",
                "type": "Ready"
            },
            {
                "lastProbeTime": null,
                "lastTransitionTime": "2023-12-20T22:34:26Z",
                "message": "containers with unready status: [homeassistant]",
                "reason": "ContainersNotReady",
                "status": "False",
                "type": "ContainersReady"
            },
            {
                "lastProbeTime": null,
                "lastTransitionTime": "2023-12-20T22:34:26Z",
                "status": "True",
                "type": "PodScheduled"
            }
        ],
        "containerStatuses": [
            {
                "containerID": "containerd://33c49e5b667a5fc65047d66b2b788fb2aa35bba726c4da70e3c2267e402aedf6",
                "image": "docker.io/homeassistant/home-assistant:2023.12",
                "imageID": "docker.io/homeassistant/home-assistant@sha256:128abdfe0b0a82df32a0a192032a86d113564ce2ce5ad470c47d551a53bf5db4",
                "lastState": {
                    "waiting": {}
                },
                "name": "homeassistant",
                "ready": false,
                "restartCount": 551,
                "started": false,
                "state": {
                    "waiting": {
                        "message": "failed to reserve container name \"homeassistant_homeassistant-84bfbdf77b-gq85g_default_97a58610-7f94-4e3d-9dc9-5240274a0450_551\": name \"homeassistant_homeassistant-84bfbdf77b-gq85g_default_97a58610-7f94-4e3d-9dc9-5240274a0450_551\" is reserved for \"33c49e5b667a5fc65047d66b2b788fb2aa35bba726c4da70e3c2267e402aedf6\"",
                        "reason": "CreateContainerError"
                    }
                }
            }
        ],
        "hostIP": "192.168.1.57",
        "phase": "Pending",
        "podIP": "10.42.0.26",
        "podIPs": [
            {
                "ip": "10.42.0.26"
            }
        ],
        "qosClass": "BestEffort",
        "startTime": "2023-12-20T22:34:26Z"
    }
}

I'm writing again to ask for any kind of pointer on what I could try to investigate. I have found these old issues on containerd that are closed. In particular, the suggestions for mitigation doesn't work for me:

containerd/containerd#4604 (comment)

My containerd version seems new enough to also have the fix they are talking about, but I'm wondering about the "k3s1" suffix in my version here, is that a patched version for k3s?

🐟 kubectl get nodes -o wide
NAME   STATUS   ROLES                  AGE    VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
pi1    Ready    control-plane,master   297d   v1.25.6+k3s1   192.168.1.57   <none>        Debian GNU/Linux 11 (bullseye)   6.1.21-v8+       containerd://1.6.15-k3s1
pi4    Ready    <none>                 290d   v1.25.6+k3s1   192.168.1.60   <none>        Debian GNU/Linux 11 (bullseye)   6.1.21-v8+       containerd://1.6.15-k3s1

If I look at the host's containerd I see that it's still trying over and over with the new image here, and there are still two images for the old deployment which is weird?

pi@pi1:~ $ sudo ctr c list | grep homeassistant
3ea5e39efa47474ef4d3716b7be9243471e6b49410e89950e08f1f457ed2e0d3    docker.io/homeassistant/home-assistant:2023.5                  io.containerd.runc.v2    
632c13235fa2c9fef7934b4e3335c3f60b4ad860ed4ffb3e95fd6855f2398b22    docker.io/homeassistant/home-assistant:2023.5                  io.containerd.runc.v2    
781c9d64bf45f6f1afebcd4d41f4802e0de81bf6b525b02ed8368ce72d96073d    docker.io/homeassistant/home-assistant:2023.12                 io.containerd.runc.v2    
pi@pi1:~ $ sudo ctr c list | grep homeassistant
3ea5e39efa47474ef4d3716b7be9243471e6b49410e89950e08f1f457ed2e0d3    docker.io/homeassistant/home-assistant:2023.5                  io.containerd.runc.v2    
632c13235fa2c9fef7934b4e3335c3f60b4ad860ed4ffb3e95fd6855f2398b22    docker.io/homeassistant/home-assistant:2023.5                  io.containerd.runc.v2    
80f10983f1b1fe6750e6ca08ee025e3c34f359a41949f7e24f205ca7e022296c    docker.io/homeassistant/home-assistant:2023.12                 io.containerd.runc.v2   

k3s knows only about one:

✗ k get pods | grep homeassistant
homeassistant-58488865d6-6rj6g   1/1     Running                65            75d
homeassistant-84bfbdf77b-gq85g   0/1     CreateContainerError   556           36h

@brandond
Copy link
Member

@hedefalk v1.25.6+k3s1 is almost a year old, and the whole v1.25 minor version is end of life. Newer releases have long been upgraded to containerd v1.7.x Have you at any point tried upgrading to a version released in the last 12 months?

@hedefalk
Copy link
Author

hedefalk commented Sep 3, 2024

I'm hitting this still on v1.30.4+k3s1

I just now used raspberry-pi-imager and put out an entirely new debian system on a ssd attached to another node "pi2". Attached it to my poe-switch and joined the cluster with k3sup:

k3sup join --host pi2 --server-host pi1 --user pi --k3s-extra-args '--snapshotter=native'
kubectl get nodes -o wide
NAME   STATUS   ROLES                  AGE    VERSION        INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
pi2    Ready    <none>                 53m    v1.30.4+k3s1   192.168.1.42   <none>        Debian GNU/Linux 11 (bullseye)   5.15.61-v8+      containerd://1.7.20-k3s1
pi1    Ready    control-plane,master   553d   v1.25.6+k3s1   192.168.1.57   <none>        Debian GNU/Linux 11 (bullseye)   6.1.21-v8+       containerd://1.6.15-k3s1
pi4    Ready    <none>                 546d   v1.25.6+k3s1   192.168.1.60   <none>        Debian GNU/Linux 11 (bullseye)   6.1.21-v8+       containerd://1.6.15-k3s1

Immediately stuff's getting scheduled but stuck:

kubectl get pods --field-selector spec.nodeName=pi2 --all-namespaces
NAMESPACE         NAME                             READY   STATUS                 RESTARTS      AGE
longhorn-system   engine-image-ei-fc06c6fb-dbjzb   0/1     ContainerCreating      0             49m
longhorn-system   longhorn-manager-4n5np           0/1     ContainerCreating      0             49m
longhorn-system   engine-image-ei-ebe8de04-fmccg   0/1     ContainerCreating      0             49m
longhorn-system   longhorn-csi-plugin-l7c9s        0/3     ContainerCreating      0             49m
kube-system       helm-install-traefik-kz276       0/1     ContainerCreating      0             33m
kube-system       svclb-traefik-45d167d6-4sjpl     0/2     CreateContainerError   6 (42m ago)   49m

Looking at the traefik pod events:

kubectl describe pod/svclb-traefik-45d167d6-4sjpl -n kube-system

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  50m                  default-scheduler  Successfully assigned kube-system/svclb-traefik-45d167d6-4sjpl to pi2
  Normal   Pulling    50m                  kubelet            Pulling image "rancher/klipper-lb:v0.4.0"
  Normal   Pulled     47m                  kubelet            Successfully pulled image "rancher/klipper-lb:v0.4.0" in 2m25.362s (2m25.362s including waiting). Image size: 3664284 bytes.
  Normal   Created    43m (x2 over 46m)    kubelet            Created container lb-tcp-443
  Normal   Started    43m (x2 over 46m)    kubelet            Started container lb-tcp-443
  Normal   Created    41m (x2 over 47m)    kubelet            Created container lb-tcp-80
  Normal   Started    41m (x2 over 47m)    kubelet            Started container lb-tcp-80
  Warning  Failed     39m                  kubelet            Error: context deadline exceeded
  Warning  Failed     37m (x2 over 44m)    kubelet            Error: context deadline exceeded
  Normal   Pulled     37m (x5 over 47m)    kubelet            Container image "rancher/klipper-lb:v0.4.0" already present on machine
  Warning  Failed     37m (x2 over 37m)    kubelet            Error: failed to reserve container name "lb-tcp-443_svclb-traefik-45d167d6-4sjpl_kube-system_1ac7ee4f-35cb-47fa-aa2d-8aaef36ddc5f_2": name "lb-tcp-443_svclb-traefik-45d167d6-4sjpl_kube-system_1ac7ee4f-35cb-47fa-aa2d-8aaef36ddc5f_2" is reserved for "4727aa324502b02d95c68032c12438ddae8be39114173349481271cdf645b5f8"
  Warning  Failed     37m (x2 over 37m)    kubelet            Error: failed to reserve container name "lb-tcp-80_svclb-traefik-45d167d6-4sjpl_kube-system_1ac7ee4f-35cb-47fa-aa2d-8aaef36ddc5f_3": name "lb-tcp-80_svclb-traefik-45d167d6-4sjpl_kube-system_1ac7ee4f-35cb-47fa-aa2d-8aaef36ddc5f_3" is reserved for "e2577fe1f165b32d1cff32a6e23e76debbe6bfcce70bac285dbe8ed885581c09"
  Normal   Pulled     12s (x146 over 46m)  kubelet            Container image "rancher/klipper-lb:v0.4.0" already present on machine

I recognized the pattern and found this old ticket :)

On the node I see something similar as before:

sudo ctr c ls
CONTAINER                                                           IMAGE                                   RUNTIME                  
0a721ad09f4facfbf980ccf0e5eba566357f37f1a4720f4f32c437df8cb915c2    docker.io/rancher/mirrored-pause:3.6    io.containerd.runc.v2    
1139491ef6194499ec23a062a1afb0f7fae79f950be4cec61e8c4ae4310f322a    docker.io/rancher/mirrored-pause:3.6    io.containerd.runc.v2    
15f6fe54f7e3a828b80dc8c4e79edb60795bb29b89f0e6c830fa7595ce718db8    docker.io/rancher/mirrored-pause:3.6    io.containerd.runc.v2    
1b15928517b94561657885e9a9d78951ca64cf3792583b9b39f553fec8e204d4    docker.io/rancher/mirrored-pause:3.6    io.containerd.runc.v2    
4727aa324502b02d95c68032c12438ddae8be39114173349481271cdf645b5f8    docker.io/rancher/klipper-lb:v0.4.0     io.containerd.runc.v2    
5af8d749aaf71be70e666721bc6ba9f3043f6ad4b2b6d60c9d58e90a95c5e337    docker.io/rancher/klipper-lb:v0.4.0     io.containerd.runc.v2    
da738f0553d7c2042b8ef5dad0a42e0b9ea5015b089a19ea166b7dbe7b7e6a7f    docker.io/rancher/klipper-lb:v0.4.0     io.containerd.runc.v2    
db60c9586949bd7a10f45e7d1c1472447870786adf1f503196fb9d1ec1707fa9    docker.io/rancher/mirrored-pause:3.6    io.containerd.runc.v2    
df43af0d513cb38293d65220589a28cd65e6dd676bbd5e78342808167c86f7c0    docker.io/rancher/klipper-lb:v0.4.0     io.containerd.runc.v2 

I guess there's really nothing new here and I hate to re-open without any new info, but maybe I could get some direction on what info I could try do dig into? I'd love to get this working…

Update:
I guess the news here is that it has nothing to do with homeassistant nor longhorn.

@hedefalk
Copy link
Author

hedefalk commented Sep 3, 2024

Oh.

Never mind me, the reason was really slow disk because I accidentally was booting from network. Hadn't successfully fixed bootorder on this node and it was running off of nfs from another machine

@s7an-it
Copy link

s7an-it commented Nov 27, 2024

I get this issue on eks 1.29 AL2 or AL2023 latest and it started to dominate our clusters since 1.27 to 1.29 migration. Basically we use single node setup for many springboot services, when they start above certain number this happens and I get continers in containerruntime error when I log to node things look normal, not special messages. if I wait around 30 min resart of containers in this state works but if I restart earlier it doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

5 participants