Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Currently broken - kubelet won't start in GitHub Actions #125

Open
geerlingguy opened this issue Sep 14, 2022 · 18 comments · Fixed by #126
Open

CI Currently broken - kubelet won't start in GitHub Actions #125

geerlingguy opened this issue Sep 14, 2022 · 18 comments · Fixed by #126
Labels
bug Something isn't working

Comments

@geerlingguy
Copy link
Owner

See:

TASK [geerlingguy.kubernetes : Configure Calico networking.] *******************
(https://github.com/geerlingguy/ansible-role-kubernetes/actions/runs/3050168539/jobs/4916977737#step:5:288)
  failed: [instance] (item=kubectl apply -f https://projectcalico.docs.tigera.io/manifests/calico.yaml) => {"ansible_loop_var": "item", "changed": true, "cmd": ["kubectl", "apply", "-f", "https://projectcalico.docs.tigera.io/manifests/calico.yaml"], "delta": "0:00:01.427550", "end": "2022-09-14 04:36:39.655371", "item": "kubectl apply -f https://projectcalico.docs.tigera.io/manifests/calico.yaml", "msg": "non-zero return code", "rc": 1, "start": "2022-09-14 04:36:38.227821", "stderr": "error: unable to recognize \"[https://projectcalico.docs.tigera.io/manifests/calico.yaml\](https://projectcalico.docs.tigera.io/manifests/calico.yaml/)": no matches for kind \"PodDisruptionBudget\" in version \"policy/v1\"", "stderr_lines": ["error: unable to recognize \"[https://projectcalico.docs.tigera.io/manifests/calico.yaml\](https://projectcalico.docs.tigera.io/manifests/calico.yaml/)": no matches for kind \"PodDisruptionBudget\" in version \"policy/v1\""], "stdout": "serviceaccount/calico-kube-controllers created\nserviceaccount/calico-node created\nconfigmap/calico-config

Running locally, I'm hitting:

TASK [geerlingguy.docker : Install Docker packages (with downgrade option).] ***
fatal: [instance]: FAILED! => {"changed": false, "msg": "No package matching 'docker-ce' is available"}

But I'm pretty sure that's because I'm testing under aarch64:

root@instance:/# uname -a
Linux instance 5.10.124-linuxkit #1 SMP PREEMPT Thu Jun 30 08:18:26 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

And I'm not sure if the jammy repo has ARM64 packages available. Trying to force AMD64 now...

@geerlingguy
Copy link
Owner Author

New issue: need to upgrade config:

your configuration file uses a deprecated API spec: "kubeadm.k8s.io/v1beta2"

@geerlingguy
Copy link
Owner Author

Now getting errors on image pull...

[WARNING ImagePull]: failed to pull image registry.k8s.io/kube-apiserver:v1.25.0: output: E0914 20:11:48.464757    9501 remote_image.go:238] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to extract layer sha256:2adf2408b0c8334511b0fc0a5bae060e8035a781e9136acbd8818ef11feb6314: failed to convert whiteout file \"usr/local/.wh..wh..opq\": operation not supported: unknown" image="registry.k8s.io/kube-apiserver:v1.25.0"
time="2022-09-14T20:11:48Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.25.0\": failed to extract layer sha256:2adf2408b0c8334511b0fc0a5bae060e8035a781e9136acbd8818ef11feb6314: failed to convert whiteout file \"usr/local/.wh..wh..opq\": operation not supported: unknown"
, error: exit status 1

@geerlingguy
Copy link
Owner Author

I set /var/lib/containerd as a volume, and that got past the image pull issues (I had originally mounted /var/lib/dockerd I think in molecule config, so I switched that over). Now getting error initializing kubelet:

# journalctl -xeu kubelet

Sep 14 20:25:40 instance kubelet[8449]: E0914 20:25:40.482760    8449 kubelet.go:1397] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubepods] doesn't exist"

@geerlingguy
Copy link
Owner Author

Had to add a couple options for running inside a container: kubernetes/kubernetes#43704

--cgroups-per-qos=false --enforce-node-allocatable=""

And then also merged in work from #107 and it is passing locally. Running it through CI now.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 14, 2022

Hmm... now on Debian 11 only, on GitHub Actions, I'm getting:

[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs"
output: "modprobe: FATAL: Module configs not found in directory /lib/modules/5.15.0-1019-azure

That's just a warning though... maybe not a fatal issue with the install. It seems to run just fine locally...

@geerlingguy
Copy link
Owner Author

Full error log:

fatal: [instance]: FAILED! => {"changed": true, "cmd": ["kubeadm", "init", "--config", "/etc/kubernetes/kubeadm-kubelet-config.yaml", "--ignore-preflight-errors=all"], "delta": "0:04:19.056679", "end": "2022-09-15 15:44:48.878558", "msg": "non-zero return code", "rc": 1, "start": "2022-09-15 15:40:29.821879", "stderr": "    [WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
    [WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: FATAL: Module configs not found in directory /lib/modules/5.15.0-1019-azure\
", err: exit status 1
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["    [WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet", "    [WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs", output: "modprobe: FATAL: Module configs not found in directory /lib/modules/5.15.0-1019-azure\
", err: exit status 1", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[init] Using Kubernetes version: v1.25.1
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 5.15.0-1019-azure
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
CGROUPS_PIDS: enabled
CGROUPS_HUGETLB: enabled
CGROUPS_BLKIO: enabled
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [instance kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.17.0.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [instance localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [instance localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 15, 2022

And output of systemctl status kubelet:

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Thu 2022-09-15 17:25:35 UTC; 4min 0s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 6443 (kubelet)
      Tasks: 14 (limit: 8322)
     Memory: 60.5M
     CGroup: /system.slice/containerd.service/system.slice/kubelet.service
             └─6443 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.8 --fail-swap-on=false --cgroup-driver=systemd --cgroups-per-qos=false --enforce-node-allocatable=

Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.631091    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.686704    6443 eviction_manager.go:256] "Eviction manager: failed to get summary stats" err="failed to get node info: node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.703980    6443 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.731903    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806676    6443 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806723    6443 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806746    6443 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806789    6443 pod_workers.go:965] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)" with CreatePodSandboxError: "Failed to create sandbox for pod \"etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f\" instead: unknown"" pod="kube-system/etcd-instance" podUID=0a73e99efc4a65e4141ea60e81c26a22
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.833965    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.934599    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"

And journalctl -xeu kubelet:

-- Journal begins at Thu 2022-09-15 17:23:57 UTC, ends at Thu 2022-09-15 17:29:36 UTC. --
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.159588    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: I0915 17:28:15.237463    6443 kubelet_node_status.go:70] "Attempting to register node" node="instance"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.237746    6443 kubelet_node_status.go:92] "Unable to register node with API server" err="Post "https://172.17.0.2:6443/api/v1/nodes": dial tcp 172.17.0.2:6443: connect: connection refused" node="instance"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.248415    6443 certificate_manager.go:471] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://172.17.0.2:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": dial tcp 172.17.0.2:6443: connect: connection refused
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.260523    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.360929    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.461388    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.561630    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.662109    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.679296    6443 eviction_manager.go:256] "Eviction manager: failed to get summary stats" err="failed to get node info: node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.685164    6443 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.747839    6443 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/5a1f58b79f8ea85a1288feb85800a0915a7b23b53915dbf0485b2775662a24e7" instead: unknown"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.747898    6443 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/5a1f58b79f8ea85a1288feb85800a0915a7b23b53915dbf0485b2775662a24e7" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.747923    6443 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/5a1f58b79f8ea85a1288feb85800a0915a7b23b53915dbf0485b2775662a24e7" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.747972    6443 pod_workers.go:965] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)" with CreatePodSandboxError: "Failed to create sandbox for pod \"etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/k8s.io/5a1f58b79f8ea85a1288feb85800a0915a7b23b53915dbf0485b2775662a24e7\" instead: unknown"" pod="kube-system/etcd-instance" podUID=0a73e99efc4a65e4141ea60e81c26a22
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.763093    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.863691    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:15 instance kubelet[6443]: E0915 17:28:15.964388    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:16 instance kubelet[6443]: E0915 17:28:16.064858    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:16 instance kubelet[6443]: E0915 17:28:16.164926    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:16 instance kubelet[6443]: E0915 17:28:16.265625    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:16 instance kubelet[6443]: E0915 17:28:16.366140    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:28:16 instance kubelet[6443]: E0915 17:28:16.466804    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
...
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.686704    6443 eviction_manager.go:256] "Eviction manager: failed to get summary stats" err="failed to get node info: node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.703980    6443 kubelet.go:2373] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.731903    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806676    6443 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806723    6443 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806746    6443 kuberuntime_manager.go:772] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f" instead: unknown" pod="kube-system/etcd-instance"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.806789    6443 pod_workers.go:965] "Error syncing pod, skipping" err="failed to "CreatePodSandbox" for "etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)" with CreatePodSandboxError: "Failed to create sandbox for pod \"etcd-instance_kube-system(0a73e99efc4a65e4141ea60e81c26a22)\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/k8s.io/aa90a4ddb5aec74b7522a01fe1b4da47db1165f3b659959df0fde00ab0c81c8f\" instead: unknown"" pod="kube-system/etcd-instance" podUID=0a73e99efc4a65e4141ea60e81c26a22
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.833965    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:35 instance kubelet[6443]: E0915 17:29:35.934599    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.035503    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.136578    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.237083    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.337666    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.438273    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.540611    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.640888    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found"
Sep 15 17:29:36 instance kubelet[6443]: E0915 17:29:36.741988    6443 kubelet.go:2448] "Error getting node" err="node "instance" not found""

@geerlingguy
Copy link
Owner Author

Hmm... getting this on Rocky Linux 8, too:

[init] Using Kubernetes version: v1.25.1
[preflight] Running pre-flight checks
[preflight] The system verification failed. Printing the output from the verification:
KERNEL_VERSION: 5.15.0-1019-azure
OS: Linux
CGROUPS_CPU: enabled
CGROUPS_CPUACCT: enabled
CGROUPS_CPUSET: enabled
CGROUPS_DEVICES: enabled
CGROUPS_FREEZER: enabled
CGROUPS_MEMORY: enabled
CGROUPS_PIDS: enabled
CGROUPS_HUGETLB: enabled
CGROUPS_BLKIO: enabled
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [instance kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.17.0.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [instance localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [instance localhost] and IPs [172.17.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

[WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet
[WARNING FileExisting-tc]: tc not found in system path
[WARNING SystemVerification]: failed to parse kernel config: unable to load kernel module: "configs"
output: "", err: exec: "modprobe": executable file not found in $PATH
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

@geerlingguy
Copy link
Owner Author

On my Mac, I'm seeing:

Sep 15 22:10:54 instance kubelet[10875]: I0915 22:10:54.300087   10875 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Sep 15 22:10:54 instance kubelet[10875]: E0915 22:10:54.314932   10875 run.go:74] "command failed" err="failed to run Kubelet: could not detect clock speed from output: \"processor\\t: 0\\nBogoMIPS\\t: 48.00\\nFeatures\\t: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint\\nCPU implementer\\t: 0x00\\nCPU architecture: 8\\nCPU variant\\t: 0x0\\nCPU part\\t: 0x000\\nCPU revision\\t: 0\\n\\nprocessor\\t: 1\\nBogoMIPS\\t: 48.00\\nFeatures\\t: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint\\nCPU implementer\\t: 0x00\\nCPU architecture: 8\\nCPU variant\\t: 0x0\\nCPU part\\t: 0x000\\nCPU revision\\t: 0\\n\\nprocessor\\t: 2\\nBogoMIPS\\t: 48.00\\nFeatures\\t: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint\\nCPU implementer\\t: 0x00\\nCPU architecture: 8\\nCPU variant\\t: 0x0\\nCPU part\\t: 0x000\\nCPU revision\\t: 0\\n\\nprocessor\\t: 3\\nBogoMIPS\\t: 48.00\\nFeatures\\t: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint\\nCPU implementer\\t: 0x00\\nCPU architecture: 8\\nCPU variant\\t: 0x0\\nCPU part\\t: 0x000\\nCPU revision\\t: 0\\n\\nprocessor\\t: 4\\nBogoMIPS\\t: 48.00\\nFeatures\\t: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint\\nCPU implementer\\t: 0x00\\nCPU architecture: 8\\nCPU variant\\t: 0x0\\nCPU part\\t: 0x000\\nCPU revision\\t: 0\\n\\n\""
Sep 15 22:10:54 instance systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Sep 15 22:10:54 instance systemd[1]: kubelet.service: Failed with result 'exit-code'.

I already have the volume mount /sys/fs/cgroup:/sys/fs/cgroup:rw, so not sure why this is not working.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 15, 2022

Maybe need to add --cgroupns=host, see: ansible/molecule#3349

...but I wonder what version of Docker is running on Ubuntu in the GitHub Actions environments...

@sasidatta
Copy link

sasidatta commented Oct 9, 2022

@geerlingguy I have tested this role in debian and Ubuntu vm's and got same issue. Looks like the error is related to this issue.
projectcalico/calico#4570
Error: no matches for kind "PodDisruptionBudget"

@geerlingguy
Copy link
Owner Author

@sasidatta - It seems like the Calico problem was fixed with some of the changes in my PR (#126), but it's not working for some other reasons now.

@geerlingguy
Copy link
Owner Author

I noticed I was still using kubernetes_kubelet_extra_args and it seemed the systemd cgroup driver was not being used as a result, causing a format mismatch:

runc create failed: expected cgroupsPath to be of format slice:prefix:name for systemd cgroups, got instead: unknown

@geerlingguy
Copy link
Owner Author

Switching to use kubernetes_config_kubelet_configuration instead, and I also had to fix the systemd 'Failed to connect to bus' error on Docker for Mac so I could debug more quickly locally.

@geerlingguy
Copy link
Owner Author

Now getting:

OCI runtime create failed: runc create failed

And it looks like it's having issues with the /dev/null file. And there is a lot of unrelated content that gets dug up in Google Search :(

@geerlingguy
Copy link
Owner Author

Meh, having some other issues now, it seems like containers just aren't getting started.

@geerlingguy geerlingguy changed the title CI Currently broken on at least Calico install CI Currently broken - kubelet won't start in GitHub Actions Oct 26, 2022
@geerlingguy geerlingguy reopened this Oct 26, 2022
@geerlingguy
Copy link
Owner Author

Not quite fixed, I just merged all the work in that PR since it's still useful for SOOO many different things (e.g. see #132 as well...).

@geerlingguy geerlingguy added the bug Something isn't working label Oct 26, 2022
@noranraskin
Copy link

Running the kubelet init task with elevated privileges gets me past the errors, but everything is increadibly unstable. The cluster never leaves the 'NotReady' stage and crashes completely after a few minutes. Kubelet, Docker and containerd services are running and seem to be healthy. Using Ubuntu 22 on a single node cluster. Don't quite know how to debug this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants