-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Kubelet systemd watchdog broken - NOTIFY_SOCKET unset after READY=1 #13725
Description
Environmental Info:
K3s Version:
k3s version v1.35.1+k3s1 (50fa2d70)
go version go1.25.6
Node(s) CPU architecture, OS, and Version:
Linux fcos-dev 6.18.10-200.fc43.aarch64 #1 SMP PREEMPT_DYNAMIC Wed Feb 11 17:23:35 UTC 2026 aarch64 GNU/Linux
Cluster Configuration:
Single server
Describe the bug:
Kubelet's systemd watchdog integration (enabled by default since Kubernetes v1.35) does not work because k3s removes NOTIFY_SOCKET from the process environment after sending READY=1.
The kubelet watchdog goroutine cannot send WATCHDOG=1 keepalives, so systemd considers the service unresponsive and kills it.
Steps To Reproduce:
- Installed k3s v1.35.1+k3s1
- Set
Type=notifyandWatchdogSec=60in the k3s systemd service unit - Start the service
Expected behavior:
Kubelet sends periodic WATCHDOG=1 keepalives to systemd (at WatchdogSec / 2 intervals) and the service stays running. This is the standard kubelet behavior documented at https://kubernetes.io/docs/reference/node/systemd-watchdog/.
Actual behavior:
Kubelet logs "Failed to notify watchdog" repeatedly (see logs below) and never sends keepalives. After the WatchdogSec timeout, systemd kills the service.
Additional context / logs:
Mar 04 00:37:53 fcos-dev k3s[3681]: E0304 00:37:53.204461 3681 watchdog_linux.go:163] "Failed to notify watchdog" err="failed to notify systemd watchdog, notification not supported - (i.e. NOTIFY_SOCKET is unset)"
Both the server (pkg/cli/server/server.go) and agent (pkg/agent/run.go) startup paths stash and unset NOTIFY_SOCKET early to prevent embedded components from sending a premature READY=1.
When the service is actually ready, the socket is temporarily restored and SdNotify(true, "READY=1\n") is called - but the true (unsetEnvironment) argument removes it from the environment again immediately after.
By the time the kubelet watchdog goroutine tries to send WATCHDOG=1, the env var is gone.
The fork/reexec path in pkg/cli/cmds/log_linux.go has the same problem - it sets NOTIFY_SOCKET= (empty) in the child environment.
A possible fix would be to change SdNotify(true, "READY=1\n") to SdNotify(false, "READY=1\n") in the server and agent paths so NOTIFY_SOCKET remains set after the ready notification.
The early os.Unsetenv already prevents premature notifications from embedded components - there's no need to strip it again after READY=1.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status