Skip to content

Kubelet systemd watchdog broken - NOTIFY_SOCKET unset after READY=1 #13725

@mikroskeem

Description

@mikroskeem

Environmental Info:
K3s Version:

k3s version v1.35.1+k3s1 (50fa2d70)
go version go1.25.6

Node(s) CPU architecture, OS, and Version:

Linux fcos-dev 6.18.10-200.fc43.aarch64 #1 SMP PREEMPT_DYNAMIC Wed Feb 11 17:23:35 UTC 2026 aarch64 GNU/Linux

Cluster Configuration:

Single server

Describe the bug:

Kubelet's systemd watchdog integration (enabled by default since Kubernetes v1.35) does not work because k3s removes NOTIFY_SOCKET from the process environment after sending READY=1.

The kubelet watchdog goroutine cannot send WATCHDOG=1 keepalives, so systemd considers the service unresponsive and kills it.

Steps To Reproduce:

  • Installed k3s v1.35.1+k3s1
  • Set Type=notify and WatchdogSec=60 in the k3s systemd service unit
  • Start the service

Expected behavior:

Kubelet sends periodic WATCHDOG=1 keepalives to systemd (at WatchdogSec / 2 intervals) and the service stays running. This is the standard kubelet behavior documented at https://kubernetes.io/docs/reference/node/systemd-watchdog/.

Actual behavior:

Kubelet logs "Failed to notify watchdog" repeatedly (see logs below) and never sends keepalives. After the WatchdogSec timeout, systemd kills the service.

Additional context / logs:

Mar 04 00:37:53 fcos-dev k3s[3681]: E0304 00:37:53.204461    3681 watchdog_linux.go:163] "Failed to notify watchdog" err="failed to notify systemd watchdog, notification not supported - (i.e. NOTIFY_SOCKET is unset)"

Both the server (pkg/cli/server/server.go) and agent (pkg/agent/run.go) startup paths stash and unset NOTIFY_SOCKET early to prevent embedded components from sending a premature READY=1.

When the service is actually ready, the socket is temporarily restored and SdNotify(true, "READY=1\n") is called - but the true (unsetEnvironment) argument removes it from the environment again immediately after.

By the time the kubelet watchdog goroutine tries to send WATCHDOG=1, the env var is gone.

The fork/reexec path in pkg/cli/cmds/log_linux.go has the same problem - it sets NOTIFY_SOCKET= (empty) in the child environment.

A possible fix would be to change SdNotify(true, "READY=1\n") to SdNotify(false, "READY=1\n") in the server and agent paths so NOTIFY_SOCKET remains set after the ready notification.

The early os.Unsetenv already prevents premature notifications from embedded components - there's no need to strip it again after READY=1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    No status

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions