Summary
Replace the reclaim-agent's /proc polling (rung 1, already shipped) with the Linux netlink proc connector (PROC_EVENT_EXEC via NETLINK_CONNECTOR / CN_IDX_PROC) so process matches are delivered to the agent by the kernel instead of discovered by a scan loop.
Tracked as rung 2 in the emergency-reclaim roadmap at ~/dev/roadmaps/5spot-emergency-reclaim-by-process-match.md (status row: "Phase 2 — Node-side agent (rung 2: netlink proc connector) ⏳ Not started").
Motivation
Rung 1 already meets the <5s JVM-launch-to-node-leaves-cluster SLA, so this is an optimization, not a correctness fix. Payoffs:
- Detection latency: worst-case ~1s (poll interval) → <10ms (kernel push).
- Idle CPU: rung 1 scans every PID every tick even when nothing matches; rung 2 sleeps until the kernel wakes it.
- Determinism: rung 1's worst case depends on poll-phase luck; rung 2's is bounded by kernel scheduling.
Explicit tradeoff to weigh before shipping: under heavy-exec workloads (make -j32, compilation farms) rung 2 sees every short-lived process while rung 1 only sees processes that survive to the next tick. Rung 1 can be cheaper under exec storms.
Scope
- New
netlink_proc module behind #[cfg(target_os = \"linux\")] with a non-Linux compile stub.
- Keep
scan_proc (rung 1) wired as a --poll fallback; select via CLI flag (e.g. --detector=netlink|poll, default netlink on Linux).
- Reuse the already-factored
match_pid so both rungs share match logic.
- Subscribe via
PROC_CN_MCAST_LISTEN control message; parse cn_msg + proc_event byte structures.
- Unit tests: byte-level parse of hand-crafted
cn_msg / proc_event payloads (runnable on macOS).
- Runtime verification: Linux-only — needs a kind cluster or real node.
Deployment changes
- Add
CAP_NET_ADMIN to the 5spot-reclaim-agent container (rung 1 only needs UID 0 + host /proc).
- Update
.trivyignore / Semgrep suppression rationale blocks to cover the added capability with the same architectural-necessity justification pattern used for KSV-0012 / KSV-0105 / KSV-0118.
- Document the rung-1 vs rung-2 selection knob in
docs/src/concepts/emergency-reclaim.md.
Dependency choice (open)
Three candidates — pick before implementation:
nix — minimal, raw socket + manual struct parsing; matches the roadmap's stated preference ("plain netlink socket + struct parsing — no libbpf/aya dependency").
neli — full-featured, handles netlink framing; heaviest dep.
netlink-proto + netlink-packet-core — async-native rust-netlink stack; middle weight.
Default recommendation: nix (smallest surface, aligns with roadmap phrasing).
Out of scope
- eBPF tracepoint on
sched_process_exec via aya — rejected for MVP; revisit only if netlink proves inadequate.
- Cross-node signalling (explicitly non-goal per the parent roadmap).
Acceptance criteria
Summary
Replace the reclaim-agent's
/procpolling (rung 1, already shipped) with the Linux netlink proc connector (PROC_EVENT_EXECviaNETLINK_CONNECTOR/CN_IDX_PROC) so process matches are delivered to the agent by the kernel instead of discovered by a scan loop.Tracked as rung 2 in the emergency-reclaim roadmap at
~/dev/roadmaps/5spot-emergency-reclaim-by-process-match.md(status row: "Phase 2 — Node-side agent (rung 2: netlink proc connector) ⏳ Not started").Motivation
Rung 1 already meets the <5s JVM-launch-to-node-leaves-cluster SLA, so this is an optimization, not a correctness fix. Payoffs:
Explicit tradeoff to weigh before shipping: under heavy-exec workloads (
make -j32, compilation farms) rung 2 sees every short-lived process while rung 1 only sees processes that survive to the next tick. Rung 1 can be cheaper under exec storms.Scope
netlink_procmodule behind#[cfg(target_os = \"linux\")]with a non-Linux compile stub.scan_proc(rung 1) wired as a--pollfallback; select via CLI flag (e.g.--detector=netlink|poll, defaultnetlinkon Linux).match_pidso both rungs share match logic.PROC_CN_MCAST_LISTENcontrol message; parsecn_msg+proc_eventbyte structures.cn_msg/proc_eventpayloads (runnable on macOS).Deployment changes
CAP_NET_ADMINto the5spot-reclaim-agentcontainer (rung 1 only needs UID 0 + host/proc)..trivyignore/ Semgrep suppression rationale blocks to cover the added capability with the same architectural-necessity justification pattern used for KSV-0012 / KSV-0105 / KSV-0118.docs/src/concepts/emergency-reclaim.md.Dependency choice (open)
Three candidates — pick before implementation:
nix— minimal, raw socket + manual struct parsing; matches the roadmap's stated preference ("plain netlink socket + struct parsing — no libbpf/aya dependency").neli— full-featured, handles netlink framing; heaviest dep.netlink-proto+netlink-packet-core— async-native rust-netlink stack; middle weight.Default recommendation:
nix(smallest surface, aligns with roadmap phrasing).Out of scope
sched_process_execviaaya— rejected for MVP; revisit only if netlink proves inadequate.Acceptance criteria
netlink_procmodule compiles on Linux; no-op stub on macOS.--detector=netlink|pollCLI flag selects between rungs; defaultnetlinkon Linux.cn_msgandproc_eventstructures pass on macOS.CAP_NET_ADMINwith updated suppression rationale blocks.emergency-reclaim.mddescribes the rung-1 vs rung-2 choice and the exec-storm tradeoff.