Skip to content

Phase 2 rung 2: netlink proc connector for event-driven emergency reclaim #40

@ebourgeois

Description

@ebourgeois

Summary

Replace the reclaim-agent's /proc polling (rung 1, already shipped) with the Linux netlink proc connector (PROC_EVENT_EXEC via NETLINK_CONNECTOR / CN_IDX_PROC) so process matches are delivered to the agent by the kernel instead of discovered by a scan loop.

Tracked as rung 2 in the emergency-reclaim roadmap at ~/dev/roadmaps/5spot-emergency-reclaim-by-process-match.md (status row: "Phase 2 — Node-side agent (rung 2: netlink proc connector) ⏳ Not started").

Motivation

Rung 1 already meets the <5s JVM-launch-to-node-leaves-cluster SLA, so this is an optimization, not a correctness fix. Payoffs:

  1. Detection latency: worst-case ~1s (poll interval) → <10ms (kernel push).
  2. Idle CPU: rung 1 scans every PID every tick even when nothing matches; rung 2 sleeps until the kernel wakes it.
  3. Determinism: rung 1's worst case depends on poll-phase luck; rung 2's is bounded by kernel scheduling.

Explicit tradeoff to weigh before shipping: under heavy-exec workloads (make -j32, compilation farms) rung 2 sees every short-lived process while rung 1 only sees processes that survive to the next tick. Rung 1 can be cheaper under exec storms.

Scope

  • New netlink_proc module behind #[cfg(target_os = \"linux\")] with a non-Linux compile stub.
  • Keep scan_proc (rung 1) wired as a --poll fallback; select via CLI flag (e.g. --detector=netlink|poll, default netlink on Linux).
  • Reuse the already-factored match_pid so both rungs share match logic.
  • Subscribe via PROC_CN_MCAST_LISTEN control message; parse cn_msg + proc_event byte structures.
  • Unit tests: byte-level parse of hand-crafted cn_msg / proc_event payloads (runnable on macOS).
  • Runtime verification: Linux-only — needs a kind cluster or real node.

Deployment changes

  • Add CAP_NET_ADMIN to the 5spot-reclaim-agent container (rung 1 only needs UID 0 + host /proc).
  • Update .trivyignore / Semgrep suppression rationale blocks to cover the added capability with the same architectural-necessity justification pattern used for KSV-0012 / KSV-0105 / KSV-0118.
  • Document the rung-1 vs rung-2 selection knob in docs/src/concepts/emergency-reclaim.md.

Dependency choice (open)

Three candidates — pick before implementation:

  • nix — minimal, raw socket + manual struct parsing; matches the roadmap's stated preference ("plain netlink socket + struct parsing — no libbpf/aya dependency").
  • neli — full-featured, handles netlink framing; heaviest dep.
  • netlink-proto + netlink-packet-core — async-native rust-netlink stack; middle weight.

Default recommendation: nix (smallest surface, aligns with roadmap phrasing).

Out of scope

  • eBPF tracepoint on sched_process_exec via aya — rejected for MVP; revisit only if netlink proves inadequate.
  • Cross-node signalling (explicitly non-goal per the parent roadmap).

Acceptance criteria

  • New netlink_proc module compiles on Linux; no-op stub on macOS.
  • --detector=netlink|poll CLI flag selects between rungs; default netlink on Linux.
  • Byte-level parse tests for cn_msg and proc_event structures pass on macOS.
  • Manifests add CAP_NET_ADMIN with updated suppression rationale blocks.
  • Runtime verified on a real Linux node: JVM launch produces a match within <100ms.
  • Documentation in emergency-reclaim.md describes the rung-1 vs rung-2 choice and the exec-storm tradeoff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions