Skip to content

feat: add ExecApprovalsStore write path and wire coordinator side effects#526

Open
AlexAlves87 wants to merge 5 commits into
openclaw:masterfrom
AlexAlves87:feat/exec-approvals-write-path
Open

feat: add ExecApprovalsStore write path and wire coordinator side effects#526
AlexAlves87 wants to merge 5 commits into
openclaw:masterfrom
AlexAlves87:feat/exec-approvals-write-path

Conversation

@AlexAlves87
Copy link
Copy Markdown
Contributor

@AlexAlves87 AlexAlves87 commented May 23, 2026

What

Adds the write path to ExecApprovalsStore and wires the two side-effect calls into ExecApprovalsCoordinator, closing the persistence loop for the exec approvals V2 pipeline.

Store — new public API:

  • AddAllowlistEntryAsync(agentId, pattern) — persists a new allowlist entry after an AllowAlways prompt decision. Deduplicates on write (OrdinalIgnoreCase). Returns true if the entry is present after the call (added or already there), false on empty pattern or I/O failure. New entries carry {id, pattern}; lastUsedAt is stamped later by RecordAllowlistUseAsync on first successful use (matches macOS parity).
  • RecordAllowlistUseAsync(agentId, pattern, command, resolvedPath) — updates lastUsedAt, lastUsedCommand, and lastResolvedPath for every matching entry after a final allow. Returns false if pattern not found or on I/O failure.

Store — private infrastructure:

  • UpdateFileAsync(mutate) — load → mutate → atomic save, serialized by the existing SemaphoreSlim. Never throws. Refuses to overwrite a malformed file. Handles transient IOException on the atomic move as a degraded path: logs Warn, no retry.

Coordinator — side effects wired:

  • RecordAllowlistUsageAsync fires on both allow exit points: the pass1 pre-approved branch and the post-pass2 branch. Both are required to cover the common allowlist-satisfied case.
  • PersistAllowlistEntriesAsync fires only after pass2 = Allow and followupDecision == AllowAlways, strictly outside the _promptLock block.
  • Both helpers are best-effort: a store I/O failure is absorbed and does not affect the final Allow result.

Not wired in production yet: the coordinator is still not referenced in any production src/ file. The ProductionWiring_CoordinatorNotReferencedInSrc test remains green.

Design notes

Side effects fire strictly after the final allow decision is confirmed — not before the second evaluator pass. This is a deliberate structural safety choice: the guarantee is structural rather than relying on proof that Evaluate(context, AllowAlways) always produces Allow.

Pattern validation in AddAllowlistEntryAsync is non-empty only, matching macOS parity. Basename-only patterns are inert at match time but not rejected at persist time.

Testing

145 tests passing, 0 failures. 24 new tests added in this change (17 store, 7 coordinator).

Store tests cover: success paths, dedup, not-found, malformed-file refusal, I/O failure degradation on both mutators, concurrency (5 concurrent writes produce a single entry), and round-trip JSON validation.

Coordinator tests cover: AllowAlways persistence, non-allowlist security guard, duplicate pattern dedup, allowlist usage recording, allowlist-not-satisfied guard, pass1 pre-approved path, and fallback path with allowlist satisfied.

Real behavior proof

End-to-end coordinator/store runtime proof using real filesystem I/O (test RuntimeProof_AllowAlways_PersistsAndRecordsLastUsed in tests/OpenClaw.Shared.Tests/ExecApprovalsCoordinatorTests.cs).

Scope clarification: this is slice runtime proof, not full production runtime proof. The coordinator is intentionally not wired in production yet. A follow-up production wiring slice (SetV2Handler(coordinator) + feature flag) connects the coordinator, and the WinUI prompt dialog is a separate slice after that. UI-driven proof against the live app is only meaningful once those land.

Reproduce locally:

dotnet test tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj \
  --filter "FullyQualifiedName~RuntimeProof_AllowAlways_PersistsAndRecordsLastUsed" \
  --logger "console;verbosity=detailed" --nologo

Captured output (initial / post-AllowAlways / post-allowlist-hit):

=== Initial exec-approvals.json ===
{"version":1,"agents":{"main":{"security":"allowlist","ask":"always"}}}

=== After AllowAlways (correlationId=proof-1) ===
{
  "version": 1,
  "agents": {
    "main": {
      "security": "allowlist",
      "ask": "always",
      "allowlist": [
        {
          "id": "fc9ee349-abe6-4c08-9352-7ed397b7c491",
          "pattern": "C:\\WINDOWS\\system32\\cmd.EXE"
        }
      ]
    }
  }
}

=== After allowlist hit (correlationId=proof-2) ===
{
  "version": 1,
  "agents": {
    "main": {
      "security": "allowlist",
      "ask": "always",
      "allowlist": [
        {
          "id": "fc9ee349-abe6-4c08-9352-7ed397b7c491",
          "pattern": "C:\\WINDOWS\\system32\\cmd.EXE",
          "lastUsedAt": 1779521352175,
          "lastUsedCommand": "cmd",
          "lastResolvedPath": "C:\\WINDOWS\\system32\\cmd.EXE"
        }
      ]
    }
  }
}

The entry id is identical between the two invocations, proving on-disk dedup. The lastUsedAt/lastUsedCommand/lastResolvedPath fields appear only after the second invocation, proving RecordAllowlistUseAsync fires on the allowlist-hit path.

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

AlexAlves87 and others added 2 commits May 23, 2026 08:15
…ects (PR8)

Implements the store write path (AddAllowlistEntryAsync, RecordAllowlistUseAsync)
and wires the side-effect calls into ExecApprovalsCoordinator.

Side effects fire strictly after the final allow decision is confirmed (both
pass1 pre-approved and post-pass2 branches). Best-effort: never throw; any
I/O failure degrades to a logged warning. UpdateFileAsync refuses to write
a malformed file. SemaphoreSlim serializes intra-process writes.

Pattern validation is non-empty only, matching macOS parity. New entries
carry {id, pattern} only — lastUsedAt is absent on creation and stamped by
RecordAllowlistUseAsync on first use (macOS addAllowlistEntry parity).

Rail 19 preserved: coordinator not referenced in any production src/ file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strip rail codes, PR numbers, research doc references, D-labels, CVE/ADR
tags, and other planning labels from all ExecApprovals source files.
No logic changed — comments only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 23, 2026

Codex review: needs changes before merge.

Latest ClawSweeper review: 2026-05-23 07:49 UTC / May 23, 2026, 3:49 AM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch adds ExecApprovalsStore allowlist write/update APIs, wires ExecApprovalsCoordinator side effects, cleans exec-approval comments, and adds store/coordinator tests including a runtime proof test.

Reproducibility: yes. for the review finding: source inspection shows wildcard allowlist entries are part of the resolved allowlist, while the new recording method only searches the concrete agent bucket. I did not run tests because the checkout must remain read-only.

PR rating
Overall: 🦐 gold shrimp
Proof: 🐚 platinum hermit
Patch quality: 🦐 gold shrimp
Summary: The proof is now useful for this staged slice, but patch quality is capped by the wildcard allowlist metadata bug.

Rank-up moves:

  • Fix wildcard allowlist usage recording and add a regression test for agents["*"].
  • Provide or rely on CI for the required build, shared-test, and tray-test validation after the repair.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (live_output): The PR body includes copied detailed runtime output from a real-filesystem coordinator/store proof showing the after-fix allowlist write and later lastUsed metadata update; full app proof is not applicable until production wiring exists.

Risk before merge

  • Wildcard allowlist entries can authorize execution today, but this PR would not record lastUsed metadata for those allowed hits.
  • This code writes exec approval state and metadata, so maintainers should treat the side-effect conditions as security-boundary behavior even though no authorization bypass was found.
  • Local validation was not run because this review is read-only; the PR body reports 145 passing tests, but full build/shared/tray validation logs were not independently verified.

Maintainer options:

  1. Record wildcard matches before merge (recommended)
    Update the write path so allowlist hits from agents["*"] receive lastUsed metadata, and add focused regression coverage for that case.
  2. Accept missing wildcard metadata
    Maintainers could intentionally merge without wildcard usage metadata, but that leaves shared allowlist audit fields incomplete for commands that were actually authorized.
  3. Pause until the next wiring slice
    If maintainers want the matched-entry ownership model settled with production wiring, pause this PR and fold the write-path repair into that follow-up.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Update ExecApprovalsStore/ExecApprovalsCoordinator so RecordAllowlistUseAsync records lastUsed* metadata for allowlist entries matched from both the concrete agent and agents["*"], without changing production V2 wiring. Add regression coverage for a wildcard allowlist hit and rerun the required build/shared/tray validation.

Next step before merge
A narrow automated repair can target wildcard allowlist usage recording and tests; no product decision is needed.

Security
Cleared: No concrete supply-chain, secret-handling, or authorization bypass defect was found, but the touched allowlist persistence path is security-boundary code.

Review findings

  • [P2] Record wildcard allowlist hits — src/OpenClaw.Shared/ExecApprovals/ExecApprovalsStore.cs:108-109
Review details

Best possible solution:

Preserve the staged no-production-wiring approach, but update usage recording to persist metadata for the actual matched allowlist entry source, including wildcard entries, before merge.

Do we have a high-confidence way to reproduce the issue?

Yes for the review finding: source inspection shows wildcard allowlist entries are part of the resolved allowlist, while the new recording method only searches the concrete agent bucket. I did not run tests because the checkout must remain read-only.

Is this the best way to solve the issue?

No: the write path should record metadata against the actual matched allowlist entry source, not only the concrete agent bucket. The rest of the staged store/coordinator shape looks maintainable once that gap is fixed.

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied detailed runtime output from a real-filesystem coordinator/store proof showing the after-fix allowlist write and later lastUsed metadata update; full app proof is not applicable until production wiring exists.
  • add rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🐚 platinum hermit, patch quality is 🦐 gold shrimp, and The proof is now useful for this staged slice, but patch quality is capped by the wildcard allowlist metadata bug.
  • add status: 🔁 re-review loop: A fresh ClawSweeper review was explicitly requested after the latest review. Sufficient (live_output): The PR body includes copied detailed runtime output from a real-filesystem coordinator/store proof showing the after-fix allowlist write and later lastUsed metadata update; full app proof is not applicable until production wiring exists.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
  • remove status: 📣 needs proof: Current PR status label is status: 🔁 re-review loop.

Label justifications:

  • P2: This is a normal-priority feature PR with a focused correctness gap in exec approval metadata, not an active production regression.
  • merge-risk: 🚨 security-boundary: The diff writes exec allowlist authorization state and audit metadata, so incorrect side effects can affect command-execution boundary review.
  • rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🐚 platinum hermit, patch quality is 🦐 gold shrimp, and The proof is now useful for this staged slice, but patch quality is capped by the wildcard allowlist metadata bug.
  • status: 🔁 re-review loop: A fresh ClawSweeper review was explicitly requested after the latest review. Sufficient (live_output): The PR body includes copied detailed runtime output from a real-filesystem coordinator/store proof showing the after-fix allowlist write and later lastUsed metadata update; full app proof is not applicable until production wiring exists.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied detailed runtime output from a real-filesystem coordinator/store proof showing the after-fix allowlist write and later lastUsed metadata update; full app proof is not applicable until production wiring exists.

Full review comments:

  • [P2] Record wildcard allowlist hits — src/OpenClaw.Shared/ExecApprovals/ExecApprovalsStore.cs:108-109
    ResolveReadOnly includes entries from agents["*"] before the concrete agent, so a command can be allowed by a wildcard allowlist entry. This new write path only searches file.Agents[key].Allowlist, so wildcard-authorized executions return false here and never get lastUsed* metadata. Search the wildcard bucket too, or carry the matched entry's owner through the coordinator, and add a regression test for a wildcard allowlist hit.
    Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.86

Acceptance criteria:

  • ./build.ps1
  • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
  • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore

What I checked:

Likely related people:

  • AlexAlves87: Authored the recent exec approvals evaluator, prompt-adapter, and coordinator slices already on main, and blame ties the current coordinator implementation to commit 12416d2. (role: feature owner and recent area contributor; confidence: high; commits: 6d58171c5847, d8c2ca5184ad, 12416d282a23; files: src/OpenClaw.Shared/ExecApprovals/ExecApprovalsCoordinator.cs, src/OpenClaw.Shared/ExecApprovals/ExecApprovalsStore.cs, tests/OpenClaw.Shared.Tests/ExecApprovalsCoordinatorTests.cs)
  • Scott Hanselman: Exec approvals history includes review feedback attributed to Shanselman, and the current store file's grafted blame points at a broad Scott Hanselman merge, though that broad import makes ownership less direct. (role: reviewer signal and adjacent merger; confidence: medium; commits: 488e2e9286ba, 3d16bac78fec; files: src/OpenClaw.Shared/ExecApprovals/ExecApprovalsStore.cs, src/OpenClaw.Shared/ExecApprovals/ExecApprovalEvaluator.cs, src/OpenClaw.Shared/ExecApprovals/ExecAllowlistMatcher.cs)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5a87c9921c31.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 23, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 23, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 23, 2026

ClawSweeper PR egg

🔁 Wobbling: a re-review loop is active, so the shell is rattling.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

…ce and lastUsed recording

End-to-end coordinator/store test using real filesystem I/O. Surfaces the
on-disk exec-approvals.json content at three points (initial, post-AllowAlways,
post-allowlist-hit) via ITestOutputHelper so the JSON is visible under
`dotnet test ... --logger "console;verbosity=detailed"`. Demonstrates both
side-effect paths: AllowAlways persists a new entry, and a later allowlist
hit records lastUsed* metadata against the same entry id (dedup).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 23, 2026

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 🔁 re-review loop A fresh ClawSweeper review was explicitly requested after the latest review. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 23, 2026
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 23, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

ResolveReadOnly merges entries from agents["*"] into the resolved
allowlist for any concrete agent, so a hit can be authorized by the
wildcard bucket. RecordAllowlistUseAsync was only searching the concrete
agent bucket, so wildcard-authorized executions never accumulated
lastUsed* metadata.

The method now iterates both the concrete agent bucket and "*", updating
metadata wherever the pattern matches. If agentId is already "*", only
the wildcard bucket is searched (no double-pass).

Tests added:
- Store: wildcard-only bucket hit updates metadata.
- Store: same pattern in both buckets updates both entries.
- Coordinator: end-to-end regression with allowlist living under "*".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 23, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 🔁 re-review loop A fresh ClawSweeper review was explicitly requested after the latest review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant