feat(ops): add shadow CI system for dependency-aware test selection by smartcontracts · Pull Request #19440 · ethereum-optimism/optimism

smartcontracts · 2026-03-07T02:03:46Z

Summary

Adds a complete shadow CI system (ops/shadow-ci/) that runs alongside existing CI to prove equivalence before replacing it.

Language adapters: Go (go list), Solidity (import parsing + remappings), Rust (cargo metadata) — build dependency graphs to compute affected targets from changed files
Core engine: AffectedComputer, Planner, Executor (with retry + classification), Fingerprinter, ComparisonEngine — orchestrates dependency-aware test selection and proves catch rate
Platform adapter: CircleCI renderer (TestPlan → YAML), results fetcher (artifacts → junit XML parsing)
Event store: NDJSON-based unified event store with 20+ event types covering every decision point
Aggregator: Weekly reports (catch rate, false negatives, speedup, top flakes) and dashboard data
Agents: FlakeInvestigator, GraphMaintainer, ConfigVerifier, ReportAnalyst — autonomous remediation
6 CLI tools: affected, planner, render, runner, compare, aggregate
Config-driven activation: shadow → belt-and-suspenders → primary, controlled by scoping.yaml

Key design decisions

Observability-first: Every component emits structured events. Catch rate, false negatives, flake rate, skip rate, compute reduction — all derived from events.
Safety by default: Starts in shadow mode (no gate on PRs). Belt-and-suspenders mode runs both systems and blocks if they disagree. Primary mode only after proven equivalence.
Self-healing: False negatives automatically trigger graph gap detection and always-run list additions.
51 files, ~5,400 lines of Go with own go.mod — zero dependencies on monorepo Go code.

Test plan

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Complete implementation of the shadow CI system that runs alongside existing CI to prove equivalence before replacing it. The system computes affected targets via dependency graphs, runs only the tests that could be affected by a change, classifies failures as real/flake/ infrastructure, and compares results against the main CI pipeline. Architecture (5 layers): - Layer 1: Language adapters (Go via go list, Solidity via import parsing, Rust via cargo metadata) - Layer 2: Core engine (AffectedComputer, Planner, Executor, Classifier, Fingerprinter, ComparisonEngine) - Layer 3: Platform adapter (CircleCI YAML rendering, result fetching) - Layer 4: Data layer (NDJSON event store, aggregator, dashboard data) - Layer 5: Agents (Flake Investigator, Graph Maintainer, Config Verifier, Report Analyst) 6 CLI entry points: affected, planner, render, runner, compare, aggregate 3 YAML config files controlling adapters, scoping, and platform Full test suite covering BFS, classification, fingerprinting, comparison, event store, Solidity import parsing, and config loading Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

wiz-inc-a178a98b5d · 2026-03-07T02:04:22Z

Wiz Scan Summary

⚠️ Many findings detected

Many findings were detected, but only a subset of the findings are displayed inline due to API constraints. To view all findings inline, please click here.

Scanner	Findings
Vulnerabilities	-
Sensitive Data	-
Secrets	-
IaC Misconfigurations	-
SAST Findings	43 9
Software Management Findings	-

Total	43 9

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

- Add yamlSafe template function to escape YAML-special characters in rendered CircleCI config (defense-in-depth for text/template) - Validate --language and --config CLI flags reject path separators - Validate git ref arguments reject leading dashes (flag injection) - Add validatePathComponent to local platform adapter to reject path traversal in pipeline IDs and artifact names - Add tests for yamlSafe and validatePathComponent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add pipeline decision engine that evaluates every CI job category: - Go tests, fuzz, lint, generated mocks - Solidity tests (6-feature matrix), upgrade tests, checks, heavy fuzz - Cannon tests, acceptance tests - Builds, publishes, docker, rust CI/e2e The `affected` binary now produces a PipelineDecision alongside the AffectedResult. Each job category is evaluated as RUN/SKIP based on: - Dependency graph analysis (Go, Sol, Rust) - Path-based trigger matching - Branch context (develop-only, schedule-only, tag-only) - Force-all triggers (.circleci/, mise.toml) - Fuzz package routing (only fuzz affected packages) CircleCI integration via .circleci/continue/shadow-ci.yml: - Merged with main.yml via path-filtering - Shadow jobs reference main.yml jobs (contracts-bedrock-build, etc.) - Every shadow job checks the decision file and halts if not needed - Shadow mode: test failures captured but don't gate PRs - Comparison job aggregates results and compares against main CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove cross-workflow requires (contracts-bedrock-build, cannon-prestate from main workflow can't be referenced); build artifacts inline instead - Consolidate dual workspace roots to single /tmp root - Replace comma-containing matrix values with explicit job entries to avoid CircleCI parameter parsing issues - Remove PyYAML import (not on cimg/base); use JSON-only fallback - Add Rust binary builds (kona, op-rbuilder, rollup-boost) for acceptance tests via submodule checkout + cargo build Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add comprehensive tests for the pipeline decision engine covering: - All category types: path-based, graph-based, fuzz, always, tag-only, schedule-only, develop-only, always-on-develop - Feature matrix and config propagation through both path and graph paths - Force-all behavior and its interaction with skip-type categories - Priority ordering (schedule-only > tag-only > develop-only > always > ...) - matchPaths prefix, suffix, and glob-prefix matching - mergeTargets deduplication and empty-set edge cases - resolveAlwaysRun exact and prefix matching - applyConfidence threshold promotion - checkForceAllPaths - Comparison engine: partial catch rate, flake dedup, performance metrics - Full realistic pipeline integration test Engine coverage: 44.1% -> 60.7% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Go tests, cannon tests, fuzz, and acceptance tests only need src/ and scripts/ artifacts — not test contracts. Switch from `just forge-build` (compiles everything) to `just build-no-tests` (skips **/test/**) to cut contract build time in shadow CI jobs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds `validate` command that checks CircleCI pipeline YAML for the class of bugs we just fixed: - Dual workspace roots (persist_to_workspace/attach_workspace at multiple paths in the same job) - Cross-workflow requires (referencing jobs from another workflow) - Commas in matrix parameter values - Non-stdlib Python imports (yaml, requests) in inline scripts - Requires referencing non-existent job names (with matrix expansion) The validator runs as the first step in shadow-ci-setup, before computing affected targets. Includes an integration test that validates the actual shadow-ci.yml. Usage: ops/shadow-ci/bin/validate --pipeline .circleci/continue/shadow-ci.yml --strict Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-03-07T03:10:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.3%. Comparing base (c14cd1e) to head (ff282b1).
⚠️ Report is 30 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #19440      +/-   ##
===========================================
+ Coverage     75.3%    76.3%    +1.0%     
===========================================
  Files          193      729     +536     
  Lines        11256    81441   +70185     
===========================================
+ Hits          8476    62196   +53720     
- Misses        2636    19101   +16465     
  Partials       144      144

Flag	Coverage Δ
cannon-go-tests-64	`66.4% <ø> (ø)`
contracts-bedrock-tests	`80.2% <ø> (ø)`
unit	`76.5% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 536 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Implement the full adaptive test placement system: - 4-stage CI model (PR, merge queue, post-merge, nightly) with automatic stage detection from branch patterns - Stage-aware decision engine that filters categories by placement, with pinned constraints that cannot be overridden - Dynamic pipeline generation via CircleCI continuation API, replacing the 834-line static shadow-ci.yml with a ~80-line setup shell - Flake lifecycle management (healthy → suspected → quarantined → shaking → diagnosed → fixed/accepted) with severity escalation - Flake reactor for hot-path detection and auto-recovery - Co-failure correlation engine with Wilson score confidence intervals - Placement optimizer (cold path) with 4 optimization rules: merge queue flake demotion, correlation-based deferral, slow-test nightly promotion, false negative feedback promotion - False negative feedback loop via correlation decay detection - Auto-revert system for post-merge failures with flake filtering - New CLI tools: flake-reactor, optimize, auto-revert Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce a state.Store interface for persisting blobs (flake DB, correlation matrix) across CI pipeline runs. Two backends: - LocalStore: filesystem (dev/testing) - CircleCIStore: fetches state from previous pipeline's artifacts via CircleCI API, saves locally for upload via store_artifacts The CircleCI flow: 1. Pipeline starts, CircleCIStore.Load() fetches flake-db.json from the last successful pipeline's artifacts 2. Flake reactor and decision engine work with the loaded state 3. Updated state is saved to artifacts dir 4. store_artifacts step uploads it for the next run This replaces the assumption that /tmp persists across pipelines. FlakeDB gains LoadFlakeDBFromStore/SaveToStore methods. All CLIs fall back to direct file paths (--flake-db) when no state store is configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The RenderFromDecision method was generating flat test jobs without build prerequisites. When test categories (go_tests, acceptance_tests) depend on build categories (contracts_build, cannon_prestate), the build jobs must exist in the rendered YAML and test jobs need requires clauses pointing to them. Changes: - Add DependsOn, Command, WorkspacePaths, RunnerClass to JobCategoryConfig - Wire depends_on in scoping.yaml for all test→build relationships - RenderFromDecision now resolves the dependency graph: auto-includes build prerequisite jobs, renders them with persist_to_workspace, and wires requires clauses on dependent test jobs - Separate build and test job templates (build: checkout+build+persist, test: checkout+attach+runner+artifacts) - Add transitive dependency resolution (e.g. cannon_prestate→contracts_build) - Add render tests including pipeline validator integration test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The circleci state backend requires CIRCLE_TOKEN which may not be set in all CI environments. Fall back to local filesystem store with a warning instead of hard-failing. State won't persist across pipelines but everything else works. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ence Upstash free tier (10K commands/day, 256MB) provides a zero-cost key-value store for shadow CI state. REST API only — no client library. Set UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN in CircleCI project env vars. Falls back to local store if not set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shadow-ci.yml is already a continued pipeline (via path-filtering in config.yml). CircleCI only supports one level of setup→continuation, so CIRCLE_CONTINUATION_KEY is not available here. Replace with a self-contained setup job that computes the pipeline decision, prints the summary, and stores artifacts. This proves the engine works end-to-end in CI. Dynamic job execution will use a different approach (runtime skip or pre-computed config). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- shadow-ci-setup: compute decision, persist to workspace - shadow-ci-verify: validate decision coherence against mainline CI (checks all categories, stage filtering, develop-only, schedule-only) - shadow-ci-tests: run shadow CI Go test suite in CI The coherence checker compares the decision engine's output against a ground truth mapping of mainline CI jobs, ensuring no categories are missing or incorrectly classified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lidation The coherence checker now validates all stages bidirectionally: - PR jobs verified at ALL stages (not just PR/MQ) - Schedule-only jobs verified to RUN on schedule (not just skip off-schedule) - Develop-only jobs validated on merge queue (not just PR) - Deferred jobs flagged as errors if still deferred at post_merge/nightly Verified locally with 0 errors across all 4 stages: - PR: 20 run, 9 skip - Merge Queue: 20 run, 9 skip - Post-Merge: 24 run, 5 skip - Nightly: 28 run, 1 skip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Shadow CI now actually runs tests, not just computes decisions. Architecture: - `cmd/execute`: reads decision + scoping config, runs all needed categories for a given group (build/go/sol/rust/misc). One binary dispatches all test execution dynamically — no static job-per-category in YAML. - `cmd/generate-ci`: renders shadow-ci.yml from scoping.yaml. Users configure shadow CI in its own language, this tool generates the CircleCI config. The shadow-ci-tests job checks for staleness — if you change scoping.yaml and forget to regenerate, CI fails with instructions. - `group` field on job categories: determines which CI executor runs it. Adding a new category to a group = one line in scoping.yaml. - `command` field on all executable categories: the actual test command. CI pipeline (8 jobs): shadow-ci-setup → shadow-ci-verify (coherence check) → shadow-ci-build → shadow-ci-go (Go tests, lint, fuzz) → shadow-ci-sol (Sol tests, checks) → shadow-ci-rust (Rust CI) → shadow-ci-misc (shellcheck, semgrep, etc.) shadow-ci-tests (self-tests + staleness check, independent) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Set SHELL=/bin/bash before foundryup install (fixes "could not detect shell") - Add || true to foundry curl to handle non-zero exit on PATH message - Install semgrep via pip3 in misc group Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace manual curl/install toolchain setup with mise, matching how mainline CI works. mise.toml already defines all tool versions (Go, Rust, Foundry, golangci-lint, gotestsum, shellcheck, semgrep, just). Also fix cannon_prestate command (cannon-prestates, plural). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mise install with all tools hits GitHub API rate limits. Each group now installs only the specific tools it needs: - build: go, rust, forge, cast, anvil, just, make - go: go, gotestsum, golangci-lint, make - sol: forge, cast, anvil, just - rust: rust - misc: shellcheck, python, uv, semgrep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

reth-mdbx-sys uses bindgen which needs libclang. Install libclang-dev in build and rust groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mise install hits GitHub API rate limits when multiple CI jobs run in parallel. The misc group only needs shellcheck and semgrep, both available via apt/pip without touching GitHub API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Persist build artifacts (forge-artifacts, cannon bins, rust binaries) from build group to workspace for downstream groups - Fix rust_ci command to use justfile (install-nightly, lint, test) - Add just to rust group, mockery to go group - Downstream groups (go, sol, rust) restore build artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…-nextest for rust - Add forge to go group's mise install (needed by cannon_tests which runs `forge build` via make) - Add circleci-repo-readonly-authenticated-github-token context to all group jobs (go tests need RPC access for op-validator tests) - Add cargo-binstall + cargo-nextest to rust group (rust justfile uses `cargo nextest run`) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rust_build and rust_submodule_build both trigger rustup to install the toolchain on first cargo invocation. Running them in parallel causes filesystem conflicts in ~/.rustup/toolchains/. Fix by making rust_submodule_build depend on rust_build so they run sequentially. The parallel DAG executor still runs contracts_build and rust_build concurrently — the critical path is now max(contracts+cannon, rust+submodule) ≈ max(3.5m, 13m) ≈ 13m vs 17m sequential. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…workspace Switch from persisting mise tools via workspace (1.6GB upload every run, 5+ minutes) to CircleCI save_cache/restore_cache keyed on mise.toml checksum. Tools are pre-installed as flat binaries in the setup job and cached permanently. Group jobs restore the cache in seconds. - Setup job: installs mise, builds flat tool binaries to /tmp/shadow-ci-tools/ - Go/Sol groups: restore cache + PATH export (one-liner, ~5s) - Build/Rust groups: restore cache + install Rust via mise directly - Workspace now only carries lightweight artifacts (~50MB vs 1.6GB) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The mise shims in ~/.local/share/mise/shims/ are wrapper scripts, not the actual binaries. Copying them to /tmp/shadow-ci-tools/ caused "Argument list too long" errors when they tried to exec. Use `mise which` to resolve the real underlying binary path before copying. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The v1 cache contained mise shims instead of real binaries. Bump to v2 to force a cache miss and rebuild with the corrected mise which logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Flat binary copying doesn't work for Go (needs GOROOT/SDK tree) or mise-managed tools. Switch to caching ~/.local/share/mise/installs and ~/.local/bin (mise binary). Group jobs restore the cache, install mise if needed, and run mise reshim to regenerate working shims. Skip mise download on cache hit (mise binary cached in ~/.local/bin). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a content-addressed build cache that guarantees cached artifacts are never stale. Each build category declares cache_inputs (for key computation), workspace_paths (for caching), and verify_command (for post-restore validation). The executor computes cache keys from git tree hashes of declared inputs + mise.toml checksum, restores on hit, then always runs verify_command to confirm validity. On verify failure, rebuilds and emits a warning event. - New pkg/cache with Resolver (ComputeKey, Resolve, Restore, Save) - Executor integration with --cache-dir flag and fail-open semantics - CircleCI cache for /tmp/shadow-ci-cache across pipeline runs - 5 build categories configured with cache_inputs and verify_command - 12 tests covering key computation, cache hit/miss, verify, round-trip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove packages/contracts-bedrock/remappings.txt (doesn't exist) - Remove Cargo.lock from rust_build (it's at rust/Cargo.lock, already covered by the rust/ directory input) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The verify commands for contracts_build (forge build --sizes, 20min) and rust_build (cargo check --release, 3min) were too expensive, defeating the purpose of caching. Switch to simple existence checks — the content-addressed hash already guarantees input correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Verify commands check for artifacts in the repo (e.g. test -f cannon/bin/cannon), but those artifacts only exist after cache restore. The previous flow ran verify before restore, so verify always failed on fresh checkouts. New flow: check key → restore → verify → use or rebuild. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Rewrite to reflect the group-based execution model (generate-ci + execute) instead of the original per-target model (planner → render → runner). Document the build cache, CI pipeline topology, activation phases, and explain why the architecture changed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Honest accounting of what works end-to-end vs what's computed but unused. Prioritized list of gaps to close, with the activation sequence from shadow mode through to primary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When a category has use_graph: true and the decision contains affected targets, the executor now uses target_command instead of command. This runs only the affected packages/files instead of the full test suite. - Add target_command field to JobCategoryConfig with generic placeholders: {{targets}} (space-separated), {{targets_csv}}, {{targets_glob}} ({a,b}) - Add resolveCommand() — language-agnostic, all formatting in config - Falls back to full command on force-all, always-on-develop, or empty targets - Configure target_command for go_tests and sol_tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three fixes: - rust_submodule_build: deinit before init to handle existing directories - go_binaries_for_sysgo: remove expensive go build from verify command - save_cache/stage artifacts: run with when:always so cache updates survive individual category failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sol_tests and sol_upgrade both run `just build-go-ffi` which does git submodule init. Running them in parallel causes lock file conflicts. Make sol_upgrade and sol_coverage wait for sol_tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The git diff at the end of generated_mocks picks up submodule pointer drift that's unrelated to Go mock generation. Use --ignore-submodules to only check Go file changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rift - Add debug output when cache verify fails showing workspace path state - Fix generated_mocks to ignore submodule drift (--ignore-submodules) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add debug output for: - Cache directory contents after restore (CircleCI step) - Workspace path state after Restore() call (executor) - Verify command output on failure Temporary commit for debugging cannon_prestate cache staleness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cannon_prestate and go_binaries_for_sysgo both declared op-program/bin and cannon/bin as workspace_paths. Their parallel Restore operations raced — one's os.RemoveAll deleted the other's just-restored files, causing verify to always fail ("CACHE STALE"). Fix: remove overlapping workspace_paths from go_binaries_for_sysgo (keeping only .devnet) and add cannon_prestate to its depends_on. Also remove the debug CI step that was added for investigation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove verify_command from config — the framework now auto-generates verification by checking that each workspace_path exists after restore. This eliminates a class of bugs where verify_command drifts from the actual cached paths (e.g., rust_submodule_build checking a path the command never creates). Changes: - Delete VerifyCommand field from JobCategoryConfig struct - Delete Verified field from Resolution struct (dead code) - Add cache.Verify() function that checks workspace_paths exist - Add validateConfig() in LoadConfig to detect workspace_path overlaps between categories in the same group at load time - Remove redundant cache_inputs from rust_build and rust_submodule_build - Update README.md Build Cache section - Update tests: new Verify tests, overlap validation tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…kage The executor's DAG scheduling, cache resolve/verify/rebuild, and parallel execution logic was embedded in cmd/execute/main.go with no seams for testing. Every bug required a full CI push-and-wait cycle to diagnose. Extract orchestration into pkg/executor with a Runner interface for command execution and CacheResolver interface for cache operations. The cmd/execute binary becomes a thin CLI wrapper. 19 local tests now cover the full flow: DAG ordering, diamond deps, parallel execution, cache hit/miss/stale/restore failure, dry run, targeted commands — all running in <100ms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three issues found from CI output: 1. rust_build listed trigger_path "Cargo.lock" but no Cargo.lock exists at repo root (it's rust/Cargo.lock). The "rust/" trigger_path already covers it. Removed the non-existent path. 2. go_binaries_for_sysgo had workspace_paths: [".devnet"] but `make op-program cannon` produces op-program/bin and cannon/bin, not .devnet. Since cannon_prestate already caches those exact paths and go_binaries_for_sysgo depends on it, removed workspace_paths entirely (no separate caching needed). 3. The "Stage build artifacts" step in shadow-ci.yml had a hardcoded path list that included .devnet and could drift from config. Now auto-derived from workspace_paths of build-group categories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cannon_prestate already runs `make cannon-prestates` which builds op-program/bin and cannon/bin. go_binaries_for_sysgo depended on cannon_prestate then ran `make op-program cannon` — a 3.5min no-op since those binaries already existed. Removed the category and updated acceptance_tests to depend directly on cannon_prestate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Follow-up to removing the redundant category — the coherence checker still expected it in its mainline mapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…oration Replace the monolithic workspace-based build artifact passing with per-category CircleCI caches. Each build category (contracts_build, cannon_prestate, rust_build, rust_submodule_build) gets its own cache key, so unchanged categories don't need re-uploading. Downstream groups (go, sol, rust) restore build artifacts from per-category caches via the executor's restoreCrossGroupDeps() method, eliminating the 7-minute persist_to_workspace step. Key changes: - Add restoreCrossGroupDeps() to executor for cross-group cache restore - Auto-derive BuildWorkspacePaths and BuildCategories from config - Generate per-category save_cache/restore_cache in CI template - Remove workspace-based build artifact staging and restoration - Add tests for cross-group dependency restoration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…line Categories with a `language` field (go_tests, sol_tests) now dispatch to the adapter-based TestExecutor instead of shelling out to opaque commands. This produces per-test results with status, duration, and flake classification. Categories without `language` (builds, lints) continue using ShellRunner unchanged. Key changes: - Runner interface takes RunContext instead of (category, command, logPath) - New AdapterRunner wraps engine.TestExecutor for adapter dispatch - Executor dispatches based on language + !isFuzzCategory guard - cmd/execute builds adapter registry and events emitter - cmd/compare handles both JobResult and []TestResult JSON formats - cmd/runner deleted (functionality absorbed into cmd/execute) - render.go templates updated to use bin/execute instead of bin/runner - 7 new tests for dispatch logic and PlannedJob construction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements the 8-phase adaptive test placement plan: Phase 1 — Stats Aggregation: StatsAggregator computes per-test and per-category stats (duration, flake rate, failure rate, false negatives) from the event store. Wired into cmd/optimize to replace nil stubs. Phase 2 — Per-Test Placement: TestPlacer uses marginal coverage algorithm to assign tests to stages within miss rate budgets (PR 5%, MQ 0.1%, post-merge 0.01%, nightly 0%). Shadow mode default — records WouldDefer without actually skipping. Integrated into DecisionEngine via new testPlacer parameter. Phase 3 — Test Filtering: Executor builds TestFilter from per-test placements, flows through RunContext → AdapterRunner → PlannedJob → RunOptions.TestFilter. Shadow deferral annotations added to TestResult. Phase 4 — Shadow Deferral Reporting: ShadowDeferralAnalyzer produces deferral reports tracking would-have-deferred tests, estimated savings, actual misses (real failures that would have been missed). Phase 5 — Demand-Driven Builds: BuildResolver resolves test selections to required build categories via DependsOn chains. Unrequired build categories are automatically skipped. Phase 6 — LLM Placement Advisor: LLMAdvisor provides per-PR override suggestions via Anthropic API. Disabled by default. LLM overrides cannot override pinned constraints. Phase 7 — Auto-Revert Notification: Notifier interface with SlackNotifier and LogNotifier implementations. Wired into AutoReverter. Phase 8 — Historical Import: cmd/import-history pulls historical test results from CircleCI API. Parsers for gotestsum JSON, JUnit XML, and forge JSON output. New test files: 30+ tests across stats_aggregator, test_placer, shadow_deferral, build_resolver, llm_advisor, and circleci parser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three fixes for CI failures: 1. adapters.yaml had a spurious top-level `adapters:` wrapper key that caused YAML unmarshaling to produce nil adapter configs. Remove the wrapper so fields parse correctly into AdaptersConfig. 2. Executor dispatched to adapter runner for all categories with a language, even when the category has a shell command. The adapter runner bypasses the shell command and runs gotestsum directly, which is wrong for categories like go_tests that use `make go-tests-short-ci`. Now only dispatches to adapter when the registry has the language AND the category has no shell command. 3. Rust group's cargo-nextest install failed because cargo-binstall fell back to source compilation without --locked. Add --no-symlinks flag and --locked fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The setup job's "Install tools" step only wrote the mise PATH to $BASH_ENV on cache miss (exited early on hit). This meant the "Compute pipeline decision" step couldn't find `go` in $PATH when running `go list` for the dependency graph. Move the $BASH_ENV setup before the cache check so it persists regardless of hit/miss. On cache hit, reshim to regenerate tool shims. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The affected command failed hard when cargo wasn't in PATH because the Rust adapter's graph builder calls `cargo metadata`. The setup job only installs Go toolchain, not Rust. Instead of failing the entire pipeline, log a warning and skip graph-based analysis for that language. Categories fall back to path-based matching via the decision engine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

smartcontracts marked this pull request as ready for review March 7, 2026 02:05

smartcontracts requested a review from a team as a code owner March 7, 2026 02:05

smartcontracts marked this pull request as draft March 7, 2026 02:06

smartcontracts and others added 6 commits March 7, 2026 02:08

smartcontracts and others added 18 commits March 7, 2026 04:12

fix(ops): wire shadow-ci-test-context to setup job for Upstash creds

92fee6e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore(ops): retrigger CI after fixing Upstash context env var values

da311af

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(shadow-ci): add libclang-dev for Rust builds requiring bindgen

8e5ec13

reth-mdbx-sys uses bindgen which needs libclang. Install libclang-dev in build and rust groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

smartcontracts and others added 12 commits March 9, 2026 20:19

fix(shadow-ci): bump cache key to v2 to invalidate stale shim cache

5759651

The v1 cache contained mise shims instead of real binaries. Bump to v2 to force a cache miss and rebuild with the corrected mise which logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

smartcontracts force-pushed the feat/shadow-ci branch from 4e3e4af to d353d35 Compare March 10, 2026 15:44

smartcontracts and others added 17 commits March 10, 2026 15:50

fix(shadow-ci): add debug logging for cache stale and fix submodule d…

d9b5bab

…rift - Add debug output when cache verify fails showing workspace path state - Fix generated_mocks to ignore submodule drift (--ignore-submodules) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(shadow-ci): remove go_binaries_for_sysgo from coherence map

a82b129

Follow-up to removing the redundant category — the coherence checker still expected it in its mainline mapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ops): add shadow CI system for dependency-aware test selection#19440

feat(ops): add shadow CI system for dependency-aware test selection#19440
smartcontracts wants to merge 64 commits intodevelopfrom
feat/shadow-ci

smartcontracts commented Mar 7, 2026

Uh oh!

wiz-inc-a178a98b5d bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smartcontracts commented Mar 7, 2026

Summary

Key design decisions

Test plan

Uh oh!

wiz-inc-a178a98b5d bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wiz Scan Summary

Uh oh!

codecov bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wiz-inc-a178a98b5d bot commented Mar 7, 2026 •

edited

Loading

codecov bot commented Mar 7, 2026 •

edited

Loading