Skip to content

feat(ops): add shadow CI system for dependency-aware test selection#19440

Draft
smartcontracts wants to merge 64 commits intodevelopfrom
feat/shadow-ci
Draft

feat(ops): add shadow CI system for dependency-aware test selection#19440
smartcontracts wants to merge 64 commits intodevelopfrom
feat/shadow-ci

Conversation

@smartcontracts
Copy link
Contributor

Summary

Adds a complete shadow CI system (ops/shadow-ci/) that runs alongside existing CI to prove equivalence before replacing it.

  • Language adapters: Go (go list), Solidity (import parsing + remappings), Rust (cargo metadata) — build dependency graphs to compute affected targets from changed files
  • Core engine: AffectedComputer, Planner, Executor (with retry + classification), Fingerprinter, ComparisonEngine — orchestrates dependency-aware test selection and proves catch rate
  • Platform adapter: CircleCI renderer (TestPlan → YAML), results fetcher (artifacts → junit XML parsing)
  • Event store: NDJSON-based unified event store with 20+ event types covering every decision point
  • Aggregator: Weekly reports (catch rate, false negatives, speedup, top flakes) and dashboard data
  • Agents: FlakeInvestigator, GraphMaintainer, ConfigVerifier, ReportAnalyst — autonomous remediation
  • 6 CLI tools: affected, planner, render, runner, compare, aggregate
  • Config-driven activation: shadow → belt-and-suspenders → primary, controlled by scoping.yaml

Key design decisions

  • Observability-first: Every component emits structured events. Catch rate, false negatives, flake rate, skip rate, compute reduction — all derived from events.
  • Safety by default: Starts in shadow mode (no gate on PRs). Belt-and-suspenders mode runs both systems and blocks if they disagree. Primary mode only after proven equivalence.
  • Self-healing: False negatives automatically trigger graph gap detection and always-run list additions.
  • 51 files, ~5,400 lines of Go with own go.mod — zero dependencies on monorepo Go code.

Test plan

  • go test ./... — 24 tests pass across 7 packages
  • go vet ./... — clean
  • go build ./... — clean
  • BFS graph traversal tests (diamond, linear, isolated)
  • Classifier tests (all classification combinations)
  • Fingerprinter stability tests (timestamps, IPs, ports normalize correctly)
  • Comparison engine tests (catch rate, false negatives, flake detection)
  • Event store tests (emit/query, type filtering, time filtering, persistence)
  • Solidity import parser tests (remappings, file collection)
  • Config loader tests (full YAML round-trip)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Complete implementation of the shadow CI system that runs alongside
existing CI to prove equivalence before replacing it. The system
computes affected targets via dependency graphs, runs only the tests
that could be affected by a change, classifies failures as real/flake/
infrastructure, and compares results against the main CI pipeline.

Architecture (5 layers):
- Layer 1: Language adapters (Go via go list, Solidity via import
  parsing, Rust via cargo metadata)
- Layer 2: Core engine (AffectedComputer, Planner, Executor,
  Classifier, Fingerprinter, ComparisonEngine)
- Layer 3: Platform adapter (CircleCI YAML rendering, result fetching)
- Layer 4: Data layer (NDJSON event store, aggregator, dashboard data)
- Layer 5: Agents (Flake Investigator, Graph Maintainer, Config
  Verifier, Report Analyst)

6 CLI entry points: affected, planner, render, runner, compare, aggregate
3 YAML config files controlling adapters, scoping, and platform
Full test suite covering BFS, classification, fingerprinting, comparison,
event store, Solidity import parsing, and config loading

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wiz-inc-a178a98b5d
Copy link

wiz-inc-a178a98b5d bot commented Mar 7, 2026

Wiz Scan Summary

⚠️ Many findings detected
Many findings were detected, but only a subset of the findings are displayed inline due to API constraints. To view all findings inline, please click here.
Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 43 Medium 9 Low
Software Management Finding Software Management Findings -
Total 43 Medium 9 Low

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

@smartcontracts smartcontracts marked this pull request as ready for review March 7, 2026 02:05
@smartcontracts smartcontracts requested a review from a team as a code owner March 7, 2026 02:05
@smartcontracts smartcontracts marked this pull request as draft March 7, 2026 02:06
smartcontracts and others added 6 commits March 7, 2026 02:08
- Add yamlSafe template function to escape YAML-special characters in
  rendered CircleCI config (defense-in-depth for text/template)
- Validate --language and --config CLI flags reject path separators
- Validate git ref arguments reject leading dashes (flag injection)
- Add validatePathComponent to local platform adapter to reject
  path traversal in pipeline IDs and artifact names
- Add tests for yamlSafe and validatePathComponent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pipeline decision engine that evaluates every CI job category:
- Go tests, fuzz, lint, generated mocks
- Solidity tests (6-feature matrix), upgrade tests, checks, heavy fuzz
- Cannon tests, acceptance tests
- Builds, publishes, docker, rust CI/e2e

The `affected` binary now produces a PipelineDecision alongside the
AffectedResult. Each job category is evaluated as RUN/SKIP based on:
- Dependency graph analysis (Go, Sol, Rust)
- Path-based trigger matching
- Branch context (develop-only, schedule-only, tag-only)
- Force-all triggers (.circleci/, mise.toml)
- Fuzz package routing (only fuzz affected packages)

CircleCI integration via .circleci/continue/shadow-ci.yml:
- Merged with main.yml via path-filtering
- Shadow jobs reference main.yml jobs (contracts-bedrock-build, etc.)
- Every shadow job checks the decision file and halts if not needed
- Shadow mode: test failures captured but don't gate PRs
- Comparison job aggregates results and compares against main CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove cross-workflow requires (contracts-bedrock-build, cannon-prestate
  from main workflow can't be referenced); build artifacts inline instead
- Consolidate dual workspace roots to single /tmp root
- Replace comma-containing matrix values with explicit job entries to
  avoid CircleCI parameter parsing issues
- Remove PyYAML import (not on cimg/base); use JSON-only fallback
- Add Rust binary builds (kona, op-rbuilder, rollup-boost) for
  acceptance tests via submodule checkout + cargo build

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive tests for the pipeline decision engine covering:
- All category types: path-based, graph-based, fuzz, always, tag-only,
  schedule-only, develop-only, always-on-develop
- Feature matrix and config propagation through both path and graph paths
- Force-all behavior and its interaction with skip-type categories
- Priority ordering (schedule-only > tag-only > develop-only > always > ...)
- matchPaths prefix, suffix, and glob-prefix matching
- mergeTargets deduplication and empty-set edge cases
- resolveAlwaysRun exact and prefix matching
- applyConfidence threshold promotion
- checkForceAllPaths
- Comparison engine: partial catch rate, flake dedup, performance metrics
- Full realistic pipeline integration test

Engine coverage: 44.1% -> 60.7%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go tests, cannon tests, fuzz, and acceptance tests only need src/ and
scripts/ artifacts — not test contracts. Switch from `just forge-build`
(compiles everything) to `just build-no-tests` (skips **/test/**) to
cut contract build time in shadow CI jobs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds `validate` command that checks CircleCI pipeline YAML for the class
of bugs we just fixed:

- Dual workspace roots (persist_to_workspace/attach_workspace at
  multiple paths in the same job)
- Cross-workflow requires (referencing jobs from another workflow)
- Commas in matrix parameter values
- Non-stdlib Python imports (yaml, requests) in inline scripts
- Requires referencing non-existent job names (with matrix expansion)

The validator runs as the first step in shadow-ci-setup, before computing
affected targets. Includes an integration test that validates the actual
shadow-ci.yml.

Usage: ops/shadow-ci/bin/validate --pipeline .circleci/continue/shadow-ci.yml --strict

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.3%. Comparing base (c14cd1e) to head (ff282b1).
⚠️ Report is 30 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #19440      +/-   ##
===========================================
+ Coverage     75.3%    76.3%    +1.0%     
===========================================
  Files          193      729     +536     
  Lines        11256    81441   +70185     
===========================================
+ Hits          8476    62196   +53720     
- Misses        2636    19101   +16465     
  Partials       144      144              
Flag Coverage Δ
cannon-go-tests-64 66.4% <ø> (ø)
contracts-bedrock-tests 80.2% <ø> (ø)
unit 76.5% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 536 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

smartcontracts and others added 18 commits March 7, 2026 04:12
Implement the full adaptive test placement system:

- 4-stage CI model (PR, merge queue, post-merge, nightly) with
  automatic stage detection from branch patterns
- Stage-aware decision engine that filters categories by placement,
  with pinned constraints that cannot be overridden
- Dynamic pipeline generation via CircleCI continuation API,
  replacing the 834-line static shadow-ci.yml with a ~80-line setup shell
- Flake lifecycle management (healthy → suspected → quarantined →
  shaking → diagnosed → fixed/accepted) with severity escalation
- Flake reactor for hot-path detection and auto-recovery
- Co-failure correlation engine with Wilson score confidence intervals
- Placement optimizer (cold path) with 4 optimization rules:
  merge queue flake demotion, correlation-based deferral,
  slow-test nightly promotion, false negative feedback promotion
- False negative feedback loop via correlation decay detection
- Auto-revert system for post-merge failures with flake filtering
- New CLI tools: flake-reactor, optimize, auto-revert

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a state.Store interface for persisting blobs (flake DB,
correlation matrix) across CI pipeline runs. Two backends:

- LocalStore: filesystem (dev/testing)
- CircleCIStore: fetches state from previous pipeline's artifacts
  via CircleCI API, saves locally for upload via store_artifacts

The CircleCI flow:
1. Pipeline starts, CircleCIStore.Load() fetches flake-db.json from
   the last successful pipeline's artifacts
2. Flake reactor and decision engine work with the loaded state
3. Updated state is saved to artifacts dir
4. store_artifacts step uploads it for the next run

This replaces the assumption that /tmp persists across pipelines.
FlakeDB gains LoadFlakeDBFromStore/SaveToStore methods. All CLIs
fall back to direct file paths (--flake-db) when no state store
is configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The RenderFromDecision method was generating flat test jobs without
build prerequisites. When test categories (go_tests, acceptance_tests)
depend on build categories (contracts_build, cannon_prestate), the
build jobs must exist in the rendered YAML and test jobs need requires
clauses pointing to them.

Changes:
- Add DependsOn, Command, WorkspacePaths, RunnerClass to JobCategoryConfig
- Wire depends_on in scoping.yaml for all test→build relationships
- RenderFromDecision now resolves the dependency graph: auto-includes
  build prerequisite jobs, renders them with persist_to_workspace,
  and wires requires clauses on dependent test jobs
- Separate build and test job templates (build: checkout+build+persist,
  test: checkout+attach+runner+artifacts)
- Add transitive dependency resolution (e.g. cannon_prestate→contracts_build)
- Add render tests including pipeline validator integration test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The circleci state backend requires CIRCLE_TOKEN which may not be set
in all CI environments. Fall back to local filesystem store with a
warning instead of hard-failing. State won't persist across pipelines
but everything else works.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ence

Upstash free tier (10K commands/day, 256MB) provides a zero-cost
key-value store for shadow CI state. REST API only — no client library.

Set UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN in CircleCI
project env vars. Falls back to local store if not set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shadow-ci.yml is already a continued pipeline (via path-filtering in
config.yml). CircleCI only supports one level of setup→continuation,
so CIRCLE_CONTINUATION_KEY is not available here.

Replace with a self-contained setup job that computes the pipeline
decision, prints the summary, and stores artifacts. This proves the
engine works end-to-end in CI. Dynamic job execution will use a
different approach (runtime skip or pre-computed config).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- shadow-ci-setup: compute decision, persist to workspace
- shadow-ci-verify: validate decision coherence against mainline CI
  (checks all categories, stage filtering, develop-only, schedule-only)
- shadow-ci-tests: run shadow CI Go test suite in CI

The coherence checker compares the decision engine's output against a
ground truth mapping of mainline CI jobs, ensuring no categories are
missing or incorrectly classified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lidation

The coherence checker now validates all stages bidirectionally:
- PR jobs verified at ALL stages (not just PR/MQ)
- Schedule-only jobs verified to RUN on schedule (not just skip off-schedule)
- Develop-only jobs validated on merge queue (not just PR)
- Deferred jobs flagged as errors if still deferred at post_merge/nightly

Verified locally with 0 errors across all 4 stages:
- PR: 20 run, 9 skip
- Merge Queue: 20 run, 9 skip
- Post-Merge: 24 run, 5 skip
- Nightly: 28 run, 1 skip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shadow CI now actually runs tests, not just computes decisions.

Architecture:
- `cmd/execute`: reads decision + scoping config, runs all needed categories
  for a given group (build/go/sol/rust/misc). One binary dispatches all test
  execution dynamically — no static job-per-category in YAML.
- `cmd/generate-ci`: renders shadow-ci.yml from scoping.yaml. Users configure
  shadow CI in its own language, this tool generates the CircleCI config.
  The shadow-ci-tests job checks for staleness — if you change scoping.yaml
  and forget to regenerate, CI fails with instructions.
- `group` field on job categories: determines which CI executor runs it.
  Adding a new category to a group = one line in scoping.yaml.
- `command` field on all executable categories: the actual test command.

CI pipeline (8 jobs):
  shadow-ci-setup → shadow-ci-verify (coherence check)
                  → shadow-ci-build → shadow-ci-go (Go tests, lint, fuzz)
                                    → shadow-ci-sol (Sol tests, checks)
                                    → shadow-ci-rust (Rust CI)
                  → shadow-ci-misc (shellcheck, semgrep, etc.)
  shadow-ci-tests (self-tests + staleness check, independent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set SHELL=/bin/bash before foundryup install (fixes "could not detect shell")
- Add || true to foundry curl to handle non-zero exit on PATH message
- Install semgrep via pip3 in misc group

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace manual curl/install toolchain setup with mise, matching how
mainline CI works. mise.toml already defines all tool versions (Go,
Rust, Foundry, golangci-lint, gotestsum, shellcheck, semgrep, just).

Also fix cannon_prestate command (cannon-prestates, plural).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mise install with all tools hits GitHub API rate limits. Each group
now installs only the specific tools it needs:
- build: go, rust, forge, cast, anvil, just, make
- go: go, gotestsum, golangci-lint, make
- sol: forge, cast, anvil, just
- rust: rust
- misc: shellcheck, python, uv, semgrep

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reth-mdbx-sys uses bindgen which needs libclang. Install libclang-dev
in build and rust groups.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mise install hits GitHub API rate limits when multiple CI jobs run
in parallel. The misc group only needs shellcheck and semgrep, both
available via apt/pip without touching GitHub API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Persist build artifacts (forge-artifacts, cannon bins, rust binaries)
  from build group to workspace for downstream groups
- Fix rust_ci command to use justfile (install-nightly, lint, test)
- Add just to rust group, mockery to go group
- Downstream groups (go, sol, rust) restore build artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-nextest for rust

- Add forge to go group's mise install (needed by cannon_tests which
  runs `forge build` via make)
- Add circleci-repo-readonly-authenticated-github-token context to all
  group jobs (go tests need RPC access for op-validator tests)
- Add cargo-binstall + cargo-nextest to rust group (rust justfile uses
  `cargo nextest run`)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
smartcontracts and others added 12 commits March 9, 2026 20:19
rust_build and rust_submodule_build both trigger rustup to install the
toolchain on first cargo invocation. Running them in parallel causes
filesystem conflicts in ~/.rustup/toolchains/. Fix by making
rust_submodule_build depend on rust_build so they run sequentially.

The parallel DAG executor still runs contracts_build and rust_build
concurrently — the critical path is now max(contracts+cannon, rust+submodule)
≈ max(3.5m, 13m) ≈ 13m vs 17m sequential.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…workspace

Switch from persisting mise tools via workspace (1.6GB upload every run,
5+ minutes) to CircleCI save_cache/restore_cache keyed on mise.toml
checksum. Tools are pre-installed as flat binaries in the setup job and
cached permanently. Group jobs restore the cache in seconds.

- Setup job: installs mise, builds flat tool binaries to /tmp/shadow-ci-tools/
- Go/Sol groups: restore cache + PATH export (one-liner, ~5s)
- Build/Rust groups: restore cache + install Rust via mise directly
- Workspace now only carries lightweight artifacts (~50MB vs 1.6GB)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mise shims in ~/.local/share/mise/shims/ are wrapper scripts, not
the actual binaries. Copying them to /tmp/shadow-ci-tools/ caused
"Argument list too long" errors when they tried to exec. Use `mise which`
to resolve the real underlying binary path before copying.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The v1 cache contained mise shims instead of real binaries. Bump to v2
to force a cache miss and rebuild with the corrected mise which logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flat binary copying doesn't work for Go (needs GOROOT/SDK tree) or
mise-managed tools. Switch to caching ~/.local/share/mise/installs and
~/.local/bin (mise binary). Group jobs restore the cache, install mise
if needed, and run mise reshim to regenerate working shims.

Skip mise download on cache hit (mise binary cached in ~/.local/bin).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a content-addressed build cache that guarantees cached artifacts are
never stale. Each build category declares cache_inputs (for key computation),
workspace_paths (for caching), and verify_command (for post-restore validation).

The executor computes cache keys from git tree hashes of declared inputs +
mise.toml checksum, restores on hit, then always runs verify_command to
confirm validity. On verify failure, rebuilds and emits a warning event.

- New pkg/cache with Resolver (ComputeKey, Resolve, Restore, Save)
- Executor integration with --cache-dir flag and fail-open semantics
- CircleCI cache for /tmp/shadow-ci-cache across pipeline runs
- 5 build categories configured with cache_inputs and verify_command
- 12 tests covering key computation, cache hit/miss, verify, round-trip

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove packages/contracts-bedrock/remappings.txt (doesn't exist)
- Remove Cargo.lock from rust_build (it's at rust/Cargo.lock, already
  covered by the rust/ directory input)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The verify commands for contracts_build (forge build --sizes, 20min) and
rust_build (cargo check --release, 3min) were too expensive, defeating
the purpose of caching. Switch to simple existence checks — the
content-addressed hash already guarantees input correctness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify commands check for artifacts in the repo (e.g. test -f cannon/bin/cannon),
but those artifacts only exist after cache restore. The previous flow ran verify
before restore, so verify always failed on fresh checkouts.

New flow: check key → restore → verify → use or rebuild.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite to reflect the group-based execution model (generate-ci + execute)
instead of the original per-target model (planner → render → runner).
Document the build cache, CI pipeline topology, activation phases, and
explain why the architecture changed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Honest accounting of what works end-to-end vs what's computed but unused.
Prioritized list of gaps to close, with the activation sequence from
shadow mode through to primary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a category has use_graph: true and the decision contains affected
targets, the executor now uses target_command instead of command. This
runs only the affected packages/files instead of the full test suite.

- Add target_command field to JobCategoryConfig with generic placeholders:
  {{targets}} (space-separated), {{targets_csv}}, {{targets_glob}} ({a,b})
- Add resolveCommand() — language-agnostic, all formatting in config
- Falls back to full command on force-all, always-on-develop, or empty targets
- Configure target_command for go_tests and sol_tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
smartcontracts and others added 17 commits March 10, 2026 15:50
Three fixes:
- rust_submodule_build: deinit before init to handle existing directories
- go_binaries_for_sysgo: remove expensive go build from verify command
- save_cache/stage artifacts: run with when:always so cache updates
  survive individual category failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sol_tests and sol_upgrade both run `just build-go-ffi` which does
git submodule init. Running them in parallel causes lock file
conflicts. Make sol_upgrade and sol_coverage wait for sol_tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The git diff at the end of generated_mocks picks up submodule pointer
drift that's unrelated to Go mock generation. Use --ignore-submodules
to only check Go file changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rift

- Add debug output when cache verify fails showing workspace path state
- Fix generated_mocks to ignore submodule drift (--ignore-submodules)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add debug output for:
- Cache directory contents after restore (CircleCI step)
- Workspace path state after Restore() call (executor)
- Verify command output on failure

Temporary commit for debugging cannon_prestate cache staleness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cannon_prestate and go_binaries_for_sysgo both declared op-program/bin
and cannon/bin as workspace_paths. Their parallel Restore operations
raced — one's os.RemoveAll deleted the other's just-restored files,
causing verify to always fail ("CACHE STALE").

Fix: remove overlapping workspace_paths from go_binaries_for_sysgo
(keeping only .devnet) and add cannon_prestate to its depends_on.
Also remove the debug CI step that was added for investigation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove verify_command from config — the framework now auto-generates
verification by checking that each workspace_path exists after restore.
This eliminates a class of bugs where verify_command drifts from the
actual cached paths (e.g., rust_submodule_build checking a path the
command never creates).

Changes:
- Delete VerifyCommand field from JobCategoryConfig struct
- Delete Verified field from Resolution struct (dead code)
- Add cache.Verify() function that checks workspace_paths exist
- Add validateConfig() in LoadConfig to detect workspace_path overlaps
  between categories in the same group at load time
- Remove redundant cache_inputs from rust_build and rust_submodule_build
- Update README.md Build Cache section
- Update tests: new Verify tests, overlap validation tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kage

The executor's DAG scheduling, cache resolve/verify/rebuild, and parallel
execution logic was embedded in cmd/execute/main.go with no seams for
testing. Every bug required a full CI push-and-wait cycle to diagnose.

Extract orchestration into pkg/executor with a Runner interface for command
execution and CacheResolver interface for cache operations. The cmd/execute
binary becomes a thin CLI wrapper. 19 local tests now cover the full flow:
DAG ordering, diamond deps, parallel execution, cache hit/miss/stale/restore
failure, dry run, targeted commands — all running in <100ms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three issues found from CI output:

1. rust_build listed trigger_path "Cargo.lock" but no Cargo.lock exists
   at repo root (it's rust/Cargo.lock). The "rust/" trigger_path already
   covers it. Removed the non-existent path.

2. go_binaries_for_sysgo had workspace_paths: [".devnet"] but
   `make op-program cannon` produces op-program/bin and cannon/bin,
   not .devnet. Since cannon_prestate already caches those exact paths
   and go_binaries_for_sysgo depends on it, removed workspace_paths
   entirely (no separate caching needed).

3. The "Stage build artifacts" step in shadow-ci.yml had a hardcoded
   path list that included .devnet and could drift from config.
   Now auto-derived from workspace_paths of build-group categories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cannon_prestate already runs `make cannon-prestates` which builds
op-program/bin and cannon/bin. go_binaries_for_sysgo depended on
cannon_prestate then ran `make op-program cannon` — a 3.5min no-op
since those binaries already existed.

Removed the category and updated acceptance_tests to depend directly
on cannon_prestate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up to removing the redundant category — the coherence checker
still expected it in its mainline mapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oration

Replace the monolithic workspace-based build artifact passing with
per-category CircleCI caches. Each build category (contracts_build,
cannon_prestate, rust_build, rust_submodule_build) gets its own cache
key, so unchanged categories don't need re-uploading.

Downstream groups (go, sol, rust) restore build artifacts from
per-category caches via the executor's restoreCrossGroupDeps() method,
eliminating the 7-minute persist_to_workspace step.

Key changes:
- Add restoreCrossGroupDeps() to executor for cross-group cache restore
- Auto-derive BuildWorkspacePaths and BuildCategories from config
- Generate per-category save_cache/restore_cache in CI template
- Remove workspace-based build artifact staging and restoration
- Add tests for cross-group dependency restoration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…line

Categories with a `language` field (go_tests, sol_tests) now dispatch to
the adapter-based TestExecutor instead of shelling out to opaque commands.
This produces per-test results with status, duration, and flake
classification. Categories without `language` (builds, lints) continue
using ShellRunner unchanged.

Key changes:
- Runner interface takes RunContext instead of (category, command, logPath)
- New AdapterRunner wraps engine.TestExecutor for adapter dispatch
- Executor dispatches based on language + !isFuzzCategory guard
- cmd/execute builds adapter registry and events emitter
- cmd/compare handles both JobResult and []TestResult JSON formats
- cmd/runner deleted (functionality absorbed into cmd/execute)
- render.go templates updated to use bin/execute instead of bin/runner
- 7 new tests for dispatch logic and PlannedJob construction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the 8-phase adaptive test placement plan:

Phase 1 — Stats Aggregation: StatsAggregator computes per-test and
per-category stats (duration, flake rate, failure rate, false negatives)
from the event store. Wired into cmd/optimize to replace nil stubs.

Phase 2 — Per-Test Placement: TestPlacer uses marginal coverage algorithm
to assign tests to stages within miss rate budgets (PR 5%, MQ 0.1%,
post-merge 0.01%, nightly 0%). Shadow mode default — records WouldDefer
without actually skipping. Integrated into DecisionEngine via new
testPlacer parameter.

Phase 3 — Test Filtering: Executor builds TestFilter from per-test
placements, flows through RunContext → AdapterRunner → PlannedJob →
RunOptions.TestFilter. Shadow deferral annotations added to TestResult.

Phase 4 — Shadow Deferral Reporting: ShadowDeferralAnalyzer produces
deferral reports tracking would-have-deferred tests, estimated savings,
actual misses (real failures that would have been missed).

Phase 5 — Demand-Driven Builds: BuildResolver resolves test selections
to required build categories via DependsOn chains. Unrequired build
categories are automatically skipped.

Phase 6 — LLM Placement Advisor: LLMAdvisor provides per-PR override
suggestions via Anthropic API. Disabled by default. LLM overrides cannot
override pinned constraints.

Phase 7 — Auto-Revert Notification: Notifier interface with
SlackNotifier and LogNotifier implementations. Wired into AutoReverter.

Phase 8 — Historical Import: cmd/import-history pulls historical test
results from CircleCI API. Parsers for gotestsum JSON, JUnit XML, and
forge JSON output.

New test files: 30+ tests across stats_aggregator, test_placer,
shadow_deferral, build_resolver, llm_advisor, and circleci parser.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes for CI failures:

1. adapters.yaml had a spurious top-level `adapters:` wrapper key that
   caused YAML unmarshaling to produce nil adapter configs. Remove the
   wrapper so fields parse correctly into AdaptersConfig.

2. Executor dispatched to adapter runner for all categories with a
   language, even when the category has a shell command. The adapter
   runner bypasses the shell command and runs gotestsum directly, which
   is wrong for categories like go_tests that use `make go-tests-short-ci`.
   Now only dispatches to adapter when the registry has the language AND
   the category has no shell command.

3. Rust group's cargo-nextest install failed because cargo-binstall fell
   back to source compilation without --locked. Add --no-symlinks flag
   and --locked fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The setup job's "Install tools" step only wrote the mise PATH to
$BASH_ENV on cache miss (exited early on hit). This meant the
"Compute pipeline decision" step couldn't find `go` in $PATH when
running `go list` for the dependency graph.

Move the $BASH_ENV setup before the cache check so it persists
regardless of hit/miss. On cache hit, reshim to regenerate tool shims.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The affected command failed hard when cargo wasn't in PATH because
the Rust adapter's graph builder calls `cargo metadata`. The setup
job only installs Go toolchain, not Rust.

Instead of failing the entire pipeline, log a warning and skip
graph-based analysis for that language. Categories fall back to
path-based matching via the decision engine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant