Execute PLAN.md in small, merge-safe increments with clear dependencies, parallel work, and explicit stop points.
- PR #985 merged (
2f649d62d164185755c19e33906a7064ae4ff132): contract-first pack toggles, startup bootstrap visibility, and migration hardening are now onmaster. - PR #986 merged (
4db3931ac8cbb753da7ff160311a4f8e0d6904a3): removed planner prompt pack-hint inference (pack/pack_aliases) to keep Chat planner context generic.
- Decoupling cleanup: AD domain guardrail user hint text no longer hardcodes tool ids (
ad_scope_discovery/ad_domain_controllers). - Stabilization hotfix: finalize-time host structured next-action replay now rejects stale single-host replays when user/assistant host hints indicate different or multi-host scope.
- Stabilization hotfix: finalize-time scope-shift guard now evaluates raw user intent (instead of routed rewrite payload), reducing stale AD0 replay on contextual compact follow-ups.
- Stabilization hotfix: structured-next-action carryover replay now blocks stale self-loop replays and host-hint-conflicting replays.
- Stabilization hotfix: startup deferred metadata no longer skips metadata sync solely due to initial unauthenticated state.
- Stabilization hotfix: bootstrap progress status can publish while connected startup metadata sync is still active.
- Closed migration gap:
ToolPackBootstrapnow performs descriptor-driven built-in pack discovery (no hardcoded per-pack bootstrap chain). - Closed migration gap: host-hint helpers moved to
ChatServiceSession.HostHints.cs(fallback-era naming removed). - Closed startup perf gap (warm path): server-scoped tooling bootstrap snapshot cache now avoids repeated full bootstrap during reconnect/session churn.
- Stabilization hotfix: carryover structured-next-action replay now accepts compact non-question follow-ups without requiring continuation-expansion text rewrites.
- Stabilization hotfix: duplicate final
chat_resultpublishes are suppressed per request/thread/text at the service writer boundary. - Stabilization hotfix: connected session status now stays in startup-pending mode until metadata/tool-pack readiness settles.
- Stabilization regression: added end-to-end two-turn
go aheadcarryover replay test to prevent follow-up execution stalls. - Stabilization hotfix: compact follow-up question turns no longer force blocker/cached-evidence finalize rewrites, preserving direct tool-capability answers.
- Stabilization hotfix: cached evidence fallback now requires explicit tool-name match when request text references a concrete tool id.
- Stabilization hotfix: deferred startup metadata sync now waits for authenticated state and is re-queued after login completion.
- Stabilization hotfix: host structured next-action auto-replay now blocks same-tool/same-arguments self-loops (
next_action_self_loop) to prevent repeated AD0-style churn. - Stabilization hotfix: carryover structured-next-action auto-replay now suppresses repeated identical tool+args replays until fresh context is provided (or explicit host pin matches).
- Stabilization hotfix: carryover host-hint mismatch detection now consumes assistant-draft host targets as contextual hints, blocking stale single-host replay after multi-host continuation guidance.
- Stabilization hotfix: carryover host-hint gating now rejects single-host auto-replay whenever follow-up context contains multi-host hints (including mixed stale/fresh host mentions).
- Stabilization hotfix: carryover replay freshness/scope-shift guards now read raw user request text (assistant draft host hints remain mismatch-only), preventing repeated AD0 replay caused by assistant-draft host echo.
- Stabilization hardening: carryover replay now uses separate replay-intent and host-hint inputs (no in-band text marker coupling), reducing marker-collision risk while keeping assistant-draft mismatch safeguards.
- Startup stabilization hotfix: transient reconnects now preserve interactive auth state when appropriate (authenticated or login-in-progress without explicit unauthenticated probe), reducing sign-in/connect churn.
- Startup visibility hotfix: connect stage now publishes per-attempt retry/timeout/delay progress status during pipe-connect retries.
- Validation checkpoint:
analyze validate-catalogcurrently reportspass (0 error(s), 0 warning(s))on this branch. - Stabilization hotfix: contextual follow-up detection now reads the
Follow-up:tail from legacy continuation expansion before carryover replay decisions. - Stabilization cleanup: removed standalone lowercase
adlexical alias auto-routing from domain-intent signal resolution. - Stabilization hotfix: continuation subset reuse now skips when follow-up explicitly references a tool outside the remembered subset, enabling fresh cross-pack tool routing.
- Startup visibility hotfix: header chip diagnostics now maintain a bounded runtime lifecycle timeline (tooltip + debug panel) across connect/auth/bootstrap transitions.
- Stabilization hotfix: routing-meta activity timeline labels now include strategy + selected/total tool counts for explicit route-stage observability.
- Stabilization hotfix: explicit tool-capability questions now bypass finalize-time execution-blocker cached-evidence substitution, preventing stale evidence fallbacks on
tool_name?clarification turns. - Stabilization hotfix: carryover structured-next-action auto-replay now skips compact contextual scope-shift follow-ups (for example "other DCs"), forcing fresh routing instead of stale single-host replay.
- Startup perf hotfix: plugin duplicate packs are now skipped via loaded-assembly fast-path before dependency preload/reflection, cutting avoidable bootstrap latency on first metadata sync.
- Stabilization hotfix: explicit tool-id follow-ups now bypass pending-action/carryover auto-replay rewrites, and escaped Markdown tool ids (for example
eventlog\_evtx\_query) are honored by cached-evidence gating. - Startup resilience hotfix: deferred metadata sync phases now retry once on transient disconnects (
hello/list_tools/auth_refresh), reducing cold-start states where runtime is connected but catalog/policy sync is missing. - Startup/turn UX hotfix: final assistant replacement now targets the latest assistant row when only
System/Toolsrows followed (no newer user turn), reducing duplicate assistant finals across reconnect/retry churn. - Stabilization hotfix: carryover structured-next-action replay now evaluates compact follow-up eligibility from raw user text (instead of routed payload rewrite text), restoring queued
go aheadfollow-up execution. - Stabilization hotfix: domain-intent payload parsing now safely handles invalid UTF-16 input (
ArgumentException+JsonException) to avoid compact follow-up expansion crashes. - Contract-alignment cleanup: updated routing/output lifecycle tests to match strict routing metadata requirements and one-meaningful-final-per-request output policy.
- Startup stability hotfix: deferred startup metadata sync now reruns when login success arrives during an in-flight metadata sync, preventing dropped post-login
hello/list_tools/auth_refreshrefreshes and stale tool visibility. - Stabilization hotfix: continuation subset follow-ups now treat escaped Markdown tool ids as explicit tool references, preventing stale subset reuse when users switch to tools outside the remembered subset.
- Stabilization regression coverage: startup metadata rerun scheduling now includes dedicated dispatch-gating tests for shutdown/connectivity safety.
- Stabilization hotfix: quoted/multiline tool-descriptor references now keep explicit tool-capability routing (no finalize-time cached-evidence rewrite), and explicit tool-id extraction now strips invisible Unicode format chars to keep descriptor parsing robust.
- Startup/dispatch stabilization hotfix: app turn dispatch startup/send transitions now claim lifecycle state atomically under lock, preventing sign-in/manual-send double-dispatch races and duplicate assistant finals.
- Stabilization hotfix: contextual compact follow-up questions now block stale single-host carryover replay when thread evidence is multi-host, while short acknowledgement questions stay replay-eligible.
- Startup UX hotfix: login-completed status now queues deferred startup metadata sync before publishing connected status, avoiding transient ready-state flicker while startup sync is still pending.
- Startup UX hotfix: bootstrap progress emitted while already connected now keeps
Runtime connected...phrasing (pluscause metadata_sync) instead of regressing toStarting runtime...wording. - Stabilization regression coverage: finalize host scope-shift user-request resolution now has explicit tests proving raw user intent takes precedence over routed rewrite text.
- Live strict scenario validation:
ad-ad0-then-all-dcs-followthrough-10-turnpasses end-to-end with cross-DC fanout and strict call/output pairing. - Live strict scenario validation:
ad-eventlog-tool-capability-followthrough-10-turnpasses end-to-end and explicitly blocks cached-evidence fallback responses for directeventlog_evtx_querycapability questions. - Stabilization hotfix: domain-intent action catalog now preserves all declared same-family action ids as valid
/actaliases independent of definition order; canonical family action ids are deterministic and ambiguous cross-family ids do not use first-wins suppression. - Live strict scenario validation: transcript-derived
ad-other-dcs-go-ahead-followthrough-10-turnpasses end-to-end, covering continuation-stylego aheadexecution across multiple DC hosts and expliciteventlog_evtx_querycapability follow-ups. - Scenario-contract hardening: host scenario contracts now support forbidden tool-input values (
forbid_tool_input_values/forbidden_tool_inputs) and enforce them during retry repair, fallback host patching, and assertion evaluation. - Transcript guardrail hardening:
ad-other-dcs-go-ahead-followthrough-10-turncontinuation turns now include explicit non-AD0 host exclusions, and catalog strictness tests lock those exclusions. - Transcript-derived strict scenario seed added:
ad-domainwide-reboot-followthrough-10-turn(AD0 reboot baseline -> non-AD0 domain-wide continuation + expliciteventlog_evtx_querycapability question + DNS cross-pack turn). - Forbidden-input equivalence hardening: scenario contract enforcement now treats short-host and FQDN forms as equivalent when applying forbidden host targets (for example
AD0blocksAD0.ad.evotec.xyz) across repair/fallback/assertion paths. - Live strict rerun passed:
ad-domainwide-reboot-followthrough-10-turnnow completes10/10turns with non-AD0 continuation turns preserving host exclusions after input-repair fallback. - Startup/send race hardening: manual resend now skips enqueue when an equivalent queued-after-login prompt is already in-flight, reducing duplicate assistant replies after switch-account recovery.
- Transcript phrase lock-in: strict cross-DC follow-through scenarios now include "
those are correct DCs, go ahead" continuation wording to exercise replay suppression under real-world follow-up phrasing. - Stabilization hotfix: domain host-scope guardrail now blocks stale single-host AD-scope replay on compact scope-shift follow-ups when thread evidence is multi-host, unless a single host is explicitly pinned by the user.
- Stabilization regression coverage: domain host-scope guardrail now has explicit compact scope-shift replay tests (block stale replay, allow explicit host pin, allow short acknowledgement question).
- Live strict validation rerun: transcript-derived follow-through scenarios (
ad-other-dcs-go-ahead-followthrough-10-turn,ad-domainwide-reboot-followthrough-10-turn,ad-ad0-then-all-dcs-followthrough-10-turn) pass end-to-end (10/10) after this hardening. - Stabilization hotfix:
ad_monitoring_probe_runADWS port normalization now keeps default9389when non-positiveportis supplied, preventing false endpoint probes on:1. - Transcript-derived strict scenario added:
ad-ldap-go-ahead-followthrough-8-turnto lock continuation execution from scope confirmation into explicit LDAP diagnostics after compactgo ahead. - Live strict validation:
ad-ldap-go-ahead-followthrough-8-turnpasses end-to-end (8/8) and asserts ADWS endpoint probes do not regress to:1/ActiveDirectoryWebServices. - Startup/send dedupe hardening: queued-after-login suppression now treats both-missing-conversation-id startup prompts as equivalent when normalized text matches, reducing duplicate post-login greeting replies.
- Startup/send regression coverage expanded: queue dedupe tests now include explicit both-missing-conversation-id cases for in-flight queued-after-login manual-resend suppression.
- Stabilization hotfix: weighted/planner subset routing now retains explicitly requested tool ids (including escaped markdown ids) in candidate selection, preventing false "tool inactive" responses for registered tools during follow-up turns.
- Regression coverage expanded: planner/routing tests now lock explicit escaped tool-id retention in weighted subset selection and ensure planner minimum-selection backfill replaces non-explicit tools at limit when needed.
- Live strict validation rerun:
ad-eventlog-tool-capability-followthrough-10-turnpasses end-to-end (10/10) after explicit tool-id subset retention hardening. - Scenario-contract clarity hardening: host scenario execution contracts/retry prompts now emit forbidden input directives as
not-in [..](parser remains backward-compatible with legacy!=syntax) to avoid non-AD0 constraint inversion during repair. - Transcript replay guardrail scenario added and validated:
ad-other-dcs-transcript-replay-guardrail-10-turnpasses10/10, proving cross-DC continuation execution and explicit non-AD0 follow-up behavior under transcript wording. - Transcript fanout guardrail scenario added and validated:
ad-c400-transcript-cross-dc-fanout-10-turnpasses10/10, proving explicit non-AD0 4-host fanout after continuation wording that previously regressed into AD0-only replay loops. - Startup visibility hardening: startup/connect/reconnect status text now emits structured context tokens (
phase startup_*,cause ...) and connected bootstrap rewrites legacy cause-only suffixes into phase+cause form, so "runtime connected" no longer hides in-flight startup work. - Stabilization hotfix: no-text tool-output synthesis retry now runs only for review-loop, non-redacted turns with at least one successful tool output; deterministic fallback handles redacted/tool-failure paths without an extra model round.
- Follow-through quality hardening: no-text synthesis prompts now include compact executed tool-argument context (generic key/value summaries) to keep target/scope details available during retry synthesis.
- Startup UX hardening: header status chip fallback now consumes structured startup phase/cause context to render compact in-progress labels (
Loading tool packs,Sign in to continue loading tool packs,Starting runtime (retrying connection)). - Live strict rerun checkpoint after this batch:
ad-c400-transcript-cross-dc-fanout-10-turn(10/10),ad-eventlog-tool-capability-followthrough-10-turn(10/10),ad-ldap-go-ahead-followthrough-8-turn(8/8) all pass. - Host fallback decoupling cleanup: removed host runtime hardcoded tool-specific retry transforms (
ApplyAdDiscoveryRootDseFallback,ApplyAdReplicationProbeFallback,ApplyDomainDetectiveSummaryTimeoutFallback) and added architecture guardrail coverage to block reintroduction. - Typed-surface guardrail expansion:
SourceGuardrailTestsnow scans typed-pipeline tool wrappers pack-wide and fails if refactored tools reintroduce ad-hocarguments?.Get.../arguments.Get...parsing. - Typed-envelope increment:
ad_scope_discoverymigrated toToolResultV2success/error envelope path and included in typed-wrapper guardrail enforcement list. - Typed-envelope base hardening:
ActiveDirectoryToolBase*shared helpers now emitToolResultV2envelopes and are protected by guardrail coverage preventing directToolResponseregressions. - Startup runtime-connect visibility increment: service emits
[startup] provider_connect_progressphase/status/elapsed telemetry for runtime-provider connect attempts, and app status parsing publishes those lines (including send-time override) so first-turn connect stalls are diagnosable. - Decoupling guardrail increment: Chat architecture tests now block hardcoded tool-pack ids from reappearing in runtime app/service source (
testimox,active_directory,adplayground,domaindetective,dnsclientx,reviewer_setup). - Contract-first routing increment: routing-scoring pack hints now derive from explicit routing contract pack ids only (removed
ToolSelectionMetadata.TryResolvePackId(...)fallback), and architecture guardrails lock this behavior. - Scenario-contract reliability increment:
ad-eventlog-tool-capability-followthrough-10-turnavailability assertion now accepts semantic wording variants (eventlogorevent log) to avoid non-behavioral phrase drift failures. - Live strict rerun validation:
ad-eventlog-tool-capability-followthrough-10-turnpasses end-to-end (10/10) after contract-only routing-hint cleanup and scenario assertion hardening. - Contract-first routing decoupling increment: removed Chat-side hardcoded compound-pack token heuristic (
ToolSelectionMetadata.IsKnownCompoundPackRoutingCompact) from routing tokenization and added architecture guardrail coverage. - Live strict rerun validation: transcript follow-up guardrail scenarios stay green after compound-token heuristic removal (
ad-eventlog-tool-capability-followthrough-10-turn10/10,ad-other-dcs-transcript-replay-guardrail-10-turn10/10). - Stabilization hardening: compact continuation recovery now treats linked structured deferred-execution drafts as execution-nudge eligible (language-neutral shape checks), preventing
go aheadturns from ending on evidence-only summaries with zero in-turn tool activity. - Live strict rerun validation:
ad-ldap-go-ahead-followthrough-8-turnpasses end-to-end (8/8) after compact-follow-up structured-draft recovery hardening. - Regression fix (2026-03-04): no-text finalize path no longer triggers an extra synthesis model request for redacted/tool-failure turns; end-to-end regressions restored (
RunChatOnCurrentThreadAsync_DoesNotAutoSwitchPacksAfterToolFailure,RunChatOnCurrentThreadAsync_RedactsToolOutputRecoveryFallbackWhenRedactionEnabled). - Live strict rerun validation (2026-03-04):
ad-c400-transcript-cross-dc-fanout-10-turn(10/10),ad-eventlog-tool-capability-followthrough-10-turn(10/10), andad-ldap-go-ahead-followthrough-8-turn(8/8) pass after the no-text recovery gating fix. - Startup UX wording hardening (2026-03-04): Shell header status now rewrites generic connected unauthenticated
Sign in to continueintoSign in to continue loading tool packswhile startup tools-loading is pending. - Language-neutral strict scenario increment (2026-03-04): added transcript-derived Polish scenario
ad-pl-eventlog-capability-followthrough-10-turnto lock AD-to-EventLog capability follow-through without cached-evidence/no-tool fallback regressions. - Live strict validation (2026-03-04):
ad-pl-eventlog-capability-followthrough-10-turnpasses end-to-end (10/10) with strict tool call/output pairing and no duplicate call/output ids. - Startup visibility hardening (2026-03-04): send-safe bootstrap progress phases now publish during startup turn/send waits, and connected sessions continue surfacing send-safe startup statuses even when metadata-sync flags have transient lag.
- Contract-first domain-intent hardening (2026-03-04): removed raw tool-name domain-family fallback from
ToolRouting.ResolveDomainIntentFamily(string toolName)so family resolution relies on registered definition/catalog contracts; architecture guardrail now enforces this method-level boundary. - Transcript-snippet scenario hardening (2026-03-04): updated
ad-pl-eventlog-capability-followthrough-10-turnwith the original multiline descriptor turn (eventlog_evtx_query · Event Log (EventViewerX) ...) that previously regressed into cached-evidence fallback. - Live strict rerun validation (2026-03-04):
ad-pl-eventlog-capability-followthrough-10-turnpasses end-to-end (10/10) after descriptor-snippet hardening, with no cached-evidence fallback output on the explicit capability turn. - Documentation increment (2026-03-04): published
InternalDocs/agent-playbooks/chat-pack-contract-first-onboarding.mdwith zero-Chat-edits onboarding flow and plugin contract schema examples. - Language-neutral regression checkpoint (2026-03-04):
ChatServiceRoutingTrimTestssuite passes (709/709), covering Unicode ordinal parsing and compact follow-up routing behaviors. - Routing heuristic-removal checkpoint (2026-03-04): no Chat service domain-routing paths call raw tool-name family inference (
TryResolveDomainIntentFamily(toolName, ...)); domain routing now remains contract-first with definition metadata fallback only. - Decision checkpoint (2026-03-04): strict pack-boundary isolation is considered complete and locked by cross-pack orchestration catalog tests (no inferred ADPlayground↔DomainDetective handoff without explicit contracts).
- Typed-surface adapter increment (2026-03-04): introduced
ToolRequestAdapter<TRequest>andToolBaseadapter overload inIntelligenceX.Tools.Common, then migratedDnsClientXPackInfoToolandDomainDetectivePackInfoToolto use the adapter path with dedicated unit coverage. - Typed-surface adapter increment (2026-03-04): migrated
DomainDetectiveChecksCatalogTool,ReviewerSetupPackInfoTool, andReviewerSetupContractVerifyToolto typed binder/adapter pipelines usingToolResultV2envelopes; added source guardrails to keep these wrappers free of rawarguments?.Get*parsing. - Typed-surface backfill increment (2026-03-04): migrated
SystemBitlockerStatusTool,SystemInstalledApplicationsTool,SystemNetworkAdaptersTool,SystemPatchComplianceTool,EventLogChannelListTool, andEventLogProviderListToolto typed pipeline binders; source guardrails now enforce typed binder usage across migrated non-AD pack folders. - Documentation cleanup increment (2026-03-04): updated ADR wording (
adr-0001-chat-tools-contract-boundary.md) so Chat cross-pack fallback references are historical/current-state accurate after fallback-engine removal. - Fallback-marker cleanup checkpoint (2026-03-04): verified Chat runtime service source contains no legacy cross-pack fallback markers/constants; active projection-fallback diagnostics (
projection_fallback_*) remain intentionally for view-repair transparency.
- Startup preview truthfulness hardening (2026-03-29): persisted descriptor-preview startup now surfaces explicit descriptor-preview telemetry/cache-mode/status detail instead of replaying stale full-bootstrap timings from older live cache payloads.
- Startup phase telemetry rename (2026-03-29): live bootstrap now reports
descriptor_discovery,pack_activation, andregistry_activation_finalizephases, and app-facing summary/detail text now matches that descriptor-first lifecycle. - Startup disabled-pack activation trim (2026-03-29): disabled known built-ins now publish descriptor-only availability without entering raw
pack_load_progressactivation steps on the default bootstrap path, so pack toggles reduce bootstrap churn instead of only hiding the pack later. - Versioned descriptor snapshot contract (2026-03-29): persisted tooling bootstrap cache now writes and prefers an explicit versioned descriptor/capability snapshot payload for preview-safe metadata responses, with legacy fallback and fail-open diagnostics when nested descriptor snapshot schema drifts.
- Workspace output probing hardening (2026-03-29): built-in pack assembly resolution now respects
EnableWorkspaceBuiltInToolOutputProbingon the actual load path, keeping workspace/project-output fallback strictly opt-in for dev/bootstrap-repair scenarios. - Built-in discovery metadata-first refactor (2026-03-29): default built-in assembly discovery now starts from known built-in descriptor metadata and only falls back to dependency-context / embedded-manifest reads for compatibility, reducing coupling to the embedded manifest as a primary source.
- Descriptor-first first-paint startup (2026-03-29): startup and post-connect app flows now accept persisted-preview
hellopolicy snapshots as enough metadata for first paint, seed plugin metadata fromhello, and defer finallist_toolsreplacement work to the existing background refresh path. - Activation-on-demand status truthfulness (2026-03-29): deferred chat pack activation now emits explicit in-progress routing status before descriptor-matched or handoff-target pack activation begins, so users no longer only see the activation after live schemas are already ready.
- Phase-based bootstrap reporting compatibility (2026-03-29): app and shell bootstrap summaries now resolve canonical
descriptor_discovery,pack_activation, andregistry_activation_finalizetimings from phase telemetry instead of depending on legacypackLoadMs/packRegisterMsfield names. - Startup release-gate coverage expansion (2026-03-29): app-side tests now lock zero-pack startup sparsity, plugin-only persisted-preview metadata/snapshot projection, and descriptor-first shell summary labels without relying on the currently blocked shared Chat test project.
- Known built-in pre-activation filtering (2026-03-29): default startup now excludes disabled known built-in assemblies from live discovery before assembly/type reflection work, while still publishing descriptor-only disabled availability and truthful on-demand disabled-pack results.
- Workspace probing opt-in hardening (2026-03-29): workspace/project-output assembly fallback now remains behind explicit opt-in/test-only workspace-root helpers, and known built-ins selected without a resolvable trusted path now surface unavailable availability instead of disappearing from on-demand activation results.
- Structured-only runtime self-report checkpoint (2026-03-29): app/shared runtime-introspection classifiers now require trusted
ix:runtime-self-report:v1metadata instead of inferring runtime mode from rawmodel/toolscue words, with legacy lexical-fallback prompt behavior covered only through explicit precomputed-analysis tests. - Multilingual prompt-mode release-gate checkpoint (2026-03-29): app tests now lock non-English broad capability asks into
assistant_capability_questionmode and keep plain cue-word runtime asks out of implicit runtime self-report mode unless trusted runtime directive metadata is present. - Literal-confirmation routing-prelude checkpoint (2026-03-29): shared Chat tests now prove bare
go ahead/go ahead?inputs do not gain structured continuation context or compact-follow-up treatment unless real pending-action or continuation state is already present. - Shape-based capability boundary checkpoint (2026-03-29): app capability-question classification now relies on open-ended question shape instead of exact English internal-noun blockers, with focused regressions proving broad capability asks still enter
assistant_capability_questionmode while short runtime/inventory asks stay out. - Obsolete runtime cue-catalog removal checkpoint (2026-03-30): deleted the unused
RuntimeSelfReportCueCatalogEnglish noun blocker and the shared host test that froze it, leaving runtime/capability handling on the newer shape/directive paths only. - Contract-backed host-target hint checkpoint (2026-03-30): host retry/scenario prompts now require concrete thread-sourced host target values before reusing them and otherwise instruct the model to ask for the minimal missing target input instead of inferring or defaulting a host from prose.
- Known-host hint consolidation checkpoint (2026-03-30): removed duplicate “remaining discovered DCs/hosts” prose from host correction prompts and kept distinct-target guidance only inside the structured known-host hint path.
- Ordered known-host summary checkpoint (2026-03-30): the structured known-host hint now reports ordered distinct host/DC candidates from prior tool inputs instead of keeping a second narrative distinct-coverage rule in the prompt, with a focused regression for duplicate filtering and old-phrase removal.
- No-text warning known-host checkpoint (2026-03-30): host no-text fallback warnings now include the same structured known-host/DC summary when prior thread targets exist, keeping recovery warnings aligned with the contract/locality hint surface instead of dropping concrete target context.
- Canonical startup phase-duration contract checkpoint (2026-03-31): startup bootstrap telemetry now includes pre-resolved canonical descriptor/activation/finalize durations from C#, and the shell summary reads those canonical fields instead of maintaining its own legacy alias fallback mapping.
- Persisted preview fingerprint alignment checkpoint (2026-03-31): tooling bootstrap cache now stores the deferred descriptor-preview fingerprint computed from bootstrap options, so persisted preview validation no longer rejects valid fast-path snapshots just because the live snapshot carried a fuller tool-definition surface.
- Split tooling lifecycle into explicit
descriptor_discoveryandpack_activationphases with telemetry, tests, and release budgets for both. - Move pack enablement filtering ahead of activation work so disabled packs stop paying constructor/reflection cost on the default startup path.
- Keep workspace/project-output assembly probing available only for explicit dev/bootstrap-repair modes, not on the default warm path.
- Add activation-on-demand for descriptor-known packs/tools selected after preview startup.
- Remove English cue-word routing from runtime/capability classification and replace it with structure-first + contract-backed behavior.
- Keep each PR focused to one objective and one rollback boundary.
- No net-new Chat hardcoded fallback logic during migration.
- Prefer additive contracts first, then Chat rewiring, then deletion.
- Every deletion PR must include equivalent or stricter tests.
Files:
PLAN.mdPLAN-EXECUTION-ORDER.mdTODO.mdInternalDocs/architecture/adr-0001-chat-tools-contract-boundary.md
Checklist:
- Add ADR for contract-first boundary.
- Add migration tracker entries to
TODO.md. - Confirm maintainers accept no-legacy direction (drop Chat fallback engine target).
Files:
IntelligenceX.Chat/IntelligenceX.Chat.Tests/ChatFallbackArchitectureGuardrailTests.cs
Checklist:
- Add test to freeze current
TryBuildCross*Fallback*method set (only allowed to shrink). - Add test to block
PackIdMatches(...)usage outside legacy fallback file. - Add test to freeze hardcoded pack-id set in legacy fallback file.
Files (expected):
IntelligenceX/Tools/ToolRoutingContract.csIntelligenceX/Tools/ToolDefinition.csIntelligenceX/Tools/ToolRegistry.cs- New contract files in
IntelligenceX/ToolsandIntelligenceX.Tools.Common
Checklist:
- Add role/setup/handoff/recovery contract types.
- Wire validation in
ToolRegistry. - Keep backward-compatible defaults temporarily only if needed for staged migration.
Dependency: PR 1
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolRoutingCatalogDiagnostics.csIntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.PolicyAndTypes.csIntelligenceX.Chat/IntelligenceX.Chat.App/Ui/Shell.10.core.js
Checklist:
- Extend diagnostics with new contract health fields.
- Expose diagnostics in session policy payload.
- Add runtime policy toggle (
--require-explicit-routing-metadata) with profile persistence and session policy exposure. - Render new health signals in UI policy panel.
Dependency: PR 2
Files (expected):
- New catalog builder in
IntelligenceX.Chat.ToolingorIntelligenceX.Chat.Service ChatServiceSession.ProfilesAndModels.csChatServiceSession.cs
Checklist:
- Build runtime orchestration catalog from registry definitions/contracts.
- Replace direct suffix/prefix pack derivation consumers with catalog lookups where possible.
- Keep behavior equivalent before deleting legacy code.
Dependency: PR 2
Files:
ChatServiceSession.PackPreflight.cs- Related tests in
IntelligenceX.Chat.Tests
Checklist:
- Replace
_pack_info/_environment_discoversuffix selection with role metadata selection. - Keep existing semantics for required-arg checks and remembered successful preflight calls.
Dependency: PR 4
Files:
ChatServiceSession.ChatRouting.RoutingScoring.csChatServiceSession.ToolRouting.Secondary.csChatServiceSession.ToolRouting.DomainIntentSignals.cs- Related tests
Checklist:
- Replace name-based family keys with contract pack/role/family metadata.
- Keep Unicode/language-neutral behavior and existing pending-action ordinal support.
Dependency: PR 4
Files:
IntelligenceX.Tools.ADPlayground/*IntelligenceX.Tools.DomainDetective/*IntelligenceX.Tools.DnsClientX/*IntelligenceX.Tools.Tests/*
Checklist:
- Add explicit role/handoff/setup/recovery contracts.
- Add tests proving AD vs public-domain separation unless explicit handoff contract exists.
Dependency: PR 2
Files:
ChatServiceSession.PackCapabilityFallback.csChatServiceSession.HostHints.csChatServiceSession.ChatRouting.NoExtractedFinalize.csChatServiceSession.csChatServiceSession.ProfilesAndModels.cs- Heavy test cleanup in
ChatServiceRoutingTrimTests.ToolNudge.*PackFallback*.cs
Checklist:
- Remove fallback replay path and fallback contract cache.
- Remove cross-pack builder methods and helper methods.
- Delete or rewrite fallback-specific tests.
- Keep model retry/review loops intact.
Dependency: PR 5, PR 6, PR 7
Files:
IntelligenceX.Tools.System/*IntelligenceX.Tools.EventLog/*IntelligenceX.Tools.TestimoX/*IntelligenceX.Tools.FileSystem/*IntelligenceX.Tools.Email/*IntelligenceX.Tools.PowerShell/*IntelligenceX.Tools.OfficeIMO/*
Checklist:
- Complete explicit contracts for all packs/tools.
- Ensure tool wrappers remain thin and engine-first.
Dependency: PR 2
Files:
IntelligenceX.Tools.Common/*(typed adapters/helpers)- Analyzer/guardrail tests in
IntelligenceX.Tools.Testsor shared analyzer project
Checklist:
- Prefer typed binders for all migrated/refactored tools.
- Add guardrail to flag ad-hoc direct argument parsing in target packs.
- Standardize on
ToolResultV2for migrated paths. - Wave-2 typed migration batch completed for: pack/discovery tools,
SystemDevicesSummary,SystemHardwareIdentity,SystemHardwareSummary,SystemInfo,SystemBiosSummary,SystemSecurityOptions,SystemBootConfiguration,SystemRdpPosture,SystemSmbPosture,SystemFeaturesList,SystemUpdatesInstalled,SystemPatchDetails,SystemDisksList,SystemLogicalDisksList,SystemPortsList,SystemProcessList,SystemFirewallProfiles,SystemFirewallRules,SystemServiceList,SystemScheduledTasksList,SystemTimeSync,SystemWhoAmI,WslStatus,SystemBitlockerStatus,SystemInstalledApplications,SystemNetworkAdapters,SystemPatchCompliance,FsList/FsRead/FsSearch,EventLogNamedEventsCatalog,EventLogNamedEventsQuery,EventLogLiveQuery,EventLogTopEvents,EventLogLiveStats,EventLogEvtxFind,EventLogEvtxQuery,EventLogEvtxStats,EventLogEvtxSecuritySummary,EventLogChannelList,EventLogProviderList,TestimoXRulesList,TestimoXRulesRun,PowerShellRun,EmailImapSearch,EmailImapGet,OfficeImoRead, plus Wave-1/AD carryover typed migrations:DomainDetectivePackInfo,DomainDetectiveChecksCatalog,DomainDetectiveNetworkProbe,DomainDetectiveDomainSummary,DnsClientXPackInfo,DnsClientXQuery,DnsClientXPing,AdPackInfo,AdMonitoringProbeCatalog,AdRecycleBinLifetime,AdGroupMembers,AdGroupMembersResolved,AdGroupsList,AdAdminCountReport,AdGpoChanges,AdGpoList,AdGpoInventoryHealth,AdGpoHealth,AdGpoPermissionReport,AdGpoPermissionConsistency,AdLdapQuery,AdLdapQueryPaged,AdObjectGet,AdObjectResolve,AdSearch,AdSpnSearch,AdDomainAdminsSummary,AdPrivilegedGroupsSummary,AdStaleAccounts,AdUsersExpired,AdWhoAmI,AdDomainInfo,AdLdapDiagnostics,AdSearchFacets,AdReplicationSummary,AdReplicationConnections,ReviewerSetupPackInfo, andReviewerSetupContractVerify.
Dependency: PR 7, PR 9
Files:
- Remaining stale constants/docs/tests across Chat/Tools.
Checklist:
- Remove obsolete telemetry markers/constants tied to deleted fallback engine.
- Close migration tracker entries.
- Validate final DoD from
PLAN.md.
Dependency: PR 8, PR 10
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceServer.csIntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.csIntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap*.csIntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow*.csPLAN.mdPLAN-EXECUTION-ORDER.md
Checklist:
- Move tooling bootstrap out of per-connection session constructor into reusable runtime cache/lifecycle.
- Keep startup status truthful: do not surface "ready" semantics until metadata/auth probes settle or explicitly fail-open with reason.
- Replace hardcoded known-pack bootstrap chain with descriptor/manifest-driven registration.
- Remove/rename fallback-era host-hint file so architecture guardrails match current source layout.
- Add regression tests for reconnect warm path and multi-turn follow-up carryover against host scope changes.
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ToolRouting.DomainIntentAffinity.csIntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ChatRouting.NoExtractedFinalize.csIntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow.Messaging.Connection.csIntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow.StartupReadiness.csIntelligenceX.Chat/IntelligenceX.Chat.Tests/ChatServiceRoutingTrimTests.*
Checklist:
- Escape continuation subset reuse for language-neutral tool-capability question turns (even without explicit tool-id literals) to restore cross-pack follow-up awareness.
- Extend startup status/debug timeline with churn cause labels (
auth_wait,pipe_retry,metadata_retry,runtime_disconnect) so reconnect loops are diagnosable from UI alone. - Add finalize-path regression that proves contextual follow-up scope shifts are evaluated from raw user intent and cannot replay stale single-host next actions.
- Validate with targeted chat tests + catalog validation before PR open.
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceToolingBootstrapCache.csIntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.csIntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap*.csIntelligenceX.Chat/IntelligenceX.Chat.Abstractions/*IntelligenceX.Chat/IntelligenceX.Chat.Tests/*
Checklist:
- Introduce persisted descriptor/capability snapshot contract that does not require full pack activation to answer startup-safe metadata requests.
- Make
hello,list_tools, and app first-paint able to use descriptor-preview state without pretending full activation has completed. - Split startup telemetry/status into
descriptor_discovery,descriptor_cache_hit,pack_activation, andregistry_activation_finalize. - Add zero-pack, plugin-only, and persisted-preview startup tests.
- Add versioned snapshot compatibility tests and fail-open diagnostics for stale preview payloads.
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ProfilesAndModels.csIntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap.csIntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap.RegistryAndReflection.csIntelligenceX.Chat/IntelligenceX.Chat.Tooling/PluginFolderToolPackLoader*.csIntelligenceX.Chat/IntelligenceX.Chat.Tests/*
Checklist:
- Refactor cache-key validation so persisted preview reuse does not require full discovery fingerprint recomputation on the warm path.
- Filter disabled packs before constructor/reflection-heavy activation work for built-in and plugin packs.
- Make workspace
binprobing and similar repair-oriented scans opt-in for dev/bootstrap recovery only. - Add startup profiler assertions proving pack disablement reduces warm-path activation cost.
- Add regression coverage for plugin roots/manifests changing independently from activation cache contents.
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.csIntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceServer.csIntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow*.csIntelligenceX.Chat/IntelligenceX.Chat.Tests/*
Checklist:
- Add activation-on-demand for descriptor-known packs/tools when a turn first selects a capability area that is not yet activated.
- Emit truthful status transitions while activation is pending so runtime does not claim full readiness early.
- Preserve deterministic routing/retry behavior across descriptor-only and activated states.
- Add first-activation latency and activation-failure regression tests.
Files (expected):
IntelligenceX.Chat/IntelligenceX.Chat.Abstractions/RuntimeSelfReportTurnClassifier.csIntelligenceX.Chat/IntelligenceX.Chat.App/ConversationTurnShapeClassifier.csIntelligenceX.Chat/IntelligenceX.Chat.App.Tests/*IntelligenceX.Chat/IntelligenceX.Chat.Tests/*
Checklist:
- Remove English runtime cue-word control flow from app/shared classifiers while preserving natural English prompt text/examples.
- Keep routing/safety/execution decisions structure-first and contract-backed instead of literal phrase driven.
- Add multilingual runtime-introspection/capability tests that do not rely on borrowed English nouns.
- Add negative tests proving literal confirmations like
go aheaddo not become privileged routing signals without structured pending-action state.
- Track A can run PR 2, PR 7, PR 9, PR 10.
- Track B can run PR 3, PR 4, PR 5, PR 6.
- Track C can continuously update tests/docs in PR 1 and PR 11.
- Track D can run PR 14, PR 15, and PR 16 once current bootstrap contracts stay stable.
- Track E can run PR 17 in parallel with PR 14/15 because it is mostly classifier/test surface work.
- Critical merge gate before lazy activation rollout: PR 14 + PR 15 must be merged.
- Critical merge gate before full release closeout: PR 16 + PR 17 must be merged alongside final DoD validation.
chore/chat-tools-contract-adrtest/chat-fallback-guardrailsfeat/tool-contract-role-setup-handoff-recoveryfeat/chat-orchestration-catalogrefactor/chat-preflight-contract-rolerefactor/chat-routing-contract-taxonomyfeat/tools-wave1-ad-dd-dnsx-contractsrefactor/chat-remove-pack-fallback-enginefeat/tools-wave2-pack-contractsrefactor/tools-typed-surface-enforcementchore/chat-tools-decoupling-cleanupfeat/chat-descriptor-preview-startupperf/chat-warm-path-bootstrap-costfeat/chat-lazy-pack-activationrefactor/chat-language-neutral-classifiers
- Checkpoint A (after PR 4): catalog live, no behavior deletion yet.
- Checkpoint B (after PR 6 + PR 7): contract-driven selection/preflight proven in tests.
- Checkpoint C (after PR 8): Chat fallback engine removed.
- Checkpoint D (after PR 14): descriptor-preview startup is live and truthful.
- Checkpoint E (after PR 15): warm-path bootstrap budgets improve and disabled packs reduce startup cost.
- Checkpoint F (after PR 16 + PR 17): lazy activation plus language-neutral classifier cleanup are green.
- Checkpoint G (after final cleanup): all DoD checks complete.