Skip to content

Latest commit

 

History

History
496 lines (379 loc) · 45.2 KB

File metadata and controls

496 lines (379 loc) · 45.2 KB

Chat/Tools Decoupling Execution Order

Goal

Execute PLAN.md in small, merge-safe increments with clear dependencies, parallel work, and explicit stop points.

Progress Update (2026-03-02)

  • PR #985 merged (2f649d62d164185755c19e33906a7064ae4ff132): contract-first pack toggles, startup bootstrap visibility, and migration hardening are now on master.
  • PR #986 merged (4db3931ac8cbb753da7ff160311a4f8e0d6904a3): removed planner prompt pack-hint inference (pack/pack_aliases) to keep Chat planner context generic.

Audit Update (2026-03-03)

  • Decoupling cleanup: AD domain guardrail user hint text no longer hardcodes tool ids (ad_scope_discovery/ad_domain_controllers).
  • Stabilization hotfix: finalize-time host structured next-action replay now rejects stale single-host replays when user/assistant host hints indicate different or multi-host scope.
  • Stabilization hotfix: finalize-time scope-shift guard now evaluates raw user intent (instead of routed rewrite payload), reducing stale AD0 replay on contextual compact follow-ups.
  • Stabilization hotfix: structured-next-action carryover replay now blocks stale self-loop replays and host-hint-conflicting replays.
  • Stabilization hotfix: startup deferred metadata no longer skips metadata sync solely due to initial unauthenticated state.
  • Stabilization hotfix: bootstrap progress status can publish while connected startup metadata sync is still active.
  • Closed migration gap: ToolPackBootstrap now performs descriptor-driven built-in pack discovery (no hardcoded per-pack bootstrap chain).
  • Closed migration gap: host-hint helpers moved to ChatServiceSession.HostHints.cs (fallback-era naming removed).
  • Closed startup perf gap (warm path): server-scoped tooling bootstrap snapshot cache now avoids repeated full bootstrap during reconnect/session churn.
  • Stabilization hotfix: carryover structured-next-action replay now accepts compact non-question follow-ups without requiring continuation-expansion text rewrites.
  • Stabilization hotfix: duplicate final chat_result publishes are suppressed per request/thread/text at the service writer boundary.
  • Stabilization hotfix: connected session status now stays in startup-pending mode until metadata/tool-pack readiness settles.
  • Stabilization regression: added end-to-end two-turn go ahead carryover replay test to prevent follow-up execution stalls.
  • Stabilization hotfix: compact follow-up question turns no longer force blocker/cached-evidence finalize rewrites, preserving direct tool-capability answers.
  • Stabilization hotfix: cached evidence fallback now requires explicit tool-name match when request text references a concrete tool id.
  • Stabilization hotfix: deferred startup metadata sync now waits for authenticated state and is re-queued after login completion.
  • Stabilization hotfix: host structured next-action auto-replay now blocks same-tool/same-arguments self-loops (next_action_self_loop) to prevent repeated AD0-style churn.
  • Stabilization hotfix: carryover structured-next-action auto-replay now suppresses repeated identical tool+args replays until fresh context is provided (or explicit host pin matches).
  • Stabilization hotfix: carryover host-hint mismatch detection now consumes assistant-draft host targets as contextual hints, blocking stale single-host replay after multi-host continuation guidance.
  • Stabilization hotfix: carryover host-hint gating now rejects single-host auto-replay whenever follow-up context contains multi-host hints (including mixed stale/fresh host mentions).
  • Stabilization hotfix: carryover replay freshness/scope-shift guards now read raw user request text (assistant draft host hints remain mismatch-only), preventing repeated AD0 replay caused by assistant-draft host echo.
  • Stabilization hardening: carryover replay now uses separate replay-intent and host-hint inputs (no in-band text marker coupling), reducing marker-collision risk while keeping assistant-draft mismatch safeguards.
  • Startup stabilization hotfix: transient reconnects now preserve interactive auth state when appropriate (authenticated or login-in-progress without explicit unauthenticated probe), reducing sign-in/connect churn.
  • Startup visibility hotfix: connect stage now publishes per-attempt retry/timeout/delay progress status during pipe-connect retries.
  • Validation checkpoint: analyze validate-catalog currently reports pass (0 error(s), 0 warning(s)) on this branch.
  • Stabilization hotfix: contextual follow-up detection now reads the Follow-up: tail from legacy continuation expansion before carryover replay decisions.
  • Stabilization cleanup: removed standalone lowercase ad lexical alias auto-routing from domain-intent signal resolution.
  • Stabilization hotfix: continuation subset reuse now skips when follow-up explicitly references a tool outside the remembered subset, enabling fresh cross-pack tool routing.
  • Startup visibility hotfix: header chip diagnostics now maintain a bounded runtime lifecycle timeline (tooltip + debug panel) across connect/auth/bootstrap transitions.
  • Stabilization hotfix: routing-meta activity timeline labels now include strategy + selected/total tool counts for explicit route-stage observability.
  • Stabilization hotfix: explicit tool-capability questions now bypass finalize-time execution-blocker cached-evidence substitution, preventing stale evidence fallbacks on tool_name? clarification turns.
  • Stabilization hotfix: carryover structured-next-action auto-replay now skips compact contextual scope-shift follow-ups (for example "other DCs"), forcing fresh routing instead of stale single-host replay.
  • Startup perf hotfix: plugin duplicate packs are now skipped via loaded-assembly fast-path before dependency preload/reflection, cutting avoidable bootstrap latency on first metadata sync.
  • Stabilization hotfix: explicit tool-id follow-ups now bypass pending-action/carryover auto-replay rewrites, and escaped Markdown tool ids (for example eventlog\_evtx\_query) are honored by cached-evidence gating.
  • Startup resilience hotfix: deferred metadata sync phases now retry once on transient disconnects (hello/list_tools/auth_refresh), reducing cold-start states where runtime is connected but catalog/policy sync is missing.
  • Startup/turn UX hotfix: final assistant replacement now targets the latest assistant row when only System/Tools rows followed (no newer user turn), reducing duplicate assistant finals across reconnect/retry churn.
  • Stabilization hotfix: carryover structured-next-action replay now evaluates compact follow-up eligibility from raw user text (instead of routed payload rewrite text), restoring queued go ahead follow-up execution.
  • Stabilization hotfix: domain-intent payload parsing now safely handles invalid UTF-16 input (ArgumentException + JsonException) to avoid compact follow-up expansion crashes.
  • Contract-alignment cleanup: updated routing/output lifecycle tests to match strict routing metadata requirements and one-meaningful-final-per-request output policy.
  • Startup stability hotfix: deferred startup metadata sync now reruns when login success arrives during an in-flight metadata sync, preventing dropped post-login hello/list_tools/auth_refresh refreshes and stale tool visibility.
  • Stabilization hotfix: continuation subset follow-ups now treat escaped Markdown tool ids as explicit tool references, preventing stale subset reuse when users switch to tools outside the remembered subset.
  • Stabilization regression coverage: startup metadata rerun scheduling now includes dedicated dispatch-gating tests for shutdown/connectivity safety.
  • Stabilization hotfix: quoted/multiline tool-descriptor references now keep explicit tool-capability routing (no finalize-time cached-evidence rewrite), and explicit tool-id extraction now strips invisible Unicode format chars to keep descriptor parsing robust.
  • Startup/dispatch stabilization hotfix: app turn dispatch startup/send transitions now claim lifecycle state atomically under lock, preventing sign-in/manual-send double-dispatch races and duplicate assistant finals.
  • Stabilization hotfix: contextual compact follow-up questions now block stale single-host carryover replay when thread evidence is multi-host, while short acknowledgement questions stay replay-eligible.
  • Startup UX hotfix: login-completed status now queues deferred startup metadata sync before publishing connected status, avoiding transient ready-state flicker while startup sync is still pending.
  • Startup UX hotfix: bootstrap progress emitted while already connected now keeps Runtime connected... phrasing (plus cause metadata_sync) instead of regressing to Starting runtime... wording.
  • Stabilization regression coverage: finalize host scope-shift user-request resolution now has explicit tests proving raw user intent takes precedence over routed rewrite text.
  • Live strict scenario validation: ad-ad0-then-all-dcs-followthrough-10-turn passes end-to-end with cross-DC fanout and strict call/output pairing.
  • Live strict scenario validation: ad-eventlog-tool-capability-followthrough-10-turn passes end-to-end and explicitly blocks cached-evidence fallback responses for direct eventlog_evtx_query capability questions.
  • Stabilization hotfix: domain-intent action catalog now preserves all declared same-family action ids as valid /act aliases independent of definition order; canonical family action ids are deterministic and ambiguous cross-family ids do not use first-wins suppression.
  • Live strict scenario validation: transcript-derived ad-other-dcs-go-ahead-followthrough-10-turn passes end-to-end, covering continuation-style go ahead execution across multiple DC hosts and explicit eventlog_evtx_query capability follow-ups.
  • Scenario-contract hardening: host scenario contracts now support forbidden tool-input values (forbid_tool_input_values / forbidden_tool_inputs) and enforce them during retry repair, fallback host patching, and assertion evaluation.
  • Transcript guardrail hardening: ad-other-dcs-go-ahead-followthrough-10-turn continuation turns now include explicit non-AD0 host exclusions, and catalog strictness tests lock those exclusions.
  • Transcript-derived strict scenario seed added: ad-domainwide-reboot-followthrough-10-turn (AD0 reboot baseline -> non-AD0 domain-wide continuation + explicit eventlog_evtx_query capability question + DNS cross-pack turn).
  • Forbidden-input equivalence hardening: scenario contract enforcement now treats short-host and FQDN forms as equivalent when applying forbidden host targets (for example AD0 blocks AD0.ad.evotec.xyz) across repair/fallback/assertion paths.
  • Live strict rerun passed: ad-domainwide-reboot-followthrough-10-turn now completes 10/10 turns with non-AD0 continuation turns preserving host exclusions after input-repair fallback.
  • Startup/send race hardening: manual resend now skips enqueue when an equivalent queued-after-login prompt is already in-flight, reducing duplicate assistant replies after switch-account recovery.
  • Transcript phrase lock-in: strict cross-DC follow-through scenarios now include "those are correct DCs, go ahead" continuation wording to exercise replay suppression under real-world follow-up phrasing.
  • Stabilization hotfix: domain host-scope guardrail now blocks stale single-host AD-scope replay on compact scope-shift follow-ups when thread evidence is multi-host, unless a single host is explicitly pinned by the user.
  • Stabilization regression coverage: domain host-scope guardrail now has explicit compact scope-shift replay tests (block stale replay, allow explicit host pin, allow short acknowledgement question).
  • Live strict validation rerun: transcript-derived follow-through scenarios (ad-other-dcs-go-ahead-followthrough-10-turn, ad-domainwide-reboot-followthrough-10-turn, ad-ad0-then-all-dcs-followthrough-10-turn) pass end-to-end (10/10) after this hardening.
  • Stabilization hotfix: ad_monitoring_probe_run ADWS port normalization now keeps default 9389 when non-positive port is supplied, preventing false endpoint probes on :1.
  • Transcript-derived strict scenario added: ad-ldap-go-ahead-followthrough-8-turn to lock continuation execution from scope confirmation into explicit LDAP diagnostics after compact go ahead.
  • Live strict validation: ad-ldap-go-ahead-followthrough-8-turn passes end-to-end (8/8) and asserts ADWS endpoint probes do not regress to :1/ActiveDirectoryWebServices.
  • Startup/send dedupe hardening: queued-after-login suppression now treats both-missing-conversation-id startup prompts as equivalent when normalized text matches, reducing duplicate post-login greeting replies.
  • Startup/send regression coverage expanded: queue dedupe tests now include explicit both-missing-conversation-id cases for in-flight queued-after-login manual-resend suppression.
  • Stabilization hotfix: weighted/planner subset routing now retains explicitly requested tool ids (including escaped markdown ids) in candidate selection, preventing false "tool inactive" responses for registered tools during follow-up turns.
  • Regression coverage expanded: planner/routing tests now lock explicit escaped tool-id retention in weighted subset selection and ensure planner minimum-selection backfill replaces non-explicit tools at limit when needed.
  • Live strict validation rerun: ad-eventlog-tool-capability-followthrough-10-turn passes end-to-end (10/10) after explicit tool-id subset retention hardening.
  • Scenario-contract clarity hardening: host scenario execution contracts/retry prompts now emit forbidden input directives as not-in [..] (parser remains backward-compatible with legacy != syntax) to avoid non-AD0 constraint inversion during repair.
  • Transcript replay guardrail scenario added and validated: ad-other-dcs-transcript-replay-guardrail-10-turn passes 10/10, proving cross-DC continuation execution and explicit non-AD0 follow-up behavior under transcript wording.
  • Transcript fanout guardrail scenario added and validated: ad-c400-transcript-cross-dc-fanout-10-turn passes 10/10, proving explicit non-AD0 4-host fanout after continuation wording that previously regressed into AD0-only replay loops.
  • Startup visibility hardening: startup/connect/reconnect status text now emits structured context tokens (phase startup_*, cause ...) and connected bootstrap rewrites legacy cause-only suffixes into phase+cause form, so "runtime connected" no longer hides in-flight startup work.
  • Stabilization hotfix: no-text tool-output synthesis retry now runs only for review-loop, non-redacted turns with at least one successful tool output; deterministic fallback handles redacted/tool-failure paths without an extra model round.
  • Follow-through quality hardening: no-text synthesis prompts now include compact executed tool-argument context (generic key/value summaries) to keep target/scope details available during retry synthesis.
  • Startup UX hardening: header status chip fallback now consumes structured startup phase/cause context to render compact in-progress labels (Loading tool packs, Sign in to continue loading tool packs, Starting runtime (retrying connection)).
  • Live strict rerun checkpoint after this batch: ad-c400-transcript-cross-dc-fanout-10-turn (10/10), ad-eventlog-tool-capability-followthrough-10-turn (10/10), ad-ldap-go-ahead-followthrough-8-turn (8/8) all pass.
  • Host fallback decoupling cleanup: removed host runtime hardcoded tool-specific retry transforms (ApplyAdDiscoveryRootDseFallback, ApplyAdReplicationProbeFallback, ApplyDomainDetectiveSummaryTimeoutFallback) and added architecture guardrail coverage to block reintroduction.
  • Typed-surface guardrail expansion: SourceGuardrailTests now scans typed-pipeline tool wrappers pack-wide and fails if refactored tools reintroduce ad-hoc arguments?.Get.../arguments.Get... parsing.
  • Typed-envelope increment: ad_scope_discovery migrated to ToolResultV2 success/error envelope path and included in typed-wrapper guardrail enforcement list.
  • Typed-envelope base hardening: ActiveDirectoryToolBase* shared helpers now emit ToolResultV2 envelopes and are protected by guardrail coverage preventing direct ToolResponse regressions.
  • Startup runtime-connect visibility increment: service emits [startup] provider_connect_progress phase/status/elapsed telemetry for runtime-provider connect attempts, and app status parsing publishes those lines (including send-time override) so first-turn connect stalls are diagnosable.
  • Decoupling guardrail increment: Chat architecture tests now block hardcoded tool-pack ids from reappearing in runtime app/service source (testimox, active_directory, adplayground, domaindetective, dnsclientx, reviewer_setup).
  • Contract-first routing increment: routing-scoring pack hints now derive from explicit routing contract pack ids only (removed ToolSelectionMetadata.TryResolvePackId(...) fallback), and architecture guardrails lock this behavior.
  • Scenario-contract reliability increment: ad-eventlog-tool-capability-followthrough-10-turn availability assertion now accepts semantic wording variants (eventlog or event log) to avoid non-behavioral phrase drift failures.
  • Live strict rerun validation: ad-eventlog-tool-capability-followthrough-10-turn passes end-to-end (10/10) after contract-only routing-hint cleanup and scenario assertion hardening.
  • Contract-first routing decoupling increment: removed Chat-side hardcoded compound-pack token heuristic (ToolSelectionMetadata.IsKnownCompoundPackRoutingCompact) from routing tokenization and added architecture guardrail coverage.
  • Live strict rerun validation: transcript follow-up guardrail scenarios stay green after compound-token heuristic removal (ad-eventlog-tool-capability-followthrough-10-turn 10/10, ad-other-dcs-transcript-replay-guardrail-10-turn 10/10).
  • Stabilization hardening: compact continuation recovery now treats linked structured deferred-execution drafts as execution-nudge eligible (language-neutral shape checks), preventing go ahead turns from ending on evidence-only summaries with zero in-turn tool activity.
  • Live strict rerun validation: ad-ldap-go-ahead-followthrough-8-turn passes end-to-end (8/8) after compact-follow-up structured-draft recovery hardening.
  • Regression fix (2026-03-04): no-text finalize path no longer triggers an extra synthesis model request for redacted/tool-failure turns; end-to-end regressions restored (RunChatOnCurrentThreadAsync_DoesNotAutoSwitchPacksAfterToolFailure, RunChatOnCurrentThreadAsync_RedactsToolOutputRecoveryFallbackWhenRedactionEnabled).
  • Live strict rerun validation (2026-03-04): ad-c400-transcript-cross-dc-fanout-10-turn (10/10), ad-eventlog-tool-capability-followthrough-10-turn (10/10), and ad-ldap-go-ahead-followthrough-8-turn (8/8) pass after the no-text recovery gating fix.
  • Startup UX wording hardening (2026-03-04): Shell header status now rewrites generic connected unauthenticated Sign in to continue into Sign in to continue loading tool packs while startup tools-loading is pending.
  • Language-neutral strict scenario increment (2026-03-04): added transcript-derived Polish scenario ad-pl-eventlog-capability-followthrough-10-turn to lock AD-to-EventLog capability follow-through without cached-evidence/no-tool fallback regressions.
  • Live strict validation (2026-03-04): ad-pl-eventlog-capability-followthrough-10-turn passes end-to-end (10/10) with strict tool call/output pairing and no duplicate call/output ids.
  • Startup visibility hardening (2026-03-04): send-safe bootstrap progress phases now publish during startup turn/send waits, and connected sessions continue surfacing send-safe startup statuses even when metadata-sync flags have transient lag.
  • Contract-first domain-intent hardening (2026-03-04): removed raw tool-name domain-family fallback from ToolRouting.ResolveDomainIntentFamily(string toolName) so family resolution relies on registered definition/catalog contracts; architecture guardrail now enforces this method-level boundary.
  • Transcript-snippet scenario hardening (2026-03-04): updated ad-pl-eventlog-capability-followthrough-10-turn with the original multiline descriptor turn (eventlog_evtx_query · Event Log (EventViewerX) ...) that previously regressed into cached-evidence fallback.
  • Live strict rerun validation (2026-03-04): ad-pl-eventlog-capability-followthrough-10-turn passes end-to-end (10/10) after descriptor-snippet hardening, with no cached-evidence fallback output on the explicit capability turn.
  • Documentation increment (2026-03-04): published InternalDocs/agent-playbooks/chat-pack-contract-first-onboarding.md with zero-Chat-edits onboarding flow and plugin contract schema examples.
  • Language-neutral regression checkpoint (2026-03-04): ChatServiceRoutingTrimTests suite passes (709/709), covering Unicode ordinal parsing and compact follow-up routing behaviors.
  • Routing heuristic-removal checkpoint (2026-03-04): no Chat service domain-routing paths call raw tool-name family inference (TryResolveDomainIntentFamily(toolName, ...)); domain routing now remains contract-first with definition metadata fallback only.
  • Decision checkpoint (2026-03-04): strict pack-boundary isolation is considered complete and locked by cross-pack orchestration catalog tests (no inferred ADPlayground↔DomainDetective handoff without explicit contracts).
  • Typed-surface adapter increment (2026-03-04): introduced ToolRequestAdapter<TRequest> and ToolBase adapter overload in IntelligenceX.Tools.Common, then migrated DnsClientXPackInfoTool and DomainDetectivePackInfoTool to use the adapter path with dedicated unit coverage.
  • Typed-surface adapter increment (2026-03-04): migrated DomainDetectiveChecksCatalogTool, ReviewerSetupPackInfoTool, and ReviewerSetupContractVerifyTool to typed binder/adapter pipelines using ToolResultV2 envelopes; added source guardrails to keep these wrappers free of raw arguments?.Get* parsing.
  • Typed-surface backfill increment (2026-03-04): migrated SystemBitlockerStatusTool, SystemInstalledApplicationsTool, SystemNetworkAdaptersTool, SystemPatchComplianceTool, EventLogChannelListTool, and EventLogProviderListTool to typed pipeline binders; source guardrails now enforce typed binder usage across migrated non-AD pack folders.
  • Documentation cleanup increment (2026-03-04): updated ADR wording (adr-0001-chat-tools-contract-boundary.md) so Chat cross-pack fallback references are historical/current-state accurate after fallback-engine removal.
  • Fallback-marker cleanup checkpoint (2026-03-04): verified Chat runtime service source contains no legacy cross-pack fallback markers/constants; active projection-fallback diagnostics (projection_fallback_*) remain intentionally for view-repair transparency.

Planning Extension (2026-03-25)

  • Startup preview truthfulness hardening (2026-03-29): persisted descriptor-preview startup now surfaces explicit descriptor-preview telemetry/cache-mode/status detail instead of replaying stale full-bootstrap timings from older live cache payloads.
  • Startup phase telemetry rename (2026-03-29): live bootstrap now reports descriptor_discovery, pack_activation, and registry_activation_finalize phases, and app-facing summary/detail text now matches that descriptor-first lifecycle.
  • Startup disabled-pack activation trim (2026-03-29): disabled known built-ins now publish descriptor-only availability without entering raw pack_load_progress activation steps on the default bootstrap path, so pack toggles reduce bootstrap churn instead of only hiding the pack later.
  • Versioned descriptor snapshot contract (2026-03-29): persisted tooling bootstrap cache now writes and prefers an explicit versioned descriptor/capability snapshot payload for preview-safe metadata responses, with legacy fallback and fail-open diagnostics when nested descriptor snapshot schema drifts.
  • Workspace output probing hardening (2026-03-29): built-in pack assembly resolution now respects EnableWorkspaceBuiltInToolOutputProbing on the actual load path, keeping workspace/project-output fallback strictly opt-in for dev/bootstrap-repair scenarios.
  • Built-in discovery metadata-first refactor (2026-03-29): default built-in assembly discovery now starts from known built-in descriptor metadata and only falls back to dependency-context / embedded-manifest reads for compatibility, reducing coupling to the embedded manifest as a primary source.
  • Descriptor-first first-paint startup (2026-03-29): startup and post-connect app flows now accept persisted-preview hello policy snapshots as enough metadata for first paint, seed plugin metadata from hello, and defer final list_tools replacement work to the existing background refresh path.
  • Activation-on-demand status truthfulness (2026-03-29): deferred chat pack activation now emits explicit in-progress routing status before descriptor-matched or handoff-target pack activation begins, so users no longer only see the activation after live schemas are already ready.
  • Phase-based bootstrap reporting compatibility (2026-03-29): app and shell bootstrap summaries now resolve canonical descriptor_discovery, pack_activation, and registry_activation_finalize timings from phase telemetry instead of depending on legacy packLoadMs/packRegisterMs field names.
  • Startup release-gate coverage expansion (2026-03-29): app-side tests now lock zero-pack startup sparsity, plugin-only persisted-preview metadata/snapshot projection, and descriptor-first shell summary labels without relying on the currently blocked shared Chat test project.
  • Known built-in pre-activation filtering (2026-03-29): default startup now excludes disabled known built-in assemblies from live discovery before assembly/type reflection work, while still publishing descriptor-only disabled availability and truthful on-demand disabled-pack results.
  • Workspace probing opt-in hardening (2026-03-29): workspace/project-output assembly fallback now remains behind explicit opt-in/test-only workspace-root helpers, and known built-ins selected without a resolvable trusted path now surface unavailable availability instead of disappearing from on-demand activation results.
  • Structured-only runtime self-report checkpoint (2026-03-29): app/shared runtime-introspection classifiers now require trusted ix:runtime-self-report:v1 metadata instead of inferring runtime mode from raw model / tools cue words, with legacy lexical-fallback prompt behavior covered only through explicit precomputed-analysis tests.
  • Multilingual prompt-mode release-gate checkpoint (2026-03-29): app tests now lock non-English broad capability asks into assistant_capability_question mode and keep plain cue-word runtime asks out of implicit runtime self-report mode unless trusted runtime directive metadata is present.
  • Literal-confirmation routing-prelude checkpoint (2026-03-29): shared Chat tests now prove bare go ahead / go ahead? inputs do not gain structured continuation context or compact-follow-up treatment unless real pending-action or continuation state is already present.
  • Shape-based capability boundary checkpoint (2026-03-29): app capability-question classification now relies on open-ended question shape instead of exact English internal-noun blockers, with focused regressions proving broad capability asks still enter assistant_capability_question mode while short runtime/inventory asks stay out.
  • Obsolete runtime cue-catalog removal checkpoint (2026-03-30): deleted the unused RuntimeSelfReportCueCatalog English noun blocker and the shared host test that froze it, leaving runtime/capability handling on the newer shape/directive paths only.
  • Contract-backed host-target hint checkpoint (2026-03-30): host retry/scenario prompts now require concrete thread-sourced host target values before reusing them and otherwise instruct the model to ask for the minimal missing target input instead of inferring or defaulting a host from prose.
  • Known-host hint consolidation checkpoint (2026-03-30): removed duplicate “remaining discovered DCs/hosts” prose from host correction prompts and kept distinct-target guidance only inside the structured known-host hint path.
  • Ordered known-host summary checkpoint (2026-03-30): the structured known-host hint now reports ordered distinct host/DC candidates from prior tool inputs instead of keeping a second narrative distinct-coverage rule in the prompt, with a focused regression for duplicate filtering and old-phrase removal.
  • No-text warning known-host checkpoint (2026-03-30): host no-text fallback warnings now include the same structured known-host/DC summary when prior thread targets exist, keeping recovery warnings aligned with the contract/locality hint surface instead of dropping concrete target context.
  • Canonical startup phase-duration contract checkpoint (2026-03-31): startup bootstrap telemetry now includes pre-resolved canonical descriptor/activation/finalize durations from C#, and the shell summary reads those canonical fields instead of maintaining its own legacy alias fallback mapping.
  • Persisted preview fingerprint alignment checkpoint (2026-03-31): tooling bootstrap cache now stores the deferred descriptor-preview fingerprint computed from bootstrap options, so persisted preview validation no longer rejects valid fast-path snapshots just because the live snapshot carried a fuller tool-definition surface.
  • Split tooling lifecycle into explicit descriptor_discovery and pack_activation phases with telemetry, tests, and release budgets for both.
  • Move pack enablement filtering ahead of activation work so disabled packs stop paying constructor/reflection cost on the default startup path.
  • Keep workspace/project-output assembly probing available only for explicit dev/bootstrap-repair modes, not on the default warm path.
  • Add activation-on-demand for descriptor-known packs/tools selected after preview startup.
  • Remove English cue-word routing from runtime/capability classification and replace it with structure-first + contract-backed behavior.

Rules For This Migration

  • Keep each PR focused to one objective and one rollback boundary.
  • No net-new Chat hardcoded fallback logic during migration.
  • Prefer additive contracts first, then Chat rewiring, then deletion.
  • Every deletion PR must include equivalent or stricter tests.

PR Sequence

PR 0 - Baseline Governance (Docs + Tracker)

Files:

  • PLAN.md
  • PLAN-EXECUTION-ORDER.md
  • TODO.md
  • InternalDocs/architecture/adr-0001-chat-tools-contract-boundary.md

Checklist:

  • Add ADR for contract-first boundary.
  • Add migration tracker entries to TODO.md.
  • Confirm maintainers accept no-legacy direction (drop Chat fallback engine target).

PR 1 - Architecture Guardrails (Prevent Re-growth)

Files:

  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/ChatFallbackArchitectureGuardrailTests.cs

Checklist:

  • Add test to freeze current TryBuildCross*Fallback* method set (only allowed to shrink).
  • Add test to block PackIdMatches(...) usage outside legacy fallback file.
  • Add test to freeze hardcoded pack-id set in legacy fallback file.

PR 2 - Contract Surface Expansion (Tools Core)

Files (expected):

  • IntelligenceX/Tools/ToolRoutingContract.cs
  • IntelligenceX/Tools/ToolDefinition.cs
  • IntelligenceX/Tools/ToolRegistry.cs
  • New contract files in IntelligenceX/Tools and IntelligenceX.Tools.Common

Checklist:

  • Add role/setup/handoff/recovery contract types.
  • Wire validation in ToolRegistry.
  • Keep backward-compatible defaults temporarily only if needed for staged migration.

Dependency: PR 1

PR 3 - Contract Diagnostics + UI Transparency

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolRoutingCatalogDiagnostics.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.PolicyAndTypes.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/Ui/Shell.10.core.js

Checklist:

  • Extend diagnostics with new contract health fields.
  • Expose diagnostics in session policy payload.
  • Add runtime policy toggle (--require-explicit-routing-metadata) with profile persistence and session policy exposure.
  • Render new health signals in UI policy panel.

Dependency: PR 2

PR 4 - Orchestration Catalog In Chat

Files (expected):

  • New catalog builder in IntelligenceX.Chat.Tooling or IntelligenceX.Chat.Service
  • ChatServiceSession.ProfilesAndModels.cs
  • ChatServiceSession.cs

Checklist:

  • Build runtime orchestration catalog from registry definitions/contracts.
  • Replace direct suffix/prefix pack derivation consumers with catalog lookups where possible.
  • Keep behavior equivalent before deleting legacy code.

Dependency: PR 2

PR 5 - Preflight Contractization

Files:

  • ChatServiceSession.PackPreflight.cs
  • Related tests in IntelligenceX.Chat.Tests

Checklist:

  • Replace _pack_info / _environment_discover suffix selection with role metadata selection.
  • Keep existing semantics for required-arg checks and remembered successful preflight calls.

Dependency: PR 4

PR 6 - Routing Heuristic Contractization

Files:

  • ChatServiceSession.ChatRouting.RoutingScoring.cs
  • ChatServiceSession.ToolRouting.Secondary.cs
  • ChatServiceSession.ToolRouting.DomainIntentSignals.cs
  • Related tests

Checklist:

  • Replace name-based family keys with contract pack/role/family metadata.
  • Keep Unicode/language-neutral behavior and existing pending-action ordinal support.

Dependency: PR 4

PR 7 - Pack Migration Wave 1 (Highest Ambiguity Packs)

Files:

  • IntelligenceX.Tools.ADPlayground/*
  • IntelligenceX.Tools.DomainDetective/*
  • IntelligenceX.Tools.DnsClientX/*
  • IntelligenceX.Tools.Tests/*

Checklist:

  • Add explicit role/handoff/setup/recovery contracts.
  • Add tests proving AD vs public-domain separation unless explicit handoff contract exists.

Dependency: PR 2

PR 8 - Remove Chat Fallback Engine

Files:

  • ChatServiceSession.PackCapabilityFallback.cs
  • ChatServiceSession.HostHints.cs
  • ChatServiceSession.ChatRouting.NoExtractedFinalize.cs
  • ChatServiceSession.cs
  • ChatServiceSession.ProfilesAndModels.cs
  • Heavy test cleanup in ChatServiceRoutingTrimTests.ToolNudge.*PackFallback*.cs

Checklist:

  • Remove fallback replay path and fallback contract cache.
  • Remove cross-pack builder methods and helper methods.
  • Delete or rewrite fallback-specific tests.
  • Keep model retry/review loops intact.

Dependency: PR 5, PR 6, PR 7

PR 9 - Pack Migration Wave 2 (Remaining Packs)

Files:

  • IntelligenceX.Tools.System/*
  • IntelligenceX.Tools.EventLog/*
  • IntelligenceX.Tools.TestimoX/*
  • IntelligenceX.Tools.FileSystem/*
  • IntelligenceX.Tools.Email/*
  • IntelligenceX.Tools.PowerShell/*
  • IntelligenceX.Tools.OfficeIMO/*

Checklist:

  • Complete explicit contracts for all packs/tools.
  • Ensure tool wrappers remain thin and engine-first.

Dependency: PR 2

PR 10 - Typed Tool Surface Enforcement

Files:

  • IntelligenceX.Tools.Common/* (typed adapters/helpers)
  • Analyzer/guardrail tests in IntelligenceX.Tools.Tests or shared analyzer project

Checklist:

  • Prefer typed binders for all migrated/refactored tools.
  • Add guardrail to flag ad-hoc direct argument parsing in target packs.
  • Standardize on ToolResultV2 for migrated paths.
  • Wave-2 typed migration batch completed for: pack/discovery tools, SystemDevicesSummary, SystemHardwareIdentity, SystemHardwareSummary, SystemInfo, SystemBiosSummary, SystemSecurityOptions, SystemBootConfiguration, SystemRdpPosture, SystemSmbPosture, SystemFeaturesList, SystemUpdatesInstalled, SystemPatchDetails, SystemDisksList, SystemLogicalDisksList, SystemPortsList, SystemProcessList, SystemFirewallProfiles, SystemFirewallRules, SystemServiceList, SystemScheduledTasksList, SystemTimeSync, SystemWhoAmI, WslStatus, SystemBitlockerStatus, SystemInstalledApplications, SystemNetworkAdapters, SystemPatchCompliance, FsList/FsRead/FsSearch, EventLogNamedEventsCatalog, EventLogNamedEventsQuery, EventLogLiveQuery, EventLogTopEvents, EventLogLiveStats, EventLogEvtxFind, EventLogEvtxQuery, EventLogEvtxStats, EventLogEvtxSecuritySummary, EventLogChannelList, EventLogProviderList, TestimoXRulesList, TestimoXRulesRun, PowerShellRun, EmailImapSearch, EmailImapGet, OfficeImoRead, plus Wave-1/AD carryover typed migrations: DomainDetectivePackInfo, DomainDetectiveChecksCatalog, DomainDetectiveNetworkProbe, DomainDetectiveDomainSummary, DnsClientXPackInfo, DnsClientXQuery, DnsClientXPing, AdPackInfo, AdMonitoringProbeCatalog, AdRecycleBinLifetime, AdGroupMembers, AdGroupMembersResolved, AdGroupsList, AdAdminCountReport, AdGpoChanges, AdGpoList, AdGpoInventoryHealth, AdGpoHealth, AdGpoPermissionReport, AdGpoPermissionConsistency, AdLdapQuery, AdLdapQueryPaged, AdObjectGet, AdObjectResolve, AdSearch, AdSpnSearch, AdDomainAdminsSummary, AdPrivilegedGroupsSummary, AdStaleAccounts, AdUsersExpired, AdWhoAmI, AdDomainInfo, AdLdapDiagnostics, AdSearchFacets, AdReplicationSummary, AdReplicationConnections, ReviewerSetupPackInfo, and ReviewerSetupContractVerify.

Dependency: PR 7, PR 9

PR 11 - Final Cleanup + Hardening

Files:

  • Remaining stale constants/docs/tests across Chat/Tools.

Checklist:

  • Remove obsolete telemetry markers/constants tied to deleted fallback engine.
  • Close migration tracker entries.
  • Validate final DoD from PLAN.md.

Dependency: PR 8, PR 10

PR 12 - Startup Perf And Decoupling Reality-Close

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceServer.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow*.cs
  • PLAN.md
  • PLAN-EXECUTION-ORDER.md

Checklist:

  • Move tooling bootstrap out of per-connection session constructor into reusable runtime cache/lifecycle.
  • Keep startup status truthful: do not surface "ready" semantics until metadata/auth probes settle or explicitly fail-open with reason.
  • Replace hardcoded known-pack bootstrap chain with descriptor/manifest-driven registration.
  • Remove/rename fallback-era host-hint file so architecture guardrails match current source layout.
  • Add regression tests for reconnect warm path and multi-turn follow-up carryover against host scope changes.

PR 13 - Follow-Up Execution Reliability + Startup Churn Visibility

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ToolRouting.DomainIntentAffinity.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ChatRouting.NoExtractedFinalize.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow.Messaging.Connection.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow.StartupReadiness.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/ChatServiceRoutingTrimTests.*

Checklist:

  • Escape continuation subset reuse for language-neutral tool-capability question turns (even without explicit tool-id literals) to restore cross-pack follow-up awareness.
  • Extend startup status/debug timeline with churn cause labels (auth_wait, pipe_retry, metadata_retry, runtime_disconnect) so reconnect loops are diagnosable from UI alone.
  • Add finalize-path regression that proves contextual follow-up scope shifts are evaluated from raw user intent and cannot replay stale single-host next actions.
  • Validate with targeted chat tests + catalog validation before PR open.

PR 14 - Descriptor Snapshot And Preview-Startup Contract

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceToolingBootstrapCache.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Abstractions/*
  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/*

Checklist:

  • Introduce persisted descriptor/capability snapshot contract that does not require full pack activation to answer startup-safe metadata requests.
  • Make hello, list_tools, and app first-paint able to use descriptor-preview state without pretending full activation has completed.
  • Split startup telemetry/status into descriptor_discovery, descriptor_cache_hit, pack_activation, and registry_activation_finalize.
  • Add zero-pack, plugin-only, and persisted-preview startup tests.
  • Add versioned snapshot compatibility tests and fail-open diagnostics for stale preview payloads.

PR 15 - Warm-Path Bootstrap Cost Reduction

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession.ProfilesAndModels.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/ToolPackBootstrap.RegistryAndReflection.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tooling/PluginFolderToolPackLoader*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/*

Checklist:

  • Refactor cache-key validation so persisted preview reuse does not require full discovery fingerprint recomputation on the warm path.
  • Filter disabled packs before constructor/reflection-heavy activation work for built-in and plugin packs.
  • Make workspace bin probing and similar repair-oriented scans opt-in for dev/bootstrap recovery only.
  • Add startup profiler assertions proving pack disablement reduces warm-path activation cost.
  • Add regression coverage for plugin roots/manifests changing independently from activation cache contents.

PR 16 - Lazy Activation And Activation-Aware Status Flow

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceSession*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Service/ChatServiceServer.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/MainWindow*.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/*

Checklist:

  • Add activation-on-demand for descriptor-known packs/tools when a turn first selects a capability area that is not yet activated.
  • Emit truthful status transitions while activation is pending so runtime does not claim full readiness early.
  • Preserve deterministic routing/retry behavior across descriptor-only and activated states.
  • Add first-activation latency and activation-failure regression tests.

PR 17 - Language-Neutral Capability And Runtime Classification Cleanup

Files (expected):

  • IntelligenceX.Chat/IntelligenceX.Chat.Abstractions/RuntimeSelfReportTurnClassifier.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App/ConversationTurnShapeClassifier.cs
  • IntelligenceX.Chat/IntelligenceX.Chat.App.Tests/*
  • IntelligenceX.Chat/IntelligenceX.Chat.Tests/*

Checklist:

  • Remove English runtime cue-word control flow from app/shared classifiers while preserving natural English prompt text/examples.
  • Keep routing/safety/execution decisions structure-first and contract-backed instead of literal phrase driven.
  • Add multilingual runtime-introspection/capability tests that do not rely on borrowed English nouns.
  • Add negative tests proving literal confirmations like go ahead do not become privileged routing signals without structured pending-action state.

Parallelization Map

  • Track A can run PR 2, PR 7, PR 9, PR 10.
  • Track B can run PR 3, PR 4, PR 5, PR 6.
  • Track C can continuously update tests/docs in PR 1 and PR 11.
  • Track D can run PR 14, PR 15, and PR 16 once current bootstrap contracts stay stable.
  • Track E can run PR 17 in parallel with PR 14/15 because it is mostly classifier/test surface work.
  • Critical merge gate before lazy activation rollout: PR 14 + PR 15 must be merged.
  • Critical merge gate before full release closeout: PR 16 + PR 17 must be merged alongside final DoD validation.

Recommended Branch Names

  • chore/chat-tools-contract-adr
  • test/chat-fallback-guardrails
  • feat/tool-contract-role-setup-handoff-recovery
  • feat/chat-orchestration-catalog
  • refactor/chat-preflight-contract-role
  • refactor/chat-routing-contract-taxonomy
  • feat/tools-wave1-ad-dd-dnsx-contracts
  • refactor/chat-remove-pack-fallback-engine
  • feat/tools-wave2-pack-contracts
  • refactor/tools-typed-surface-enforcement
  • chore/chat-tools-decoupling-cleanup
  • feat/chat-descriptor-preview-startup
  • perf/chat-warm-path-bootstrap-cost
  • feat/chat-lazy-pack-activation
  • refactor/chat-language-neutral-classifiers

Release Safety Checkpoints

  1. Checkpoint A (after PR 4): catalog live, no behavior deletion yet.
  2. Checkpoint B (after PR 6 + PR 7): contract-driven selection/preflight proven in tests.
  3. Checkpoint C (after PR 8): Chat fallback engine removed.
  4. Checkpoint D (after PR 14): descriptor-preview startup is live and truthful.
  5. Checkpoint E (after PR 15): warm-path bootstrap budgets improve and disabled packs reduce startup cost.
  6. Checkpoint F (after PR 16 + PR 17): lazy activation plus language-neutral classifier cleanup are green.
  7. Checkpoint G (after final cleanup): all DoD checks complete.