Skip to content

Commit 666dc78

Browse files
authored
Update docs to reflect kernel restart bug (#34) resolution (#41)
- Update AGENTS.md with critical materializer determinism requirements - Add detailed section about ctx.query() being forbidden in materializers - Document specific commits (6e0fb4f, a1bf20d) that fixed the issue - Provide code examples showing wrong vs correct materializer patterns - Update HANDOFF.md to remove bug from critical issues, add to recent fixes - Update ROADMAP.md to mark kernel session reliability as completed - Upgrade auto kernel management priority due to more solid foundation This prevents future developers from accidentally reintroducing side effects in materializers that would cause LiveStore hash mismatches and kernel failures.
1 parent a1bf20d commit 666dc78

File tree

3 files changed

+80
-7
lines changed

3 files changed

+80
-7
lines changed

AGENTS.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,67 @@ pnpm cache:clear # Clear package cache
112112
- **Single source of truth**: No compiled artifacts needed - TypeScript handles type checking from source
113113
- **No timestamp fields** - LiveStore handles timing automatically
114114

115+
### ⚠️ CRITICAL: Materializer Determinism Requirements
116+
117+
**NEVER use `ctx.query()` in materializers** - This was the root cause of kernel restart bug #34.
118+
119+
LiveStore requires all materializers to be **pure functions without side effects**. Any data needed by a materializer must be passed via the event payload, not looked up during materialization.
120+
121+
**What caused the bug:**
122+
```typescript
123+
// ❌ WRONG - This causes LiveStore.UnexpectedError materializer hash mismatch
124+
"v1.ExecutionCompleted": ({ queueId }, ctx) => {
125+
const queueEntries = ctx.query(
126+
tables.executionQueue.select().where({ id: queueId }).limit(1)
127+
);
128+
// ... rest of materializer
129+
}
130+
```
131+
132+
**Correct approach:**
133+
```typescript
134+
// ✅ CORRECT - All needed data in event payload
135+
"v1.ExecutionCompleted": ({ queueId, cellId, status }) => [
136+
tables.executionQueue.update({
137+
status: status === "success" ? "completed" : "failed"
138+
}).where({ id: queueId }),
139+
tables.cells.update({
140+
executionState: status === "success" ? "completed" : "error"
141+
}).where({ id: cellId }),
142+
]
143+
```
144+
145+
**Fixed commits for reference:**
146+
- `6e0fb4f`: Fixed ExecutionCompleted/ExecutionCancelled materializers
147+
- `a1bf20d`: Fixed ExecutionStarted materializer
148+
149+
**Rule**: If you need data in a materializer, add it to the event schema and pass it when committing the event. Materializers must be deterministic and reproducible.
150+
151+
### Recent Critical Fixes (December 2024)
152+
153+
**Kernel Restart Bug (#34) - RESOLVED**
154+
155+
The project recently resolved a major stability issue where 3rd+ kernel sessions would fail to receive work assignments due to LiveStore materializer hash mismatches. This was caused by non-deterministic materializers using `ctx.query()` calls.
156+
157+
**What was broken:**
158+
- ExecutionCompleted, ExecutionCancelled, and ExecutionStarted materializers were using `ctx.query()`
159+
- This made them non-deterministic, causing LiveStore to shut down with "UnexpectedError materializer hash mismatch"
160+
- Kernel restarts would accumulate terminated sessions and eventually fail
161+
162+
**How it was fixed (commits 6e0fb4f and a1bf20d):**
163+
1. **Added cellId to event schemas**: ExecutionCompleted, ExecutionCancelled, ExecutionStarted now include `cellId` in payload
164+
2. **Removed all ctx.query() calls**: Materializers now receive all needed data via event payload
165+
3. **Updated all event commits**: All places that commit these events now pass `cellId` explicitly
166+
4. **Made materializers pure functions**: No side effects, deterministic output for same input
167+
168+
**Impact:** Kernel sessions are now reliable across multiple restarts, enabling future automated kernel management.
169+
170+
**For Future Development:**
171+
- Always check that new materializers are pure functions
172+
- Never use `ctx.query()` in materializers - pass data via event payload
173+
- Reference these commits when adding new execution-related events
174+
- Test kernel restart scenarios when modifying execution flow
175+
115176
### Local-First Architecture
116177
- All data operations happen locally first
117178
- Events synced across clients via document worker
@@ -215,6 +276,8 @@ pnpm cache:warm-up # Pre-loads numpy, pandas, matplotlib, requests, etc.
215276

216277
**Do NOT use manual timestamps in code or events.** LiveStore automatically handles all timing through its event sourcing system. Focus development on features and architecture rather than timestamp management.
217278

279+
**⚠️ CRITICAL: Do NOT use `ctx.query()` in materializers.** This causes LiveStore materializer hash mismatches and kernel restart failures (see bug #34 - RESOLVED in commits 6e0fb4f and a1bf20d). All materializers must be pure functions with all needed data passed via event payload.
280+
218281
**Testing is Critical**: Many claims about functionality need verification through proper integration tests. Core features exist but integration testing is minimal.
219282

220283
**AI Tool Calling**: The next major enhancement is enabling AI to actively participate in notebook development through function calling - creating cells, modifying content, and executing code.

HANDOFF.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@
6060

6161
## Current Work State
6262

63-
**Status**: 🚧 **Working Prototype** - Core collaborative editing, Python execution, and basic AI integration functional, rich outputs need verification
63+
**Status**: 🚧 **Working Prototype** - Core collaborative editing, Python execution, and basic AI integration functional, rich outputs need verification. Major kernel restart bug (#34) resolved.
6464

6565
### What's Actually Working ✅
6666

@@ -72,6 +72,7 @@
7272

7373
#### Python Execution
7474
- **Basic Python execution** - Code cells run Python via Pyodide (manual kernel startup)
75+
- **Kernel session reliability** - Fixed materializer side effects causing restart failures (#34) ✅
7576
- **Error handling** - Python exceptions properly captured and displayed
7677
- **Text output** - print() statements and basic stdout/stderr capture
7778
- **Execution queue** - Proper job queuing and status tracking
@@ -144,7 +145,7 @@
144145
- Streaming responses for better UX
145146
- Multi-turn conversation support
146147

147-
### Priority 3: MCP Integration Foundation 🔮 LONG-TERM
148+
### Priority 4: MCP Integration Foundation 🔮 LONG-TERM
148149
**Status**: 🎯 **Architecture planning**
149150

150151
**Goal**: Connect to Model Context Protocol providers for extensible AI tooling
@@ -169,23 +170,26 @@
169170
**Estimated effort**: 1-2 months (after tool calling foundation)
170171
**Impact**: Enables unlimited AI tool extensibility through Python ecosystem
171172

172-
### Priority 4: Auto Kernel Management (High Impact)
173+
### Priority 3: Auto Kernel Management (High Impact) 🚀 UPGRADED
173174
**Current friction**: Manual `NOTEBOOK_ID=xyz pnpm dev:kernel` per notebook
174175

175176
**Goal**: One-click notebook startup with automatic kernel lifecycle
176177

178+
**Foundation now solid**: With kernel restart bug (#34) fixed, automated management is much more viable
179+
177180
**Next actions**:
178181
- Modify `pnpm dev` to auto-spawn kernels per notebook
179182
- Add kernel health monitoring and restart capability
180183
- Better error messages when kernels fail or disconnect
184+
- Leverage fixed kernel session reliability for robust auto-restart
181185

182186
**Files to modify**:
183187
- Root `package.json` - Update dev script
184188
- `packages/web-client/src/components/notebook/NotebookViewer.tsx` - Status display
185189
- Add kernel process management utilities
186190

187-
**Estimated effort**: 2-3 hours
188-
**Impact**: Removes major user friction
191+
**Estimated effort**: 2-3 hours (reduced due to fixed foundation)
192+
**Impact**: Removes major user friction + leverages recent reliability improvements
189193

190194
### Priority 5: Rich Output Verification (Medium)
191195
**Current state**: Code exists but integration unclear
@@ -262,7 +266,11 @@ pnpm test # Full test suite (27 passing, 13 skipped)
262266
- **Performance claims unverified**: Need integration tests to validate speed/output claims
263267

264268
### Known Critical Issues
265-
- **🐛 Kernel Restart Bug**: 3rd+ kernel sessions fail to receive work assignments due to LiveStore web client shutdowns when multiple terminated sessions accumulate (see https://github.com/rgbkrk/anode/issues/34 and branch `annoying-multiples-bug`)
269+
- **None currently identified** - Major kernel restart bug (#34) resolved by cleaning up side effects in materializers
270+
271+
### Recent Fixes ✅
272+
- **Kernel restart bug (#34)** - Fixed materializer side effects that caused 3rd+ kernel sessions to fail work assignments
273+
- **LiveStore web client shutdowns** - Resolved accumulation of terminated sessions causing kernel communication failures
266274

267275
### Schema & Architecture Notes
268276
- All packages use direct TypeScript imports: `../../../shared/schema.js`

ROADMAP.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@
22

33
**Vision**: A real-time collaborative notebook system enabling seamless AI ↔ Python ↔ User interactions through local-first architecture.
44

5-
**Current Status**: Core prototype with collaborative editing, basic Python execution, and AI integration with context awareness working. AI tool calling and rich outputs in active development.
5+
**Current Status**: Core prototype with collaborative editing, basic Python execution, and AI integration with context awareness working. Major kernel restart bug (#34) resolved by fixing materializer side effects. AI tool calling and rich outputs in active development.
66

77
## Foundation Complete ✅
88

99
### Core Architecture
1010
- **LiveStore event-sourcing** - Real-time collaborative state management
1111
- **Direct TypeScript schema** - No build complexity, full type safety
1212
- **Reactive execution queue** - Kernel work detection without polling
13+
- **Reliable kernel sessions** - Fixed materializer side effects, stable multi-session operation
1314
- **Cell management** - Create, edit, move, delete with proper state sync
1415
- **Basic Python execution** - Code cells run via Pyodide (manual kernel startup)
1516

@@ -54,6 +55,7 @@
5455
### Automated Kernel Management
5556
**Goal**: Remove manual `NOTEBOOK_ID=xyz pnpm dev:kernel` friction
5657

58+
- [x] **Kernel session reliability** - Fixed materializer side effects causing restart failures (#34)
5759
- [ ] **Auto-spawning kernels** - One-click notebook startup
5860
- [ ] **Kernel health monitoring** - Detect failures and restart
5961
- [ ] **Better status UI** - Clear feedback on kernel state

0 commit comments

Comments
 (0)