Skip to content

Commit dab1396

Browse files
jservclaude
andcommitted
Fix SMP=4 boot hang regression from PR #110
Root cause: After PR #110 (coro-uart), the adaptive timer/UART fd registration logic would exclude timer fd monitoring when all harts entered WFI state during early boot. This created a deadlock: harts waited for timer interrupts, but the timer fd wasn't being polled, preventing wakeup. Symptom: - SMP=4 hung after "smp: Brought up 1 node, 4 CPUs" (49 lines of output) - Never reached "clocksource: Switched to clocksource" or login prompt - SMP=1 continued to work correctly Fix: Introduce boot completion heuristic using peripheral_update_ctr. Consider boot "incomplete" for the first 5000 scheduler iterations after all harts start. During this period, always keep timer and UART fds active to ensure harts can receive timer interrupts even when temporarily in WFI. Verification: - SMP=1: Boots successfully to login prompt ✓ - SMP=4: Now completes boot to "Run /init" and login ✓ - Pre-fix SMP=4: Hung at line 49 ✗ - Pre-regression (4552c62): Worked correctly ✓ The fix preserves PR #110's CPU optimization benefits (0.3% idle usage) while ensuring multi-core boot reliability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent a56f5bd commit dab1396

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

main.c

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1203,10 +1203,20 @@ static int semu_run(emu_state_t *emu)
12031203
* For single-hart configurations (n_hart == 1), disable
12041204
* optimization entirely to avoid boot issues, as the first hart
12051205
* starts immediately.
1206+
*
1207+
* CRITICAL FIX: During early boot, always keep timer active even after
1208+
* all harts started, as the kernel may briefly put all harts in WFI
1209+
* while waiting for timer interrupts. Use a boot completion heuristic:
1210+
* consider boot "complete" only after sufficient execution (e.g., 5000
1211+
* scheduler iterations after SMP init).
12061212
*/
12071213
bool all_harts_started = (started_harts >= vm->n_hart);
1208-
bool harts_active =
1209-
(vm->n_hart == 1) || !all_harts_started || (idle_harts == 0);
1214+
const uint64_t BOOT_SETTLE_ITERATIONS = 5000;
1215+
bool boot_complete =
1216+
all_harts_started &&
1217+
(emu->peripheral_update_ctr > BOOT_SETTLE_ITERATIONS);
1218+
bool harts_active = (vm->n_hart == 1) || !boot_complete ||
1219+
(idle_harts == 0);
12101220
#ifdef __APPLE__
12111221
/* macOS: use kqueue with EVFILT_TIMER */
12121222
if (kq >= 0 && pfd_count < poll_capacity && harts_active) {
@@ -1227,7 +1237,7 @@ static int semu_run(emu_state_t *emu)
12271237
/* Add UART input fd (stdin for keyboard input).
12281238
* Only add UART when:
12291239
* 1. Single-hart configuration (n_hart == 1), OR
1230-
* 2. Not all harts started (!all_harts_started), OR
1240+
* 2. Boot not complete (!boot_complete), OR
12311241
* 3. All harts are active (idle_harts == 0), OR
12321242
* 4. A hart is actively waiting for UART input
12331243
*
@@ -1236,7 +1246,7 @@ static int semu_run(emu_state_t *emu)
12361246
* input (Ctrl+A x) may be delayed by up to poll_timeout (10ms)
12371247
* when harts are idle, which is acceptable for an emulator.
12381248
*/
1239-
bool need_uart = (vm->n_hart == 1) || !all_harts_started ||
1249+
bool need_uart = (vm->n_hart == 1) || !boot_complete ||
12401250
(idle_harts == 0) || emu->uart.has_waiting_hart;
12411251
if (emu->uart.in_fd >= 0 && pfd_count < poll_capacity &&
12421252
need_uart) {

0 commit comments

Comments
 (0)