Skip to content

py/vm: Add branch-point VM switching for dual-VM settrace.#23

Draft
andrewleech wants to merge 3 commits intoreview/py-settrace-dual-vmfrom
py-settrace-dual-vm
Draft

py/vm: Add branch-point VM switching for dual-VM settrace.#23
andrewleech wants to merge 3 commits intoreview/py-settrace-dual-vmfrom
py-settrace-dual-vm

Conversation

@andrewleech
Copy link
Owner

@andrewleech andrewleech commented Mar 6, 2026

Summary

With the dual-VM settrace in micropython#18571, calling sys.settrace() mid-function only takes effect at the next function call boundary. This means a long-running loop that doesn't call other functions won't start tracing until the loop exits. This PR makes settrace take effect at the next branch instruction (loop iteration or conditional jump) within the currently-executing function.

Built on top of micropython#18571 as a suggestion - I may have misunderstood the exact limitation you described in our conversation, but this seemed worth exploring.

The standard VM has zero additional overhead from this change. settrace() bumps sched_state to MP_SCHED_PENDING, and the existing slow-path check (if (sched_state == PENDING || ...)) picks it up at the next branch instruction. No new reads on the fast path. The tracing VM checks prof_trace_callback directly on its hot path, which is acceptable since tracing already has per-instruction overhead.

A vm_switch_pending flag in mp_state_vm_t prevents nested function calls between settrace() and the caller's next branch point from consuming the sched_state signal via mp_sched_unlock. Without this, a function call made before the branch point clears PENDING and the switch never happens.

The dispatcher in vm_outer.c also checks prof_callback_is_executing before selecting the standard VM - this fixes an assert that fires when settrace(None) is called mid-function while the trace callback itself is executing.

The webassembly/api.js commit is a separate fix for CI breakage in the base branch. Commit f6f572b ("webassembly/api: Fix CLI") changed the stdin guard from process.stdin.isTTY === false to !process.stdin.isTTY, which made it also match undefined (non-TTY CI environments). This causes runCLI() to block on fs.readFileSync(0) even when file arguments are provided. The fix adds && repl so stdin is only read when no file args were given and the REPL would otherwise start.

For threaded builds with the GIL, the calling thread holds the GIL from settrace() through to the pending exception check, so it always processes its own signal. On non-GIL threaded builds there's a theoretical race where another thread could consume sched_state first, degrading to function-boundary switching.

Size cost on unix/coverage x86-64: +816 bytes .text, +128 bytes .bss.

Testing

8 test cases in tests/misc/sys_settrace_midfunction.py covering: enable/disable mid-loop, nested calls after branch, toggle on/off/on, exception handling, return value integrity, generator mid-iteration, and self-disabling trace callback. Tested on unix/coverage only.

Note: the existing sys_settrace_features.py, sys_settrace_generator.py, sys_settrace_loop.py and sys_settrace_cov.py tests fail on the base dual-VM branch (before these commits). The cause is <listcomp> frame tracking - the dual-VM's standard VM copy maintains the current_code_state frame chain, so listcomps appear as separate call/return frames in trace output. This is actually correct for MicroPython's execution model (listcomps are separate bytecode functions) and matches CPython 3.11 behavior, but diverges from CPython 3.12+ which inlined listcomps (PEP 709). The tests have no .exp files and are compared against the host CPython, which is 3.12 in CI.

Trade-offs and Alternatives

The main trade-off is ~816 bytes of code size for a feature that may not be needed if function-boundary switching is sufficient. Alternatives considered during development:

  • Direct prof_trace_callback read on the hot path - adds overhead for all bytecode execution, not just settrace users.
  • Computed goto table swap - not portable to WebAssembly (no computed goto).
  • mp_sched_schedule - disproportionate overhead for a simple flag check.
  • Keeping sched_state PENDING unconditionally - starves GIL bounce on threaded builds; evolved into the vm_switch_pending approach.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.

@andrewleech
Copy link
Owner Author

The webassembly CI was failing because commit f6f572b ("webassembly/api: Fix CLI.") changed the stdin check from process.stdin.isTTY === false to !process.stdin.isTTY. The original condition was actually dead code — isTTY is true on a TTY or undefined on a pipe, never literally false. The fix correctly handles the undefined case, but it means the stdin read now fires whenever the binary is invoked via subprocess (pipe), and the = assignment overwrites any file contents already loaded from argv.

When the test runner does subprocess.check_output(["node", "micropython.mjs", "script.py"]), stdin is a pipe with no data, so fs.readFileSync(0) returns "", clobbering the script. Platform detection in run-tests.py gets empty output and fails with ValueError: cannot detect platform.

Fixed by adding && repl to the condition. The repl flag is set false during the argv loop when a file argument is consumed, so stdin is only read as source code when no files were passed on the command line.

@andrewleech
Copy link
Owner Author

Current-frame tracing: CPython compatibility and enhancement options

With dual-VM, sys.settrace(callback) doesn't produce trace events for the currently executing function or anything already on the call stack — tracing only starts at the next function call boundary when the dispatcher selects the tracing VM. This matches CPython's behaviour exactly. CPython's docs state explicitly that settrace() doesn't activate tracing on the current frame; bdb.set_trace() works around this by walking the frame stack and setting frame.f_trace on every frame manually.

If we want to go beyond CPython's default in the future, there are a few options:

Option A — Auto-propagate frames on settrace: Walk the prev_state chain in mp_prof_settrace() and fire synthetic "call" events for all existing frames. Sets up valid frame objects for introspection but doesn't give line events for functions still in the standard VM. Small change in py/profile.c.

Option B — VM switch at branch points: Piggyback on the standard VM's pending_exception_check (runs at backward branches, conditionals, for-iter). When prof_trace_callback is non-NULL, save state and return a new MP_VM_RETURN_SWITCH_VM to the dispatcher, which re-enters the tracing VM. Gives full line-level tracing of existing frames from the next branch point. Zero per-opcode overhead. Would need care to suppress the spurious "call" event when re-entering a mid-execution function. Could also handle the reverse (tracing→standard on settrace(None)).

Option C — Expose frame.f_trace as settable: Let Python code do frame.f_trace = callback, matching CPython's escape hatch. Debuggers would walk the frame stack themselves, identical to bdb.set_trace().

Option D — No changes (current recommendation): Current behaviour is CPython-compatible. Revisit if a concrete debugger use case requires it.

These are documented in docs/settrace-dual-vm-spec.md.

pi-anl added 3 commits March 12, 2026 14:03
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
When sys.settrace(callback) is called while a function is already
running in the standard VM, switch to the tracing VM at the next branch
instruction rather than waiting for the next function call.

Adds MP_VM_RETURN_SWITCH_VM to the return kind enum.  The dispatcher in
vm_outer.c loops on this value to re-select the VM.  The switch triggers
at pending_exception_check in vm.c (after jumps, conditionals, for-iter).

The check placement is asymmetric for zero hot-path overhead in the
standard VM: settrace() bumps sched_state to MP_SCHED_PENDING, which
triggers the existing slow-path block where the prof_trace_callback check
lives.  The tracing VM has the reverse check on its hot path, which is
acceptable since tracing already has per-instruction overhead.

A vm_switch_pending flag in mp_state_vm_t keeps sched_state at PENDING
(via mp_sched_unlock) until the standard VM actually switches, preventing
nested function calls from consuming the signal.

The dispatcher also checks prof_callback_is_executing to avoid selecting
the standard VM while a trace callback is executing, which would hit an
assert in FRAME_ENTER.

With GIL threading, the calling thread holds the GIL from settrace()
through to pending_exception_check, so it always processes its own signal.
On non-GIL threaded ports, another thread may consume the sched_state
signal first, degrading to function-boundary switching.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Tests mid-function VM switching: enable/disable mid-loop, nested calls
after branch, toggle on/off/on, exception handling, return value
integrity, generator mid-iteration switch, and self-disabling trace
callback.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@andrewleech andrewleech force-pushed the py-settrace-dual-vm branch from c0861e5 to db849c1 Compare March 17, 2026 01:30
@andrewleech andrewleech changed the title py: Add a new dual-VM option to improve performance of sys.settrace py/vm: Add branch-point VM switching for dual-VM settrace. Mar 17, 2026
@andrewleech andrewleech force-pushed the review/py-settrace-dual-vm branch from c3ca843 to bf7cab6 Compare March 17, 2026 01:41
@github-actions
Copy link

Code size report:

Reference:  rp2/modules/rp2.py: Don't corrupt globals on asm_pio() exception. [c3ca843]
Comparison: tests: Add sys_settrace_midfunction test. [merge of db849c1]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants