-
Notifications
You must be signed in to change notification settings - Fork 113
Description
I’ve been exploring rv32emu's interpreter mode and found that while the RVOP macro is designed to support MUST_TAIL tail calls via:
MUST_TAIL return next->impl(rv, next, cycle, PC);
In practice, the compiler often emits full stack frames and callee-saved register spills, which breaks the intended tail call optimization (TCO). This happens even though the control flow would otherwise be eligible for a proper tail jump.
For example, in a simple handler like do_addi, the compiler emits a proper tail call via jmp *%rax, and avoids saving any callee-saved registers:
af38: ff e0 jmp *%rax # tail call to next->impl
However, in do_lw, due to a non-tail call (e.g., memory access through a function pointer), the compiler emits callee-saved register preservation and a full frame:
aaa0: 55 push %rbp
aaa1: 41 57 push %r15
...
ab75: c3 ret
This overhead defeats the benefits of using MUST_TAIL, especially in a hot loop of instruction dispatch.
##Suggestion
Clang provides the attribute((preserve_none)) attribute (available since 19.1.0) which modifies the calling convention to avoid saving any registers. It’s suitable when the function is only ever tail-called and doesn’t return via ret.
This attribute allows better register usage and completely avoids unnecessary stack setup for tail calls — perfect for interpreter dispatch functions.
References