-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial strawperson proposal for debugging modules #6
base: main
Are you sure you want to change the base?
Conversation
* **`SourceDebugger`:** The `SourceDebugger` interface provides source-level | ||
debugging APIs for inspecting the debuggee Wasm module. It is implemented by | ||
the debugging module, translates between source-level information and | ||
Wasm-level information, and wraps a `WasmDebugger` instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term WasmDebugger
in the prior bullet refers to an interface. Should this be Debugging module
as defined in the 2nd bullet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suppose it is wrapping both the WasmDebugger
(since it is translating methods from source-level into Wasm-level method calls on its wrapped WasmDebugger
) and internal tables/routines of the debugging module (which are used for that translation).
Do you have any suggestions on wording or clarifications we can make?
When a user asks the debugger to set a source-level breakpoint, the debugger | ||
should perform the following steps: | ||
|
||
* Get the `SourceDebugger` associated with the breakpoint's source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume this should be a collection of SourceDebugger
s if there is one per Debugee Module
. For example, I can imagine common headers with inline functions (e.g. <string>
), would result in the same source running in multiple modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! This should work out since even if we end up setting many breakpoints in the "same" source that end up in different debuggee modules, only one Wasm breakpoint is ever hit at a time.
I'll make an update in a little bit.
@@ -0,0 +1,224 @@ | |||
typedef unsigned long WasmBreakpointId; | |||
typedef unsigned long WasmCodeOffset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe unsigned long
is 32-bits in WebIDL https://www.w3.org/TR/WebIDL-1/#idl-unsigned-long. Is this specific to Wasm32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true, I wasn't thinking of a future wasm 64. Even with wasm 64, I'd be a bit surprised if a code section was larger than 232 - 1 bytes, but I suppose we shouldn't artificially limit ourselves in that situation.
It isn't clear to me whether we want to eagerly make this a 64 bit integer or not before wasm 64 is a thing, however.
A wire protocol requires defining the same set of operations we want to support | ||
that we define as interface methods in this proposal, and *also* a serialization | ||
format. Defining a serialization format that is both compact and | ||
future-extensible is no small task. Additionally, nothing about source-level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you could use an existing serialization format (Protobuf, FlatBuffers, Cap'n Proto..) that is already future extensible?
that is often a good architectural decision. Implementations are free to proxy | ||
this proposal's interface method calls across a protocol or to another | ||
process. It doesn't make sense to bake a specific wire protocol into the | ||
standard, when it can be left as an implementation detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems entirely a question of perspective. I could say, "why couple this tightly to a specific API that needs to be explicitly forwarded and implemented, when you could simply have a protocol message that is trivially forwardable, inspectable, loggable, forwards and backwards compatible, extensible and can be interfaced with in a wide range of languages with already available code generators?" :)
> with [WebAssembly Interface | ||
> Types](https://github.com/WebAssembly/interface-types/). However, since that | ||
> standard is still coming together, we are temporarily describing the | ||
> interfaces with Web IDL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be more clearly noted earlier in the document that this is not meant to be specific to an embedding that uses JS, as that is the impression I got from this and earlier documents until I saw this.
Out of interest... was there any "prior art" that went into this, or was researched for comparison? For example, VS Code based their debug adapters on, and documented, the https://microsoft.github.io/debug-adapter-protocol/overview, which is pretty well battle tested at this point as an interface/protocol for various languages to engines. There may be some hard learned lessons in there we can leverage. (Or just use it as a base, which would make integration with a few IDEs relatively easy). |
As mentioned in the subgroup meeting today, it is based on my experience with DWARF, source maps, and SpiderMonkey's debugging API. I agree that it is worth looking into how VS code solves cross-language support problems and taking inspiration from there.
We discussed this a little in the subgroup meeting today. Even if we use an existing serialization format, we would need some sort of calling convention describing how to get the serialized buffer into the debugging module's memory, as well as taking new serialized messages out of it. This would require understanding malloc vs pre-allocated space, ownership, etc... and it turns out this is exactly the type of thing that WebAssembly Interface Types is already solving for us. So why not fully leverage WebAssembly Interface Types directly?
I was imagining that when the embedder uses a debugging module to create a This is definitely something that we should clarify moving forward. |
Oh also! As far as representing debug info programmatically, Norman Ramsay's |
Both FlatBuffers and Cap'n Proto allow accessing serialized data in place, so beyond the initial buffer, there is no allocating/owning etc going on.
It will be a while before these can express data structures as rich as what these serializers can do, besides all the other advantages I mentioned. |
Once a Wasm module has access to the serialized data in its memory, yes it can deserialize it in place in a zero-copy fashion. But how does it get that initial access to the serialized data? That requires some sort of copy into some region of linear memory, and doing that requires understanding malloc/ownership/etc. |
Yes, and why is that a problem? Surely there's plenty of Wasm modules out there that exchange buffers with the outside world just fine? I can't imagine this would be a major deciding factor in deciding between using a protocol/serialized format or a set of API calls. |
Regarding the need to register a callback function that the debugger module should use to talk with the debuggee... I agree that we should clarify this moving forward, but this is quite important in my opinion because it really impacts the design of the architecture. The idea is that the debugging module (DM) would expose WebIDL interfaces (like SourceDebugger) to the debugger/devtools, and this is very clear. Then, the wasm engine would expose WebIDL interfaces (like WasmDebugger) to the DM, but I think we need to clarify how the DM will call this interface. It could be the embedder's responsibility to register somehow the WasmDebugger interface to the DM, but how? The wasm engine runs in a different process from the DM, and possibly even in a different machine. Would that mean that the embedder also needs to implement the same WasmDebugger interface, register it to the DM, and then forward every call remotely to the actual wasm engine, which also implements the same interface? The mechanism that the embedder/debugger will use to communicate with the wasm engine could vary considerably; if the debugger is in the browser DevTools, it already has its own channel to communicate with the script engine running in the debuggee webpage, but if the debugger is a standalone app like LLDB we could use to debug wasmtime, we would need to introduce a new mechanism to communicate with the wasm engine. This is why I proposed that we should at least define, as part of the DM interface, a function that the embedder would use to register a callback function that the DM will invoke to send messages to the debuggee engine, so abstracting the concrete communication channel. It would then be up to the embedder to actually send the messages to the debuggee using its own channels. For these reasons, I am totally fine with the SourceDebugger interfaces exposed by the DM and consumed by the debugger/embedder, but I am not so sure about the WasmDebugger interface that should be exposed by the engine. |
I pushed two more commits, notably 6d74a42 which expands on the "why not a protocol?" FAQ question with this additional text:
|
@paolosevMSFT, thanks for your patience while waiting for my response. Replies inline below.
The embedder is creating the
Yes, these are things that we want to ensure are possible for implementations to do, but also aren't required of all implementations.
Yes, if an implementation is running the debugging module inside the browser devtools frontend, it will need to proxy calls over the browser's internal remote debugging protocol, similar to how existing JS debugging calls are proxied from the devtools frontend to the JS engine's debugging APIs.
This proposal is explicitly not attempting to support the use case of "just connect LLDB to the Wasm engine's generated native code". See the FAQ item about AOT compilation. It is true that if
For any event that originates within the debugging module itself, yes, we would need a way for the debugging module to notify the embedder of the event (such as giving it a callback). But what sorts of events originate in the debugging module, as opposed to originating in the debuggee or being explicitly requested by the embedder?
The gdb remote protocol is not a specified standard in any meaningful sense. It is more of a documented implementation. Additionally, in the same way that DWARF won't work off the shelf with Wasm, and requires extensions to model the Wasm execution stack, locals, and globals, the gdb remote protocol would also need to be modified. I remain unconvinced that standardizing a protocol is the right choice, let alone any existing native debugger's protocol. |
@fitzgen
I am not sure what the
Also:
Maybe I am missing something here, but even if we ignore the case of remote debugging, I don't think there will ever be any implementation where the But if this is the case, shouldn't the design anticipate the need that the communication between the DM and the debuggee always needs to be proxied? The communication channel between DM and debuggee should be a central aspect of this spec, since we need to cover very different debugging scenarios. At the very least, we should mention that the
There are several requests made by the embedder that the debugging module can satisfy only querying the state of the wasm engine, so originating query events.
Also, to support source-level stepping (which is fairly complicated) the DM could need to have access to the whole module bytecode, to be able to disassemble instructions in the current function. It could then make requests to add (or remove) breakpoints at specific locations in the Wasm module:
All these requests originate in the debugging module. This is why my proposal was be that while the
It is true that DWARF won't work off the shelf with Wasm, but the changes required are very small, and even if the proposal wisely hides the debug information details behind a clean API, I don't think that realistically we will have other options if we want to debug Clang-generated code. To summarize, my main concern is that we should define more clearly the communication mechanism between the debugging module and the debuggee engine. |
* Call `dbg.onStep` with the `SourceStepOptions` | ||
* The `onStep` implementation should use the `WasmDebugger` to set Wasm | ||
breakpoint(s) where it determines execution should pause after taking the | ||
requested step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not possible to implement "step-over" just by setting breakpoints, because the debugger does not know what is the next source line that will be executed. It might not be the one directly succeeding the current line, as we could be in a loop, or some conditional construct.
Debuggers can examine what instructions are being executed and work out all of the possible branch targets, setting breakpoints on all of them. To do this, the debugger module should be able to disassemble the wasm module bytecode (which means that they also need access to that bytecode).
A simpler (but very inefficient) alternative can be to send a number of instruction-level 'step' requests to the engine, until a different line is reached.
debugging *requires* over-the-wire communication or message passing, even if | ||
that is often a good architectural decision. Implementations are free to proxy | ||
this proposal's interface method calls across a protocol or to another | ||
process. It doesn't make sense to bake a specific wire protocol into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that implementations will always need to proxy the interface method calls across a protocol, because the debugger and debuggee will always be in different processes.
This does not mean that we necessarily need to add a specific wire protocol into the standard, but the standard should mention how the Debugger Module will communicate with the debuggee.
* **Debuggee module:** A debuggee module is the Wasm module that is being | ||
debugged or profiled. | ||
* **Debugging module:** A debugging module is referenced from its debuggee Wasm | ||
module, and is a separate Wasm module that translates between Wasm-level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the requirement that a debugging module needs to be a Wasm module be problematic? When we'll start implementing debugging modules for DWARF, for example, we could leverage existing code from LLDB/LLVM that will give us the ability of parsing DWARF files, decode DWARF information, un-mangle names according to the source language specified, supporting multiple languages, evaluate expressions to determine the value of source-level variables, and so on.
Modifying all this existing code so that it compiles to WASM could be a daunting task. It would be much simpler to have 'native' debugging module and define a (websocket-based?) debugging protocol to communicate with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. Probably we should just leave it to implementation detail.
Disclaimer: This is very much a work-in-progress and nothing I've written up is set stone! My hope is that we can merge this PR and continue design in the form of follow up PRs, issue discussions, and video meetings.
Rendered