Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the native ABI calling convention from Wasmtime #8629

Merged
merged 2 commits into from
May 17, 2024

Conversation

alexcrichton
Copy link
Member

This commit proposes removing the "native abi" calling convention used in Wasmtime. For background this ABI dates back to the origins of Wasmtime. Originally Wasmtime only had Func::call and eventually I added TypedFunc with TypedFunc::call and Func::wrap for a faster path. At the time given the state of trampolines it was easiest to call WebAssembly code directly without any trampolines using the native ABI that wasm used at the time. This is the original source of the native ABI and it's persisted over time under the assumption that it's faster than the array ABI due to keeping arguments in registers rather than spilling them to the stack.

Over time, however, this design decision of using the native ABI has not aged well. Trampolines have changed quite a lot in the meantime and it's no longer possible for the host to call wasm without a trampoline, for example. Compilations nowadays maintain both native and array trampolines for wasm functions in addition to host functions. There's a large split between Func::new and Func::wrap. Overall, there's quite a lot of weight that we're pulling for the design decision of using the native ABI.

Functionally this hasn't ever really been the end of the world. Trampolines aren't a known issue in terms of performance or code size. There's no known faster way to invoke WebAssembly from the host (or vice-versa). One major downside of this design, however, is that Func::new requires Cranelift as a backend to exist. This is due to the fact that it needs to synthesize various entries in the matrix of ABIs we have that aren't available at any other time. While this is itself not the worst of issues it means that the C API cannot be built without a compiler because the C API does not have access to Func::wrap.

Overall I'd like to reevaluate given where Wasmtime is today whether it makes sense to keep the native ABI trampolines. Sure they're supposed to be fast, but are they really that much faster than the array-call ABI as an alternative? This commit is intended to measure this.

This commit removes the native ABI calling convention entirely. For example VMFuncRef is now one pointer smaller. All of TypedFunc now uses *mut ValRaw for loads/stores rather than dealing with ABI business. The benchmarks with this PR are:

  • sync/no-hook/core - host-to-wasm - typed - nop - 5% faster
  • sync/no-hook/core - host-to-wasm - typed - nop-params-and-results - 10% slower
  • sync/no-hook/core - wasm-to-host - typed - nop - no change
  • sync/no-hook/core - wasm-to-host - typed - nop-params-and-results - 7% faster

These numbers are a bit surprising as I would have suspected no change in both "nop" benchmarks as well as both being slower in the params-and-results benchmarks. Regardless it is apparent that this is not a major change in terms of performance given Wasmtime's current state. In general my hunch is that there are more expensive sources of overhead than reads/writes from the stack when dealing with wasm values (e.g. trap handling, store management, etc).

Overall this commit feels like a large simplification of what we currently do in TypedFunc:

  • The number of ABIs that Wasmtime deals with is reduced by one. ABIs are pretty much always tricky and having fewer moving parts should help improve the understandability of the system.
  • All of the WasmTy trait methods and TypedFunc infrastructure is simplified. Traits now work with simple load/store methods rather than various other flavors of conversion.
  • The multi-return-value handling of the native ABI is all gone now which gave rise to significant complexity within Wasmtime's Cranelift translation layer in addition to the TypedFunc backing traits.
  • This aligns components and core wasm where components always use the array ABI and now core wasm additionally will always use the array ABI when communicating with the host.

I'll note that this still leaves a major ABI "complexity" with respect to native functions do not have a wasm ABI function pointer until they're "attached" to a Store with a Module. That's required to avoid needing Cranelift for creating host functions and that property is still true today. This is a bit simpler to understand though now that Func::new and Func::wrap are treated uniformly rather than one being special-cased.

@alexcrichton alexcrichton requested a review from a team as a code owner May 15, 2024 21:15
@alexcrichton alexcrichton requested review from elliottt and fitzgen and removed request for a team and elliottt May 15, 2024 21:15
@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime labels May 15, 2024
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

This commit proposes removing the "native abi" calling convention used
in Wasmtime. For background this ABI dates back to the origins of
Wasmtime. Originally Wasmtime only had `Func::call` and eventually I
added `TypedFunc` with `TypedFunc::call` and `Func::wrap` for a faster
path. At the time given the state of trampolines it was easiest to call
WebAssembly code directly without any trampolines using the native ABI
that wasm used at the time. This is the original source of the native
ABI and it's persisted over time under the assumption that it's faster
than the array ABI due to keeping arguments in registers rather than
spilling them to the stack.

Over time, however, this design decision of using the native ABI has not
aged well. Trampolines have changed quite a lot in the meantime and it's
no longer possible for the host to call wasm without a trampoline, for
example. Compilations nowadays maintain both native and array
trampolines for wasm functions in addition to host functions. There's a
large split between `Func::new` and `Func::wrap`. Overall, there's quite
a lot of weight that we're pulling for the design decision of using the
native ABI.

Functionally this hasn't ever really been the end of the world.
Trampolines aren't a known issue in terms of performance or code size.
There's no known faster way to invoke WebAssembly from the host (or
vice-versa). One major downside of this design, however, is that
`Func::new` requires Cranelift as a backend to exist. This is due to the
fact that it needs to synthesize various entries in the matrix of ABIs
we have that aren't available at any other time. While this is itself
not the worst of issues it means that the C API cannot be built without
a compiler because the C API does not have access to `Func::wrap`.

Overall I'd like to reevaluate given where Wasmtime is today whether it
makes sense to keep the native ABI trampolines. Sure they're supposed to
be fast, but are they really that much faster than the array-call ABI as
an alternative? This commit is intended to measure this.

This commit removes the native ABI calling convention entirely. For
example `VMFuncRef` is now one pointer smaller. All of `TypedFunc` now
uses `*mut ValRaw` for loads/stores rather than dealing with ABI
business. The benchmarks with this PR are:

* `sync/no-hook/core - host-to-wasm - typed - nop` - 5% faster
* `sync/no-hook/core - host-to-wasm - typed - nop-params-and-results` - 10% slower
* `sync/no-hook/core - wasm-to-host - typed - nop` - no change
* `sync/no-hook/core - wasm-to-host - typed - nop-params-and-results` - 7% faster

These numbers are a bit surprising as I would have suspected no change
in both "nop" benchmarks as well as both being slower in the
params-and-results benchmarks. Regardless it is apparent that this is
not a major change in terms of performance given Wasmtime's current
state. In general my hunch is that there are more expensive sources of
overhead than reads/writes from the stack when dealing with wasm values
(e.g. trap handling, store management, etc).

Overall this commit feels like a large simplification of what we
currently do in `TypedFunc`:

* The number of ABIs that Wasmtime deals with is reduced by one. ABIs
  are pretty much always tricky and having fewer moving parts should
  help improve the understandability of the system.
* All of the `WasmTy` trait methods and `TypedFunc` infrastructure is
  simplified. Traits now work with simple `load`/`store` methods rather
  than various other flavors of conversion.
* The multi-return-value handling of the native ABI is all gone now
  which gave rise to significant complexity within Wasmtime's Cranelift
  translation layer in addition to the `TypedFunc` backing traits.
* This aligns components and core wasm where components always use the
  array ABI and now core wasm additionally will always use the array ABI
  when communicating with the host.

I'll note that this still leaves a major ABI "complexity" with respect
to native functions do not have a wasm ABI function pointer until
they're "attached" to a `Store` with a `Module`. That's required to
avoid needing Cranelift for creating host functions and that property is
still true today. This is a bit simpler to understand though now that
`Func::new` and `Func::wrap` are treated uniformly rather than one being
special-cased.
@alexcrichton alexcrichton added this pull request to the merge queue May 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 17, 2024
prtest:full
@alexcrichton alexcrichton added this pull request to the merge queue May 17, 2024
Merged via the queue into bytecodealliance:main with commit 1d11b26 May 17, 2024
115 checks passed
@alexcrichton alexcrichton deleted the remove-native-abi branch May 17, 2024 04:42
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 17, 2024
This commit enables the `Func::new` constructor and related other
functions when `cranelift` and `winch` features are both disabled,
meaning this is now available in compiler-less builds. This builds on
the support of bytecodealliance#8629.
github-merge-queue bot pushed a commit that referenced this pull request May 17, 2024
This commit enables the `Func::new` constructor and related other
functions when `cranelift` and `winch` features are both disabled,
meaning this is now available in compiler-less builds. This builds on
the support of #8629.
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 20, 2024
…nce#8629)

* Remove the native ABI calling convention from Wasmtime

This commit proposes removing the "native abi" calling convention used
in Wasmtime. For background this ABI dates back to the origins of
Wasmtime. Originally Wasmtime only had `Func::call` and eventually I
added `TypedFunc` with `TypedFunc::call` and `Func::wrap` for a faster
path. At the time given the state of trampolines it was easiest to call
WebAssembly code directly without any trampolines using the native ABI
that wasm used at the time. This is the original source of the native
ABI and it's persisted over time under the assumption that it's faster
than the array ABI due to keeping arguments in registers rather than
spilling them to the stack.

Over time, however, this design decision of using the native ABI has not
aged well. Trampolines have changed quite a lot in the meantime and it's
no longer possible for the host to call wasm without a trampoline, for
example. Compilations nowadays maintain both native and array
trampolines for wasm functions in addition to host functions. There's a
large split between `Func::new` and `Func::wrap`. Overall, there's quite
a lot of weight that we're pulling for the design decision of using the
native ABI.

Functionally this hasn't ever really been the end of the world.
Trampolines aren't a known issue in terms of performance or code size.
There's no known faster way to invoke WebAssembly from the host (or
vice-versa). One major downside of this design, however, is that
`Func::new` requires Cranelift as a backend to exist. This is due to the
fact that it needs to synthesize various entries in the matrix of ABIs
we have that aren't available at any other time. While this is itself
not the worst of issues it means that the C API cannot be built without
a compiler because the C API does not have access to `Func::wrap`.

Overall I'd like to reevaluate given where Wasmtime is today whether it
makes sense to keep the native ABI trampolines. Sure they're supposed to
be fast, but are they really that much faster than the array-call ABI as
an alternative? This commit is intended to measure this.

This commit removes the native ABI calling convention entirely. For
example `VMFuncRef` is now one pointer smaller. All of `TypedFunc` now
uses `*mut ValRaw` for loads/stores rather than dealing with ABI
business. The benchmarks with this PR are:

* `sync/no-hook/core - host-to-wasm - typed - nop` - 5% faster
* `sync/no-hook/core - host-to-wasm - typed - nop-params-and-results` - 10% slower
* `sync/no-hook/core - wasm-to-host - typed - nop` - no change
* `sync/no-hook/core - wasm-to-host - typed - nop-params-and-results` - 7% faster

These numbers are a bit surprising as I would have suspected no change
in both "nop" benchmarks as well as both being slower in the
params-and-results benchmarks. Regardless it is apparent that this is
not a major change in terms of performance given Wasmtime's current
state. In general my hunch is that there are more expensive sources of
overhead than reads/writes from the stack when dealing with wasm values
(e.g. trap handling, store management, etc).

Overall this commit feels like a large simplification of what we
currently do in `TypedFunc`:

* The number of ABIs that Wasmtime deals with is reduced by one. ABIs
  are pretty much always tricky and having fewer moving parts should
  help improve the understandability of the system.
* All of the `WasmTy` trait methods and `TypedFunc` infrastructure is
  simplified. Traits now work with simple `load`/`store` methods rather
  than various other flavors of conversion.
* The multi-return-value handling of the native ABI is all gone now
  which gave rise to significant complexity within Wasmtime's Cranelift
  translation layer in addition to the `TypedFunc` backing traits.
* This aligns components and core wasm where components always use the
  array ABI and now core wasm additionally will always use the array ABI
  when communicating with the host.

I'll note that this still leaves a major ABI "complexity" with respect
to native functions do not have a wasm ABI function pointer until
they're "attached" to a `Store` with a `Module`. That's required to
avoid needing Cranelift for creating host functions and that property is
still true today. This is a bit simpler to understand though now that
`Func::new` and `Func::wrap` are treated uniformly rather than one being
special-cased.

* Fix miri unsafety

prtest:full
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request May 20, 2024
This commit enables the `Func::new` constructor and related other
functions when `cranelift` and `winch` features are both disabled,
meaning this is now available in compiler-less builds. This builds on
the support of bytecodealliance#8629.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:ref-types Issues related to reference types and GC in Wasmtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants