* Remove callee saves from Winch's MacroAssembler trait
prtest:mingw-x64
* Remove the unused callee_saved_regs function
* Removed the unused callee_saved function from Winch's aarch64 backend
* Remove additional unused functions from the Winch ABI trait
This commit correctly handles the result of the memory grow builtin function. Previously, it was assumed that the result of memory grow must be of the the target's pointer type, which doesn't accurately represent the address space covered by the memory type.
This commit is a follow-up to https://github.com/bytecodealliance/wasmtime/pull/8059. Instead of arbitrarily using the target's pointer size, it derives the use from the heap information, in order to do bounds check calculations, this enables checking the right limits.
This commit completely replaces the `XmmMemAligned` operand in
`XmmCmove` with `Xmm` instead. Looking more into the fix#8113 I was
looking to add some `*.wast` runtime tests to assert that the fix works
at the wasm layer in addition to the instruction selection layer. I was
poking around and there's a second user of `XmmCmove` which wasn't
addressed in #8113.
In #8113 the only caller of `cmove_xmm` was updated to ensure that
everything was always in memory. This was done to both prevent the
upgrade-to-an-aligned-load from loading too much but additionally to
ensure that the load always happened, regardless of the condition. There
was a second constructor of `XmmCmove`, however, from `cmove_or_xmm`.
This is triggered when the condition is a `f64.ne` instruction, for
example, and ran into the same bug that #8112 was exposing.
To fix both of these and prevent any future issues about skipping a load
by accident due to control flow this commit removes the `XmmMemAligned`
argument entirely from `XmmCmove` and replaces it with `Xmm`. This
prevents sinking loads entirely and sidesteps all these issues at the
type level.
This commit introduces the `TypeConverter` struct, used to enable partial borrowing of a `ModuleTranslation` and a `ModuleTypesBuilder`. This makes it easier to defer type conversions until they are actually needed, for example when resolving callee signatures.
winch: Create converter until it's actually needed.
This commit fixes the bounds check comparison for dynamic heaps. The previous implementation wasn't using the right value: for the general case of dynamic heaps, we want to compare if
index + offset + access_size > bounds
But it was only comparing
index > bounds
This commit addresses the issue by adding a new temporary register into the equation which will be used for the bounds check comparison and for overflow checks. This approach is preferred over using the scratch register because it's harder to know when the scratch register might get clobbered (in the case of spectre checks, it could for example)
This commit defers the usage of `format!` until it's actually needed when retrieving addresses for locals. This helps improve performance in modules that incur in a considerable number of local lookups.
This commit fixes a bug where when an out-of-bounds access is detected at compile time, `emit_wasm_store` was failing to free the source register. By the time the address is computed, the register is already allocated, independently if it's used or not, it must be freed.
* winch: Fix bounds checks for dynamic heaps
This commit fixes a fuzz bug in which the current implementation was incorrectly cloberring the index register of a memory access (for addition overflow check) and then using that same clobbered register to perform the memory access. The clobbered register contained the value: `index + offset + access_size`, which resulting in an incorrect access and consequently in an incorrect `HeapOutOfBounds` trap.
This bug is only reproducible when modifying Wasmtime's memory settings, forcing the heap to be treated as `Dynamic`.
Currently in Winch there's no easy way to have specific Wasmtime configurations, so having a filetests for this case is not straightforward. I've opted to add an integration test, in which it's easier to configure Wasmtime.
Note that the `tests/all/winch.rs` file is temporary, and the plan is to execute all the other integration tests with Winch at some point. In the case of `memory.rs`, that will be once Winch supports `memory64` hoping to reduce the amount of code needed in order to integrate Winch.
* Remove unused variable in integration tests
* Wasmtime: Add a `gc` cargo feature
This controls whether support for `ExternRef` and its associated deferred,
reference-counting garbage collector is enabled at compile time or not. It will
also be used for similarly for Wasmtime's full Wasm GC support as that gets
added.
* Add CI for `gc` Cargo feature
* Cut down on the number of `#[cfg(feature = "gc")]`s outside the implementation of `[VM]ExternRef`
* Fix wasmparser reference types configuration with GC disabled/enabled
* More config fix
* doc cfg
* Make the dummy `VMExternRefActivationsTable` inhabited
* Fix winch tests
* final review bits
* Enable wasmtime's gc cargo feature for the C API
* Enable wasmtime's gc cargo feature from wasmtime-cli-flags
* enable gc cargo feature in a couple other crates
* Update some CI dependencies
* Update to the latest nightly toolchain
* Update mdbook
* Update QEMU for cross-compiled testing
* Update `cargo nextest` for usage with MIRI
prtest:full
* Remove lots of unnecessary imports
* Downgrade qemu as 8.2.1 seems to segfault
* Remove more imports
* Remove unused winch trait method
* Fix warnings about unused trait methods
* More unused imports
* More unused imports
* Refactor the prologue and epilogue interface in winch
* Remove the locals reservation from the MacroAssembler's prologue/epilogue
* Update winch tests
* winch: Overhaul the internal ABI
This change overhauls Winch's ABI. This means that as part of this change, the default ABI now closely resembles Cranelift's ABI, particularly on the treatment of the VMContext. This change also fixes many wrong assumptions about trampolines, which are tied to how the previous ABI operated.
The main motivation behind this change is:
* To make it easier to integrate Winch-generated functions with Wasmtime
* Fix fuzz bugs related to imports
* Solidify the implementation regarding the usage of a pinned register to hold the VMContext value throughout the lifetime of a function.
The previous implementation had the following characteristics, and wrong assumptions):
* Assumed that nternal functions don't receive a caller or callee VMContexts as parameters.
* Worked correctly in the following scenarios:
* `Wasm -> Native`: since we can explicitly load the caller and callee `VMContext`, because we're
calling a native import.
* `(Native, Array) -> Wasm`: because the native signatures define a tuple of `VMContext` as arguments.
* It didn't work in the following scenario:
* `Wasm->Wasm`: When calling imports from another WebAssembly instance (via
direct call or `call_indirect`. The previous implementation wrongly assumes
that there should be a trampoline in this case, but there isn't. The code
was generated by the same compiler, so the same ABI should be used in
both functions, but it doesn't.
This change introduces the following changes, which fix the previous assumptions and bugs:
* All internal functions declare a two extra pointer-sized parameters, which will hold the callee and caller `VMContext`s
* Use a pinned register that will be considered live through the lifetime of the function instead of pinning it at the trampoline level. The pinning explicitlly happens when entering the function body and no other assumptions are made from there on.
* Introduce the concept of special `ContextArgs` for function calls. This enum holds metadata about which context arguments are needed depending on the callee. The previous implementation of introducing register values at arbitrary locations in the value stack conflicts with the stack ordering principle which states that older values must *always* precede newer values. So we can't insert a register, because if a spill happens the order of the values will be wrong.
Finally, given that this change also enables the `imports.wast` test suite, it also includes a fix to `global.{get, set}` instructions which didn't account entirely for imported globals.
Resolved conflicts
Update Winch filetests
* Fix typos
* Use `get_wasm_local` and `get_frame_local` instead of `get_local` and `get_local_unchecked`
* Introduce `MAX_CONTEXT_ARGS` and use it in the trampoline to skip context arguments.
* Support patchable regions in MachBuffer
* Patch stack size into stack checks once functions have been visited
* Move the stack overflow check into the function prologue
* Start addressing review feedback
* Use an explicit encoding of add for fixing up add-with-immediate
* Track a single stack max addition
* Better comments about `sp_max`
* Fix comments
* Update winch filetest outputs
* Reuse the cranelift implementation of RexFlags
* Update winch/codegen/src/isa/x64/masm.rs
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
---------
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
This commit fixes a fuzz bug where modules involving known libcalls
would fail to compile given that they were unconditionally treated as
colocated libcalls.
This bug is only reproducible in non sse41 environments, given that some
operations like `floor` default to libcalls in this case. The
`use_colocated_libcalls` setting is not configurable within Wasmtime and
as such, they should be loaded into a register prior to emitting the
call. This will also ensure that the right 8-byte absolute relocation is
used.
* Restructure the MacroAssembler interface to clobbers
* Update winch/codegen/src/trampoline.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Revert "Update winch/codegen/src/trampoline.rs"
This reverts commit 53ca1dea4f9c72370a0e49200cf14260a750628c.
* Review feedback
* Prevent calling multiple `emit` functions on a `Trampoline` value
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* winch: Optimize calls
This commit introduces several optimizations to speed up the compilation
of function calls:
* Keep track of previously resolved function signatures for local or
imported callees to avoid computing the `ABISig` on every
function call.
* Keep track of previously resolved type signatures for indirect calls
to avoid computing the `ABISig` on every function call.
* Refactor `CallKnown` and `CallUnknown` instructions to make the
`BoxCallInfo` field in the struct optional. Prior to this change,
from Winch's perspective each call lowering involved a heap
allocation, using the default values for `BoxCallInfo`, which in the
end are not used by Winch.
* Switch Winch's internal `Stack` to use a `SmallVec` rather than
a `Vec`. Many of the operations involving builtin function calls
require inserting elements at arbitrary offsets in the stack and
using a `SmallVec` makes this process more efficient.
With the changes mentioned above, I observed ~30% improvement in
compilation times for modules that are call-heavy.
* Expect `CallInfo` where applicable and add a comment about the type
definition
* Remove unneeded types and lifetimes
* x64: Refactor multiplication instructions
This commit is inspired after reading over some code from #7865
and #7866. The goal of this commit was to refactor
scalar multiplication-related instructions in the x64 backend to more
closely align with their native instructions. Changes include:
* The `MulHi` instruction is renamed to `Mul`. This represents either
`mul` or `imul` producing a doublewide result.
* A `Mul8` instruction was added to correspond to `Mul` for the 8-bit
variants that produce a doublewide result in the `AX` register rather
than the other instructions which split between `RAX` and `RDX`.
* The `UMulLo` instruction was removed as now it's covered by `Mul`
* The `AluRmiROpcode::Mul` opcode was removed in favor of new `IMul` and
`IMulImm` instructions. Register allocation and emission already had
special cases for `Mul` which felt better as standalone instructions
rather than putting in an existing variant.
Lowerings using `imul` are not affected in general but the `IMulImm`
instruction has different register allocation behavior than before which
allows the destination to have a different register than the first
operand. The `umulhi` and `smulhi` instructions are also reimplemented
with their 8-bit variants instead of extension-plus-16-bit variants.
* Remove outdated emit tests
These are all covered by the filetests framework now too.
* Fix Winch build