* PCC: x64: insertlane instructions read only scalar-sized values.
Also fix `clamp_range` on greater-than-64-bit values: no range fact is
possible in this case (propagate `Option` a bit deeper to represent
this).
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67538.
* Rebase to latest main with leaf-function changes and update test expectations.
* Cranelift: make `bitcast`s between integer and reference types do a copy
This avoids putting multiple conflicting regalloc constraints on values that
would otherwise be aliases of each other (one constraint that the value must be
in a register as a function argument and another that it must be in a stack slot
for a safepoint) by splitting the value in two and each split value getting its
own constraint.
Fixes#8180
* Fix test expectations
* Return the last result through registers in the winch calling convention
* Add a run test for winch calling convention functions
* Disable the Winch calling convention in cranelift's aarch64 backend
* Remove the aarch64 winc.clif test
* Skip realignment for winch results on the stack
* cranelift: Optimize select_spectre_guard, carefully
This commit makes two changes to our treatment of
`select_spectre_guard`.
First, stop annotating this instruction as having any side effects. We
only care that if its value result is used, then it's computed without
branching on the condition input. We don't otherwise care when the value
is computed, or if it's computed at all.
Second, introduce some carefully selected ISLE egraph rewrites for this
instruction. These particular rewrites are those where we can statically
determine which SSA value will be the result of the instruction. Since
there is no actual choice involved, there's no way to accidentally
introduce speculation on the condition input.
* Add filetests
* Canonicalize fpromote/fdemote operations
This commit changes the strategy implemented in #8146 to canonicalize
promotes/demotes of floats to additionally handle #8179.
Closes#8179
* Canonicalize fvpromote_low/fvdemote as well
In fact, we can even use this fact to infer ranges of the results!
(Previously we had a "default" copy-and-paste across all instruction
cases that 64-bit reg writes produced 64 bits of undefinedness and we
hadn't tightened the spec here.)
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67436.
We have various constant-propagation/folding rules in the mid-end that
generate new `iconst`s in place of other expressions. We got a fuzzbug
with PCC wherein it was not able to verify that an iadd-iadd-uextend
combination generating a Wasm address was in-range when rules
reassociated the iadds to put constants together. Rather than carefully
augment all rules to propagate constant facts only where they exist on
the inputs, I opted to add a hook to the optimizer to generate brand-new
assertions on *all* iconsts that we insert. This adds a little more work
during verification (not too much hopefully: it's pretty low-overhead to
check that `mov $1, %rax` puts `1` in `rax`) but should provide broader
coverage of interesting corner-cases where optimization breaks the PCC
chain.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67432.
* x64: Refactor lowerings for `insertlane`
Going through old PRs I stumbled across #2716 which is quite old at this
point. Upon adding the tests to `main` I see that most of it is actually
implemented except for load-lane-from-memory where the lane size is 8 or
16 bits. That requires explicitly opting-in with `sinkable_load_exact`
so this PR now subsumes the tests of #2716 in addition to implementing
this missing hole in lowerings.
This refactoring shuffles around where definitions are located to more
easily have access to `Value` to perform the relevant match. The generic
`vec_insert_lane` helper is now gone as well in lieu of direct matches
on `insertlane` lowerings.
Closes#2716
* Remove a no-longer-needed helper function
This allows us to make a single pass over an argument's slots instead of
needing to first pre-allocate temporaries, because we don't need to hold
a borrow of `ctx`. For the same reason, we can also emit the new
instructions directly instead of buffering them and copying them at the
end.
This approach also moves less data around. A SmallInstVec is a usize
plus four `M::Inst`s, which are 32 or 40 bytes each, while an ABIArg is
only 40 bytes. Since the SmallVecs inside ABIArg almost never spill to
the heap, cloning one uses less memory than allocating temporary space
for a few instructions.
* Refactor the `MemFlags` internal representation
This commit refactors the internal `MemFlags` bits to be not just flags.
Instead some bits are now grouped together and interpreted as a unit.
This enables two primary API changes:
* First the `heap`/`table`/`vmctx` split is now represented as an "alias
region" which is set as an enum. This means that all `MemFlags` carry
an `Option<AliasRegion>` internally.
* Second trapping state is now represented as an `Option<TrapCode>`.
This means that `notrap` is no longer a flag and `tabletrap` is no
longer a flag. This does enable storing arbitrary trap codes though so
long as they aren't `TrapCode::User(_)`.
The main purpose of this commit is to enable using more trap codes with
`MemFlags` for when a segfault is detected. For example with #5291 we
want a segfault to indicate a call-to-null, which is not currently
covered by `MemFlags`.
* Remove individual alias region setters/getters from `MemFlags`
Instead use the `AliasRegion` enum instead to help emphasize that only
one region is possible on a `MemFlags`, not multiple.
* Update cranelift/codegen/src/ir/memflags.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Delay argument location use in `CallSite::gen_arg`
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Directly emit the instructions produced by gen_arg
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This commit updates the nan-canonicalization pass that Cranelift does to
canonicalize bitcasts from arbitrary integers in addition to floating
point arithmetic operations.
Closes#8145
* Update PCC test to expose failure.
* Reduce test coverage only to fully-static (bounds-check-elided) case.
* Configure PCC when fuzzing
* Ensure that we panic for pcc errors in the instantiate fuzz target
* Adjust Wasmtime configuration generation: PCC forces static memory configuration.
* Properly force memory config when fuzzing PCC.
---------
Co-authored-by: Trevor Elliott <telliott@fastly.com>
Ideally we'd allow the frontend to specify what trap code a memory
access instruction can produce, but we don't have room to store that
information.
We do have a few bits to spare in `MemFlags`, though, so add a new
`tabletrap` bit to indicate that the trap code should be
TableOutOfBounds instead of the default HeapOutOfBounds.
Nothing yet sets the `tabletrap` flag, so the TableOutOfBounds case is
never hit, but I wanted to separate this change out from the
cranelift-wasm changes which will use it.
The original version of this patch reused the `table` flag instead of
introducing a new `tabletrap` flag. But this usage changes the semantics
of an instruction, while the `table` flag is meant to be a hint for
alias analysis in the egraph optimization pass.
On s390x we support ABIs with little-endian or big-endian vector
lane ordering. When calling to function with a different lane
order, vector arguments need to be lane-swapped. This currently
happens for arguments passed in registers, but not for those
passed on the stack - that is a bug.
Fixed by swapping stack arguments as well.
Fixes https://github.com/bytecodealliance/wasmtime/issues/8132
We have a special lowering that allows us to fuse a `bitselect` with a comparison instruction. This saves us a few instructions due to the mismatch that exists between native RISC-V masks and WASM masks.
Native RISC-V masks have a single bit per lane, whereas WASM masks have all bits in a lane set to 1.
The lowering for `bitselect+bitcast+{i,f}cmp` avoids the need to generate the WASM mask, by directly using the comparison mask with `vmerge`.
The bug that this fixes was that when we introduce a `bitcast` in the middle, the comparison and the merge may have different types with different lanes. And if that happens the `vmerge` will only look at the first n bits of the mask. n being the number of lanes currently configured.
This commit ensures that they are always equal by using the same type for both vmerge and the comparison instruction.
I also manually checked all other uses of `gen_{f,i}cmp_mask` and they are all using the same type in the subsequent instructions.
With this fix we no longer really care about the type of the `bitselect` as long as it has the same bitlength as the type of `{i,f}cmp`, which I think is enforced by the verifier. (i.e. We would have the same bug if `bitselect.i8x16+icmp.i8xi8` was legal.)
Wasmtime no longer needs any of this infrastructure and neither should
anybody else.
This diff is nearly identical to @bjorn3's version of the same change,
except I didn't remove Uimm64, which has started being used in other
places. I forgot bjorn3 had already tackled this part until after I was
already done, but it's reassuring that we both made the same changes.
https://github.com/bjorn3/wasmtime/commit/fb82ccb3948e949641a6d9581aa84472f68f97b8Fixes#5532
* cranelift: Remove srcloc from emit state on all targets
In #2426, @cfallin wrote:
> […] don't emit trap info unless an op can trap.
>
> This end result was previously enacted by carrying a SourceLoc on
> every load/store, which was somewhat cumbersome, and only indirectly
> encoded metadata about a memory reference (can it trap) by its
> presence or absence.
That PR changed both backends that existed at the time to check both the
source location and the memory flags to determine whether a memory
access could trap.
Then in #2685, @cfallin wrote:
> Finally, while working out why trap records were not appearing, I had
> noticed that isa::x64::emit_std_enc_mem() was only emitting heap-OOB
> trap metadata for loads/stores when it had a srcloc. This PR ensures
> that the metadata is emitted even when the srcloc is empty.
However that PR did not apply the same change to other backends. Since
then, the pattern from #2426 has been copied to new backends.
I believe checking the source location has been unnecessary since #2426
and is now just a source of confusion at best, and possibly bugs at
worst. So this PR makes all targets match the behavior of the x64
backend.
In addition, this pattern was the only reason why source locations were
provided to any backend's emit state, so I'm removing that entirely.
The `cur_srcloc` field has been unused on x64 since #2685.
This change is mostly straightforward, but there are two questionable
changes in the riscv64 backend:
- The riscv64 backend had one use of this pattern for a
BadConversionToInteger trap. All other uses on all backends were for
HeapOutOfBounds traps. I suspect that was a copy-paste bug so I've
removed it just like all the others.
- The riscv64 `Inst::Atomic` does not have a MemFlags field, so this
means the HeapOutOfBounds trap metadata is added unconditionally for
such instructions.
* Filetests don't have srclocs so they get traps now
We were accidentally using the `max` opcode in the `min` helper when using the `Zbb` extension. This wasn't caught by testing since we never enabled the `smin` tests with the `Zbb` extension enabled.
Furthermore this should have been caught by fuzzing, but we also don't regularly fuzz RISC-V.
This commit aims to address #8116 by fixing these two instructions to
load the proper amount of bytes when a load is sunk into them. Currently
the instruction variant used here requires an aligned `XmmMem` which
is today auto-translated with a 16-byte load unconditionally. For loads
near the end of memory this loads too much and can erroneously cause a
trap. The fix in this commit is to force the load to happen manually
with the appropriate type rather than a 16-byte type.
I'll note that `XmmMem` is left as an argument in this case because the
AVX variants of these instructions can continue to leverage unaligned
accesses. This also means that the test added here won't fail on a
machine with AVX support, it needs to be explicitly disabled.
Closes#8116