167 Commits

Author SHA1 Message Date
JosJuice
58487f1633 Jit: Implement error-free transformation for single-precision FMA
This implements the equivalent of 07443e2d41 in Jit64 and JitArm64.
Aims to fix https://bugs.dolphin-emu.org/issues/13865.
2026-01-18 20:02:49 +01:00
JosJuice
84261cfc23 Arm64Emitter: Fix Q bit of vector SHL/URSHR encoding
This doesn't affect any existing callers, because all existing callers
use quad registers.
2026-01-18 20:02:49 +01:00
JosJuice
cd4902f0ed Merge pull request #13875 from JosJuice/jitarm64-orr-base-without-mirror
JitArm64: Add missing ORR pattern in MOVI2RImpl
2025-11-10 20:16:21 +01:00
JosJuice
9716148203 Arm64Emitter: Replace shifting size by 4 with IntLog2 minus 3
The instruction implementations that were shifting the size by 4 would
emit an incorrect instruction when given a size of 64. The correct
implementation is to count the number of leading or trailing zeroes in
the size parameter, which is what IntLog2 does.

No callers are affected by this, as they all use sizes other than 64.
Actually, some of these instructions are even invalid with a size of 64,
but I'm changing them anyway for consistency with the others.
2025-08-24 10:48:21 +02:00
JosJuice
c553344282 JitArm64: Add early exit in MOVI2RImpl ORR loop
Just for performance.
2025-08-21 20:56:06 +02:00
JosJuice
596b290177 JitArm64: Add missing ORR pattern in MOVI2RImpl
We should attempt to use not only mirrored versions of the immediate as
an ORR base, but also the immediate itself. This lets us emit certain
64-bit constants using fewer instructions.
2025-08-21 20:56:06 +02:00
Luz Paz
1b47dbf519 Core/Common: Fix typos
Found via `codespell -q 3 -S "./Externals,./Data/Sys/wiitdb-??.txt,*.po,*.pot" -L andf,asnd,bootup,brocken,bufferin,clen,collet,datas,delt,diety,extint,fpr,inout,inport,interm,nd,nin,ontop,pixelx,re-use,re-used,sav,stateman,strat,transer,wil`
2025-03-11 19:48:45 -04:00
mitaclaw
ffc7bcfbf8 Emitters: Define Trivial Getters Inline 2024-07-21 21:35:29 -07:00
mitaclaw
28f8ab9e8a Arm64FloatEmitter: 64-Bit Assert In ABI_PushRegisters 2024-05-07 13:51:50 -07:00
Pokechu22
fbbfea8e8e Replace Common::BitCast with std::bit_cast 2024-05-03 18:43:51 -07:00
JosJuice
de33831783 Arm64Emitter: Fix shadowed variable
A lambda at the end of ARM64XEmitter::ParallelMoves named its parameter
`move`.
2024-04-21 16:20:59 +02:00
JosJuice
e140491fa9 Arm64Emitter: Fix incorrect assert category 2024-04-21 16:19:10 +02:00
JosJuice
b5c5371848 Arm64Emitter: Don't optimize ADD to MOV for SP
Unlike ADD (immediate), MOV (register) treats SP as ZR. Therefore the
ADDI2R optimization that was added in 67791d227c can't optimize ADD to
MOV when exactly one of the registers is SP.

There currently isn't any code in Dolphin that calls ADDI2R with
parameters that would trigger this case.
2024-02-06 21:58:07 +01:00
JosJuice
d8c78f2a92 JitArm64: Fix the "do nothing" cases of ANDI2R and friends
So somehow I forgot that AArch64 uses three-operand encoding...

Fixes a regression from 6303416201 which manifested in various ways,
such as incorrect rendering of the Wind Waker title screen.
2023-12-21 20:51:32 +01:00
JosJuice
dc60bc5f1e JitArm64: Improve codegen in ANDI2R and friends
The codegen for the functions themselves, not for the emitted code.

This seems to save 32 bytes per function. We also get rid of the oddity
we had before where ANDI2R would do masking for 32-bit operations but
the other functions wouldn't.
2023-12-17 18:13:32 +01:00
JosJuice
a8e1e1ae48 JitArm64: Optimize additional cases of ANDI2R and friends
Now we'll never need a scratch register for values that are all zeroes
or all ones.
2023-12-17 18:13:32 +01:00
JosJuice
6303416201 JitArm64: Optimize ANDI2R and friends to no-ops when possible
This optimizes rlwnmx with mask == 0xFFFFFFFF.
2023-12-17 18:13:30 +01:00
JosJuice
e0eb4ef5bc JitArm64: Use enum class for LogicalImm size parameter
This should prevent issues like the one fixed in the previous commit
from happening again.
2023-12-16 16:48:26 +01:00
JosJuice
67791d227c JitArm64: Add special zero case to ADDI2R
This normally doesn't reduce the instruction count, but is nonetheless
useful on CPUs that can do 0-cycle moves.
2023-12-01 21:31:11 +01:00
JosJuice
25ffb0dbfc JitArm64: Mask input to 32-bit ADDI2R
In case the input was a s32 that got sign extended as part of conversion
to u64.
2023-12-01 21:26:37 +01:00
JosJuice
c248a69268 JitArm64: Add utility for calling a function with arguments
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
2023-11-01 19:01:58 +01:00
JosJuice
6e88c44d5d Move SmallVector to Common
We had one implementation of this type of data structure in Arm64Emitter
and one in VideoCommon. This moves the Arm64Emitter implementation to
its own file and adds begin and end functions to it, so that VideoCommon
can use it.

You may notice that the license header for the new file is CC0. I wrote
the Arm64Emitter implementation of SmallVector, so this should be no
problem.
2023-08-22 13:19:49 +02:00
Lioncash
784a216927 Common/MathUtil: Move IntLog2 into MathUtil namespace
Gets this out of the global namespace.
2023-04-15 03:35:05 -04:00
JosJuice
b5b8871bce Arm64Emitter: Fix SHRN/SHRN2
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.

We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
2022-12-10 11:20:23 +01:00
JosJuice
06e60ac327 JitArm64: Implement accurate NaNs
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
2022-12-03 19:41:32 +01:00