The instruction implementations that were shifting the size by 4 would
emit an incorrect instruction when given a size of 64. The correct
implementation is to count the number of leading or trailing zeroes in
the size parameter, which is what IntLog2 does.
No callers are affected by this, as they all use sizes other than 64.
Actually, some of these instructions are even invalid with a size of 64,
but I'm changing them anyway for consistency with the others.
We should attempt to use not only mirrored versions of the immediate as
an ORR base, but also the immediate itself. This lets us emit certain
64-bit constants using fewer instructions.
Found via `codespell -q 3 -S "./Externals,./Data/Sys/wiitdb-??.txt,*.po,*.pot" -L andf,asnd,bootup,brocken,bufferin,clen,collet,datas,delt,diety,extint,fpr,inout,inport,interm,nd,nin,ontop,pixelx,re-use,re-used,sav,stateman,strat,transer,wil`
Unlike ADD (immediate), MOV (register) treats SP as ZR. Therefore the
ADDI2R optimization that was added in 67791d227c can't optimize ADD to
MOV when exactly one of the registers is SP.
There currently isn't any code in Dolphin that calls ADDI2R with
parameters that would trigger this case.
So somehow I forgot that AArch64 uses three-operand encoding...
Fixes a regression from 6303416201 which manifested in various ways,
such as incorrect rendering of the Wind Waker title screen.
The codegen for the functions themselves, not for the emitted code.
This seems to save 32 bytes per function. We also get rid of the oddity
we had before where ANDI2R would do masking for 32-bit operations but
the other functions wouldn't.
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
We had one implementation of this type of data structure in Arm64Emitter
and one in VideoCommon. This moves the Arm64Emitter implementation to
its own file and adds begin and end functions to it, so that VideoCommon
can use it.
You may notice that the license header for the new file is CC0. I wrote
the Arm64Emitter implementation of SmallVector, so this should be no
problem.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.