Tests have already been implemented in 92044d5e; this commit also reduces
the scope of some of the todos (because now they're implemented!).
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55154
Co-authored-by: Giovanni Mascellani <gmascellani@codeweavers.com>
These may happen when storing to structured buffers, and we are not
handling them properly yet. The included test reaches unreacheable code
before this patch.
Storing to buffers is complicated since we need to split the index
chain in two paths:
- The path within the variable where the resource is.
- The subpath to the part of the resource element that is being stored
to.
For now, we will emit a fixme when the index chain in the lhs is not a
direct resource access.
For temporary registers, SM1-SM3 integer types are internally
represented as floating point, so, in order to perform a cast
from ints to floats we need a mere MOV.
For constant integer registers "iN" there is no operation for casting
from a floating point register to them. For address registers "aN", and
the loop counting register "aL", vertex shaders have the "mova" operation
but we haven't used these registers in any way yet.
We probably would want to introduce these as synthetic variables
allocated in a special register set. In that case we have to remember to
use MOVA instead of MOV in the store operations, but they shouldn't be src
or dst of CAST operations.
Regarding constant integer registers, in some shaders, constants are
expected to be received formatted as an integer, such as:
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m; ++k)
res += k;
return res;
}
which compiles as:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m i0 1
//
ps_3_0
def c0, 0, 1, 0, 0
mov r0, c0.x
mov r1.x, c0.x
rep i0
add r0, r0, r1.x
add r1.x, r1.x, c0.y
endrep
mov oC0, r0
but this only happens if the integer constant is used directly in an
instruction that needs it, and as I said there is no instruction that
allows converting them to a float representation.
Notice how a more complex shader, that performs operations with this
integer variable "m":
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m * m; ++k)
res += k;
return res;
}
gives the following output:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m c0 1
//
ps_3_0
def c1, 0, 0, 1, 0
defi i0, 255, 0, 0, 0
mul r0.x, c0.x, c0.x
mov r1, c1.y
mov r0.y, c1.y
rep i0
mov r0.z, r0.x
break_ge r0.y, r0.z
add r1, r0.y, r1
add r0.y, r0.y, c1.z
endrep
mov oC0, r1
Meaning that the uniform "m" is just stored as a floating point in
"c0", the constant integer register "i0" is just set to 255 (hoping
it is a high enough value) using "defi", and the "break_ge"
involving c0 is used to break from the loop.
We could potentially use this approach to implement loops from SM3
without expecting the variables being received as constant integer
registers.
According to the D3D documentation, for SM1-SM3 constant integer
registers are only used by the 'loop' and 'rep' instructions.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.
For simplicity PHI nodes are not currently handled.
The goal for this pass is to make the CFG structurizer simpler, because
it doesn't have to care about the more rigid rules SSA registers have
to satisfy than TEMP registers.
It is likely that the generated code will be harder for downstream
compilers to optimize and execute efficiently, so once a complete
structurizer is in place this pass should be removed, or at least
greatly reduced in scope.