libs/vkd3d-shader/hlsl.c: In function ‘declare_predefined_types’:
libs/vkd3d-shader/hlsl.c:3408:10: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
{"technique", 9},
^~~~~~~~~~~
...
programs/vkd3d-compiler/main.c: In function ‘parse_formatting’:
programs/vkd3d-compiler/main.c:303:10: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
{"colour", VKD3D_SHADER_COMPILE_OPTION_FORMATTING_COLOUR},
^~~~~~~~
...
macOS tigetstr() takes a non-const char *, so account for that as well.
Tests have already been implemented in 92044d5e; this commit also reduces
the scope of some of the todos (because now they're implemented!).
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55154
Co-authored-by: Giovanni Mascellani <gmascellani@codeweavers.com>
These may happen when storing to structured buffers, and we are not
handling them properly yet. The included test reaches unreacheable code
before this patch.
Storing to buffers is complicated since we need to split the index
chain in two paths:
- The path within the variable where the resource is.
- The subpath to the part of the resource element that is being stored
to.
For now, we will emit a fixme when the index chain in the lhs is not a
direct resource access.
For temporary registers, SM1-SM3 integer types are internally
represented as floating point, so, in order to perform a cast
from ints to floats we need a mere MOV.
For constant integer registers "iN" there is no operation for casting
from a floating point register to them. For address registers "aN", and
the loop counting register "aL", vertex shaders have the "mova" operation
but we haven't used these registers in any way yet.
We probably would want to introduce these as synthetic variables
allocated in a special register set. In that case we have to remember to
use MOVA instead of MOV in the store operations, but they shouldn't be src
or dst of CAST operations.
Regarding constant integer registers, in some shaders, constants are
expected to be received formatted as an integer, such as:
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m; ++k)
res += k;
return res;
}
which compiles as:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m i0 1
//
ps_3_0
def c0, 0, 1, 0, 0
mov r0, c0.x
mov r1.x, c0.x
rep i0
add r0, r0, r1.x
add r1.x, r1.x, c0.y
endrep
mov oC0, r0
but this only happens if the integer constant is used directly in an
instruction that needs it, and as I said there is no instruction that
allows converting them to a float representation.
Notice how a more complex shader, that performs operations with this
integer variable "m":
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m * m; ++k)
res += k;
return res;
}
gives the following output:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m c0 1
//
ps_3_0
def c1, 0, 0, 1, 0
defi i0, 255, 0, 0, 0
mul r0.x, c0.x, c0.x
mov r1, c1.y
mov r0.y, c1.y
rep i0
mov r0.z, r0.x
break_ge r0.y, r0.z
add r1, r0.y, r1
add r0.y, r0.y, c1.z
endrep
mov oC0, r1
Meaning that the uniform "m" is just stored as a floating point in
"c0", the constant integer register "i0" is just set to 255 (hoping
it is a high enough value) using "defi", and the "break_ge"
involving c0 is used to break from the loop.
We could potentially use this approach to implement loops from SM3
without expecting the variables being received as constant integer
registers.
According to the D3D documentation, for SM1-SM3 constant integer
registers are only used by the 'loop' and 'rep' instructions.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.
For simplicity PHI nodes are not currently handled.
The goal for this pass is to make the CFG structurizer simpler, because
it doesn't have to care about the more rigid rules SSA registers have
to satisfy than TEMP registers.
It is likely that the generated code will be harder for downstream
compilers to optimize and execute efficiently, so once a complete
structurizer is in place this pass should be removed, or at least
greatly reduced in scope.
These can be disassembled by D3DDisassemble() just fine, and perhaps
more importantly, shader model 1 vertex shaders do not require dcl_
instructions in Direct3D 8.
The handling of write masks and swizzles for 64 bit data types is
currently irregular: write masks are always 64 bit, while swizzles
are usually 32 bit, except for SSA registers with are 64 bit.
With this change we always use 64 bit swizzles, in order to make
the situation less surprising and make it easier to convert
registers between SSA and TEMP.
64 bit swizzles are always required to have X in their last two
components.
There are only three cases, and while the code is longer it is also
hopefully easier to read. Moreover, an error message is casted if
we're doing something unexpected.
IBFE and UBFE are not emitted for HLSL sources which perform bitfield
extractions, e.g. loading a value from a struct containing bitfields, or
the equivalent done with bit shifts. These instructions are probably
only emitted by the TPF -> DXIL converter, which makes them hard to test.
PHI nodes must be fixed up after this pass, because the block references
might have become broken. For simplicitly this is not handled yet.
The goal for this pass is to make the CFG structurizer simpler, because
only conditional and unconditional branches must be supported.
Eventually this limitation might be lifted if there is advantage in
doing so.