This codepath path is currently triggered when transpiling d3dbc shaders
that use vPos (or other of these special registers).
While vPos gets added to the input signature and gets assigned an INPUT
register, the registers in the shader instructions are still of
VKD3DSPR_MISCTYPE type and are not propperly mapped yet. This gives
invalid results.
Some SM1 tests must be set back to "todo" but they only work because, by
coincidence, we are assigning vPos the input register with index 0.
Propper mapping of these registers is still required.
Properly passing the inverse-trig.shader_test tests whose qualifiers
have been removed requires making spirv.c capable of handling ABS.
The same happens for the ps_3_0 equality test in
float-comparison.shader_test.
We use "printf" instead of "print" in awk in order to avoid a newline in
the value of $xfcount, and use "-gt" instead of ">", which creates the
spurious file, in the comparison.
Values in DXIL have no signedness, so it is ambiguous whether 16-bit
constants should or should not be sign-extended when 16-bit execution
is not supported.
For the driver script to run properly it is necessary to run
"autoreconf" in the source directory and call the configure script again
in the build directory.
There is no way to tell in spirv_compiler_emit_load_reg() if the write
mask is 64-bit. All loads are 32-bit except for IMMCONST64 and SSA, and
the latter ignores the mask, so the only issue lies with IMMCONST64.
Tests have already been implemented in 92044d5e; this commit also reduces
the scope of some of the todos (because now they're implemented!).
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=55154
Co-authored-by: Giovanni Mascellani <gmascellani@codeweavers.com>
These may happen when storing to structured buffers, and we are not
handling them properly yet. The included test reaches unreacheable code
before this patch.
Storing to buffers is complicated since we need to split the index
chain in two paths:
- The path within the variable where the resource is.
- The subpath to the part of the resource element that is being stored
to.
For now, we will emit a fixme when the index chain in the lhs is not a
direct resource access.
On shader_test files, now resources should be declared this way:
[texture n] -> [srv n]
[srv buffer n] -> [srv n]
[uav n] -> [uav n]
[uav buffer n] -> [uav n]
[vertex buffer n] -> [vb n]
[render target n] -> [rtv n]
The dimension (buffer or 2D) is now specified as an additional parameter
in the "size" directive:
For 2D resources:
size (n, m) -> size (2d, n, m)
For buffers:
size (n, 1) -> size (buffer, n)
If in the same shader_test file we have both a [buffer uav n] and a
[uav n] with the same slot "n", we want the last one to override the
first one instead of passing both resources to the backends.
Same for [buffer srv n] and [texture n] after we introduce SRV buffers.
For temporary registers, SM1-SM3 integer types are internally
represented as floating point, so, in order to perform a cast
from ints to floats we need a mere MOV.
For constant integer registers "iN" there is no operation for casting
from a floating point register to them. For address registers "aN", and
the loop counting register "aL", vertex shaders have the "mova" operation
but we haven't used these registers in any way yet.
We probably would want to introduce these as synthetic variables
allocated in a special register set. In that case we have to remember to
use MOVA instead of MOV in the store operations, but they shouldn't be src
or dst of CAST operations.
Regarding constant integer registers, in some shaders, constants are
expected to be received formatted as an integer, such as:
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m; ++k)
res += k;
return res;
}
which compiles as:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m i0 1
//
ps_3_0
def c0, 0, 1, 0, 0
mov r0, c0.x
mov r1.x, c0.x
rep i0
add r0, r0, r1.x
add r1.x, r1.x, c0.y
endrep
mov oC0, r0
but this only happens if the integer constant is used directly in an
instruction that needs it, and as I said there is no instruction that
allows converting them to a float representation.
Notice how a more complex shader, that performs operations with this
integer variable "m":
int m;
float4 main() : sv_target
{
float4 res = {0, 0, 0, 0};
for (int k = 0; k < m * m; ++k)
res += k;
return res;
}
gives the following output:
// Registers:
//
// Name Reg Size
// ------------ ----- ----
// m c0 1
//
ps_3_0
def c1, 0, 0, 1, 0
defi i0, 255, 0, 0, 0
mul r0.x, c0.x, c0.x
mov r1, c1.y
mov r0.y, c1.y
rep i0
mov r0.z, r0.x
break_ge r0.y, r0.z
add r1, r0.y, r1
add r0.y, r0.y, c1.z
endrep
mov oC0, r1
Meaning that the uniform "m" is just stored as a floating point in
"c0", the constant integer register "i0" is just set to 255 (hoping
it is a high enough value) using "defi", and the "break_ge"
involving c0 is used to break from the loop.
We could potentially use this approach to implement loops from SM3
without expecting the variables being received as constant integer
registers.
According to the D3D documentation, for SM1-SM3 constant integer
registers are only used by the 'loop' and 'rep' instructions.
These tests should actually compile and run in SM1, which is possible
if we pass the int and uint uniforms in the expected IEEE 754 float
format for SM1 shaders.
Also, bools should be passed as 1.0f or 0.0f to SM1.
When the "if" qualifier is added to a directive, the directive is
skipped if the shader->minimum_shader_model is not included in the
range.
This can be used on the "probe" directives for tests that have different
expected results on different shader models, without having to resort to
[require] blocks.
This test currently hit a Metal bug when run on Apple Silicon with
MoltenVK and fails. We don't have an easy way to mark shader runner
tests as buggy and we're not interested in tracking that bug anyway,
so I'm just working around it.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.
These can be disassembled by D3DDisassemble() just fine, and perhaps
more importantly, shader model 1 vertex shaders do not require dcl_
instructions in Direct3D 8.
The implementation of upload_buffer_data_with_states(), unlike the
implementation of upload_texture_data_with_states(), does not expect a
pointer to a D3D12_SUBRESOURCE_DATA, but rather, a direct pointer to the
data.
The generated Vulkan calls look right and do not trigger any
validation error, but the returned timestamp is 0. A valid
timestamp is returned if the CopyResource() call is commented,
or the second EndQuery() call is moved before CopyResource(),
or the first EndQuery() call is commented. I am not seeing any
sensible pattern here, so I guess there is just a bug in
MoltenVK.
Specifically, MoltenVK seems to be able to load from stencil, but
the specific replicating swizzle (repeating the stencil value on
all the channels) is not honored. The stencil value is read only
on the red channel.
At the current moment this is a little odd because for SM1 [test]
directives are skipped, and the [shader] directives are not executed by
the shader_runner_vulkan.c:compile_shader() but by the general
shader_runner.c:compile_shader(). So in principle it is a little weird
that we go through the vulkan runner.
But fret not, because in the future we plan to make the parser agnostic
to the language of the tests, so we will get rid of the general
shader_runner.c:compile_shader() function and instead call a
runner->compile_shader() function, defined for each runner. Granted,
most of these may call a generic implementation that uses native
compiler in Windows, and vkd3d-shader on Linux, but it would be more
conceptually correct.
If the runners require multiple calls to run_shader_tests() for
different shader model ranges, these are moved inside the sole runner
call.
For the same reason, the trace() messages are also moved inside the
runner calls.
We can now pass (sm<4) and (sm>=4) to "fail" and "todo" qualifiers, and
we can use multiple of these qualifier arguments using "&" for AND and
"|" for OR.
examples:
todo(sm>=4 & sm<6)
todo(sm<4 | sm>6)
parenthesis are not supported.
Adding additional model ranges for the tests, if we need them, should be
easier now, since they only have to be added to the "valid_args" array.
The relative-addressed case in shader_register_normalise_arrayed_addressing()
leaves the control point id in idx[0], while for constant register
indices it is placed in idx[1]. The latter case could be fixed instead,
but placing the control point count in the outer dimension is more
logical.
The FXC optimiser sometimes converts a local array of input values into
direct array addressing of the inputs, which can result in a
dcl_indexrange instruction spanning input elements with different masks.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=56162
Storing to a vector component using a non-constant index is not allowed
on profiles lower than 6.0. Unless this happens inside a loop that can be
unrolled, which we are not doing yet.
For this reason, a validate_nonconstant_vector_store_derefs pass is
added to detect these cases.
Ideally we would want to emit an hlsl_error on this pass, but before
implementing loop unrolling, we could reach this point on valid HLSL.
Also, as pointed out by Nikolay in the mentioned bug, currently
new_offset_from_path_index() fails an assertion when this happens,
because it expects an hlsl_ir_constant, so a check is added.
It also felt correct to emit an hlsl_fixme there, despite the
redundancy.
Apparently Metal doesn't support specifying a bias directly in the
sampler, and, with "nearest" mip filtering, it doesn't switch
precisely at LOD 0.5 (though still between 0.5 and 0.6).
Currently, if a probe fails, it will print the line number of the [test]
block the probe is in, not the line number of the probe itself. This
makes it somewhat difficult to debug.
This commit makes it print the line number that a test fails at.
This preempts us from replacing a swizzle incorrectly, as in the
following example:
1: A.x = 1.0
2: A
3: A.x = 2.0
4: @2.x
were @4 ends up being 2.0 instead of 1.0, because that's the value stored in
A.x at time 4, and we should be querying it at time 2.
This also helps us to avoid replacing a swizzle with itself in copy-prop
which can result in infinite loops, as with the included tests this commit.
Consider the following sequence of instructions:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @3.x
But in the next iteration it would make it point back to @1, keeping it
spinning infinitively.
The solution is to index the instructions and don't replace the swizzle
if the new load happens after the current load.
The included test fails because copy_propagation_transform_swizzle()
is using the value recorded for the variable when the swizzle is being
read, and not the swizzle's load.
This was broken by commit e390bc35e2c9b0a2110370f916033eea2366317e; that
commit fixed the source count for these instructions, but didn't adjust
shader_sm1_skip_opcode(). Note that this only affects shader model 1;
later versions have a token count embedded in the initial opcode token.