When the client acquires the Vulkan queue it has to ensure that
it is not submitting work before other work it depends on already
submitted through the Direct3D 12 API but currently in the internal
vkd3d queue. Currently we suggest to enqueue signalling a fence and
than wait for it before acquiring the Vulkan queue, which is
correct but excessive: it will wait not just for the work currently
in the queue to be submitted, but for it to be executed too,
introducing useless dependencies.
By adding a way to enqueue signalling a fence on the CPU side we
allow the client to wait for the currently outstanding work to
be submitted to Vulkan, but nothing more.
Turns out we are not doing this correctly in SM1, because the rounding
should be to the number that is closer to zero and lower_casts_to_int()
doesn't do that.
A vertex shader test is added since in SM1 they must rely on the SLT
operation instead of the CMP operation.
fxc compiles this method even without the backcompat option.
Furthermore, it even does it on ps_2_0 despite the fact that it maps to
a texldd instruction, which is not available on plain ps_2_0, nor ps_2_b,
only on ps_2_a and ps_3_0 according to documentation.
It is worth mentioning that the additional offset parameter is not
supported when lowering.
Otherwise ubsan reports these errors on the bitwise.shader_test:
libs/vkd3d-shader/hlsl_constant_ops.c:970:50: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
libs/vkd3d-shader/hlsl_constant_ops.c:970:50: runtime error: left shift of negative value -12
We apply distributivity to applicable expressions, specifically with
the following rewrite rules:
(x OPL a) OPR (x OPL b) -> x OPL (a OPR b)
(y OPR (x OPL a)) OPR (x OPL b) -> y OPR (x OPL (a OPR b))
((x OPL a) OPR y) OPR (x OPL b) -> (x OPL (a OPR b)) OPR y
(x OPL a) OPR ((x OPL b) OPR y) -> (x OPL (a OPR b)) OPR y
(x OPL a) OPR (y OPR (x OPL b)) -> (x OPL (a OPR b)) OPR y
where a, b are constants.
Like we did before commit 067e6deee4.
These skips aren't all that interesting; it's entirely intentional that
e.g. a 2.0-3.0 run wouldn't run 4.0 shaders.