7270 Commits

Author SHA1 Message Date
Elizabeth Figura
79ad8c9354 vkd3d-shader/hlsl: Handle error instructions in hlsl_new_swizzle().
We already check for error instructions when parsing swizzles, but if allocation
fails at codegen time we would like to avoid asserting when subsequently
constructing a swizzle.
2025-02-20 15:49:40 +01:00
Elizabeth Figura
4072aa4a4b vkd3d-shader/hlsl: Remove the type equality assertions in hlsl_new_ternary_expr().
Similar to d1c2ae3f0e, this is a bit too strict
and may prevent e.g. simultaneous use of float and float1 at codegen time.

However, in this case the inciting factor is that in the case of allocation
failure at codegen time, we would like to allow one or more arguments to have
error type.
2025-02-20 15:48:25 +01:00
Elizabeth Figura
ba868ed4a6 vkd3d-shader/hlsl: Skip transformation passes on error.
The primary motivation here is to avoid needing to worry about instructions
potentially pointing to the preallocated error instruction in the case of
allocation failure.

This doesn't cover all passes, but none of the other passes make assumptions
about instruction sources.
2025-02-20 15:48:24 +01:00
Francisco Casas
153b7c8460 vkd3d-shader/hlsl: Run folding passes again after lower_nonconstant_array_loads.
This is because lower_nonconstant_array_loads() can potentially turn
nonconstant loads into constant loads, allowing copy-prop to turn these
loads into previous instructions, which might help other passes as well.

This patch lowers the number of required temps for the following ps_2_0
shader from 19 to 16:

    int i;
    float3x3 mats[4];

    float4 main() : sv_target
    {
        return mul(mats[i], float3(1, 2, 3)).xyzz;
    }
2025-02-20 15:44:09 +01:00
Francisco Casas
e60c89c532 tests: Test unused invalid samples with a static sampler. 2025-02-20 15:44:09 +01:00
Francisco Casas
321fda9c26 vkd3d-shader/hlsl: Only use the temp copy for variables that are written.
This can save a significant amount of temp registers because it allows to
avoid referencing the temp (and having to store it) when not needed.

For instance, this patch lowers the number of required temps for the
following ps_2_0 shader from 24 to 19:

    int i;
    float3x3 mats[4];

    float4 main() : sv_target
    {
        return mul(mats[i], float3(1, 2, 3)).xyzz;
    }

Also, it is needed for SM1 vertex shader relative addressing since
non-constant loads are required to be directly on the uniform ('c'
registers) instead of the temp, and non-constant loads cannot be
transformed by copy propagation.
2025-02-20 15:44:09 +01:00
Elizabeth Figura
8e6ddb0c1a vkd3d-shader/hlsl: Don't mark extern variables with an explicit first_write or last_read.
Fix the last few places that care.
2025-02-20 15:44:09 +01:00
Francisco Casas
1d74ff075e vkd3d-make/hlsl: Trace the number of registers allocated in allocate_temp_registers(). 2025-02-20 15:44:04 +01:00
Nikolay Sivov
f830ac1206 vkd3d-shader/preproc: Do not attempt to load empty included files.
Signed-off-by: Nikolay Sivov <nsivov@codeweavers.com>
2025-02-20 15:40:34 +01:00
Giovanni Mascellani
07b7975d09 vkd3d: Put all root descriptors in a single Vulkan descriptor set when using Vulkan heaps.
Since 4a94bfc2f6 we segregate
different D3D12 descriptor types in different Vulkan descriptor sets.
This change was introduced to reduce descriptor wasting when
allocating a new descriptor pool; that can be very useful when
using virtual heaps, which have to often cycle through many
descriptors, but it is expected to have limited impact for Vulkan
heaps, given that in that case most descriptors are allocated through
the descriptor heap rather than through the command allocator.

Instead, it has a rather detrimental effect with Vulkan heaps,
because it tends to use many more Vulkan descriptor sets than
necessary, often with just a handful of descriptors each. This
causes a regression on some Vulkan implementations that support
too few descriptor sets.

With this change we revert to a situation similar to before,
stuffing all the descriptors that do not live in a root
descriptor table in as few descriptor sets as possible (at most
one or two, depending on whether push descriptors are used).
2025-02-19 17:58:23 +01:00
Giovanni Mascellani
6415c6b0e0 vkd3d: Rename push_descriptor_set to root_descriptor_set.
Soon it won't be used necessarily for push descriptors anymore, but
it will still contain root descriptors.
2025-02-19 17:57:15 +01:00
Conor McCarthy
67d8cf744c tests/hlsl: Add a conditional 16-bit test. 2025-02-19 17:53:57 +01:00
Conor McCarthy
36f9510bb3 tests/hlsl: Add interstage interface 16-bit tests. 2025-02-19 17:52:05 +01:00
Conor McCarthy
6ee650f316 tests/hlsl: Add a typed SRV load 16-bit test. 2025-02-19 17:48:38 +01:00
Conor McCarthy
2b325d3b59 tests/hlsl: Add a TGSM 16-bit test. 2025-02-19 17:47:29 +01:00
Conor McCarthy
94b8747da4 tests/hlsl: Add a sampler 16-bit test. 2025-02-19 17:46:05 +01:00
Conor McCarthy
e355cdbae0 tests/hlsl: Add typed buffer SRV 16-bit tests. 2025-02-19 17:44:59 +01:00
Giovanni Mascellani
a7337bc999 vkd3d: Require extension VK_KHR_maintenance2.
We're already implicitly using it for image layouts in which
either depth or stencil is writeable and the other is not.
Correspondingly, add the _KHR suffix in those cases, so the
extension usage is more evident.

According to the Vulkan Hardware Database, only four reports
without this extension were filed since 2023, and all of them
for configurations we likely don't target.
2025-02-19 17:41:30 +01:00
Francisco Casas
604fe3ccee tests/test-driver: Merge the same consecutive tags togeter. 2025-02-19 17:36:19 +01:00
Francisco Casas
681b839419 tests/test-driver: Group together tags in the same line and shader model. 2025-02-19 17:36:19 +01:00
Francisco Casas
72b603780c tests/test-driver: Print the shader model for the detailed output of the hlsl backend. 2025-02-19 17:36:19 +01:00
Francisco Casas
3aecbc5ac1 vkd3d-shader/hlsl: Also dump preprocessed shaders.
This could be useful since there are many shaders that contain `#include`
directives or use parameter-defined macros and we can't reproduce bugs
from the source alone.
2025-02-19 17:34:24 +01:00
Giovanni Mascellani
665c29f0be vkd3d-shader/tpf: Allow I/O index ranges to not intersect a signature element for a given register.
The current TPF validator enforces that for each register involved
in a DCL_INDEX_RANGE instruction there must be a signature element
for that register and the DCL_INDEX_RANGE write mask. This is an
excessively strong request, and causes some shaders from The Falconeer
to be invalidly rejected.

The excessively strong check was needed to avoid triggering a bug
in the I/O normaliser. Since that bug is now solved, the check
can be relaxed.
2025-02-19 17:30:25 +01:00
Giovanni Mascellani
4b84fb486b vkd3d-shader/ir: Handle index ranges that do not touch a signature element for each register.
A good part of the I/O normaliser job is to merge together signature
elements that are spanned by DCL_INDEX_RANGE instructions. The
current algorithm assumes that each index range touches exactly
one signature element for each index spanned by the range. The
assumption is used in shader_signature_merge() in the form of
expecting that, if the index range is N registers long, then,
once you find the first signature element of an index range, the
other elements that will have to be merged with it are exactly the
following N-1 according to the order given by
signature_element_register_compare() or
signature_element_mask_compare(), depending on the signature type.

This doesn't necessarily happen. For example, The Falconeer has a
few hull shaders in which this happens:

    hs_fork_phase
    dcl_hs_fork_phase_instance_count 13
    dcl_input vForkInstanceId
    dcl_output o4.z
    dcl_output o5.z
    dcl_output o6.z
    dcl_output o7.z
    dcl_output o12.z
    dcl_output o13.z
    dcl_output o14.z
    dcl_output o15.z
    dcl_output o16.z
    dcl_output o17.z
    dcl_output o18.z
    dcl_output o19.z
    dcl_output o20.z
    dcl_temps 1
    dcl_index_range o4.z 17
    iadd r0.x, vForkInstanceId.x, l(4)
    ult r0.y, vForkInstanceId.x, l(4)
    movc r0.x, r0.y, vForkInstanceId.x, r0.x
    mov o[r0.x + 4].z, l(0)
    ret

Here the index range "skips" o8.z through o11.z, because those
registers only use mask .xy. The current algorithm fails on such
a shader.

Even depending on the signature element order doesn't look ideal.
I don't have a full counterexample for that, but it looks fragile,
especially given that the register allocation algorithm in FXC is
notoriously full of unexpected corner cases.

We solve both problems by slightly changing the architecture of
the normaliser: first we move computing the masks for the merge
signature element from signature_element_range_expand_mask(),
which is executed while merging signature, to
io_normaliser_add_index_range(), which is executed before merging
signatures. Then, while we are merging signatures, we can decide
for each single signature element whether it has to be retained
or not, and how it should be patched. The algorithm becomes
independent of the order, because each signature element can be
processed individually.
2025-02-19 17:30:00 +01:00
Giovanni Mascellani
b5350f9387 vkd3d-shader/ir: Use a structure to record range data in the I/O normaliser.
In order to make it able to have other fields.
2025-02-19 17:01:54 +01:00