I think the main argument for preallocating instructions and
passing them to helpers is that this simplifies error handling.
However it seems that the simplification is close to negligible,
while the current solution makes it harder to use the iterator
abstraction layer for the instruction array, and it also makes
the code harder to read and check.
The Metal runner could in principle support this feature using
MTLSamplerDescriptor.reductionMode, but that requires macOS 26.0/Tahoe,
which is newer than my current setup.
It's main advantage over vkd3d_shader_message_context_copy_messages() is
that it can't fail. The original issue this addresses is that
vkd3d_shader_compile() should free its output when
vkd3d_shader_message_context_copy_messages() fails, as spotted by
Giovanni; that likely would have applied to a number of the other uses
of vkd3d_shader_message_context_copy_messages() as well.
Much like e.g. commits a686fa7750 and
93d2bb2d5d.
Direct3D 12 doesn't guarantee any implicit barriers after UAV clears,
but unfortunately missing these is often easy to go unnoticed.
Loads of components of vectors (i.e. functionally a subset of SWIZZLE
instructions, but expressed using LOAD) are legal, and generated elsewhere.
Due to circumstances they never reach this point currently, but we shouldn't use
vkd3d_unreachable() here.
Scalars have a reg_size of 4 on sm1. In the case of a deref of a vector or
matrix resulting in a scalar, however, this yields a required_bind_count that is
one higher than it should be. reg_size is the wrong thing to be using here,
since it describes the size of a type in isolation, but this is conceptually an
embedded type that doesn't include any padding. Since we're only dealing with
scalars and vectors here, just use their width.
Arrays are allowed for clip/cull distance semantics. Their maximum size
is 2 since that's the maximum amount of registers allowed for clip/cull
distances.
Indirect addressing of these arrays is allowed on shader model 6.
These tests are introduced after the transformation of clip/cull
input/outputs into arrays in vsir since otherwise they segfault.
Takes care of transforming clip/cull system values from the Direct3D
convention of 2 4-component registers, into the SPIR-V/GLSL convention
of 8-element scalar float arrays.
This fixes SPIR-V validation errors in clip-cull-distance.shader_test,
as well as segfaults on Mesa 25.1.1-arch1.2 if those shaders are
executed regardless.
We create indexable temporaries of the appropriate size, and replace
accesses to clip/cull I/O signature elements with accesses to those
temporaries. The existing clip/cull signature elements are then replaced
with new scalar signature element arrays, and we copy the contents of
those I/O signature elements to/from the corresponding temporaries at
the start/end of the vsir program.
It is worth pointing out that the current implementation assumes that
every instance of the control point phase of a hull shader only writes
to the output registers of its control point, given by
vOutputControlPointID, and not to other control points. Shader
compilation will fail if that constraint is violated.