I think the main argument for preallocating instructions and
passing them to helpers is that this simplifies error handling.
However it seems that the simplification is close to negligible,
while the current solution makes it harder to use the iterator
abstraction layer for the instruction array, and it also makes
the code harder to read and check.
The Metal runner could in principle support this feature using
MTLSamplerDescriptor.reductionMode, but that requires macOS 26.0/Tahoe,
which is newer than my current setup.
It's main advantage over vkd3d_shader_message_context_copy_messages() is
that it can't fail. The original issue this addresses is that
vkd3d_shader_compile() should free its output when
vkd3d_shader_message_context_copy_messages() fails, as spotted by
Giovanni; that likely would have applied to a number of the other uses
of vkd3d_shader_message_context_copy_messages() as well.
Much like e.g. commits a686fa7750 and
93d2bb2d5d.
Direct3D 12 doesn't guarantee any implicit barriers after UAV clears,
but unfortunately missing these is often easy to go unnoticed.
Loads of components of vectors (i.e. functionally a subset of SWIZZLE
instructions, but expressed using LOAD) are legal, and generated elsewhere.
Due to circumstances they never reach this point currently, but we shouldn't use
vkd3d_unreachable() here.
Scalars have a reg_size of 4 on sm1. In the case of a deref of a vector or
matrix resulting in a scalar, however, this yields a required_bind_count that is
one higher than it should be. reg_size is the wrong thing to be using here,
since it describes the size of a type in isolation, but this is conceptually an
embedded type that doesn't include any padding. Since we're only dealing with
scalars and vectors here, just use their width.