The practical effect this has is that we avoid potential trailing padding at
the end of DXBC blobs. Unfortunately this also means we need to be more
careful about using bytecode_get_size() to find the offset where subsequent
data would get written, although in many cases this follows a put_u32() call.
Some drivers (AMD Radeon RX 6700 XT, with radeonsi from Mesa 22.2.0-rc3) emit
less than one invocation per pixel, presumably because they detect that the
shader control flow is uniform for all pixels. Having the control flow depend on
SV_Position avoids this test failure.
Cf. 34bd0dd0704c613abef8a9aa3ba2a2507ed02843 in wine.
The expected use case where a heap is freed before its contained
resources is not reasonably testable, so the ability to place a new
resource is tested instead.
Normalise the incoming vkd3d_shader_instruction IR to the shader model 6
pattern. This allows generation of a single patch constant function in
SPIR-V.
Since the introduction of instruction arrays, the parser location no
longer matches the location of the current instruction. Ultimately we'll
likely want to add some kind of explicit location information to struct
vkd3d_shader_instruction_array, because we might do transformations that
change the order of the original instructions.
VKD3D_SM4_OP_DCL_RESOURCE currently has 1 for "dst_count", but NULL for
"dst". This is largely harmless because we never attempt to access the
destination register of VKD3DSIH_DCL instructions, but nevertheless not
quite proper.
d3d12_command_queue_flush_ops() can renter itself while processing signal
events. Since we don't use recursive mutexes, we currently have to check
some of the queue variables without holding the mutex, which is not safe.
This is solved by allowing the queue to release its mutex while it is
processing entries: when flushing, the queue is briefly locked, the
is_flushing flag is set, the queue content is copied away and the
queue is unlocked again. After having processed the entries, the
queue is locked again to check is something else was added in the
meantime. This is repeated until the queue is empty (or a wait operation
is blocking it).
This should also remove some latency when a thread pushes to the queue
while another one is processing it, but I didn't try to measure any
impact. While it is expected that with this patch the queue mutex
will be locked and unlocked more frequently, it should also remain
locked for less time, hopefully creating little contention.