Instead of an int3.
Gather operations expect an offset with only two components.
Currently the following field (which is the gather channel) is
parsed as a third component, which leads to wrong and invalid
results.
This fixes a crash on a shader from WRC Generations.
The input DXIL can sometimes contain constant arrays not referenced by the
resulting vsir program. It doesn't hurt much to generate ICBs for those
anyway, but it's a little pointless.
Currently, on what we consider normalized vsir, destination write masks
are not relative to the signature element's mask, even though source
swizzles are. Also for most instructions, the source swizzles are masked
by the destination write mask, as given by vsir_src_is_masked().
The DXIL parser however, is not derelativizing the destination write
masks for system value signature elements, so we fix that to make it
consistent with how other front-ends are handled.
For instance, when the test introduced in commit
ca5bc63e5e is compiled to DXIL using DXC,
and then parsed using vkd3d-compiler, we get the following store
instructions:
vs_6_0
.input
.param POSITION.xyzw, v0.xyzw, float
.output
.param SV_Position.xyzw, o0.xyzw, float, POS
.param SV_CullDistance.x, o1.x, float, CULLDST
.param SV_ClipDistance.y, o1.y, float, CLIPDST
.descriptors
.text
label l1
...
mov o1.x <v4:f32>, sr1 <s:f32>
mov o2.x <v4:f32>, sr2 <s:f32> // Note the .x write mask!
ret
whereas, when compiling using FXC and parsing the TPF using
vkd3d-compiler we get:
vs_4_0
.input
.param POSITION.xyzw, v0.xyzw, float
.output
.param SV_POSITION.xyzw, o0.xyzw, float, POS
.param SV_CULLDISTANCE.x, o1.x, float, CULLDST
.param SV_CLIPDISTANCE.y, o1.y, float, CLIPDST
.descriptors
.text
label l1
mov o0.xyzw <v4:f32>, v0.xyzw <v4:f32>
mov o1.x <v4:f32>, v0.x <v4:f32>
mov o2.y <v4:f32>, v0.y <v4:f32> // Note the .y write mask.
ret
This only really matters for cases where we have a system value semantic
whose mask doesn't start at .x, which is very rare. For instance, it
requires the clip/cull distance combo, which share registers, so one of
them pushes the other to start on another component.
According to the tests, the only thing relying on this behaviour is the
handling of private variables for system value semantics on the SPIR-V
backend, which expects destination write masks as if the element started
at .x even though it might not. This is modified then.
More recent versions of the Vulkan/SPIR-V validation layers have started
to complain about our usage of "SequentiallyConsistent" in our SPIR-V
output. Specifically, VUID-StandaloneSpirv-MemorySemantics-10866 "Memory
Semantics with SequentiallyConsistent memory order must not be used in
the Vulkan API".
The SPIR-V specification says: "If the declared memory model is Vulkan,
SequentiallyConsistent must not be used." However, we're using the
GLSL450 memory model with SPIR-V 1.3, and "Vulkan" is not available
before SPIR-V 1.5.
The Vulkan specification says "Sequentially consistent atomics and
barriers are not supported and SequentiallyConsistent is treated as
AcquireRelease. SequentiallyConsistent should not be used." in the
"Shader Memory Access Ordering" section.
Those don't quite add up to the "... must not be used in the Vulkan
API", from the validation layers, but it does seem clear that
SequentiallyConsistent isn't actually supported. On the DXIL side, when
targetting SPIR-V with dxc, the generated SPIR-V uses the
"None"/"Relaxed" memory semantics. I wasn't immediately able to find a
reference for what seq_cst is supposed to mean in the context of DXIL,
but "None"/"Relaxed" does seem consistent with how the HLSL
atomic/interlocked intrinsics are expected to behave, as well as with
our behaviour for tpf shaders.
It's not obvious what this last remaining use of
sm6_parser_require_space() is preallocating space for, and that's as
good of a reason as any to get rid of it.
Only calls to sm6_parser_add_instruction() where we are using the
returned vkd3d_shader_instruction are checked for, since these return
values might cause NULL dereferences if unchecked.
Other calls to sm6_parser_add_instruction() can be left alone since the
error is still recorded via sm6->p.status.