Currently, on what we consider normalized vsir, destination write masks
are not relative to the signature element's mask, even though source
swizzles are. Also for most instructions, the source swizzles are masked
by the destination write mask, as given by vsir_src_is_masked().
The DXIL parser however, is not derelativizing the destination write
masks for system value signature elements, so we fix that to make it
consistent with how other front-ends are handled.
For instance, when the test introduced in commit
ca5bc63e5e is compiled to DXIL using DXC,
and then parsed using vkd3d-compiler, we get the following store
instructions:
vs_6_0
.input
.param POSITION.xyzw, v0.xyzw, float
.output
.param SV_Position.xyzw, o0.xyzw, float, POS
.param SV_CullDistance.x, o1.x, float, CULLDST
.param SV_ClipDistance.y, o1.y, float, CLIPDST
.descriptors
.text
label l1
...
mov o1.x <v4:f32>, sr1 <s:f32>
mov o2.x <v4:f32>, sr2 <s:f32> // Note the .x write mask!
ret
whereas, when compiling using FXC and parsing the TPF using
vkd3d-compiler we get:
vs_4_0
.input
.param POSITION.xyzw, v0.xyzw, float
.output
.param SV_POSITION.xyzw, o0.xyzw, float, POS
.param SV_CULLDISTANCE.x, o1.x, float, CULLDST
.param SV_CLIPDISTANCE.y, o1.y, float, CLIPDST
.descriptors
.text
label l1
mov o0.xyzw <v4:f32>, v0.xyzw <v4:f32>
mov o1.x <v4:f32>, v0.x <v4:f32>
mov o2.y <v4:f32>, v0.y <v4:f32> // Note the .y write mask.
ret
This only really matters for cases where we have a system value semantic
whose mask doesn't start at .x, which is very rare. For instance, it
requires the clip/cull distance combo, which share registers, so one of
them pushes the other to start on another component.
According to the tests, the only thing relying on this behaviour is the
handling of private variables for system value semantics on the SPIR-V
backend, which expects destination write masks as if the element started
at .x even though it might not. This is modified then.
For example, a HLSL shader like this:
tbuffer t0
{
float4 data;
};
float4 main() : sv_target
{
return data;
}
would be compiled to the following:
ps_4_0
dcl_resource_buffer (mixed,mixed,mixed,mixed) t0
dcl_output o0.xyzw
dcl_temps 1
ld r0.xyzw, l(0, 0, 0, 0), t0.xyzw
mov o0.xyzw, r0.xyzw
ret
VSIR_DATA_MIXED used to be mapped to VKD3D_SHADER_COMPONENT_UINT by
vkd3d_component_type_from_data_type(), but we no longer use that.
Found by Giovanni Mascellani.
This achieves two things:
- The GLSL backend no longer needs to handle this by itself. Likwise, the
MSL backend won't have to either.
- We no longer handle division by zero for DXIL UDiv and URem instructions,
which leave this undefined.
These correspond to the DXIL SDiv and SRem instructions, for which division by
zero is undefined. Division by zero is also undefined for DXIL UDiv and URem;
addressing those is slightly more involved.
More recent versions of the Vulkan/SPIR-V validation layers have started
to complain about our usage of "SequentiallyConsistent" in our SPIR-V
output. Specifically, VUID-StandaloneSpirv-MemorySemantics-10866 "Memory
Semantics with SequentiallyConsistent memory order must not be used in
the Vulkan API".
The SPIR-V specification says: "If the declared memory model is Vulkan,
SequentiallyConsistent must not be used." However, we're using the
GLSL450 memory model with SPIR-V 1.3, and "Vulkan" is not available
before SPIR-V 1.5.
The Vulkan specification says "Sequentially consistent atomics and
barriers are not supported and SequentiallyConsistent is treated as
AcquireRelease. SequentiallyConsistent should not be used." in the
"Shader Memory Access Ordering" section.
Those don't quite add up to the "... must not be used in the Vulkan
API", from the validation layers, but it does seem clear that
SequentiallyConsistent isn't actually supported. On the DXIL side, when
targetting SPIR-V with dxc, the generated SPIR-V uses the
"None"/"Relaxed" memory semantics. I wasn't immediately able to find a
reference for what seq_cst is supposed to mean in the context of DXIL,
but "None"/"Relaxed" does seem consistent with how the HLSL
atomic/interlocked intrinsics are expected to behave, as well as with
our behaviour for tpf shaders.