The following table shows how each intrinsic maps to d3d assembly and the
flags that appear in the tpf bytecode, in binary.
GroupMemoryBarrier() sync_g 0010
GroupMemoryBarrierWithGroupSync() sync_g_t 0011
DeviceMemoryBarrier() sync_uglobal 1000
DeviceMemoryBarrierWithGroupSync() sync_uglobal_t 1001
AllMemoryBarrier() sync_uglobal_g 1010
AllMemoryBarrierWithGroupSync() sync_uglobal_g_t 1011
It seems that the NVIDIA drivers leaves VBVs bindings untouched
when they are NULL (or the GPU buffer address is NULL), instead of
setting them to a null binding.
Differently from other cases of inconsistent behaviour between AMD
and NVIDIA, here I'm explicitly marking the NVIDIA behaviour as
broken, because the expected behaviour is spelled out explicitly
(at least for the D3D12 specification standards).
The code doesn't make sense in the first place, even if it's
accepted by the compiler, so it makes sense that the behaviour
is undefined. And indeed the behaviour is different on AMD (4 is
returned), NVIDIA (QNaN is returned) and WARP (device is removed).
The stride didn't match the structure size used in the shader.
This didn't seem to be a problem on AMD and WARP, but it was
on NVIDIA on Windows. Specifically, it seems that the buffer
is read using the shader structure size (so most tests pass),
but bounds are checked using the buffer stride, so a test
returned zero simply because an out-of-bounds read was detected.