This effectively moves "null_event_cond" from struct d3d12_fence and
"latch" from struct vkd3d_waiting_event together into a separate
structure, as well as storing the signalling function in struct
vkd3d_waiting_event instead of getting it from struct d3d12_device. I
think that largely makes sense on its own, but storing the signalling
function in struct vkd3d_waiting_event also allows us to more easily
implement d3d12_device_SetEventOnMultipleFenceCompletion() in a
subsequent commit.
Currently structured TGSM loads are encoded to something like this:
ld_structured sr12 <s:float>, sr1 <s:uint>, l(0) <s:uint>, g0[sr1 <s:uint> + 0] <s:float>
Notice how the TGSM offset, expressed by sr1, is encoded twice in
the instruction. In TPF there is no expectation of two indices
in the resource source, so let's avoid producing it for DXIL as
well. The same instruction will therefore become:
ld_structured sr12 <s:float>, sr1 <s:uint>, l(0) <s:uint>, g0 <s:float>
Rather than an array of pointers to registers. This makes it nicer
to use with registers that are synthesized on the fly, a situation
that already exists and is likely to become more common in future
commits.
The following table shows how each intrinsic maps to d3d assembly and the
flags that appear in the tpf bytecode, in binary.
GroupMemoryBarrier() sync_g 0010
GroupMemoryBarrierWithGroupSync() sync_g_t 0011
DeviceMemoryBarrier() sync_uglobal 1000
DeviceMemoryBarrierWithGroupSync() sync_uglobal_t 1001
AllMemoryBarrier() sync_uglobal_g 1010
AllMemoryBarrierWithGroupSync() sync_uglobal_g_t 1011