The following table shows how each intrinsic maps to d3d assembly and the
flags that appear in the tpf bytecode, in binary.
GroupMemoryBarrier() sync_g 0010
GroupMemoryBarrierWithGroupSync() sync_g_t 0011
DeviceMemoryBarrier() sync_uglobal 1000
DeviceMemoryBarrierWithGroupSync() sync_uglobal_t 1001
AllMemoryBarrier() sync_uglobal_g 1010
AllMemoryBarrierWithGroupSync() sync_uglobal_g_t 1011
All stream output objects need to have a stream index allocated,
whether they are used or not.
We allocate stream outputs here, before other objects are allocated,
because the stream index is needed to create the appropriate output
semantic variables during append_output_copy(), which will be called
in a lowering pass for the Append() method.
The following pixel shader currently triggers an infinite loop during
copy propagation, which is fixed by this commit:
sampler s;
Texture2D t1, t2;
float4 main() : sv_target
{
Texture2D t = t1;
t1 = t2;
t2 = t;
return t1.Sample(s, float2(0, 0)) + t2.Sample(s, float2(0, 0));
}
The infinite loop occurs because copy_propagation_transform_object_load()
replaces t1 in the resource_load(t1, ...) instruction
with t2, t1, t2, ... repeatedly.
Note that we still have to preempt the propagation to SM1 pixel shader
uniforms. Otherwise this will turn the many constant derefs that appear
from the <index-val> copy generated in lower_index_loads() into a single
non-constant deref, causing it to allocate all the registers instead of
up until the last one used.