The following pixel shader currently triggers an infinite loop during
copy propagation, which is fixed by this commit:
sampler s;
Texture2D t1, t2;
float4 main() : sv_target
{
Texture2D t = t1;
t1 = t2;
t2 = t;
return t1.Sample(s, float2(0, 0)) + t2.Sample(s, float2(0, 0));
}
The infinite loop occurs because copy_propagation_transform_object_load()
replaces t1 in the resource_load(t1, ...) instruction
with t2, t1, t2, ... repeatedly.
Note that we still have to preempt the propagation to SM1 pixel shader
uniforms. Otherwise this will turn the many constant derefs that appear
from the <index-val> copy generated in lower_index_loads() into a single
non-constant deref, causing it to allocate all the registers instead of
up until the last one used.
Basically, separate lower_casts_to_int() into the lowering of the CAST
and the lowering of the TRUNC, so that TRUNCs that are not part of a
cast are lowered as well.
We implement a transformation that propagates loads with a single
non-constant index in its deref path. Consider a load of the form
var[[a0][a1]...[i]...[an]], where ak are integral constants, and i is
an arbitrary non-constant node. If, for all j, the following holds:
var[[a0][a1]...[j]...[an]] = x[[c0*j + d0][c1*j + d1]...[cm*j + dm]],
where ck, dk are constants, then we can replace the load with
x[[c0*i + d0]...[cm*i + dm]]. This pass is implemented by
copy_propagation_replace_with_deref().