My main motivation to this is avoiding generating a lot of useless
log lines from other executors when I'm interested in just one of
them, but I can imagine this also somewhat improving efficiency.
This test is currently miscompiling on SM4 because
copy_propagation_invalidate_variable_from_deref_recurse() is not always
invalidating the right components.
Valid stream output objects must be single-element containing a
PointStream/LineStream/TriangleStream object.
Moreover, stream output objects cannot be declared globally.
Basically, separate lower_casts_to_int() into the lowering of the CAST
and the lowering of the TRUNC, so that TRUNCs that are not part of a
cast are lowered as well.
We implement a transformation that propagates loads with a single
non-constant index in its deref path. Consider a load of the form
var[[a0][a1]...[i]...[an]], where ak are integral constants, and i is
an arbitrary non-constant node. If, for all j, the following holds:
var[[a0][a1]...[j]...[an]] = x[[c0*j + d0][c1*j + d1]...[cm*j + dm]],
where ck, dk are constants, then we can replace the load with
x[[c0*i + d0]...[cm*i + dm]]. This pass is implemented by
copy_propagation_replace_with_deref().