Note that we still have to preempt the propagation to SM1 pixel shader
uniforms. Otherwise this will turn the many constant derefs that appear
from the <index-val> copy generated in lower_index_loads() into a single
non-constant deref, causing it to allocate all the registers instead of
up until the last one used.
Here, a vertex shader version of the previous test by Shaun is
introduced. Note that in this case the uniform allocates all 4 registers
instead of 3 because it is indirectly addressed.
Note that, for indexes with a decimal part, the behavior is different
depending on whether it is a temp load or a direct uniform load (which
can only happen on vertex shaders). The former rounds to the
closest-to-zero, while the latter rounds to the nearest even.
Similarly to the modulus operator, d3dbc results with constant folding
are different from results when constant folding cannot be applied, and
different from tpf results.
Note that in d3dbc target profiles it gives different results when this
operation is constant folded compared to when it is not.
This suggests that whatever pass lowers the modulus operation to d3dbc
operations doesn't do it before constant folding.
Also note that when constant folded, d3dbc results differ from tpf
results for negative operands, because of the loss of precision that
happens when NEG is constant folded.
So the same integer modulus expression can have 3 different results
depending on the context.
This test is currently miscompiling on SM4 because
copy_propagation_invalidate_variable_from_deref_recurse() is not always
invalidating the right components.
Basically, separate lower_casts_to_int() into the lowering of the CAST
and the lowering of the TRUNC, so that TRUNCs that are not part of a
cast are lowered as well.
While fxc allows full expressions inside the angle brackets, we don't parse that
yet as it'd be quite a mess to properly do so with yacc, and I'm not aware of any
game doing so in their shaders.