This can save a significant amount of temp registers because it allows to
avoid referencing the temp (and having to store it) when not needed.
For instance, this patch lowers the number of required temps for the
following ps_2_0 shader from 24 to 19:
int i;
float3x3 mats[4];
float4 main() : sv_target
{
return mul(mats[i], float3(1, 2, 3)).xyzz;
}
Also, it is needed for SM1 vertex shader relative addressing since
non-constant loads are required to be directly on the uniform ('c'
registers) instead of the temp, and non-constant loads cannot be
transformed by copy propagation.
The semantic variables from a patch parameter are split as usual, with
the difference being that the semantic variable being added is a patch
variable itself, with the type being the split variable type, and its
number of control points being equal to the original patch variable's
number of control points. It is then stored in the original patch
variable as follows:
for (i = 0; i < n; ++i)
patch[i][f] := <inputpatch-sem-var>[i]
where n is the number of control points of "patch", and f is the field
index in patch corresponding to "<inputpatch-sem-var>".
We use special prefixes, "inputpatch-" or "outputpatch-", when adding
the semantic patch variables, in order to distinguish them from
non-patch semantic variables of the same name.
In anticipation of the need for is_patch_constant_func in
sm4_generate_vsir_reg_from_deref(), in order to generate vsir
registers from InputPatch/OutputPatch dereferences.
While so far it has been posible to do this at parse time, this must
happen after knowing if the complex cast is on the lhs or not.
The modified tests show that before this patch we are currently
miscompiling when this happens, because a complex lhs cast is transformed
into a load, and add_assigment() just stores to the generated "cast"
temp.
Vertex shaders do not have CMP, so we use SLT and MAD.
For example, this vertex shader:
uniform float4 f;
void main(inout float4 pos : position, out float4 t1 : TEXCOORD1)
{
t1 = (int4)f;
}
results in:
vs_2_0
dcl_position v0
slt r0, c0, -c0
frc r1, c0
add r2, -r1, c0
slt r1, -r1, r1
mad oT1, r0, r1, r2
mov oPos, v0
while we have the lower_cmp() pass, each time it is applied many
instructions are generated, so this patch introduces a specialized
version of the cast-to-int lowering for efficiency.