This is because lower_nonconstant_array_loads() can potentially turn
nonconstant loads into constant loads, allowing copy-prop to turn these
loads into previous instructions, which might help other passes as well.
This patch lowers the number of required temps for the following ps_2_0
shader from 19 to 16:
int i;
float3x3 mats[4];
float4 main() : sv_target
{
return mul(mats[i], float3(1, 2, 3)).xyzz;
}