Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
If the offset of a gather resource load can be represented as an
aoffimmi (vectori of ints from -8 to 7), use one.
This is of particular importance for 4.0 profiles, where this is the only
valid way of representing offsets for this operation.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
Rename it to copy_propagation_replace_with_single_instr() accordingly.
The idea is to introduce a constant vector replacement pass which will do the
same thing.
copy_propagation_compute_replacement() is not doing very much for us, and
conceptually is a bit of an odd fit anyway, since it's meant to deal with
multi-component types.
validate_static_object_references() validates that uninitialized static
objects are not referenced in the shader.
In case a static variable contains both numeric and object types, the
"Static variables cannot have both numeric and resource components."
error should preempt uninitialized numeric values to reach further
compilation steps.
Note that in the future we should call
validate_static_object_references() after DCE and pruning branches,
because shaders such as these compile (at least in more modern versions
of the native compiler):
Branch pruning:
```
static RWTexture2D<float> tex;
float4 main() : sv_target
{
if (0)
{
tex[int2(0, 0)] = 2;
}
return 0;
}
```
DCE:
```
static Texture2D tex;
uniform uint i;
float4 main() : sv_target
{
float4 unused = tex.Load(int3(0, 1, 2));
return 0;
}
```
These are "todo" tests in hlsl-static-initializer.shader_test
that depend on this.
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the 9.29.952.3111 version of the native
compiler, since it retrieves the following error on a shader that lacks
an initializer on a data type with object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
We have a different system of generating intrinsics, which makes it easier to
deal with "polymorphic" arithmetic functions.
Defining and storing intrinsics as hlsl_ir_function_decls would also require
more space in memory (and more optimization passes to get rid of the parameter
variables), and doesn't really save us any effort in terms of source code.