Co-authored-by: Francisco Casas <fcasas@codeweavers.com>
Co-authored-by: Zebediah Figura <zfigura@codeweavers.com>
Because copy_propagation_transform_object_load() replaces a deref
instead of an instruction, it is currently prone to two problems:
1- It can replace a deref with the same deref, returning true every
time and getting the compilation stuck in an endless loop of
copy-propagation iterations.
2- When performed multiple times in the same deref, the second time it
can replace the deref with a deref from a temp that is only valid in
another point of the program execution, resulting in an incorrect value.
This patch preempts this by avoiding replacing derefs when the new deref
doesn't point to a uniform variable. Because, uniform variables cannot
be written to.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
A pointer to the containing descriptor heap can be derived from this
information.
PE build of vkd3d uses Windows critical sections for synchronisation,
and these slow down on the very high lock/unlock rate during multithreaded
descriptor copying in Shadow of the Tomb Raider. This patch speeds up the
demo by about 8%. By comparison, using SRW locks in the allocators and
locking them for read only where applicable is about 4% faster.
If the offset of a gather resource load can be represented as an
aoffimmi (vectori of ints from -8 to 7), use one.
This is of particular importance for 4.0 profiles, where this is the only
valid way of representing offsets for this operation.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
Rename it to copy_propagation_replace_with_single_instr() accordingly.
The idea is to introduce a constant vector replacement pass which will do the
same thing.
copy_propagation_compute_replacement() is not doing very much for us, and
conceptually is a bit of an odd fit anyway, since it's meant to deal with
multi-component types.
On cross builds, shaders are compiled with d3dcompiler_47.dll and
run with d3dN.dll. On non-cross builds, shaders are compiled with
vkd3d-shader and run with d3dN.dll (on Windows) or Vulkan and vkd3d
(on Linux).