The new fixmes can be triggered in presence of object components within
structs (for SM5).
In shaders such as this one:
struct apple
{
Texture2D tex : TEX;
float4 color : COLOR;
};
float4 main(struct apple input) : sv_target
{
return input.tex.Load(int3(1, 2, 3));
}
Or this one:
struct
{
Texture2D tex;
float4 color;
} s;
float4 main() : sv_target
{
return s.tex.Load(int3(1, 2, 3));
}
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.
Otherwise, in the added test, we get:
vkd3d-compiler: vkd3d-shader/hlsl.c:452: hlsl_init_deref_from_index_chain: Assertion `chain' failed.
because on the path that triggers the following error:
E5002: Wrong type for argument 1 of 'tex3D': expected 'sampler' or 'sampler3D', but got 'sampler2D'.
a NULL params.resource is passed to hlsl_new_resource_load() and
then to hlsl_init_deref_from_index_chain().
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
Thanks to Giovanni for the second set of tests! Note that the
tolerance for the final pixel was set much higher than the others;
this test seems to be an issue for some devices (in my case, a 7900
XTX running RADV).
Co-authored-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Ethan Lee <flibitijibibo@gmail.com>
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
Some drivers (AMD Radeon RX 6700 XT, with radeonsi from Mesa 22.2.0-rc3) emit
less than one invocation per pixel, presumably because they detect that the
shader control flow is uniform for all pixels. Having the control flow depend on
SV_Position avoids this test failure.
Cf. 34bd0dd0704c613abef8a9aa3ba2a2507ed02843 in wine.
The expected use case where a heap is freed before its contained
resources is not reasonably testable, so the ability to place a new
resource is tested instead.
But still throw hlsl_fixme() when there is more than one.
Prioritizing among multiple compatible function overloads in the same way
as the native compiler would require systematic testing.
This was originally left alone in order to allow functions without early return
to succeed, since in that case we would already emit the correct bytecode
despite not handling the HLSL_IR_JUMP_RETURN instruction.
Now that we lower return statements, however, any unhandled instructions are
either definitely going to result in invalid bytecode, or rare enough that it's
not worth returning success anyway.
Vectors cannot be used as array indexes, however, single-component
swizzles (such as vec.x) can be used.
This suggests that single-component swizzles should actually be
scalars and not vectors of dimx = 1.
It is worth noting that the use of single-component swizzles on scalars
should still be allowed.
Co-authored-by: Francisco Casas <fcasas@codeweavers.com>
Co-authored-by: Zebediah Figura <zfigura@codeweavers.com>
Because copy_propagation_transform_object_load() replaces a deref
instead of an instruction, it is currently prone to two problems:
1- It can replace a deref with the same deref, returning true every
time and getting the compilation stuck in an endless loop of
copy-propagation iterations.
2- When performed multiple times in the same deref, the second time it
can replace the deref with a deref from a temp that is only valid in
another point of the program execution, resulting in an incorrect value.
This patch preempts this by avoiding replacing derefs when the new deref
doesn't point to a uniform variable. Because, uniform variables cannot
be written to.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
On cross builds, shaders are compiled with d3dcompiler_47.dll and
run with d3dN.dll. On non-cross builds, shaders are compiled with
vkd3d-shader and run with d3dN.dll (on Windows) or Vulkan and vkd3d
(on Linux).