Normalise the incoming vkd3d_shader_instruction IR to the shader model 6
pattern. This allows generation of a single patch constant function in
SPIR-V.
Since the introduction of instruction arrays, the parser location no
longer matches the location of the current instruction. Ultimately we'll
likely want to add some kind of explicit location information to struct
vkd3d_shader_instruction_array, because we might do transformations that
change the order of the original instructions.
VKD3D_SM4_OP_DCL_RESOURCE currently has 1 for "dst_count", but NULL for
"dst". This is largely harmless because we never attempt to access the
destination register of VKD3DSIH_DCL instructions, but nevertheless not
quite proper.
Without this patch, dp3 and dp4 map src swizzles to the dst writemask,
which is not correct.
Before b84f560bdf, these ops worked
despite this, because the dst register had, incorrectly, the full
writemask.
To solve this problem, write_sm1_binary_op_dot() is introduced,
similarly to write_sm4_binary_op_dot().
But still throw hlsl_fixme() when there is more than one.
Prioritizing among multiple compatible function overloads in the same way
as the native compiler would require systematic testing.
Otherwise we may create nodes of different dimensions than the ones we
are replacing.
"count" is the number of components of the source deref (without
considering the swizzle), while "instr_component_count" is the actual
number of components of the instruction to be replaced.
This was originally left alone in order to allow functions without early return
to succeed, since in that case we would already emit the correct bytecode
despite not handling the HLSL_IR_JUMP_RETURN instruction.
Now that we lower return statements, however, any unhandled instructions are
either definitely going to result in invalid bytecode, or rare enough that it's
not worth returning success anyway.
SM1 dp2add doesn't map src swizzles to the dst writemask, also it
expects the last argument to have a replicate swizzle.
Before this patch we were writing the operation as:
```
dp2add r0.x, r1.x, r0.x, r2.x
```
and now it is:
```
dp2add r0.x, r1.xyxx, r0.xyxx, r2.x
```
dp2add now has its own function, write_sm1_dp2add(), since it seems to
be the only instruction with this structure.
Ideally we would be using the default swizzles for the first two src
arguments:
```
dp2add r0.x, r1, r0, r2.x
```
since, according to native's documentation, these are supported for all
sm < 4.
But this change -- along with following the convention of repeating the
last component of the swizzle when fewer than 4 components are to be
specified -- would require more global changes, probably in
hlsl_swizzle_from_writemask() and hlsl_map_swizzle().
Because of the change introduced in
f21693b2 vkd3d-shader/hlsl: Use reg_size as component count when allocating a single register.
SM1 scalars and vectors were not longer getting the correct writemask
when they are allocated.
This happened because they have to reserve the whole register even if
they only use some of its components, so their reg_size may differ from
the number of components.
This commit fixes that.
Prevent them from being ever looked up.
Our naming scheme for synthetic variables already effectively prevents this, but
this is better for clarity. We also will need to be able to move some named
variables into a dummy scope to account for complexities around function
definition and declarations.
Co-authored-by: Francisco Casas <fcasas@codeweavers.com>
Co-authored-by: Zebediah Figura <zfigura@codeweavers.com>
Because copy_propagation_transform_object_load() replaces a deref
instead of an instruction, it is currently prone to two problems:
1- It can replace a deref with the same deref, returning true every
time and getting the compilation stuck in an endless loop of
copy-propagation iterations.
2- When performed multiple times in the same deref, the second time it
can replace the deref with a deref from a temp that is only valid in
another point of the program execution, resulting in an incorrect value.
This patch preempts this by avoiding replacing derefs when the new deref
doesn't point to a uniform variable. Because, uniform variables cannot
be written to.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
If the offset of a gather resource load can be represented as an
aoffimmi (vectori of ints from -8 to 7), use one.
This is of particular importance for 4.0 profiles, where this is the only
valid way of representing offsets for this operation.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
Rename it to copy_propagation_replace_with_single_instr() accordingly.
The idea is to introduce a constant vector replacement pass which will do the
same thing.
copy_propagation_compute_replacement() is not doing very much for us, and
conceptually is a bit of an odd fit anyway, since it's meant to deal with
multi-component types.
validate_static_object_references() validates that uninitialized static
objects are not referenced in the shader.
In case a static variable contains both numeric and object types, the
"Static variables cannot have both numeric and resource components."
error should preempt uninitialized numeric values to reach further
compilation steps.
Note that in the future we should call
validate_static_object_references() after DCE and pruning branches,
because shaders such as these compile (at least in more modern versions
of the native compiler):
Branch pruning:
```
static RWTexture2D<float> tex;
float4 main() : sv_target
{
if (0)
{
tex[int2(0, 0)] = 2;
}
return 0;
}
```
DCE:
```
static Texture2D tex;
uniform uint i;
float4 main() : sv_target
{
float4 unused = tex.Load(int3(0, 1, 2));
return 0;
}
```
These are "todo" tests in hlsl-static-initializer.shader_test
that depend on this.
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the 9.29.952.3111 version of the native
compiler, since it retrieves the following error on a shader that lacks
an initializer on a data type with object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
We have a different system of generating intrinsics, which makes it easier to
deal with "polymorphic" arithmetic functions.
Defining and storing intrinsics as hlsl_ir_function_decls would also require
more space in memory (and more optimization passes to get rid of the parameter
variables), and doesn't really save us any effort in terms of source code.
Using add_unary_arithmetic_expr() instead of hlsl_new_unary_expr()
allows the intrinsic to work with matrices.
Otherwise we get:
E5017: Aborting due to not yet implemented feature: Copying from unsupported node type.
because an HLSL_IR_EXPR reaches split_matrix_copies().
Atomic ops on images with Unknown type will cause SPIR-V validation failure,
and assertion failure in Mesa debug builds. D3D12 allows atomics on typed
buffers, and this requires a distinction to be made between UAV reads and
atomic ops.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=53874
Unlike compatible_data_types() and implicit_compatible_data_types(),
this function is intended to be symmetrical. So it makes sense to
preserve the names "t1" and "t2" for the arguments.
Otherwise, for instance, the added test results in:
debug_hlsl_writemask: Assertion `!(writemask & ~VKD3DSP_WRITEMASK_ALL)' failed.
Which happens in allocate_variable_temp_register() when the variable's
type reg_size is <= 4 but its component count is larger, which may
happen if it contains objects.
We would like to generate SPIR-V for input formats other than DXBC.
The "vkd3d_" prefix is dropped, partly to make names shorter, and partly to help
clarify what is an internal function.
I prefer avoiding the vkd3d_* prefix on all internal functions, for these
reasons. However, I'm open to restoring it.
The function has far too many arguments, including multiple different arguments
with the same type. Use a structure for clarity and to avoid errors.
Merge hlsl_new_sample_lod() into hlsl_new_resource_load() accordingly.
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.
Also, TextureCube and TextureCubeArray don't support the offset
argument, so this check is updated here too.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Also, TextureCube and TextureCubeArray don't support the offset
argument, so this check is updated.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
HLSL_ARRAY_ELEMENTS_COUNT_IMPLICIT (zero) is used as a temporal value
for elements_count for implicit size arrays.
This value is replaced by the correct one after parsing the initializer.
In case the implicit array is not initialized correctly, hlsl_error()
is called but the array size is kept at 0. So the rest of the code
must handle these cases.
In shader model 5.1, unlike in 5.0, declaring a multi-dimensional
object-type array with the last dimension implicit results in
an error. This happens even in presence of an initializer.
So, both gen_struct_fields() and declare_vars() first check if the
shader model is 5.1, the array elements are objects, and if there is
at least one implicit array size to handle the whole type as an
unbounded resource array.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
It is responsibility of the shader's programmer to ensure that
object references can be solved statically.
Resource arrays for ps_5_1 and vs_5_1 are an exception which is not
properly handled yet. They probably deserve a different object type.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Otherwise we get false in implicit_compatible_data_types() when passing
types that are equal but not convertible according to
convertible_data_type(); e.g. getting:
"Can't implicitly convert from Texture2D<float4> to Texture2D<float4>."
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Don't assume that enums and uint32_t parameters are identical. Clang
16 changes the diagonstic for incompatible function pointer types
from a warning into an error by default.
This fixes the following error, when built (for aarch64, but probably
also for other architectures) in MSVC mode:
../src/libs/vkd3d/libs/vkd3d-shader/spirv.c:1083:13: error: incompatible function pointer types passing 'uint32_t (struct vkd3d_spirv_builder *, uint32_t, SpvDim, uint32_t, uint32_t, uint32_t, uint32_t, SpvImageFormat)' (aka 'unsigned int (struct vkd3d_spirv_builder *, unsigned int, enum SpvDim_, unsigned int, unsigned int, unsigned int, unsigned int, enum SpvImageFormat_)') to parameter of type 'vkd3d_spirv_build7_pfn' (aka 'unsigned int (*)(struct vkd3d_spirv_builder *, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int)') [-Wincompatible-function-pointer-types]
vkd3d_spirv_build_op_type_image);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/libs/vkd3d/libs/vkd3d-shader/spirv.c:612:68: note: passing argument to parameter 'build_pfn' here
SpvOp op, const uint32_t *operands, vkd3d_spirv_build7_pfn build_pfn)
^
Signed-off-by: Martin Storsjö <martin@martin.st>
hlsl_new_store() and hlsl_new_load() are deleted, so now there are no more
direct ways to create derefs with offsets in hlsl.c and hlsl.h.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
This can be done now, to ensure that register offsets are no longer used
in hlsl.c and hlsl.h.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
At this point, the parse code is free of offsets; it only uses index
paths.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
The transform_deref_paths_into_offsets pass turns these index paths back
into register offsets.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
At this point add_load() is split into add_load_component() and
add_load_index(); register offsets are hidden for these functions.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Specifying R32 for UAVs created with a vector format, e.g. R32G32B32A32_FLOAT,
results in only the red being loaded/stored, potentially causing images to
contain only the red component.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>