We are using the hlsl_ir_var.is_uniform flag to indicate when an object
is a uniform copy created from a variable with the HLSL_STORAGE_UNIFORM
modifier.
We should be checking for this instead of the HLSL_STORAGE_UNIFORM flag
which is also set to 1 for the original variables, and there should be
no reason to use this flag instead of "is_uniform" after the uniform
copies and combined/separated samplers are created.
After lowering the derefs path to a single offset node, there was no way
of knowing the type of the referenced part of the variable. This little
modification allows to avoid having to pass the data type everywhere and
it is required for supporting instructions that reference objects
components within struct types.
Since deref->data_type allows us to retrieve the type of the deref,
deref->offset_regset is no longer necessary.
lower_narrowing_casts() currently creates a new cast calling
hlsl_new_cast(). This cast may be redundant, but it is not folded, which
is making SM1 emit an unnecessary fixme in some shaders:
Aborting due to not yet implemented feature: SM1 "cast" expression.
Other passes that call hlsl_new_cast() are lower_int_division() and
lower_int_modulus(), so the new fold_redundant_casts() pass is called
after these as well.
Non-constant vector indexing is not solved with relative addressing
in the register indexes because this indexation cannot be at the level
of register-components.
Mathematical operations must be used instead.
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.
This patch makes index expressions on resources hlsl_ir_index nodes
instead of hlsl_ir_resource_load nodes, because it is not known if they
will be used later as the lhs of an hlsl_ir_resource_store.
For now, the only benefit is consistency.
Since in SM1 all vector types use 4 register components, and since SM1
doesn't consider vectors of different dimx incompatible, it is necessary
to ensure that the semantic var is created with dimx=4, and to add a
cast node.
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
This node type is intended for use during parse-time.
While we parse an indexing expression such as "a[3]", we don't know if
it will end up as part of an expression (in which case it must be folded
into a load) or it is for the lhs of a store (in which case it must be
folded into the store's deref).
Otherwise we may create nodes of different dimensions than the ones we
are replacing.
"count" is the number of components of the source deref (without
considering the swizzle), while "instr_component_count" is the actual
number of components of the instruction to be replaced.
Because of the change introduced in
f21693b2 vkd3d-shader/hlsl: Use reg_size as component count when allocating a single register.
SM1 scalars and vectors were not longer getting the correct writemask
when they are allocated.
This happened because they have to reserve the whole register even if
they only use some of its components, so their reg_size may differ from
the number of components.
This commit fixes that.
Co-authored-by: Francisco Casas <fcasas@codeweavers.com>
Co-authored-by: Zebediah Figura <zfigura@codeweavers.com>
Because copy_propagation_transform_object_load() replaces a deref
instead of an instruction, it is currently prone to two problems:
1- It can replace a deref with the same deref, returning true every
time and getting the compilation stuck in an endless loop of
copy-propagation iterations.
2- When performed multiple times in the same deref, the second time it
can replace the deref with a deref from a temp that is only valid in
another point of the program execution, resulting in an incorrect value.
This patch preempts this by avoiding replacing derefs when the new deref
doesn't point to a uniform variable. Because, uniform variables cannot
be written to.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
Rename it to copy_propagation_replace_with_single_instr() accordingly.
The idea is to introduce a constant vector replacement pass which will do the
same thing.
copy_propagation_compute_replacement() is not doing very much for us, and
conceptually is a bit of an odd fit anyway, since it's meant to deal with
multi-component types.
Note that in the future we should call
validate_static_object_references() after DCE and pruning branches,
because shaders such as these compile (at least in more modern versions
of the native compiler):
Branch pruning:
```
static RWTexture2D<float> tex;
float4 main() : sv_target
{
if (0)
{
tex[int2(0, 0)] = 2;
}
return 0;
}
```
DCE:
```
static Texture2D tex;
uniform uint i;
float4 main() : sv_target
{
float4 unused = tex.Load(int3(0, 1, 2));
return 0;
}
```
These are "todo" tests in hlsl-static-initializer.shader_test
that depend on this.
Otherwise, for instance, the added test results in:
debug_hlsl_writemask: Assertion `!(writemask & ~VKD3DSP_WRITEMASK_ALL)' failed.
Which happens in allocate_variable_temp_register() when the variable's
type reg_size is <= 4 but its component count is larger, which may
happen if it contains objects.
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.
It is responsibility of the shader's programmer to ensure that
object references can be solved statically.
Resource arrays for ps_5_1 and vs_5_1 are an exception which is not
properly handled yet. They probably deserve a different object type.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
hlsl_new_store() and hlsl_new_load() are deleted, so now there are no more
direct ways to create derefs with offsets in hlsl.c and hlsl.h.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
This can be done now, to ensure that register offsets are no longer used
in hlsl.c and hlsl.h.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
At this point, the parse code is free of offsets; it only uses index
paths.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
The transform_deref_paths_into_offsets pass turns these index paths back
into register offsets.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Otherwise we can get failed assertions:
assert(node->type == HLSL_IR_LOAD);
because broadcasts to these types are not implemented yet.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
With this change it is possible to store booleans as 0xffffffff,
similarly as what happens at runtime.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Also rename it to hlsl_fold_constants().
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Also rename it to hlsl_replace_node().
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Some register types do not use a consistent swizzle type, so the
sm4_swizzle_type() function is removed.
The swizzle type now must be specified using the swizzle_type field.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The return variable was already added to the extern_vars list and marked as an
output semantic by the append_output_var_copy() call above, so the preceding
loop will take care of setting the last_read field.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This doesn't hold anything other than a list, nor do I have any immediate plans
for it to hold anything other than a list, but I'm adding it for some degree of
clarity. Passing around untyped list pointers is not my favourite hobby.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
These can be assigned to when compatibility mode is used.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This allows us to more easily manipulate individual elements in a type-agnostic
way. For example, it allows easier implementation of constant swizzle folding.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Some places assert() that the register is allocated.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
For the sake of simplicity and clarity, especially in the interest of allowing
us to have expressions with larger numbers of terms.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>