Native does not always do this. For example, functions whose parameters are
float and float1 always result in an "ambiguous function call" error.
This does not fix any tests, because the relevant tests previously (incorrectly)
succeeded, and now fail with:
E5017: Aborting due to not yet implemented feature: Prioritize between multiple compatible function overloads.
when they should simply fail.
The choice to store them in an rbtree was made early on. It does not seem likely
that HLSL programs would define many overloads for any of their functions, but I
suspect the idea was rather that intrinsics would be defined as plain
hlsl_ir_function_decl structures [cf. 447463e590]
and that some intrinsics that could operate on any type would therefore need
many overrides.
This is not how we deal with intrinsics, however. When the first intrinsics were
implemented I made the choice disregard this intended design, and instead match
and convert their types manually, in C. Nothing that has happened in the time
since has led me to question that choice, and in fact, the flexibility with
which we must accommodate functions has led me to believe that matching in this
way was definitely the right choice. The main other designs I see would have
been:
* define each intrinsic variant separately using existing HLSL types. Besides
efficiency concerns (i.e. this would take more space in memory, and would take
longer to generate each variant), the normal type-matching rules don't really
apply to intrinsics.
[For example: elementwise intrinsics like abs() return the same type as the
input, including preserving the distinction between float and float1. It is
legal to define separate HLSL overloads taking float and float1, but trying to
invoke these functions yields an "ambiguous function call" error.]
* introduce new (semi-)generic types. This is far more code and ends up acting
like our current scheme (with helpers) in a slightly more complex form.
So I think we can go ahead and rip out this vestige of the original design for
intrinsics.
As for why to change it: rbtrees are simply more complex to deal with, and it
seems unlikely to me that the difference is going to matter. I do not expect any
program to define large quantities of intrinsics; linked list search should be
good enough.
None of these currently have any meaning, and none of these can currently be
parsed as distinct tokens either (i.e. they will generate a syntax error
anyway).
A struct declaration with variables is now absorbed into the 'declaration'
rule, like any other variable declaration.
A struct declaration without variables is now reduced to the
'struct_declaration_without_vars' rule.
They both are reduced to a 'declaration_statement' in the end.
In a declaration with multiple variables, the variables must be created
before the initializer of the next variable is parsed. This is required
for initializers such as:
float a = 1, b = a, c = b + 1;
A requisite for this is that the type information is parsed in the same
rule as the first variable (as a variable_def_typed) so it is
immediately available to declare the first variable. Then, the next
untyped variable declaration is parsed, and the type from the first
variable can be used to declare the second, before the third is parsed,
and so on.
Basically, declare_vars() is separated in three functions:
1. check_invalid_in_out_modifiers(), which is to be called once per
declaration and emits an error when in or out modifiers are used for
these non-parameter variables.
2. declare_var(), which now handles one variable at the time and doesn't
free any memory.
3. initialize_vars(), which takes care of preparing the initialization
instructions of several variables and frees their struct
parse_variable_def, using exclusively free_parse_variable_def().
This allows to declare variables individually before the initializer of
the next variable in the same declaration is parsed, which is used in
the following patches.
Also, simplifies memory management.
This patch makes index expressions on resources hlsl_ir_index nodes
instead of hlsl_ir_resource_load nodes, because it is not known if they
will be used later as the lhs of an hlsl_ir_resource_store.
For now, the only benefit is consistency.
Otherwise, in the added test, we get:
vkd3d-compiler: vkd3d-shader/hlsl.c:452: hlsl_init_deref_from_index_chain: Assertion `chain' failed.
because on the path that triggers the following error:
E5002: Wrong type for argument 1 of 'tex3D': expected 'sampler' or 'sampler3D', but got 'sampler2D'.
a NULL params.resource is passed to hlsl_new_resource_load() and
then to hlsl_init_deref_from_index_chain().
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
This node type is intended for use during parse-time.
While we parse an indexing expression such as "a[3]", we don't know if
it will end up as part of an expression (in which case it must be folded
into a load) or it is for the lhs of a store (in which case it must be
folded into the store's deref).
But still throw hlsl_fixme() when there is more than one.
Prioritizing among multiple compatible function overloads in the same way
as the native compiler would require systematic testing.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
validate_static_object_references() validates that uninitialized static
objects are not referenced in the shader.
In case a static variable contains both numeric and object types, the
"Static variables cannot have both numeric and resource components."
error should preempt uninitialized numeric values to reach further
compilation steps.
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the 9.29.952.3111 version of the native
compiler, since it retrieves the following error on a shader that lacks
an initializer on a data type with object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
We have a different system of generating intrinsics, which makes it easier to
deal with "polymorphic" arithmetic functions.
Defining and storing intrinsics as hlsl_ir_function_decls would also require
more space in memory (and more optimization passes to get rid of the parameter
variables), and doesn't really save us any effort in terms of source code.
Using add_unary_arithmetic_expr() instead of hlsl_new_unary_expr()
allows the intrinsic to work with matrices.
Otherwise we get:
E5017: Aborting due to not yet implemented feature: Copying from unsupported node type.
because an HLSL_IR_EXPR reaches split_matrix_copies().
Unlike compatible_data_types() and implicit_compatible_data_types(),
this function is intended to be symmetrical. So it makes sense to
preserve the names "t1" and "t2" for the arguments.
The function has far too many arguments, including multiple different arguments
with the same type. Use a structure for clarity and to avoid errors.
Merge hlsl_new_sample_lod() into hlsl_new_resource_load() accordingly.
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.
Also, TextureCube and TextureCubeArray don't support the offset
argument, so this check is updated here too.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Also, TextureCube and TextureCubeArray don't support the offset
argument, so this check is updated.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
HLSL_ARRAY_ELEMENTS_COUNT_IMPLICIT (zero) is used as a temporal value
for elements_count for implicit size arrays.
This value is replaced by the correct one after parsing the initializer.
In case the implicit array is not initialized correctly, hlsl_error()
is called but the array size is kept at 0. So the rest of the code
must handle these cases.
In shader model 5.1, unlike in 5.0, declaring a multi-dimensional
object-type array with the last dimension implicit results in
an error. This happens even in presence of an initializer.
So, both gen_struct_fields() and declare_vars() first check if the
shader model is 5.1, the array elements are objects, and if there is
at least one implicit array size to handle the whole type as an
unbounded resource array.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Otherwise we get false in implicit_compatible_data_types() when passing
types that are equal but not convertible according to
convertible_data_type(); e.g. getting:
"Can't implicitly convert from Texture2D<float4> to Texture2D<float4>."
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
At this point, the parse code is free of offsets; it only uses index
paths.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
The transform_deref_paths_into_offsets pass turns these index paths back
into register offsets.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
At this point add_load() is split into add_load_component() and
add_load_index(); register offsets are hidden for these functions.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>