So that they are dumped even if parsing fails, which is a circumstance
in which one likely wants to see the problematic shader.
The downside of that is that for shader types other than HLSL
the profile is not written any more in the filename. This should
not be a big problem, because in those cases the shader describes
its own type.
When dumping an HLSL shader, the id is brought in front of the profile
in the file name, in order to make it more tab-friendly: when dealing
with a directory full of shaders it's likely that the id determines
the profile, but the other way around.
This field is now analogous to vkd3d_shader_register_index.rel_addr.
Also, it makes sense to rename it now because all the constant part of
the offset is now handled to hlsl_deref.const_offset. Consequently, it
may also be NULL now.
This uint will be used for the following:
- Since SM4's relative addressing (the capability of passing a register
as an index to another register) only has whole-register granularity,
we will need to make the offset node express the offset in
whole-registers and specify the register component in this uint,
otherwise we would have to add additional / and % operations in the
output binary.
- If, after we apply constant folding and copy propagation, we determine
that the offset is a single constant node, we can store all the offset
in this uint constant, and remove the offset src.
This allows DCE to remove a good bunch of the nodes previously required
only for the offset constants, which makes the output more liteweight
and readable, and simplifies the implementation of relative addressing
when writing tpf in the following patches.
In dump_deref(), we use "c" to indicate components instead of whole
registers. Since now both the offset node and the offset uint are in
components a lowered deref would look like:
var[@42c + 2c]
But, once we express the offset node in whole registers we will remove
the "c" from the node part:
var[@22 + 3c]
Some functions work with dereferences and need to know if they are
lowered yet.
This can be known checking if deref->offset.node is NULL or
deref->data_type is NULL. I am using the latter since it keeps working
even after the following patches that split deref->offset into
constant and variable parts.
Components only span across a single regset, so instead of expecting the
regset as input for the offset, hlsl_type_get_component_offset() can
actually retrieve it.
We have to distinguish between the "bind count" and the "allocation size"
of variables.
The "allocation size" affects the starting register id for the resource to
be allocated next, while the "bind count" is determined by the last field
actually used. The former may be larger than the latter.
What we are currently calling hlsl_reg.bind_count is actually the
"allocation size", so a rename is in order.
The real "bind count", which will be introduced in following patches,
is important because it is what should be shown in the RDEF table and
some resource allocation rules depend on it.
For instance, for this shader:
texture2D texs[3];
texture2D tex;
float4 main() : sv_target
{
return texs[0].Load(int3(0, 0, 0)) + tex.Load(int3(0, 0, 0));
}
the variable "texs" has a "bind count" of 1, but an "allocation size" of
3:
// Resource Bindings:
//
// Name Type Format Dim HLSL Bind Count
// ------------------------------ ---------- ------- ----------- -------------- ------
// texs texture float4 2d t0 1
// tex texture float4 2d t3 1
After lowering the derefs path to a single offset node, there was no way
of knowing the type of the referenced part of the variable. This little
modification allows to avoid having to pass the data type everywhere and
it is required for supporting instructions that reference objects
components within struct types.
Since deref->data_type allows us to retrieve the type of the deref,
deref->offset_regset is no longer necessary.
Non-constant vector indexing is not solved with relative addressing
in the register indexes because this indexation cannot be at the level
of register-components.
Mathematical operations must be used instead.
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.
This patch makes index expressions on resources hlsl_ir_index nodes
instead of hlsl_ir_resource_load nodes, because it is not known if they
will be used later as the lhs of an hlsl_ir_resource_store.
For now, the only benefit is consistency.
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
Fix a compile warning:
../vkd3d/libs/vkd3d-shader/hlsl_codegen.c: In function 'allocate_semantic_register':
../vkd3d/libs/vkd3d-shader/hlsl_codegen.c:2947:85: error: passing argument 4 of 'hlsl_sm4_register_from_semantic' from incompatible pointer type [-Werror=incompatible-pointer-types]
2947 | if ((builtin = hlsl_sm4_register_from_semantic(ctx, &var->semantic, output, &type, NULL, &has_idx)))
| ^~~~~
| |
| unsigned int *
In file included from ../vkd3d/libs/vkd3d-shader/hlsl_codegen.c:21:
../vkd3d/libs/vkd3d-shader/hlsl.h:1171:52: note: expected 'enum vkd3d_sm4_register_type *' but argument is of type 'unsigned int *'
1171 | bool output, enum vkd3d_sm4_register_type *type, enum vkd3d_sm4_swizzle_type *swizzle_type, bool *has_idx);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
This node type is intended for use during parse-time.
While we parse an indexing expression such as "a[3]", we don't know if
it will end up as part of an expression (in which case it must be folded
into a load) or it is for the lhs of a store (in which case it must be
folded into the store's deref).
Prevent them from being ever looked up.
Our naming scheme for synthetic variables already effectively prevents this, but
this is better for clarity. We also will need to be able to move some named
variables into a dummy scope to account for complexities around function
definition and declarations.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
We have a different system of generating intrinsics, which makes it easier to
deal with "polymorphic" arithmetic functions.
Defining and storing intrinsics as hlsl_ir_function_decls would also require
more space in memory (and more optimization passes to get rid of the parameter
variables), and doesn't really save us any effort in terms of source code.
The function has far too many arguments, including multiple different arguments
with the same type. Use a structure for clarity and to avoid errors.
Merge hlsl_new_sample_lod() into hlsl_new_resource_load() accordingly.
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.