This preempts us from replacing a swizzle incorrectly, as in the
following example:
1: A.x = 1.0
2: A
3: A.x = 2.0
4: @2.x
were @4 ends up being 2.0 instead of 1.0, because that's the value stored in
A.x at time 4, and we should be querying it at time 2.
This also helps us to avoid replacing a swizzle with itself in copy-prop
which can result in infinite loops, as with the included tests this commit.
Consider the following sequence of instructions:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @3.x
But in the next iteration it would make it point back to @1, keeping it
spinning infinitively.
The solution is to index the instructions and don't replace the swizzle
if the new load happens after the current load.
Instead of only storing the value that each variable's component has at
the moment of the instruction currently handled by copy-prop, we store
the trace of all the historic values with their timestamps, i.e. the
instruction index on which the value was stored.
This would allow us to query the value that the variable had at the time
of execution of previous instructions.
The choice to store them in an rbtree was made early on. It does not seem likely
that HLSL programs would define many overloads for any of their functions, but I
suspect the idea was rather that intrinsics would be defined as plain
hlsl_ir_function_decl structures [cf. 447463e590]
and that some intrinsics that could operate on any type would therefore need
many overrides.
This is not how we deal with intrinsics, however. When the first intrinsics were
implemented I made the choice disregard this intended design, and instead match
and convert their types manually, in C. Nothing that has happened in the time
since has led me to question that choice, and in fact, the flexibility with
which we must accommodate functions has led me to believe that matching in this
way was definitely the right choice. The main other designs I see would have
been:
* define each intrinsic variant separately using existing HLSL types. Besides
efficiency concerns (i.e. this would take more space in memory, and would take
longer to generate each variant), the normal type-matching rules don't really
apply to intrinsics.
[For example: elementwise intrinsics like abs() return the same type as the
input, including preserving the distinction between float and float1. It is
legal to define separate HLSL overloads taking float and float1, but trying to
invoke these functions yields an "ambiguous function call" error.]
* introduce new (semi-)generic types. This is far more code and ends up acting
like our current scheme (with helpers) in a slightly more complex form.
So I think we can go ahead and rip out this vestige of the original design for
intrinsics.
As for why to change it: rbtrees are simply more complex to deal with, and it
seems unlikely to me that the difference is going to matter. I do not expect any
program to define large quantities of intrinsics; linked list search should be
good enough.
This field is now analogous to vkd3d_shader_register_index.rel_addr.
Also, it makes sense to rename it now because all the constant part of
the offset is now handled to hlsl_deref.const_offset. Consequently, it
may also be NULL now.
This uint will be used for the following:
- Since SM4's relative addressing (the capability of passing a register
as an index to another register) only has whole-register granularity,
we will need to make the offset node express the offset in
whole-registers and specify the register component in this uint,
otherwise we would have to add additional / and % operations in the
output binary.
- If, after we apply constant folding and copy propagation, we determine
that the offset is a single constant node, we can store all the offset
in this uint constant, and remove the offset src.
This allows DCE to remove a good bunch of the nodes previously required
only for the offset constants, which makes the output more liteweight
and readable, and simplifies the implementation of relative addressing
when writing tpf in the following patches.
In dump_deref(), we use "c" to indicate components instead of whole
registers. Since now both the offset node and the offset uint are in
components a lowered deref would look like:
var[@42c + 2c]
But, once we express the offset node in whole registers we will remove
the "c" from the node part:
var[@22 + 3c]
Some functions work with dereferences and need to know if they are
lowered yet.
This can be known checking if deref->offset.node is NULL or
deref->data_type is NULL. I am using the latter since it keeps working
even after the following patches that split deref->offset into
constant and variable parts.