The label itself is certainly an unsigned integer, but the register
has no meaningful data type. It cannot be evaluated to anything.
The goal of this is to reduce cluttering in the internal ASM dumps.
The new structurizer therefore reaches feature parity with the
older simple one, except for a couple of points:
* the old structurizer accepts any CFG, without requiring reducibility;
however, the DXIL specification requires the CFG to be reducible
anyway, so we're not really losing anything;
* the new structurizer additionally requires that no block has two
incoming back arrows; AFAIK this is condition that can happen,
but in practice it seems to be rare; also, it's not hard to add
support for it, as soon as it is decided it is useful.
On the other hand, the new structurizer makes use of the merging
information that are reconstructed from the CFG, which is important
for downstream optimization and fundamental for correctly emitting
tangled instructions.
Taking these considerations into account, the old structurizer is
considered superseded and is therefore removed.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.
For simplicity PHI nodes are not currently handled.
The goal for this pass is to make the CFG structurizer simpler, because
it doesn't have to care about the more rigid rules SSA registers have
to satisfy than TEMP registers.
It is likely that the generated code will be harder for downstream
compilers to optimize and execute efficiently, so once a complete
structurizer is in place this pass should be removed, or at least
greatly reduced in scope.
PHI nodes must be fixed up after this pass, because the block references
might have become broken. For simplicitly this is not handled yet.
The goal for this pass is to make the CFG structurizer simpler, because
only conditional and unconditional branches must be supported.
Eventually this limitation might be lifted if there is advantage in
doing so.
The relative-addressed case in shader_register_normalise_arrayed_addressing()
leaves the control point id in idx[0], while for constant register
indices it is placed in idx[1]. The latter case could be fixed instead,
but placing the control point count in the outer dimension is more
logical.
For example, this occurred in a shader:
reg_idx write_mask
0 xyz
1 xyzw
2 xyzw
3 xyz
The dcl_indexrange instruction covered only xyz, so once merged, searching for
xyzw failed.
It is impossible to declare an input array where elements have different
component counts, but the optimiser can create this case. One way for
this to occur is to dynamically index input values via a local array
containing copies of the input values. The optimiser converts this to
dynamically indexed inputs.
They were originally made const because no optimization/normalization
pass existed. Now having to cast away const all the time is becoming
more and more burdening.
Regression in signature normalisation, however the old code was not
correct either because it would apply the interpolation mode to all
components. Found in an Assassin's Creed: Valhalla shader.
Specifically, map COLOROUT to OUTPUT, and map INCONTROLPOINT to INPUT for domain
shaders as well as hull shaders.
Obscure the non-existent differences from the view of the backend.
Hull shaders have a different temps count for each phase, and the
parser only reports the count for the patch constant phase.
In order to properly check for temps count on hull shaders, we first
need to decode its phases.
The hull shader barrier used for this was broken by I/O normalization, since
vocp is no longer exposed to the spirv backend.
Restore this barrier by checking for vocp during normalization instead.
It was originally intended that this structure could hold other information
about the next stage which compilation might depend on. For example, compilation
to GLSL needs to know the type of the next shader in some circumstances.
That was never actualized, and since the API is fixed at this point for 1.9, it
makes the most sense to rename the structure to match its actual scope.
The documentation was written and arranged to imply that the structure would
hold other information about the next shader than the varying map; this is
changed accordingly as well.
In Shader Model 6 each signature element can span a range of register
indices, or 'rows', and system values do not share a register index with
non-system values. Inputs and outputs are referenced by element index
instead of register index. This patch merges multiple signature elements
into a single element under the following conditions:
- The register index in a load or store is specified dynamically by
including a relative address parameter with a base register index. The
dcl_index_range instruction is used to identify these.
- A register declaration is split across multiple elements which declare
different components of the register.
- A patch constant function writes tessellation factors. These are an
array in SPIR-V, but in SM 5.x each factor is declared as a separate
register, and these are dynamically indexed by the fork/join instance
id. Elimination of multiple fork/join phases converts the indices to
constants, but merging the signature elements into a single arrayed
element matches the SPIR-V output.
All references to input/output register indices are converted to element
indices. If a relative address is present, the element index is moved up
a slot so it cannot be confused with a constant offset. Existing code
only handles register index relative addressing for tessellation factors.
This patch adds generic support for it.
The SPIR-V backend will emit a default control point phase. Inserting
inputs into the IR allows handling of declarations via the usual path
instead of an ad hoc implementation which may not match later changes
to input handling.
In SPIR-V the address must include the invocation id, but in TPF it
is implicit. Move the register index up one slot and insert an
OUTPOINTID relative address.
Normalise the incoming vkd3d_shader_instruction IR to the shader model 6
pattern. This allows generation of a single patch constant function in
SPIR-V.