For an if block
if (cond)
{
<then_block>
}
else
{
<else_block>
}
We flatten it by first replacing any store instruction `v[[k]] = x`
in the then_block with the following:
1: load(v[[k]])
2: cond ? x : @1
3: v[[k]] = @2
Similarly, we replace any store instruction `v[[k]] = x` in the
else_block with the following:
1: load(v[[k]])
2: cond ? @1 : x
3: v[[k]] = @2
Then we can concatenate <then_block> and <else_block> together and
get rid of the if block.
The combined sampler is created as a SAMPLER instead of a TEXTURE
because that fits all our current infrastructure. The only problem is
that in the CTAB it must appear as a Texture, so the new field
hlsl_type.is_combined_sampler is added.
Co-authored-by: Elizabeth Figura <zfigura@codeweavers.com>
This is simply unnecessary and wastes time.
As part of this, simply remove the "all" directive. Only for a couple of tests
is it even potentially interesting to validate all pixels (e.g.
nointerpolation.shader_test), and for those "all" is replaced with an explicit
(0, 0, 640, 480) rect.
In all other cases we just probe (0, 0).
According to the documentation, if_comp is available from 2_x pixel
and vertex shaders and, unlike "if bool" it doesn't expect a constant
boolean register (from the input signature), so:
if_neq cond -cond
seems like a convenient way to write these, for profiles above 2.0.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.