This test currently hit a Metal bug when run on Apple Silicon with
MoltenVK and fails. We don't have an easy way to mark shader runner
tests as buggy and we're not interested in tracking that bug anyway,
so I'm just working around it.
The structurizer is implemented along the lines of what is usually called
the "structured program theorem": the control flow is completely
virtualized by mean of an additional TEMP register which stores the
block index which is currently running. The whole program is then
converted to a huge switch construction enclosed in a loop, executing
at each iteration the appropriate block and updating the register
depending on block jump instruction.
The algorithm's generality is also its major weakness: it accepts any
input program, even if its CFG is not reducible, but the output
program lacks any useful convergence information. It satisfies the
letter of the SPIR-V requirements, but it is expected that it will
be very inefficient to run on a GPU (unless a downstream compiler is
able to devirtualize the control flow and do a proper convergence
analysis pass). The algorithm is however very simple, and good enough
to at least pass tests, enabling further development. A better
alternative is expected to be upstreamed incrementally.
Side note: the structured program theorem is often called the
Böhm-Jacopini theorem; Böhm and Jacopini did indeed prove a variation
of it, but their algorithm is different from what is commontly attributed
to them and implemented here, so I opted for not using their name.
These can be disassembled by D3DDisassemble() just fine, and perhaps
more importantly, shader model 1 vertex shaders do not require dcl_
instructions in Direct3D 8.
The implementation of upload_buffer_data_with_states(), unlike the
implementation of upload_texture_data_with_states(), does not expect a
pointer to a D3D12_SUBRESOURCE_DATA, but rather, a direct pointer to the
data.
The generated Vulkan calls look right and do not trigger any
validation error, but the returned timestamp is 0. A valid
timestamp is returned if the CopyResource() call is commented,
or the second EndQuery() call is moved before CopyResource(),
or the first EndQuery() call is commented. I am not seeing any
sensible pattern here, so I guess there is just a bug in
MoltenVK.
Specifically, MoltenVK seems to be able to load from stencil, but
the specific replicating swizzle (repeating the stencil value on
all the channels) is not honored. The stencil value is read only
on the red channel.