It's hard to pinpoint exactly what's going wrong with these
tests. They seem to be related to atomics and GPU timestamps,
both categories that are known to have problems on MoltenVK in a
way or another; those failures clearly depend on a few factors
like the MoltenVK version, the macOS version and whether we're in
a virtual machine or not, but the exact dependency on those factors
is hard to describe (for example, in general the paravirtualized
device offered inside virtual machines has a lot more problems than
real devices, but I've seen tests, fixed all other conditions,
working on the paravirtualized device and not on the real device).
The only thing all tests in this batch have in common is that I've
never seen them fail on a Sequoia system, thus I've settled for
using just that as the bug_if() condition. Ultimately, wasting a
lot of time to get to the bottom of each single test failure is
pointless, and being able to mark the CI job as not allowed to
fail gives better regression protection than investigating each
of those. Also, I routinely run the tests on a Sequoia system, so
if these tests get broken this is going to be noticed anyway.
Note that we still have to preempt the propagation to SM1 pixel shader
uniforms. Otherwise this will turn the many constant derefs that appear
from the <index-val> copy generated in lower_index_loads() into a single
non-constant deref, causing it to allocate all the registers instead of
up until the last one used.
Here, a vertex shader version of the previous test by Shaun is
introduced. Note that in this case the uniform allocates all 4 registers
instead of 3 because it is indirectly addressed.
Note that, for indexes with a decimal part, the behavior is different
depending on whether it is a temp load or a direct uniform load (which
can only happen on vertex shaders). The former rounds to the
closest-to-zero, while the latter rounds to the nearest even.
On MoltenVK it seems that all draws are always executed,
independently of the early depth stencil test. The problem doesn't
seem to belong to vkd3d or MoltenVK, because the generated Metal
commands look correct. I tried looking at a GPU capture with Xcode,
which was not very conclusive because it doesn't state clearly
whether early fragment tests were passed or not. Sometimes it
says that a fragment shader execution had no thread execution
data, which I interpret as the early fragment tests having
prevented the fragment shader from running, but it's not really
consistent, and it's never clear which results are based on
software simulation and which on the hardware run.
However taking everything into account I think the most likely
explanation is some incorrect optimization at the Metal level.
The graphics pipeline triggers an internal error in the Metal
pipeline compiler, with a completely generic error message. I have
no idea what the actual problem is.