Note that, for indexes with a decimal part, the behavior is different
depending on whether it is a temp load or a direct uniform load (which
can only happen on vertex shaders). The former rounds to the
closest-to-zero, while the latter rounds to the nearest even.
On MoltenVK it seems that all draws are always executed,
independently of the early depth stencil test. The problem doesn't
seem to belong to vkd3d or MoltenVK, because the generated Metal
commands look correct. I tried looking at a GPU capture with Xcode,
which was not very conclusive because it doesn't state clearly
whether early fragment tests were passed or not. Sometimes it
says that a fragment shader execution had no thread execution
data, which I interpret as the early fragment tests having
prevented the fragment shader from running, but it's not really
consistent, and it's never clear which results are based on
software simulation and which on the hardware run.
However taking everything into account I think the most likely
explanation is some incorrect optimization at the Metal level.
The graphics pipeline triggers an internal error in the Metal
pipeline compiler, with a completely generic error message. I have
no idea what the actual problem is.
This makes it more similar to the MSL and GLSL generators. It also looks
like a cleaner design, the backend is supposed to get access to the vsir
program after it has gone through the pipeline.
Similarly to the modulus operator, d3dbc results with constant folding
are different from results when constant folding cannot be applied, and
different from tpf results.
Note that in d3dbc target profiles it gives different results when this
operation is constant folded compared to when it is not.
This suggests that whatever pass lowers the modulus operation to d3dbc
operations doesn't do it before constant folding.
Also note that when constant folded, d3dbc results differ from tpf
results for negative operands, because of the loss of precision that
happens when NEG is constant folded.
So the same integer modulus expression can have 3 different results
depending on the context.