We currently check that non-shader-visible heaps have a NULL
handle, but that doesn't seem to be guaranteed: beside WARP, also
NVIDIA drivers still return a valid pointer. And that's a pretty
useless check anyway; rather, check that shader-visible heaps
have a valid pointer, which is more interesting.
Currently the tests expect that creating buffers in COMMON or
COPY_SOURCE state on UPLOAD heaps or in COMMON state on READBACK
heaps leads to a failure. I tested WARP, AMD and NVIDIA, and in
all cases the operations is successful.
I think the D3D12 runtime used reject resources created in the
configurations detailed above, but it doesn't any more (both
using the latest Agility SDK and the runtime distributed with
an updated Windows 11 system). However the CI still uses an
earlier runtime, so the old behavior is still allowed as
broken.
It seems that the NVIDIA drivers leaves VBVs bindings untouched
when they are NULL (or the GPU buffer address is NULL), instead of
setting them to a null binding.
Differently from other cases of inconsistent behaviour between AMD
and NVIDIA, here I'm explicitly marking the NVIDIA behaviour as
broken, because the expected behaviour is spelled out explicitly
(at least for the D3D12 specification standards).
It's hard to pinpoint exactly what's going wrong with these
tests. They seem to be related to atomics and GPU timestamps,
both categories that are known to have problems on MoltenVK in a
way or another; those failures clearly depend on a few factors
like the MoltenVK version, the macOS version and whether we're in
a virtual machine or not, but the exact dependency on those factors
is hard to describe (for example, in general the paravirtualized
device offered inside virtual machines has a lot more problems than
real devices, but I've seen tests, fixed all other conditions,
working on the paravirtualized device and not on the real device).
The only thing all tests in this batch have in common is that I've
never seen them fail on a Sequoia system, thus I've settled for
using just that as the bug_if() condition. Ultimately, wasting a
lot of time to get to the bottom of each single test failure is
pointless, and being able to mark the CI job as not allowed to
fail gives better regression protection than investigating each
of those. Also, I routinely run the tests on a Sequoia system, so
if these tests get broken this is going to be noticed anyway.
On MoltenVK it seems that all draws are always executed,
independently of the early depth stencil test. The problem doesn't
seem to belong to vkd3d or MoltenVK, because the generated Metal
commands look correct. I tried looking at a GPU capture with Xcode,
which was not very conclusive because it doesn't state clearly
whether early fragment tests were passed or not. Sometimes it
says that a fragment shader execution had no thread execution
data, which I interpret as the early fragment tests having
prevented the fragment shader from running, but it's not really
consistent, and it's never clear which results are based on
software simulation and which on the hardware run.
However taking everything into account I think the most likely
explanation is some incorrect optimization at the Metal level.
The graphics pipeline triggers an internal error in the Metal
pipeline compiler, with a completely generic error message. I have
no idea what the actual problem is.
As far as I know there is no way to implement this properly on
Vulkan, and all the other Vulkan implementations essentially work
by luck. In Vulkan the initial layout of a resource must always be
UNDEFINED or PREINITIALIZED and it must be transitioned away from
before any meaningful use of that image is done. Therefore it's
possible to alias two images and let the second one inherit the
content in the first one only if both already exist (and are in
the same layout) before the first writing is done. If, as in this
example, the second image is created after the first one has
already been written to, the obligatory transition away from
UNDEFINED or PREINITIALIZED will potentially wipe out the content.
Therefore I am marking this as todo, not as a bug. I might also be
that there is a bug in MoltenVK, and ultimately that's the reason
why we're reading invalid data, but technically the Vulkan
commands we generate are incorrect anyway.
When textures[1] is read for the second time it is aliased to
textures[0]. But textures[0] was left in COPY_DEST state (since
its creation), and textures[1] is currently recreated in COPY_SOURCE
state, which AFAIU is invalid. So recreate textures[1] in COPY_DEST
state and then transition it before reading it.
I haven't investigated the actual problem here, but the generated
Vulkan commands look correct (and work with basically all other
Vulkan implementations) and MoltenVK is known to have incomplete
tessellation support, so it's likely that the problem is there.
Similarly to 5d4edba925, it seems
that sometimes MoltenVK returns 0 to timestamp queries. It
doesn't happen deterministically and it might depend on the
hardware (I have seen differences between the M2 I used some
time ago and the M3 Max I have now).