This effectively moves "null_event_cond" from struct d3d12_fence and
"latch" from struct vkd3d_waiting_event together into a separate
structure, as well as storing the signalling function in struct
vkd3d_waiting_event instead of getting it from struct d3d12_device. I
think that largely makes sense on its own, but storing the signalling
function in struct vkd3d_waiting_event also allows us to more easily
implement d3d12_device_SetEventOnMultipleFenceCompletion() in a
subsequent commit.
Since 4a94bfc2f6 we segregate
different D3D12 descriptor types in different Vulkan descriptor sets.
This change was introduced to reduce descriptor wasting when
allocating a new descriptor pool; that can be very useful when
using virtual heaps, which have to often cycle through many
descriptors, but it is expected to have limited impact for Vulkan
heaps, given that in that case most descriptors are allocated through
the descriptor heap rather than through the command allocator.
Instead, it has a rather detrimental effect with Vulkan heaps,
because it tends to use many more Vulkan descriptor sets than
necessary, often with just a handful of descriptors each. This
causes a regression on some Vulkan implementations that support
too few descriptor sets.
With this change we revert to a situation similar to before,
stuffing all the descriptors that do not live in a root
descriptor table in as few descriptor sets as possible (at most
one or two, depending on whether push descriptors are used).
We're already implicitly using it for image layouts in which
either depth or stencil is writeable and the other is not.
Correspondingly, add the _KHR suffix in those cases, so the
extension usage is more evident.
According to the Vulkan Hardware Database, only four reports
without this extension were filed since 2023, and all of them
for configurations we likely don't target.
When the client acquires the Vulkan queue it has to ensure that
it is not submitting work before other work it depends on already
submitted through the Direct3D 12 API but currently in the internal
vkd3d queue. Currently we suggest to enqueue signalling a fence and
than wait for it before acquiring the Vulkan queue, which is
correct but excessive: it will wait not just for the work currently
in the queue to be submitted, but for it to be executed too,
introducing useless dependencies.
By adding a way to enqueue signalling a fence on the CPU side we
allow the client to wait for the currently outstanding work to
be submitted to Vulkan, but nothing more.
Creating a pool of 16k descriptors is wasteful if an allocator only uses
a fraction of them, so start at 1k and double for each subsequent
allocation until 16k is reached.
Now that our Vulkan descriptor sets contain only a single vkd3d
descriptor type, we're able to create descriptor pools containing only a
single vkd3d descriptor type as well. This avoids wasting unallocated
descriptors of one type when we run out of descriptors of another type.
We currently create statically sized descriptor pools, shared among
different descriptor types. Once we're unable to allocate a descriptor
set from a pool, we create a new pool. The unfortunate but predictable
consequence is that when we run out of descriptors of one type, we waste
any unallocated descriptors of the other types.
Dynamically adjusting the pool sizes could mitigate the issue, but it
seems non-trivial to handle all the edge cases, particularly in
situations where the descriptor count ratios change significantly
between frames. Instead, by storing only a single vkd3d descriptor type
in each Vulkan descriptor set we're able to create separate descriptor
pools for each vkd3d descriptor type, which also avoids the issue.
The main drawback of using separate descriptor sets for each descriptor
type is that we can no longer pack all bounded descriptor ranges into a
single descriptor set, potentially leaving fewer descriptor sets
available for unbounded ranges. That seems worth it, but we may end up
having to switch to a more complicated strategy if this ends up being a
problem on Vulkan implementations with a very limited number of
available descriptor sets.
Slightly simplifies descriptor write addressing, and makes layouts
essentially the same as array layouts, differing only in the binding
details, and therefore easier to understand. This also simplifies the
addition of storage buffer bindings, which can all be added onto the end.
Based on the design document, "The runtime will not clamp or validate
the input, but implementations may clamp to the range [0,1] if necessary.",
so we test for the EXT_depth_range_unrestricted extension, and only clamp if
it's not available (thus, necessary to do so).
NaNs are converted to zero as per "NaNs must be treated as 0, but the runtime
will convert NaNs to 0 on behalf of the implementation.", and a default bounds
are set to 0.0 and 1.0.