When the client acquires the Vulkan queue it has to ensure that
it is not submitting work before other work it depends on already
submitted through the Direct3D 12 API but currently in the internal
vkd3d queue. Currently we suggest to enqueue signalling a fence and
than wait for it before acquiring the Vulkan queue, which is
correct but excessive: it will wait not just for the work currently
in the queue to be submitted, but for it to be executed too,
introducing useless dependencies.
By adding a way to enqueue signalling a fence on the CPU side we
allow the client to wait for the currently outstanding work to
be submitted to Vulkan, but nothing more.
We're explicitly replacing zero with one in the only place where the
plane count being zero or one makes a difference. It also make sense:
UNKNOWN is used for buffers, which for all intents and purposes have one
plane.
Otherwise ubsan reports runtime errors such as:
libs/vkd3d-shader/ir.c:4731:5: runtime error: null pointer passed as argument 1, which is declared to never be null
Currently, when using Vulkan heaps, we create descriptor set
layouts with as many descriptors as allowed by the Vulkan
implementation limits. For some implementations this can mean
hundreds of millions of descriptors or more, which is wasteful,
given that even on the best resource binding tier Direct3D 12
applications should not expect to have more than a million usable
descriptors.
Recently this began being a problem, because since Mesa 24.2.7
the Intel driver advertises more than 200 million descriptors,
but pipeline compilation takes linear RAM in the number of
descriptors declared in the pipeline layout. This means that
compiling even a simple shader requires 10-20 GB of RAM.
In order to avoid using too much memory, with this commit we clamp
the number of descriptors declared in the set layouts to how many
we actually need to guarantee tier 3 resource binding support.
Creating a pool of 16k descriptors is wasteful if an allocator only uses
a fraction of them, so start at 1k and double for each subsequent
allocation until 16k is reached.
Now that our Vulkan descriptor sets contain only a single vkd3d
descriptor type, we're able to create descriptor pools containing only a
single vkd3d descriptor type as well. This avoids wasting unallocated
descriptors of one type when we run out of descriptors of another type.
We currently create statically sized descriptor pools, shared among
different descriptor types. Once we're unable to allocate a descriptor
set from a pool, we create a new pool. The unfortunate but predictable
consequence is that when we run out of descriptors of one type, we waste
any unallocated descriptors of the other types.
Dynamically adjusting the pool sizes could mitigate the issue, but it
seems non-trivial to handle all the edge cases, particularly in
situations where the descriptor count ratios change significantly
between frames. Instead, by storing only a single vkd3d descriptor type
in each Vulkan descriptor set we're able to create separate descriptor
pools for each vkd3d descriptor type, which also avoids the issue.
The main drawback of using separate descriptor sets for each descriptor
type is that we can no longer pack all bounded descriptor ranges into a
single descriptor set, potentially leaving fewer descriptor sets
available for unbounded ranges. That seems worth it, but we may end up
having to switch to a more complicated strategy if this ends up being a
problem on Vulkan implementations with a very limited number of
available descriptor sets.
ERR is used to indicate internal inconsistencies in vkd3d. Here that's
not the case, we simply have to forward the error condition to the
caller.
This fixes failures on the CI with llvmpipe, because the build we use is
compiled without support for VK_KHR_surface and related extensions.
Without LTO, gcc doesn't know that hresult_from_vk_result() will always return a
failure HRESULT for a failure VkResult, and so thinks that we might exit from
vkd3d_check_device_extensions() with a success HRESULT but without initializing
vk_extensions.
Currently the mutable descriptor set is repeated many times in the
pipeline layout in order to cover the indices for all the
descriptor types that would be present if mutable descriptors were
not used. This is useless and wasteful, but was necessary before
the descriptor sets backing the SRV-UAV-CBV heap were moved at the
end of the allocation table because descriptor set indices are
currently a compile-time constant in many places.
Now this is not needed any more and we can just avoid putting
many copies of the mutable descriptor set in the pipeline layout,
making it easier to meet Vulkan implementation limits.
So that when mutable descriptors are in use we can avoid putting
the other descriptor sets backing the SRV-UAV-CBV descriptor heap
in the pipeline layout altogether.
So we avoid hardcoding that it is number zero. There are two
goals here: first making the code easier to understand and
second allow reshuffling the descriptor set allocation in a
later commit.
The descriptor heap implementation is a rather central behavior element
in vkd3d, so it's useful to have all the relevant information logged
in a single place.
Slightly simplifies descriptor write addressing, and makes layouts
essentially the same as array layouts, differing only in the binding
details, and therefore easier to understand. This also simplifies the
addition of storage buffer bindings, which can all be added onto the end.
Allows descriptor set layouts to be created after all bindings are
mapped. This is less complex and fragile than the current scheme, and in
a future patch it will support separating descriptor types into different
sets. Descriptors on virtual heaps are currently allocated from pools
which contain an equal number of each descriptor type used by vkd3d, and
this can waste a significant amount of device memory.
Enables the bounded range to be mapped to the unbounded one, instead of
being mapped to a separate binding which will be populated from the same
d3d12 descriptors as the unbounded one.
Based on the design document, "The runtime will not clamp or validate
the input, but implementations may clamp to the range [0,1] if necessary.",
so we test for the EXT_depth_range_unrestricted extension, and only clamp if
it's not available (thus, necessary to do so).
NaNs are converted to zero as per "NaNs must be treated as 0, but the runtime
will convert NaNs to 0 on behalf of the implementation.", and a default bounds
are set to 0.0 and 1.0.