The list suffers from the ABA problem, where comparison with the head
succeeds but the head's `next` pointer has changed. Occurs in Cyberpunk
2077, on NVIDIA at least.
Provides a simple way to disable Vulkan writes for non-shader-visible
heaps. Also there is a chance of avoiding access to the d3d12_device
object which helps memory cache performance.
In this case d3d12_command_allocator_allocate_descriptor_set() is
only called for clearing UAVs. This helps on platforms with limited
descriptor maximum counts.
The descriptor component of struct d3d12_desc is replaced with a union
containing a pointer which can be swapped out using
InterlockedExchangePointer(). To make it safe to increment the refcount
of such an object it is necessary to cache freed objects. Elimination
of the descriptor mutexes on games which use multithreaded descriptor
writes nearly doubles framerate on recent hardware.
Eliminates vk_sets_mutex. Performance on average may be lower until
the descriptor mutexes are replaced and Vulkan writes are buffered
to reduce thunk calls.
In practice they never fail. If they fail, it means that there
is some underlying platform problem and there is little we can do
anyway. Under pthreads function prototypes allow returning failure,
but that's only used for "error checking" mutexes, which we
don't use.
On the other hand, error handling in vkd3d is rather inconsistent:
sometimes the errors are ignored, sometimes logged, sometimes
passed to the caller. It's hard to handle failures appropriately
if you can't even keep your state consistent, so I think it's
better to avoid trying, assume that synchronization primitives do
not fail and at least have consistent logging if something goes
wrong.
A pointer to the containing descriptor heap can be derived from this
information.
PE build of vkd3d uses Windows critical sections for synchronisation,
and these slow down on the very high lock/unlock rate during multithreaded
descriptor copying in Shadow of the Tomb Raider. This patch speeds up the
demo by about 8%. By comparison, using SRW locks in the allocators and
locking them for read only where applicable is about 4% faster.