Provides a simple way to disable Vulkan writes for non-shader-visible
heaps. Also there is a chance of avoiding access to the d3d12_device
object which helps memory cache performance.
In this case d3d12_command_allocator_allocate_descriptor_set() is
only called for clearing UAVs. This helps on platforms with limited
descriptor maximum counts.
The descriptor component of struct d3d12_desc is replaced with a union
containing a pointer which can be swapped out using
InterlockedExchangePointer(). To make it safe to increment the refcount
of such an object it is necessary to cache freed objects. Elimination
of the descriptor mutexes on games which use multithreaded descriptor
writes nearly doubles framerate on recent hardware.
Eliminates vk_sets_mutex. Performance on average may be lower until
the descriptor mutexes are replaced and Vulkan writes are buffered
to reduce thunk calls.
d3d12_command_queue_flush_ops() can renter itself while processing signal
events. Since we don't use recursive mutexes, we currently have to check
some of the queue variables without holding the mutex, which is not safe.
This is solved by allowing the queue to release its mutex while it is
processing entries: when flushing, the queue is briefly locked, the
is_flushing flag is set, the queue content is copied away and the
queue is unlocked again. After having processed the entries, the
queue is locked again to check is something else was added in the
meantime. This is repeated until the queue is empty (or a wait operation
is blocking it).
This should also remove some latency when a thread pushes to the queue
while another one is processing it, but I didn't try to measure any
impact. While it is expected that with this patch the queue mutex
will be locked and unlocked more frequently, it should also remain
locked for less time, hopefully creating little contention.
The goal is to simplify the CS queue handling: with this and the following
changes operations are always started by d3d12_command_queue_flush_ops(),
in order to make further refactoring easier.
Notice that while with this change executing an operation on an empty CS
queue is a bit less efficient, it doesn't require more locking. On the other
hand, this change paves the road for executing CS operations without holding
the queue lock.
Otherwise it could be added more than once.
Note that the deleted comment is wrong: between when d3d12_command_queue_flush_ops()
returns and when the queue is added back to the blocked list, the queue
might have been pushed to and flushed an arbitrary number of times.
In practice they never fail. If they fail, it means that there
is some underlying platform problem and there is little we can do
anyway. Under pthreads function prototypes allow returning failure,
but that's only used for "error checking" mutexes, which we
don't use.
On the other hand, error handling in vkd3d is rather inconsistent:
sometimes the errors are ignored, sometimes logged, sometimes
passed to the caller. It's hard to handle failures appropriately
if you can't even keep your state consistent, so I think it's
better to avoid trying, assume that synchronization primitives do
not fail and at least have consistent logging if something goes
wrong.