VKD3D_SM4_OP_DCL_RESOURCE currently has 1 for "dst_count", but NULL for
"dst". This is largely harmless because we never attempt to access the
destination register of VKD3DSIH_DCL instructions, but nevertheless not
quite proper.
d3d12_command_queue_flush_ops() can renter itself while processing signal
events. Since we don't use recursive mutexes, we currently have to check
some of the queue variables without holding the mutex, which is not safe.
This is solved by allowing the queue to release its mutex while it is
processing entries: when flushing, the queue is briefly locked, the
is_flushing flag is set, the queue content is copied away and the
queue is unlocked again. After having processed the entries, the
queue is locked again to check is something else was added in the
meantime. This is repeated until the queue is empty (or a wait operation
is blocking it).
This should also remove some latency when a thread pushes to the queue
while another one is processing it, but I didn't try to measure any
impact. While it is expected that with this patch the queue mutex
will be locked and unlocked more frequently, it should also remain
locked for less time, hopefully creating little contention.
The goal is to simplify the CS queue handling: with this and the following
changes operations are always started by d3d12_command_queue_flush_ops(),
in order to make further refactoring easier.
Notice that while with this change executing an operation on an empty CS
queue is a bit less efficient, it doesn't require more locking. On the other
hand, this change paves the road for executing CS operations without holding
the queue lock.
Otherwise it could be added more than once.
Note that the deleted comment is wrong: between when d3d12_command_queue_flush_ops()
returns and when the queue is added back to the blocked list, the queue
might have been pushed to and flushed an arbitrary number of times.
Otherwise it's not clear which clauses in vkd3d_shader_compile() really
apply to other functions. For example, many of the functions currently
refering to vkd3d_shader_compile() don't even take a vkd3d_shader_compile_info
parameter.
Without this patch, dp3 and dp4 map src swizzles to the dst writemask,
which is not correct.
Before b84f560bdf, these ops worked
despite this, because the dst register had, incorrectly, the full
writemask.
To solve this problem, write_sm1_binary_op_dot() is introduced,
similarly to write_sm4_binary_op_dot().