I ran the compilation of ~1000 DXBC-TPF shaders randomly taken from
my collection and measured the performance using callgrind and the
kcachegrind "cycle count" estimation.
BEFORE:
* 1,846,641,596 cycles
* 1,845,635,336 cycles
* 1,841,335,225 cycles
AFTER:
* 1,764,035,136 cycles
* 1,767,948,767 cycles
* 1,773,927,734 cycles
So callgrind would estimate a 3.6% improvement at least.
The counterpoint is that the caller might get an allocation that
is potentially bigger than necessary. I would expect that allocation
to be rather short-lived anyway, so that's probably not a problem.
shader_signature_find_element_for_reg() is also used in the TPF parser,
where the program has not been validated yet, so it must not crash
on errors.
The I/O normaliser can instead assume that the shader is already
validated.
This fixes a crash with a shader used by The Falconeer. The bug is still
present, because the shader will be incorrectly rejected, but at least
the vkd3d-shader will fail gracefully.