Numeric types are used very frequently, and doing a tree search
each time one is needed tends to waste a lot of time.
I ran the compilation of ~1000 DXBC-TPF shaders randomly taken from
my collection and measured the performance using callgrind and the
kcachegrind "cycle count" estimation.
BEFORE:
* 1,764,035,136 cycles
* 1,767,948,767 cycles
* 1,773,927,734 cycles
AFTER:
* 1,472,384,755 cycles
* 1,469,506,188 cycles
* 1,470,191,425 cycles
So callgrind would estimate a 16% improvement at least.
shader_signature_find_element_for_reg() is also used in the TPF parser,
where the program has not been validated yet, so it must not crash
on errors.
The I/O normaliser can instead assume that the shader is already
validated.
This fixes a crash with a shader used by The Falconeer. The bug is still
present, because the shader will be incorrectly rejected, but at least
the vkd3d-shader will fail gracefully.
The hlsl_ir_compile node is introduced to represent the "compile"
syntax, and later the CompileShader() and ConstructGSWithSO()
constructs.
It basically represents a function call that remembers its arguments
using hlsl_srcs and keeps its own instruction block, which is discarded
when working on non-effect shaders.
For shader compilations it can be asserted that args_count is 1, and
that this argument (and the last node in hlsl_ir_effect_call.instrs)
is a regular hlsl_ir_call pointing to the declaration of the function
to be compiled.