Sometimes SM1-3 shaders contain write masks that exceed the
signature element masks. That happens because SM1-3 shaders do not
have a concept of signature and signature masks, and OTOH aren't
always able to express any given write mask.
In VSIR we don't want to deal with I/O register masks exceeding the
corresponding signature element mask or usage mask, because, for
instance, for higher shader models it can complicate dealing with
DCL_INDEX_RANGE. In order to have uniform rules for all shader
models we normalise masks coming from SM1-3 shaders.
We don't do that normalisation when disassembling, in order to
preserve the expected output.
The previous names "not normalised" and "fully normalised" have meanings
which are likely to change with time. OTOH including a description of the
normalisation level in the enumerant seems excessive. Relating
normalisation levels to shader model versions might be a reasonable
compromise.
The alternative to adding the vsir_program->tess_output_primitive and
vsir_program->tess_partitioning fields would be to emit the vsir
DCL_TESSELLATOR_OUTPUT_PRIMITIVE and DCL_TESSELLATOR_PARTITIONING
instructions, like DXIL does, but I think that the preference is to store
these kind of data directly in the vsir_program.
Most I/O registers are already described by the shader signatures.
The registers that are not do not have any property other then
being used by the program or not, so they can be collectively
described with a bitmap.
Since the sort index is just a convenience field it is more
appropriate to only set it where it is required, instead of
requiring all frontends and passes to retain sensible values for
it.
Our ASM dumper currently hides or interprets some register indices
in order to match users expectations. This can be inconvenient for
developers, though, because it makes it harder to understand what's
really going on in the VSIR code when reading logs. With this change
the whole index structure is dumped.
Numeric types are used very frequently, and doing a tree search
each time one is needed tends to waste a lot of time.
I ran the compilation of ~1000 DXBC-TPF shaders randomly taken from
my collection and measured the performance using callgrind and the
kcachegrind "cycle count" estimation.
BEFORE:
* 1,764,035,136 cycles
* 1,767,948,767 cycles
* 1,773,927,734 cycles
AFTER:
* 1,472,384,755 cycles
* 1,469,506,188 cycles
* 1,470,191,425 cycles
So callgrind would estimate a 16% improvement at least.
shader_signature_find_element_for_reg() is also used in the TPF parser,
where the program has not been validated yet, so it must not crash
on errors.
The I/O normaliser can instead assume that the shader is already
validated.
This fixes a crash with a shader used by The Falconeer. The bug is still
present, because the shader will be incorrectly rejected, but at least
the vkd3d-shader will fail gracefully.