The primary goal here is to move compilation profile type and version
check outside of a parsing stage. Default values for parameters were
never subjected to this fixup, and it does look tpf-specific, so moving
it where it belongs.
Signed-off-by: Nikolay Sivov <nsivov@codeweavers.com>
Our ASM dumper currently hides or interprets some register indices
in order to match users expectations. This can be inconvenient for
developers, though, because it makes it harder to understand what's
really going on in the VSIR code when reading logs. With this change
the whole index structure is dumped.
SPIR-V images have a "depth" parameter that, as far as I understand
(the spec doesn't look terribly clear in that regard), specifies
whether the image can be used for depth-comparison operations.
In TPF (and therefore in VSIR) the same information is specified
on the sampler type instead of on the image type. This puts us in
a hard spot, because in principle an image can be used with
many different samplers, and the mapping might even be unknown
at compilation time, so it's not clear how we should define our
images.
We currently have some algorithms to deal with that, but they are
incomplete and lead to SPIR-V validation errors like:
Expected Image to have the same type as Result Type Image
%63 = OpSampledImage %62 %59 %61
The problem here is that the image has a non-depth type, but is
being sampled as a depth image. This check was added recently to
SPIRV-Tools, so we became aware of the problem.
As I said, it's not easy in general to decide whether an image is
going to be sampled with depth-comparison operators or not.
Fortunately the SPIR-V spec allow to mark the depth parameter as
unknown (using value 2), so until we come up with something better
we use that for all images to please the validator and avoid
giving misleading hints to the driver.
Numeric types are used very frequently, and doing a tree search
each time one is needed tends to waste a lot of time.
I ran the compilation of ~1000 DXBC-TPF shaders randomly taken from
my collection and measured the performance using callgrind and the
kcachegrind "cycle count" estimation.
BEFORE:
* 1,764,035,136 cycles
* 1,767,948,767 cycles
* 1,773,927,734 cycles
AFTER:
* 1,472,384,755 cycles
* 1,469,506,188 cycles
* 1,470,191,425 cycles
So callgrind would estimate a 16% improvement at least.
If the condition and argument types are compatible, i.e. there is no broadcast,
the resulting shape should be the shape of the arguments, not the shape of the
condition.
Currently, if an expression successfully parses according to the bison grammar,
but for one reason or another cannot generate a meaningful IR instruction, we
abort parsing with YYABORT. This includes, for example, an undefined variable or
function, invalid swizzle or field reference, or a constructor with a complex or
non-numeric data type.
Aborting parsing is unfortunate, however, because it means that any further
errors in the program cannot be caught by the programmer, increasing the number
of times they will need to fix errors and recompile.
The idea of this patch is that any such expression will instead generate an IR
node whose data type is of HLSL_CLASS_ERROR. Any further expression which would
consume an "error" typed instruction will instead immediately return an
expression of type "error" (probably the same one) instead of aborting or doing
any other type-checking.
Currently these "error" instructions should not pass the parsing stage, since
hlsl_compile_shader() will immediately notice that compilation has failed and
skip any optimization, lowering, or bytecode-writing.
A further direction to take this is to pre-allocate one "error" expression
immediately when creating the HLSL parser, and return that expression when we
fail to allocate an hlsl_ir_node of any type. This means we do not need to
handle allocation errors when constructing nodes, saving us quite a lot of error
handling (which is not only tedious but currently often broken, if nothing else
by virtue of neglecting cleanup of local variables).
I ran the compilation of ~1000 DXBC-TPF shaders randomly taken from
my collection and measured the performance using callgrind and the
kcachegrind "cycle count" estimation.
BEFORE:
* 1,846,641,596 cycles
* 1,845,635,336 cycles
* 1,841,335,225 cycles
AFTER:
* 1,764,035,136 cycles
* 1,767,948,767 cycles
* 1,773,927,734 cycles
So callgrind would estimate a 3.6% improvement at least.
The counterpoint is that the caller might get an allocation that
is potentially bigger than necessary. I would expect that allocation
to be rather short-lived anyway, so that's probably not a problem.
shader_signature_find_element_for_reg() is also used in the TPF parser,
where the program has not been validated yet, so it must not crash
on errors.
The I/O normaliser can instead assume that the shader is already
validated.
This fixes a crash with a shader used by The Falconeer. The bug is still
present, because the shader will be incorrectly rejected, but at least
the vkd3d-shader will fail gracefully.