The handling of write masks and swizzles for 64 bit data types is
currently irregular: write masks are always 64 bit, while swizzles
are usually 32 bit, except for SSA registers with are 64 bit.
With this change we always use 64 bit swizzles, in order to make
the situation less surprising and make it easier to convert
registers between SSA and TEMP.
64 bit swizzles are always required to have X in their last two
components.
For example, this occurred in a shader:
reg_idx write_mask
0 xyz
1 xyzw
2 xyzw
3 xyz
The dcl_indexrange instruction covered only xyz, so once merged, searching for
xyzw failed.
It is impossible to declare an input array where elements have different
component counts, but the optimiser can create this case. One way for
this to occur is to dynamically index input values via a local array
containing copies of the input values. The optimiser converts this to
dynamically indexed inputs.
They were originally made const because no optimization/normalization
pass existed. Now having to cast away const all the time is becoming
more and more burdening.
For relative addressing, the vkd3d_shader_registers must point to
another vkd3d_shader_src_param. For now, use the sm4_instruction to save
them, since the only purpose of this struct is to be used as paramter
for write_sm4_instruction.
RWBuffer objects would trigger a vkd3d_unreachable() in sm4_base_type().
It would be easy enough to add the required case there, but (manual,
unfortunately) tests show that we aren't supposed to write constant
buffer entries for objects in the first place, as you'd expect.
This particular path ends up being exercised by vkd3d's internal UAV
clear shaders, but unfortunately it looks like our RDEF data may have
more issues; the ability to write tests for it would seem helpful.
Co-authored-by: Evan Tang <etang@codeweavers.com>
Evan Tang reported that new fixmes appeared on the shader_runner when
running some of his tests after
f50d0ae2cb.
vkd3d:652593:fixme:shader_sm4_read_src_param Unhandled mask 0x4.
The change to blame seems to be this added line in
sm4_src_from_constant_value().
+ src->swizzle = VKD3D_SHADER_NO_SWIZZLE;
On tpf binaries the last 12 bits of each src register in an instruction
specify the swizzle, and there are 5 possible combinations:
Dimension NONE
-------- 00
Dimension SCALAR
-------- 01
Dimension VEC4, with a 4 bit writemask:
---- xxxx 00 01
Dimension VEC4, with an 8 bit swizzle:
xx xx xx xx 01 01
Dimension VEC4, with a 2bit scalar dimension number:
------ xx 10 01
So far, we have only seen src registers use 4 bit writemasks in a
single case: for vec4 constants, and it is always zero.
So we expect this:
---- 0000 00 01
Now, I probably wanted to initialize src->swizzle to zero when writing
constants, but VKD3D_SHADER_NO_SWIZZLE is not zero, it is actually the
default swizzle:
11 10 01 00
And the last 4 bits (0x4) get written in the mask part, which causes
the reader to complain.
Adding extra bits to instr->opcode doesn't seem correct, given that it
is an enum.
For instance, get_opcode_info() would return NULL if additional bits are
added to instr->opcode. This is not a problem now because that function
is called when reading and not writing.
Components only span across a single regset, so instead of expecting the
regset as input for the offset, hlsl_type_get_component_offset() can
actually retrieve it.
vkd3d-shader/tpf.c:3810:39: warning: passing argument 2 of ‘sm4_register_from_node’ from incompatible pointer type [-Wincompatible-pointer-types]
vkd3d-shader/tpf.c:4750:59: warning: passing argument 3 of ‘sm4_register_from_deref’ from incompatible pointer type [-Wincompatible-pointer-types]
Change to use uint32_t as requested.
We are passing map writemasks that may have more components than the
constant's data type, so a (j < width) check is added to avoid reading
components from the constant that are not intended to be used.
Remaining values in the register are initialized to zero.
We have to distinguish between the "bind count" and the "allocation size"
of variables.
The "allocation size" affects the starting register id for the resource to
be allocated next, while the "bind count" is determined by the last field
actually used. The former may be larger than the latter.
What we are currently calling hlsl_reg.bind_count is actually the
"allocation size", so a rename is in order.
The real "bind count", which will be introduced in following patches,
is important because it is what should be shown in the RDEF table and
some resource allocation rules depend on it.
For instance, for this shader:
texture2D texs[3];
texture2D tex;
float4 main() : sv_target
{
return texs[0].Load(int3(0, 0, 0)) + tex.Load(int3(0, 0, 0));
}
the variable "texs" has a "bind count" of 1, but an "allocation size" of
3:
// Resource Bindings:
//
// Name Type Format Dim HLSL Bind Count
// ------------------------------ ---------- ------- ----------- -------------- ------
// texs texture float4 2d t0 1
// tex texture float4 2d t3 1
We will add register information lookup tables on this struct later.
They will be used by sm4_encode_register(), and thus, they will be
required by write_sm4_instruction().
This struct is required for handling both whole-variable resources for
SM < 5 and single-component resources for SM 5 in the same way, when
writting the RDEF block and resource declarations within the shader.
After lowering the derefs path to a single offset node, there was no way
of knowing the type of the referenced part of the variable. This little
modification allows to avoid having to pass the data type everywhere and
it is required for supporting instructions that reference objects
components within struct types.
Since deref->data_type allows us to retrieve the type of the deref,
deref->offset_regset is no longer necessary.
Store it in the shader_desc, and declare temps from that when compiling SPIR-V,
instead of parsing dcl_instructions.
As part of this change, we declare a single, global temps array (with Private
scope instead of Function) which is as large as the maximum of all dcl_temps
instructions. It is not clear to me whether this will improve, hurt, or have no
significant effect on the lower-level compiler. An alternative is to still
redeclare a new temps array every time (although still with a smaller size).
Instead of modifying the swizzle after calling sm4_src_from_node().
This fixes the case where sm4_src_from_node() returns an immediate constant.
Fixes: a471c5567a
The new fixmes can be triggered in presence of object components within
structs (for SM5).
In shaders such as this one:
struct apple
{
Texture2D tex : TEX;
float4 color : COLOR;
};
float4 main(struct apple input) : sv_target
{
return input.tex.Load(int3(1, 2, 3));
}
Or this one:
struct
{
Texture2D tex;
float4 color;
} s;
float4 main() : sv_target
{
return s.tex.Load(int3(1, 2, 3));
}
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.