We have to distinguish between the "bind count" and the "allocation size"
of variables.
The "allocation size" affects the starting register id for the resource to
be allocated next, while the "bind count" is determined by the last field
actually used. The former may be larger than the latter.
What we are currently calling hlsl_reg.bind_count is actually the
"allocation size", so a rename is in order.
The real "bind count", which will be introduced in following patches,
is important because it is what should be shown in the RDEF table and
some resource allocation rules depend on it.
For instance, for this shader:
texture2D texs[3];
texture2D tex;
float4 main() : sv_target
{
return texs[0].Load(int3(0, 0, 0)) + tex.Load(int3(0, 0, 0));
}
the variable "texs" has a "bind count" of 1, but an "allocation size" of
3:
// Resource Bindings:
//
// Name Type Format Dim HLSL Bind Count
// ------------------------------ ---------- ------- ----------- -------------- ------
// texs texture float4 2d t0 1
// tex texture float4 2d t3 1
We are using the hlsl_ir_var.is_uniform flag to indicate when an object
is a uniform copy created from a variable with the HLSL_STORAGE_UNIFORM
modifier.
We should be checking for this instead of the HLSL_STORAGE_UNIFORM flag
which is also set to 1 for the original variables, and there should be
no reason to use this flag instead of "is_uniform" after the uniform
copies and combined/separated samplers are created.
After lowering the derefs path to a single offset node, there was no way
of knowing the type of the referenced part of the variable. This little
modification allows to avoid having to pass the data type everywhere and
it is required for supporting instructions that reference objects
components within struct types.
Since deref->data_type allows us to retrieve the type of the deref,
deref->offset_regset is no longer necessary.
lower_narrowing_casts() currently creates a new cast calling
hlsl_new_cast(). This cast may be redundant, but it is not folded, which
is making SM1 emit an unnecessary fixme in some shaders:
Aborting due to not yet implemented feature: SM1 "cast" expression.
Other passes that call hlsl_new_cast() are lower_int_division() and
lower_int_modulus(), so the new fold_redundant_casts() pass is called
after these as well.
Non-constant vector indexing is not solved with relative addressing
in the register indexes because this indexation cannot be at the level
of register-components.
Mathematical operations must be used instead.
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.
This patch makes index expressions on resources hlsl_ir_index nodes
instead of hlsl_ir_resource_load nodes, because it is not known if they
will be used later as the lhs of an hlsl_ir_resource_store.
For now, the only benefit is consistency.
Since in SM1 all vector types use 4 register components, and since SM1
doesn't consider vectors of different dimx incompatible, it is necessary
to ensure that the semantic var is created with dimx=4, and to add a
cast node.
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
This node type is intended for use during parse-time.
While we parse an indexing expression such as "a[3]", we don't know if
it will end up as part of an expression (in which case it must be folded
into a load) or it is for the lhs of a store (in which case it must be
folded into the store's deref).
Otherwise we may create nodes of different dimensions than the ones we
are replacing.
"count" is the number of components of the source deref (without
considering the swizzle), while "instr_component_count" is the actual
number of components of the instruction to be replaced.