The practical effect this has is that we avoid potential trailing padding at
the end of DXBC blobs. Unfortunately this also means we need to be more
careful about using bytecode_get_size() to find the offset where subsequent
data would get written, although in many cases this follows a put_u32() call.
Without this patch, dp3 and dp4 map src swizzles to the dst writemask,
which is not correct.
Before b84f560bdf, these ops worked
despite this, because the dst register had, incorrectly, the full
writemask.
To solve this problem, write_sm1_binary_op_dot() is introduced,
similarly to write_sm4_binary_op_dot().
This was originally left alone in order to allow functions without early return
to succeed, since in that case we would already emit the correct bytecode
despite not handling the HLSL_IR_JUMP_RETURN instruction.
Now that we lower return statements, however, any unhandled instructions are
either definitely going to result in invalid bytecode, or rare enough that it's
not worth returning success anyway.
SM1 dp2add doesn't map src swizzles to the dst writemask, also it
expects the last argument to have a replicate swizzle.
Before this patch we were writing the operation as:
```
dp2add r0.x, r1.x, r0.x, r2.x
```
and now it is:
```
dp2add r0.x, r1.xyxx, r0.xyxx, r2.x
```
dp2add now has its own function, write_sm1_dp2add(), since it seems to
be the only instruction with this structure.
Ideally we would be using the default swizzles for the first two src
arguments:
```
dp2add r0.x, r1, r0, r2.x
```
since, according to native's documentation, these are supported for all
sm < 4.
But this change -- along with following the convention of repeating the
last component of the swizzle when fewer than 4 components are to be
specified -- would require more global changes, probably in
hlsl_swizzle_from_writemask() and hlsl_map_swizzle().
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.
This doesn't hold anything other than a list, nor do I have any immediate plans
for it to hold anything other than a list, but I'm adding it for some degree of
clarity. Passing around untyped list pointers is not my favourite hobby.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
For the sake of simplicity and clarity, especially in the interest of allowing
us to have expressions with larger numbers of terms.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>