These can be disassembled by D3DDisassemble() just fine, and perhaps
more importantly, shader model 1 vertex shaders do not require dcl_
instructions in Direct3D 8.
The implementation of upload_buffer_data_with_states(), unlike the
implementation of upload_texture_data_with_states(), does not expect a
pointer to a D3D12_SUBRESOURCE_DATA, but rather, a direct pointer to the
data.
The generated Vulkan calls look right and do not trigger any
validation error, but the returned timestamp is 0. A valid
timestamp is returned if the CopyResource() call is commented,
or the second EndQuery() call is moved before CopyResource(),
or the first EndQuery() call is commented. I am not seeing any
sensible pattern here, so I guess there is just a bug in
MoltenVK.
Specifically, MoltenVK seems to be able to load from stencil, but
the specific replicating swizzle (repeating the stencil value on
all the channels) is not honored. The stencil value is read only
on the red channel.
At the current moment this is a little odd because for SM1 [test]
directives are skipped, and the [shader] directives are not executed by
the shader_runner_vulkan.c:compile_shader() but by the general
shader_runner.c:compile_shader(). So in principle it is a little weird
that we go through the vulkan runner.
But fret not, because in the future we plan to make the parser agnostic
to the language of the tests, so we will get rid of the general
shader_runner.c:compile_shader() function and instead call a
runner->compile_shader() function, defined for each runner. Granted,
most of these may call a generic implementation that uses native
compiler in Windows, and vkd3d-shader on Linux, but it would be more
conceptually correct.
If the runners require multiple calls to run_shader_tests() for
different shader model ranges, these are moved inside the sole runner
call.
For the same reason, the trace() messages are also moved inside the
runner calls.
We can now pass (sm<4) and (sm>=4) to "fail" and "todo" qualifiers, and
we can use multiple of these qualifier arguments using "&" for AND and
"|" for OR.
examples:
todo(sm>=4 & sm<6)
todo(sm<4 | sm>6)
parenthesis are not supported.
Adding additional model ranges for the tests, if we need them, should be
easier now, since they only have to be added to the "valid_args" array.
The relative-addressed case in shader_register_normalise_arrayed_addressing()
leaves the control point id in idx[0], while for constant register
indices it is placed in idx[1]. The latter case could be fixed instead,
but placing the control point count in the outer dimension is more
logical.
The FXC optimiser sometimes converts a local array of input values into
direct array addressing of the inputs, which can result in a
dcl_indexrange instruction spanning input elements with different masks.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=56162
Storing to a vector component using a non-constant index is not allowed
on profiles lower than 6.0. Unless this happens inside a loop that can be
unrolled, which we are not doing yet.
For this reason, a validate_nonconstant_vector_store_derefs pass is
added to detect these cases.
Ideally we would want to emit an hlsl_error on this pass, but before
implementing loop unrolling, we could reach this point on valid HLSL.
Also, as pointed out by Nikolay in the mentioned bug, currently
new_offset_from_path_index() fails an assertion when this happens,
because it expects an hlsl_ir_constant, so a check is added.
It also felt correct to emit an hlsl_fixme there, despite the
redundancy.
Apparently Metal doesn't support specifying a bias directly in the
sampler, and, with "nearest" mip filtering, it doesn't switch
precisely at LOD 0.5 (though still between 0.5 and 0.6).
Currently, if a probe fails, it will print the line number of the [test]
block the probe is in, not the line number of the probe itself. This
makes it somewhat difficult to debug.
This commit makes it print the line number that a test fails at.
This preempts us from replacing a swizzle incorrectly, as in the
following example:
1: A.x = 1.0
2: A
3: A.x = 2.0
4: @2.x
were @4 ends up being 2.0 instead of 1.0, because that's the value stored in
A.x at time 4, and we should be querying it at time 2.
This also helps us to avoid replacing a swizzle with itself in copy-prop
which can result in infinite loops, as with the included tests this commit.
Consider the following sequence of instructions:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @1.x
Current copy-prop would replace 5 so it points to @3 now:
1 : A
2 : B = @1
3 : B
4 : A = @3
5 : @3.x
But in the next iteration it would make it point back to @1, keeping it
spinning infinitively.
The solution is to index the instructions and don't replace the swizzle
if the new load happens after the current load.
The included test fails because copy_propagation_transform_swizzle()
is using the value recorded for the variable when the swizzle is being
read, and not the swizzle's load.
This was broken by commit e390bc35e2c9b0a2110370f916033eea2366317e; that
commit fixed the source count for these instructions, but didn't adjust
shader_sm1_skip_opcode(). Note that this only affects shader model 1;
later versions have a token count embedded in the initial opcode token.
For relative addressing, the vkd3d_shader_registers must point to
another vkd3d_shader_src_param. For now, use the sm4_instruction to save
them, since the only purpose of this struct is to be used as paramter
for write_sm4_instruction.
I'm not sure of what's happening here, but it seems that this change
fixes a crash when running on Windows in the CI. Since most of the
test excludes SM1-3 anyway, this shouldn't be a big loss.
I.e., top-left origin, and clockwise front-facing. These end up cancelling
each other out when drawing full-screen quads, but that's not necessarily true
for other geometry.
The location of dxcompiler should be set during configuration with
'DXCOMPILER_LIBS=-L/path/to/dxcompiler', and then at runtime with
LD_LIBRARY_PATH, WINEPATH or PATH as applicable.
A new 'fail(sm<6)' decoration is needed on many shader declarations
because dxcompiler succeeds on many shaders which fail with fxc. The
opposite case is less common and is flagged with 'fail(sm>=6)'. A few
tests cause dxcompiler to crash or hang, so these are avoided using
[require], which now skips tests until reset instead of exiting. Also,
'todo(sm<6)' and 'todo(sm>=6)' are used to separate checking of results.
A struct declaration with variables is now absorbed into the 'declaration'
rule, like any other variable declaration.
A struct declaration without variables is now reduced to the
'struct_declaration_without_vars' rule.
They both are reduced to a 'declaration_statement' in the end.
In a declaration with multiple variables, the variables must be created
before the initializer of the next variable is parsed. This is required
for initializers such as:
float a = 1, b = a, c = b + 1;
A requisite for this is that the type information is parsed in the same
rule as the first variable (as a variable_def_typed) so it is
immediately available to declare the first variable. Then, the next
untyped variable declaration is parsed, and the type from the first
variable can be used to declare the second, before the third is parsed,
and so on.
Non-constant vector indexing is not solved with relative addressing
in the register indexes because this indexation cannot be at the level
of register-components.
Mathematical operations must be used instead.
Currently, the compiler requires that dereferences be HLSL_IR_CONSTANT, so that
it can compute the offset at compile time. However, scenarios such as this test
will produce a dereference with HLSL_IR_EXPR, which will generate an error.
Passing this test in particular will require adding support for SM4 relative
addressing, as well as support for non-constant indexing in general.
Signed-off-by: Ethan Lee <flibitijibibo@gmail.com>
In Shader Model 6 each signature element can span a range of register
indices, or 'rows', and system values do not share a register index with
non-system values. Inputs and outputs are referenced by element index
instead of register index. This patch merges multiple signature elements
into a single element under the following conditions:
- The register index in a load or store is specified dynamically by
including a relative address parameter with a base register index. The
dcl_index_range instruction is used to identify these.
- A register declaration is split across multiple elements which declare
different components of the register.
- A patch constant function writes tessellation factors. These are an
array in SPIR-V, but in SM 5.x each factor is declared as a separate
register, and these are dynamically indexed by the fork/join instance
id. Elimination of multiple fork/join phases converts the indices to
constants, but merging the signature elements into a single arrayed
element matches the SPIR-V output.
All references to input/output register indices are converted to element
indices. If a relative address is present, the element index is moved up
a slot so it cannot be confused with a constant offset. Existing code
only handles register index relative addressing for tessellation factors.
This patch adds generic support for it.
The new fixmes can be triggered in presence of object components within
structs (for SM5).
In shaders such as this one:
struct apple
{
Texture2D tex : TEX;
float4 color : COLOR;
};
float4 main(struct apple input) : sv_target
{
return input.tex.Load(int3(1, 2, 3));
}
Or this one:
struct
{
Texture2D tex;
float4 color;
} s;
float4 main() : sv_target
{
return s.tex.Load(int3(1, 2, 3));
}
Variables that contain more than one object (arrays or structs) require
the allocation of contiguous registers in the respective object
register spaces.
Otherwise, in the added test, we get:
vkd3d-compiler: vkd3d-shader/hlsl.c:452: hlsl_init_deref_from_index_chain: Assertion `chain' failed.
because on the path that triggers the following error:
E5002: Wrong type for argument 1 of 'tex3D': expected 'sampler' or 'sampler3D', but got 'sampler2D'.
a NULL params.resource is passed to hlsl_new_resource_load() and
then to hlsl_init_deref_from_index_chain().
The use of the hlsl_semantic.reported_duplicated_output_next_index field
allows reporting multiple overlapping indexes, such as in the following
vertex shader:
void main(out float1x3 x : OVERLAP0, out float1x3 y : OVERLAP1)
{
x = float3(1.0, 2.0, 3.2);
y = float3(5.0, 6.0, 5.0);
}
apple.hlsl:1:41: E5013: Output semantic "OVERLAP1" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP1" is here.
apple.hlsl:1:41: E5013: Output semantic "OVERLAP2" is used multiple times.
apple.hlsl:1:13: First use of "OVERLAP2" is here.
While at the same time avoiding reporting overlaps more than once for
large arrays:
struct apple
{
float2 p : sv_position;
};
void main(out apple aps[4])
{
}
apple.hlsl:3:8: E5013: Output semantic "sv_position0" is used multiple times.
apple.hlsl:3:8: First use of "sv_position0" is here.
Thanks to Giovanni for the second set of tests! Note that the
tolerance for the final pixel was set much higher than the others;
this test seems to be an issue for some devices (in my case, a 7900
XTX running RADV).
Co-authored-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Ethan Lee <flibitijibibo@gmail.com>
From this point on, it is no longer true that only hlsl_ir_loads can
return objects, because an object can also come from chain of
hlsl_ir_indexes that ends in an hlsl_ir_load.
The lower_index_loads pass takes care of lowering all hlsl_ir_indexes
into hlsl_ir_loads.
For this reason, hlsl_resource_load_params now expects both the resource
as the sampler to be just an hlsl_ir_node pointer instead of a pointer
to a more specific hlsl_ir_load.
Some drivers (AMD Radeon RX 6700 XT, with radeonsi from Mesa 22.2.0-rc3) emit
less than one invocation per pixel, presumably because they detect that the
shader control flow is uniform for all pixels. Having the control flow depend on
SV_Position avoids this test failure.
Cf. 34bd0dd0704c613abef8a9aa3ba2a2507ed02843 in wine.
The expected use case where a heap is freed before its contained
resources is not reasonably testable, so the ability to place a new
resource is tested instead.
But still throw hlsl_fixme() when there is more than one.
Prioritizing among multiple compatible function overloads in the same way
as the native compiler would require systematic testing.
This was originally left alone in order to allow functions without early return
to succeed, since in that case we would already emit the correct bytecode
despite not handling the HLSL_IR_JUMP_RETURN instruction.
Now that we lower return statements, however, any unhandled instructions are
either definitely going to result in invalid bytecode, or rare enough that it's
not worth returning success anyway.
Vectors cannot be used as array indexes, however, single-component
swizzles (such as vec.x) can be used.
This suggests that single-component swizzles should actually be
scalars and not vectors of dimx = 1.
It is worth noting that the use of single-component swizzles on scalars
should still be allowed.
Co-authored-by: Francisco Casas <fcasas@codeweavers.com>
Co-authored-by: Zebediah Figura <zfigura@codeweavers.com>
Because copy_propagation_transform_object_load() replaces a deref
instead of an instruction, it is currently prone to two problems:
1- It can replace a deref with the same deref, returning true every
time and getting the compilation stuck in an endless loop of
copy-propagation iterations.
2- When performed multiple times in the same deref, the second time it
can replace the deref with a deref from a temp that is only valid in
another point of the program execution, resulting in an incorrect value.
This patch preempts this by avoiding replacing derefs when the new deref
doesn't point to a uniform variable. Because, uniform variables cannot
be written to.
Reinterpret min16float, min10float, min16int, min12int, and min16uint
as their regular counterparts: float, float, int, int, uint,
respectively.
A proper implementation would require adding minimum precision
indicators to all the dxbc-tpf instructions that use these types.
Consider the output of fxc 10.1 with the following shader:
uniform int i;
float4 main() : sv_target
{
min16float4 a = {0, 1, 2, i};
min16int2 b = {4, i};
min10float3 c = {6.4, 7, i};
min12int d = 9.4;
min16uint4x2 e = {14.4, 15, 16, 17, 18, 19, 20, i};
return mul(e, b) + a + c.xyzx + d;
}
However, if the graphics driver doesn't have minimum precision support,
it ignores the minimum precision indicators and runs at 32-bit
precision, which is equivalent as working with regular types.
If a hlsl_ir_load loads a variable whose components are stored from different
instructions, copy propagation doesn't replace it.
But if all these instructions are constants (which currently is the case
for value constructors), the load could be replaced with a constant value.
Which is expected in some other instructions, e.g. texel_offsets when
using aoffimmi modifiers.
For instance, this shader:
```
sampler s;
Texture2D t;
float4 main() : sv_target
{
return t.Gather(s, float2(0.6, 0.6), int2(0, 0));
}
```
results in the following IR before applying the patch:
```
float | 6.00000024e-01
float | 6.00000024e-01
uint | 0
| = (<constructor-2>[@4].x @2)
uint | 1
| = (<constructor-2>[@6].x @3)
float2 | <constructor-2>
int | 0
int | 0
uint | 0
| = (<constructor-5>[@11].x @9)
uint | 1
| = (<constructor-5>[@13].x @10)
int2 | <constructor-5>
float4 | gather_red(resource = t, sampler = s, coords = @8, offset = @15)
| return
| = (<output-sv_target0> @16)
```
and this IR afterwards:
```
float2 | {6.00000024e-01 6.00000024e-01 }
int2 | {0 0 }
float4 | gather_red(resource = t, sampler = s, coords = @2, offset = @3)
| return
| = (<output-sv_target0> @4)
```
On cross builds, shaders are compiled with d3dcompiler_47.dll and
run with d3dN.dll. On non-cross builds, shaders are compiled with
vkd3d-shader and run with d3dN.dll (on Windows) or Vulkan and vkd3d
(on Linux).
validate_static_object_references() validates that uninitialized static
objects are not referenced in the shader.
In case a static variable contains both numeric and object types, the
"Static variables cannot have both numeric and resource components."
error should preempt uninitialized numeric values to reach further
compilation steps.
Note that in the future we should call
validate_static_object_references() after DCE and pruning branches,
because shaders such as these compile (at least in more modern versions
of the native compiler):
Branch pruning:
```
static RWTexture2D<float> tex;
float4 main() : sv_target
{
if (0)
{
tex[int2(0, 0)] = 2;
}
return 0;
}
```
DCE:
```
static Texture2D tex;
uniform uint i;
float4 main() : sv_target
{
float4 unused = tex.Load(int3(0, 1, 2));
return 0;
}
```
These are "todo" tests in hlsl-static-initializer.shader_test
that depend on this.
We are currently not initializing static values to zero by default.
Consider the following shader:
```hlsl
static float4 va;
float4 main() : sv_target
{
return va;
}
```
we get the following output:
```
ps_5_0
dcl_output o0.xyzw
dcl_temps 2
mov r0.xyzw, r1.xyzw
mov o0.xyzw, r0.xyzw
ret
```
where r1.xyzw is not initialized.
This patch solves this by assigning the static variable the value of an
uint 0, and thus, relying on complex broadcasts.
This seems to be the behaviour of the 9.29.952.3111 version of the native
compiler, since it retrieves the following error on a shader that lacks
an initializer on a data type with object components:
```
error X3017: cannot convert from 'uint' to 'struct <unnamed>'
```
Using add_unary_arithmetic_expr() instead of hlsl_new_unary_expr()
allows the intrinsic to work with matrices.
Otherwise we get:
E5017: Aborting due to not yet implemented feature: Copying from unsupported node type.
because an HLSL_IR_EXPR reaches split_matrix_copies().
Some intrinsics have different rules for the allowed data types than
expressions:
- Vectors and matrices at the same time are not allowed, regardless of
their dimensions. Even if they have the same number of components.
- Any combination of matrices is always allowed, even those when no
matrix fits inside another, e.g.:
float2x3 is compatible with float3x2, resulting in float 2x2.
The common data type is the min on each dimension.
This is the case for max, pow, ldexp, clamp and smoothstep; which suggest that
it is the case for all intrinsics where the operation is applied element-wise.
Tests for mul() are also added as a counter-example where the operation
is not element-wise.
Until vkd3d-shader is patched, an atomic op on a typed buffer where
StorageImageReadWithoutFormat is available will cause SPIR-V validation
failure, and assertion in Mesa debug builds, because the image will be
declared with Unknown format.
Otherwise, for instance, the added test results in:
debug_hlsl_writemask: Assertion `!(writemask & ~VKD3DSP_WRITEMASK_ALL)' failed.
Which happens in allocate_variable_temp_register() when the variable's
type reg_size is <= 4 but its component count is larger, which may
happen if it contains objects.
Do not rely on a draw or dispatch command to do this.
This allows more efficiently testing syntax, in cases where testing the actual
shader functionality is not interesting.
Otherwise the test fails on a NVIDIA GeForce GTX 1050 Ti GPU.
The error being:
shader_runner:535:Section [test], line 9: Test failed: Got {2.72165507e-01, 4.08248246e-01, 5.44331014e-01, 6.80413783e-01}, expected {2.72165537e-01, 4.08248305e-01, 5.44331074e-01, 6.80413842e-01} at (0, 0).
This should silence warnings about some branches non returning any value
without requiring additional "return 0" statement or similar.
Also, in theory this might enable to compiler to optimize the program
a little bit more, though that's unlikely to have any measurable effect.
HLSL_ARRAY_ELEMENTS_COUNT_IMPLICIT (zero) is used as a temporal value
for elements_count for implicit size arrays.
This value is replaced by the correct one after parsing the initializer.
In case the implicit array is not initialized correctly, hlsl_error()
is called but the array size is kept at 0. So the rest of the code
must handle these cases.
In shader model 5.1, unlike in 5.0, declaring a multi-dimensional
object-type array with the last dimension implicit results in
an error. This happens even in presence of an initializer.
So, both gen_struct_fields() and declare_vars() first check if the
shader model is 5.1, the array elements are objects, and if there is
at least one implicit array size to handle the whole type as an
unbounded resource array.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
It is responsibility of the shader's programmer to ensure that
object references can be solved statically.
Resource arrays for ps_5_1 and vs_5_1 are an exception which is not
properly handled yet. They probably deserve a different object type.
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Otherwise we get false in implicit_compatible_data_types() when passing
types that are equal but not convertible according to
convertible_data_type(); e.g. getting:
"Can't implicitly convert from Texture2D<float4> to Texture2D<float4>."
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Fixes reflections in Control appearing with only their red component.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=52146
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
This currently fails if the shader loads from the UAV, because it causes
vkd3d-shader to specify the R32f format instead of Unknown.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Prepare to allow for dynamically changing the bound attachments in consecutive
draw calls.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Split the probe_vec4() directive into get_rt_readback() and release_readback().
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Move the "resource" field to a new "d3d12_resource_readback" structure
encapsulating struct resource_readback.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Strictly increasing timeline values must be mapped to fence virtual values
to avoid invalid use of Vulkan timeline semaphores. In particular, non-
increasing values and value jumps of >= 4G are permitted in d3d12.
Different virtual D3D12 command queues may map to the same Vulkan queue.
If a wait of value N is submitted on one command queue, and then a signal
for >= N is submitted on another, but they are sent to the same Vk queue,
the wait will never complete. The solution is to buffer out-of-order waits
and any subsequent queue commands until an unblocking signal value is
submitted to a different D3D12 queue, or signaled on the CPU.
Buffering out-of-order waits also fixes the old fence implementation so it
is fully functional, though a bit less efficient than timeline semaphores.
Based in part on vkd3d-proton patches by Hans-Kristian Arntzen. Unlike the
vkd3d-proton implementation, this patch does not use worker threads for
submissions to the Vulkan queue.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We would like to allow overriding the soname of libvulkan, in which case the
tests and demos should respect that override.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We would like to allow overriding the soname of libvulkan, in which case the
tests and demos should respect that override.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We would like to allow overriding the soname of libvulkan, in which case the
tests and demos should respect that override.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This commit includes work by Francisco Casas.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This commit includes work by Francisco Casas.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Francisco Casas <fcasas@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We will need to allocate some structures in the Vulkan backend; this is easier
if we don't need to worry about allocating them dynamically.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
We will need to allocate some structures in the Vulkan backend; this is easier
if we don't need to worry about allocating them dynamically.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
For compatibility with shader models before 4.0.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Ensures the new fence implementation using timeline semaphores handles
this correctly.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
D3D12 supports signalling a fence to a lower value, while Vulkan timeline
semaphores do not. On the GPU side this is handled by simply submitting
the signal anyway, if a test for this passes on device creation, because
working around this is impractical. For CPU signals the Vulkan semaphore
is replaced with a new one at the lower value only if no waits and/or
signals are pending on the GPU. Otherwise, a fixme is emitted.
Partly based on a vkd3d-proton patch by Hans-Kristian Arntzen (not
including the handling of lower fence values).
The old implementation is used if KHR_timeline_semaphore is not
available or GPU signals do not work for a lower value.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Use it to hold any type of resource, regardless of type and binding.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This is a bit clearer, and avoids colliding with other things named "context",
e.g. struct test_context, or d3d11 device contexts.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The existing implementation using virtual descriptor heaps, where Vk
descriptor sets are created for the bindings in the root descriptor tables,
is inefficient when multiple command lists are used with large descriptor
heaps. It also cannot support updating a descriptor set after it is bound.
This patch creates Vk sets for each D3D12 heap. Because D3D12 heaps
can contain CBV, SRV and UAV descriptors in the same heap, multiple Vk sets
are needed for each heap, however the total number of populated descriptors
is never more than (heap size + UAV counter count).
A new 'virtual_heaps' config option is introduced to make the old
implementation available when needed. It's not always possible to determine
if this is necessary when the device is created.
Up to nine Vk descriptor sets may be used. It's theoretically possible to
reduce this to eight by placing immutable samplers in the push descriptor
set layout, but contradictions in earlier versions of the Vulkan spec made
driver support inconsistent. The documentation was corrected in version
1.2.203.
This patch also adds support for UAV counter descriptor arrays. It's not
practical to add this in a separate patch due to complications with
combining the old UAV counter implementation with the new descriptor heap
implementation.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=47713
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=47154
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
They seem to work with mesa 21.3.7. Since some developers are
using older releases where the bug is not yet fixed, I am leaving
it marked as a bug.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
NVIDIA and AMD round differently the assignment of 0.5f to a UAV
of type R16G16_UNORM. NVIDIA rounds to 0x7fff and AMD to 0x8000.
According to both Vulkan and D3D12 specifications, both values
are acceptable, but the discrepancy currently appears as a
failure on NVIDIA cards.
Work around the issue by using 0.25f instead of 0.5f, which is
rounded as 0x4000 on both implementations.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The d3d12 specification forbids simultaneously binding more than one heap of a
given type.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Partly to make the tests easier to navigate, and partly to allow marking some
tests as SM4+.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Technically we shouldn't allow "uu" or "ll" either, but we also don't really
handle preprocessor parsing errors the way we should.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This reverts commit 41230b6e76.
The tests can hang and the Wait() implementation has issues.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Testing this before NULL event handling is patched results in a crash.
Based on a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This results in a valid format instead of NULL being returned for
buffers and any other case where DXGI_FORMAT_UNKNOWN is specified.
In some cases invalid use of a buffer or DXGI_FORMAT_UNKNOWN will
not result in E_INVALIDARG, and would need to be tested explicitly
if proven to be an issue.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Extends vkd3d_instance_create_info with struct vkd3d_host_time_domain_info
to allow host ticks per second to be changed from the default 10000000.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Based on a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
It's technically correct in warning about overwriting the entire
vec4 from the address of .x, if a bit overzealous. It's probably a
good warning to keep enabled.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Has no effect in Windows at least on AMD. Not calling SetDescriptorHeaps()
at all has no effect in Windows on any hardware so this is likely to be
universal too.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
The Wait() implementation was added some time ago and these tests succeed.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Otherwise if the input is located above position 0 in the private array
it will be swizzled incorrectly, e.g. a.yz instead of a.xy in
test_domain_shader_inputs().
Based on a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
For isoline tessellation, "density" is specified by OL0, and "detail" by OL1.
Based on a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This specifically tests the case where "count" would end up being one for
arrayed builtins in needs_private_io_variable().
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Based on a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
It avoids a compiler warning for me. There is no actual issue though.
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Based on a vkd3d-proton patch by Philip Rebohle.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Using vk_logic_op_from_d3d12() from a vkd3d-proton patch by Philip
Rebohle.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Windows returns E_INVALIDARG at least on AMD and Intel.
Psychonauts 2 attempts to create resources with this argument.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Part of a vkd3d-proton patch by Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Testing without the fixes in vkd3d_create_texture_uav() has nasty results in
radv and possibly others.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Based in part on vkd3d-proton patches by Philip Rebohle and Hans-Kristian Arntzen.
Signed-off-by: Conor McCarthy <cmccarthy@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Similarly to ReadFromSubresource, WriteToSubresource is currently
implemented only for images with linear tiling, which NVIDIA drivers
currently do not support.
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Nominally, at least. Not all of the early shader models explicitly
declare resources and samplers, so those may currently get missed for
those.
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
This is largely derived from the parser in Wine/wined3d, as of wine-6.18.
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
So as to avoid clashing with the preprocessor.
Signed-off-by: Zebediah Figura <zfigura@codeweavers.com>
Signed-off-by: Henri Verbeet <hverbeet@codeweavers.com>
Signed-off-by: Matteo Bruni <mbruni@codeweavers.com>
Signed-off-by: Giovanni Mascellani <gmascellani@codeweavers.com>
Signed-off-by: Alexandre Julliard <julliard@winehq.org>