So long as we adhere to the fence registers rules for alignment and no
overlaps (including with unfenced accesses to linear memory) and account
for the tiled access in our size allocation, we do not have to allocate
the full fenced region for the object. This allows us to fight the bloat
tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside
the GTT we still suffer the additional alignment constraints, so it doesn't
magic allow us to render larger scenes without stalls -- we need the
expanded GTT and fence pipelining to overcome those...]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Like before add a parameter mappable (also to gem_object_pin) and
set it depending upon the context. Only bos that are brought into
the gtt due to an execbuffer call can be put into the unmappable
part of the gtt, everything else (especially pinned objects) need
to be put into the mappable part of the gtt.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Preparing the ringbuffer for adding new commands can fail (a timeout
whilst waiting for the GPU to catch up and free some space). So check
for any potential error before overwriting HEAD with new commands, and
propagate that error back to the user where possible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The ringbuffer keeps a pointer to the parent device, so we can use that
instead of passing around the pointer on the stack.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This should fix the error along the reset path were we tried to clear the
tail register by setting it to 0, but were in fact setting it to the
current value and complaining when it did not reset to 0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Based on an original patch by Zhenyu Wang, this initializes the BLT ring for
SandyBridge and enables support for user execbuffers.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... before someone tries to use it. The code both calls
intel_ring_begin/advance() and open-codes the bookkeeping performed by
those two functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In commit 8187a2b, the number of dwords used in the ringbuffer for
executing the batch buffer was erroneously changed from 2 to 4.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If userspace is submitting so many long running batches that the ring
becomes full, throttle by sleeping for a 1ms before checking for free
space. Simply yielding was causing excessive scheduler overhead whilst
making no progress.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's the same code, essentially, so kill all copies safe one unified
version.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All functions are extremely similar, so fold them into one generic
implementation.
This function isn't used anyway, because there's not yet a bsd ring
error state dumper.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Two macros that use a base address for HWS_PGA were missing, add them.
Also switch the remaining users of *_ACTHD to the ring-base one.
Kill the other ring-specific macros because they're now unused.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[ickle: And silence checkpatch whilst in the vicinity]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid cause latencies in other clients by not taking the global struct
mutex and moving the per-client request manipulation a local per-client
mutex. For example, this allows a compositor to schedule a page-flip
(through X) whilst an OpenGL application is monopolising the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>