These registers are automatically incremented by the hardware during
transform feedback to track where the next streamed vertex output
should go. Unlike the previous generation, which had a packet for
setting the corresponding registers to a defined value, gen7 only has
MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads
an arbitrary register), so we need to do it from the kernel, and it
needs to be settable atomically with the batchbuffer execution so that
two clients doing transform feedback don't stomp on each others'
state.
Instead of building a more complicated interface involcing setting the
registers to a specific value, just set them to 0 when asked and
userland can tweak its pointers accordingly.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
The docs say this is required for Gen7, and since the bit was added for
Gen6, we are also setting it there pit pf paranoia. Particularly as
Chris points out, if PIPE_CONTROL counts as a 3d state packet.
This was found through doc inspection by Ken and applies to Gen6+;
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
dev_priv keeps track of the current addressing mode that gets set at
execbuffer time. Unfortunately the existing code was doing this before
acquiring struct_mutex which leaves a race with another thread also
doing an execbuffer. If that wasn't bad enough, relocate_slow drops
struct_mutex which opens a much more likely error where another thread
comes in and modifies the state while relocate_slow is being slow.
The solution here is to just defer setting this state until we
absolutely need it, and we know we'll have struct_mutex for the
remainder of our code path.
v2: Keith noticed a bug in the original patch.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Keith Packard <keithp@keithp.com>
While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.
I don't expect this code to make semaphores better or worse, but you
never know...
v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.
v3:
Ignored one of Keith's suggestions.
v4:
Removed some bloat per Daniel's recommendation.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
This reverts commit 4a684a4117.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce. (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).
Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.
Signed-off-by: Keith Packard <keithp@keithp.com>
Along the fast path for relocation handling, we attempt to copy directly
from the user data structures whilst holding our mutex. This causes
lockdep to warn about circular lock dependencies if we need to pagefault
the user pages. [Since when handling a page fault on a mmapped bo, we
need to acquire the struct mutex whilst already holding the mm
semaphore, it is then verboten to acquire the mm semaphore when already
holding the struct mutex. The likelihood of the user passing in the
relocations contained in a GTT mmaped bo is low, but conceivable for
extreme pathology.] In order to force the mm to return EFAULT rather
than handle the pagefault, we therefore need to disable pagefaults
across the relocation fast path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)
However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.
Reported-by: Andi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This seems to be running stably on my test laptop, so hopefully the
reported hangs where just symptoms of other bugs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Userspace has a legitimate requirement to use a delta that points to
outside of the target bo, and so we need to enable this. (As this is an
abi break, albeit a relaxation of the current restrictions, mark the change
with a new flag.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we just need a temporary array whilst performing the relocations for
the execbuffer, first attempt to allocate using kmalloc even if it is
not of order page-0. This avoids the overhead of remapping the
discontiguous array and so gives a moderate boost to execution
throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dave Airlie spotted that we had a potential bug should we ever rearrange
the drm_i915_gem_object so not the base drm_gem_object was not its first
member. He noticed that we often convert the return of
drm_gem_object_lookup() immediately into drm_i915_gem_object and then
check the result for nullity. This is only valid when the base object is
the first member and so the superobject has the same address. Play safe
instead and use the compiler to convert back to the original return
address for sanity testing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Tested-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hopefully, this is a temporary measure whilst the root cause is
understood. At the moment, we experience a hard hang whilst looping
urbanterror that has been identified as a result of the use of
semaphores, but so far only on SNB mobile.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752
Tested-by: mengmeng.meng@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After reordering the sequence of relocating objects, commit 6fe4f1404,
we can no longer rely on seeing all reloc targets prior to performing
the relocation. As a result we were ignoring the need to flush objects
from the render cache and invalidate the sampler caches, resulting in
rendering glitches. So we need to clear the relocation domains earlier.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On the fault path, commit 6fe4f140 introduction a regression whereby it
changed the sequence of the objects but continued to use the original
ordering of relocation entries. The result was that incorrect GTT offsets
were being fed into the execbuffer causing lots of misrendering and
potential hangs.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the mappable portion of the aperture is always a small subset at the
start of the GTT, it is allocated preferentially by drm_mm. This is
useful in case we ever need to map an object later. However, if you have
a large object that can consume the entire mappable region of the
GTT this prevents the batchbuffer from fitting and so causing an error.
Instead allocate all those that require a mapping up front in order to
improve the likelihood of finding sufficient space to bind them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Before releasing the lock in order to copy the relocation list from user
pages, we need to drop all the object references as another thread may
usurp and execute another batchbuffer before we reacquire the lock.
However, the code was buggy and failed to clear the list...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org