Commit Graph

75 Commits

Author SHA1 Message Date
Chris Wilson a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson 016fd0c1ae drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer
Otherwise once we use the buffer with a BLT command on gen2/3, we will
always regard future command submissions as continuing the fenced
access. However, now that we flush/invalidate between every batch we can
drop this pessimism.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Daniel Vetter 6ac42f4148 drm/i915: Replace the complex flushing logic with simple invalidate/flush all
Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.

v2 (danvet): Keep around the flip waiting logic. It's gross and
broken, I know, but we can't just kill that thing ... even if we just
keep it around as a reminder that things are broken.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:54 +02:00
Chris Wilson 69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00
Chris Wilson 0201f1ecf4 drm/i915: Replace the pending_gpu_write flag with an explicit seqno
As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:52 +02:00
Chris Wilson 3bb73aba1e drm/i915: Allow late allocation of request for i915_add_request()
Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:51 +02:00
Eric Anholt 0da5cec1de drm/i915: Set the context before setting up regs for the context.
Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 10:39:59 +02:00
Chris Wilson 09cf7c9a12 drm/i915: Insert a flush between batches if the breadcrumb was dropped
If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:

commit de2b998552
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

which allowed intel_ring_begin to return -ERESTARTSYS and

commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which essentially disabled the flushing list.

The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already.  Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.

As soon as we try to unbind that object, things blow up.

Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.

Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-20 12:21:40 +02:00
Daniel Vetter cc889e0f6c drm/i915: disable flushing_list/gpu_write_list
This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-06-20 13:54:28 +02:00
Ben Widawsky 6e0a69dbc8 drm/i915/context: switch contexts with execbuf2
Use the rsvd1 field in execbuf2 to specify the context ID associated
with the workload. This will allow the driver to do the proper context
switch when/if needed.

v2: Add checks for context switches on rings not supporting contexts.
Before the code would silently ignore such requests.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
2012-06-14 17:36:21 +02:00
Chris Wilson a15817cf16 drm/i915: Check whether the ring is initialised prior to dispatch
Rather than use the magic feature tests HAS_BLT/HAS_BSD just check
whether the ring we are about to dispatch the execbuffer on is
initialised.

v2: Use intel_ring_initialized()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-19 22:39:53 +02:00
Chris Wilson acb87dfb4b drm/i915: Limit calling mark-busy only for potential scanouts
The principle of intel_mark_busy() is that we want to spot the
transition of when the display engine is being used in order to bump
powersaving modes and increase display clocks. As such it is only
important when the display is changing, i.e. when rendering to the
scanout or other sprite/plane, and these are characterised by being
pinned.

v2: Mark the whole device as busy on execbuffer and pageflips as well
and rebase against dinq for the minor bug fix to be immediately
applicable.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: fix compile fail.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-08 15:10:34 +02:00
Daniel Vetter 5e13a0c5ec Merge remote-tracking branch 'airlied/drm-core-next' into drm-intel-next-queued
Backmerge of drm-next to resolve a few ugly conflicts and to get a few
fixes from 3.4-rc6 (which drm-next has already merged). Note that this
merge also restricts the stencil cache lra evict policy workaround to
snb (as it should) - I had to frob the code anyway because the
CM0_MASK_SHIFT define died in the masked bit cleanups.

We need the backmerge to get Paulo Zanoni's infoframe regression fix
for gm45 - further bugfixes from him touch the same area and would
needlessly conflict.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-08 13:39:59 +02:00
Daniel Vetter dc257cf154 Merge tag 'v3.4-rc6' into drm-intel-next
Conflicts:
	drivers/gpu/drm/i915/intel_display.c

Ok, this is a fun story of git totally messing things up. There
/shouldn't/ be any conflict in here, because the fixes in -rc6 do only
touch functions that have not been changed in -next.

The offending commits in drm-next are 14415745b2..1fa611065 which
simply move a few functions from intel_display.c to intel_pm.c. The
problem seems to be that git diff gets completely confused:

$ git diff 14415745b2..1fa611065

is a nice mess in intel_display.c, and the diff leaks into totally
unrelated functions, whereas

$git diff --minimal  14415745b2..1fa611065

is exactly what we want.

Unfortunately there seems to be no way to teach similar smarts to the
merge diff and conflict generation code, because with the minimal diff
there really shouldn't be any conflicts. For added hilarity, every
time something in that area changes the + and - lines in the diff move
around like crazy, again resulting in new conflicts. So I fear this
mess will stay with us for a little longer (and might result in
another backmerge down the road).

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-07 14:02:14 +02:00
Daniel Vetter 6ebebc9206 drm/i915: disallow clip rects on gen5+
Unfortunately there has been dri1 userspace that used gem to manage
the gtt and hence also needed cliprects in the execbuf ioctl. So
we can't ever remove that code without breaking the ioctl abi.

But at least we can disable it on gen5+, because these horrible
versions of mesa have not supported these chips.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:29 +02:00
Ben Widawsky b2da9fe5d5 drm/i915: remove do_retire from i915_wait_request
This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-05-03 11:18:20 +02:00
Xi Wang 44afb3a043 drm/i915: fix integer overflow in i915_gem_do_execbuffer()
On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-23 22:32:15 +02:00
Xi Wang ed8cd3b2cd drm/i915: fix integer overflow in i915_gem_execbuffer2()
On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-23 22:32:02 +02:00
Chris Wilson 06d9813157 drm/i915: Remove the pipelined parameter from get_fence()
We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 13:15:43 +02:00
Chris Wilson 7b09638f45 drm/i915: Always flush tiling changes before accessing through the GTT
As we defer updating the fence register from set-tiling to the point of
use, we need to declare every access through the GTT as either fenced or
unfenced.

This patches fixes an old bug in the execbuffer relocation processing
which could conceivably be hit by a pathological userspace.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-18 10:48:38 +02:00
Ben Widawsky 2911a35b2e drm/i915: use semaphores for the display plane
In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:05 +02:00
Chris Wilson 9a5a53b392 drm/i915: Reorganise rules for get_fence/put_fence
By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-12 21:14:04 +02:00
Dave Airlie effbc4fd8e Merge branch 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel into drm-core-next
Daniel Vetter wrote
First pull request for 3.5-next, slightly large than usual because new
things kept coming in since the last pull for 3.4.
Highlights:
- first batch of hw enablement for vlv (Jesse et al) and hsw (Eugeni). pci
 ids are not yet added, and there's still quite a few patches to merge
 (mostly modesetting). To make QA easier I've decided to merge this stuff
 in pieces.
- loads of cleanups and prep patches spurred by the above. Especially vlv
 is a real frankenstein chip, but also hsw is stretching our driver's
 code design. Expect more to come in this area for 3.5.
- more gmbus fixes, cleanups and improvements by Daniel Kurtz. Again,
 there are more patches needed (and some already queued up), but I wanted
 to split this a bit for better testing.
- pwrite/pread rework and retuning. This series has been in the works for
 a few months already and a lot of i-g-t tests have been created for it.
 Now it's finally ready to be merged.  Note that one patch in this series
 touches include/pagemap.h, that patch is acked-by akpm.
- reduce mappable pressure and relocation throughput improvements from
 Chris.
- mmap offset exhaustion mitigation by Chris Wilson.
- a start at figuring out which codepaths in our messy dri1/ums+gem/kms
 driver we actually need to support by bailing out of unsupported case.
 The driver now refuses to load without kms on gen6+ and disallows a few
 ioctls that userspace never used in certain cases. More of this will
 definitely come.
- More decoupling of global gtt and ppgtt.
- Improved dual-link lvds detection by Takashi Iwai.
- Shut up the compiler + plus fix the fallout (Ben)
- Inverted panel brightness handling (mostly Acer manages to break things
 in this way).
- Small fixlets and adjustements and some minor things to help debugging.

Regression-wise QA reported quite a few issues on ivb, but all of them
turned out to be hw stability issues which are already fixed in
drm-intel-fixes (QA runs the nightly regression tests on -next alone,
without -fixes automatically merged in). There's still one issue open on
snb, it looks like occlusion query writes are not quite as cache coherent
as we've expected. With some of the pwrite adjustements we can now
reliably hit this. Kernel workaround for it is in the works."

* 'drm-intel-next' of git://people.freedesktop.org/~danvet/drm-intel: (101 commits)
  drm/i915: VCS is not the last ring
  drm/i915: Add a dual link lvds quirk for MacBook Pro 8,2
  drm/i915: make quirks more verbose
  drm/i915: dump the DMA fetch addr register on pre-gen6
  drm/i915/sdvo: Include YRPB as an additional TV output type
  drm/i915: disallow gem init ioctl on ilk
  drm/i915: refuse to load on gen6+ without kms
  drm/i915: extract gt interrupt handler
  drm/i915: use render gen to switch ring irq functions
  drm/i915: rip out old HWSTAM missed irq WA for vlv
  drm/i915: open code gen6+ ring irqs
  drm/i915: ring irq cleanups
  drm/i915: add SFUSE_STRAP registers for digital port detection
  drm/i915: add WM_LINETIME registers
  drm/i915: add WRPLL clocks
  drm/i915: add LCPLL control registers
  drm/i915: add SSC offsets for SBI access
  drm/i915: add port clock selection support for HSW
  drm/i915: add S PLL control
  drm/i915: add PIXCLK_GATE register
  ...

Conflicts:
	drivers/char/agp/intel-agp.h
	drivers/char/agp/intel-gtt.c
	drivers/gpu/drm/i915/i915_debugfs.c
2012-04-12 10:27:01 +01:00
Chris Wilson 7dd4906586 drm/i915: Mark untiled BLT commands as fenced on gen2/3
The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-04-01 12:26:05 +02:00
Daniel Vetter f56f821feb mm: extend prefault helpers to fault in more than PAGE_SIZE
drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.

Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-03-27 13:36:30 +02:00