Because whatever.*
* This should contain a fairly long list of issues and still
unresolved resgressions, but I didn't really get a vote.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The ring will emit too many if semaphores are disabled since we do not
add the correct number to num_dwords anymore.
This was introduced:
commit 52ed23253b
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Mon Dec 16 20:50:38 2013 -0800
drm/i915: Don't emit mbox updates without semaphores
FWIW, the bug was fixed later in the series.
/me hangs head in shame.
Daniel: Also note that we should have merged the read-only semaphore
modparam before this patch.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In very rare cases (such as a memory failure stress test) it is possible
to fill the entire ring without emitting a request. Under this
circumstance, the outstanding request is flushed and waited upon. After
space on the ring is cleared, we return to emitting the new command -
except that we just cleared the seqno allocated for this operation and
trigger the sanity check that a request is only ever emitted with a
valid seqno. The fix is to rearrange the code to make sure the
allocation of the seqno for this operation is after any required flushes
of outstanding operations.
The bug exists since the preallocation was introduced in
commit 9d7730914f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 27 16:22:52 2012 +0000
drm/i915: Preallocate next seqno before touching the ring
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Aside from the fact that it leaves confusing dumps on error capture, it
is entirely unnecessary, and potentially harmful in cases like BDW,
where the instruction has changed.
In reality (seemingly), this will have no behavioral impact.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Added power well arguments to all the force wake routines
to help us individually control power well based on the
scenario.
Signed-off-by: Deepak S <deepak.s@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: Resolve conflict with the removed forcewake hack and drop one
spurious hunk Jesse noticed.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I believe, and an evening of i-g-t, that our original workaround for the
missed interrupts on Sandybridge, that of holding forcewake whilst we
wait for an interrupts, is no longer required. This leaves us dependent
on the second workaround of forcing an UC read of the ACTHD before
reading back the seqno from the snooped HWS. Dropping the forcewake
should allow us to conserve a little power, not much as the GPU is meant
to be busy whilst we wait for it!
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Don't issue the FBC nuke/cache clean command when invalidate_domains!=0.
That would indicate that we're not being called for the post-batch
flush.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This confused me some many times that I think it is appropriate to add a
small comment to instruct the reader of the code that it is indeed doing
what it is supposed to do.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The handling of the error interrupts isn't wired up at all. And it
hasn't been ever since ilk happened, so don't bother.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This was an oversight and should have been in a previous series
somewhere.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
PIPE_CONTROL added the high address dword. I'm not sure how the
simulator let me get away with this. I've explicitly left out all the
workarounds from Gen7 because in the minimal digging that I did, most
don't seem necessary, and the simulator doesn't complain without them
Note that BLT and BSD ring commands had already been updated previously.
Just render/pipe_control should have been broken.
v2: Squash in a fixup from Ville to follow the recent IVB PIPE_CONTROL
updates: "BDW uses the IVB PIPE_CONTROL style for specifying GTT vs.
PPGTT for the PIPE_CONTROL QW/DW write."
v3: Rebase on top of Chris' cleanup to have an explicit ring->scratch
buffer object instead of an opaque ring->private where everyone stores
the same stuff inside.
Reported-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (for the fixup)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Squash in fix from Ben: Set PPGTT batches as necessary
This fixes the regression in the last couple of days when we enabled
PPGTT.
v3: Squash in fixup to still use GTT for secure batches from Ville:
BDW doesn't have a separate secure vs. non-secure bit in
MI_BATCH_BUFFER_START. So for secure batches we have to simply
leave the PPGTT bit unset. Fortunately older generations (except
HSW) had similar limitations so execbuffer already creates a GTT
mapping for all secure batches.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The code is more verbose than necessary for the reader's sake, hopefully
the compiler optimizes away the if.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The command to emit batch buffers has changed to address 48b addresses.
It seemed reasonable that we could still use the old instruction where
emitting 0 for length would do the right thing, but it seems to bother
the simulator when the code does that.
Now the second dword in the command has the upper 16b of the address of
the batchbuffer.
v2: Remove duplicated vfun assignment.
v3: Squash in VECS support changes from Zhao Yakui <yakui.zhao@intel.com>
v4: Make checkpatch happy.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The interrupt handling implementation remains the same as previous
generations with the 4 types of registers, status, identity, mask, and
enable. However the layout of where the bits go have changed entirely.
To address these changes, all of the interrupt vfuncs needed special
gen8 code.
The way it works is there is a top level status register now which
informs the interrupt service routine which unit caused the interrupt,
and therefore which interrupt registers to read to process the
interrupt. For display the division is quite logical, a set of interrupt
registers for each pipe, and in addition to those, a set each for "misc"
and port.
For GT the things get a bit hairy, as seen by the code. Each of the GT
units has it's own bits defined. They all look *very similar* and
resides in 16 bits of a GT register. As an example, RCS and BCS share
register 0. To compact the code a bit, at a slight expense to
complexity, this is exactly how the code works as well. 2 structures are
added to the ring buffer so that our ring buffer interrupt handling code
knows which ring shares the interrupt registers, and a shift value (ie.
the top or bottom 16 bits of the register).
The above allows us to kept the interrupt register caching scheme, the
per interrupt enables, and the code to mask and unmask interrupts
relatively clean (again at the cost of some more complexity).
Most of the GT units mentioned above are command streamers, and so the
symmetry should work quite well for even the yet to be implemented rings
which Broadwell adds.
v2: Fixes up a couple of bugs, and is more verbose about errors in the
Broadwell interrupt handler.
v3: fix DE_MISC IER offset
v4: Simplify interrupts:
I totally misread the docs the first time I implemented interrupts, and
so this should greatly simplify the mess. Unlike GEN6, we never touch
the regular mask registers in irq_get/put.
v5: Rebased on to of recent pch hotplug setup changes.
v6: Fixup on top of moving num_pipes to intel_info.
v7: Rebased on top of Egbert Eich's hpd irq handling rework. Also
wired up ibx_hpd_irq_setup for gen8.
v8: Rebase on top of Jani's asle handling rework.
v9: Rebase on top of Ben's VECS enabling for Haswell, where he
unfortunately went OCD on the gt irq #defines. Not that they're still
not yet fully consistent:
- Used the GT_RENDER_ #defines + bdw shifts.
- Dropped the shift from the L3_PARITY stuff, seemed clearer.
- s/irq_refcount/irq_refcount.gt/
v10: Squash in VECS enabling patches and the gen8_gt_irq_handler
refactoring from Zhao Yakui <yakui.zhao@intel.com>
v11: Rebase on top of the interrupt cleanups in upstream.
v12: Rebase on top of Ben's DPF changes in upstream.
v13: Drop bdw from the HAS_L3_DPF feature flag for now, it's unclear what
exactly needs to be done. Requested by Ben.
v14: Fix the patch.
- Drop the mask of reserved bits and assorted logic, it doesn't match
the spec.
- Do the posting read inconditionally instead of commenting it out.
- Add a GEN8_MASTER_IRQ_CONTROL definition and use it.
- Fix up the GEN8_PIPE interrupt defines and give the GEN8_ prefixes -
we actually will need to use them.
- Enclose macros in do {} while (0) (checkpatch).
- Clear DE_MISC interrupt bits only after having processed them.
- Fix whitespace fail (checkpatch).
- Fix overtly long lines where appropriate (checkpatch).
- Don't use typedef'ed private_t (maintainer-scripts).
- Align the function parameter list correctly.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v4)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
bikeshed
I had this lying around from he original PPGTT series, and thought we
might try to get it in by itself.
It's convenient to just call i915_gem_init_hw at reset because we'll be
adding new things to that function, and having just one function to call
instead of reimplementing it in two places is nice.
In order to accommodate we cleanup ringbuffers in order to bring them
back up cleanly. Optionally, we could also teardown/re initialize the
default context but this was causing some problems on reset which I
wasn't able to fully debug, and is unnecessary with the previous context
init/enable split.
This essentially reverts:
commit 8e88a2bd59
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Tue Jun 19 18:40:00 2012 +0200
drm/i915: don't call modeset_init_hw in i915_reset
It seems to work for me on ILK now. Perhaps it's due to:
commit 8a5c2ae753
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Thu Mar 28 13:57:19 2013 -0700
drm/i915: fix ILK GPU reset for render
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that MMIO has been split up into gen specific functions it is
obvious when HAS_FPGA_DBG_UNCLAIMED, HAS_FORCE_WAKE are needed. As such,
we can remove this extraneous condition.
As a result of this, as well as previously existing function pointers
for forcewake, we no longer need the has_force_wake member in the device
specific data structure.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We'd only ever used this define to denote whether or not we have the
dynamic parity feature (DPF) and never to determine whether or not L3
exists. Baytrail is a good example of where L3 exists, and not DPF.
This patch provides clarify in the code for future use cases which might
want to actually query whether or not L3 exists.
v2: Add /* DPF == dynamic parity feature */
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.
v2: Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity
v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Ignoring the legacy DRI1 code, and a couple of special cases (to be
discussed later), all access to the ring is mediated through requests.
The first write to a ring will grab a seqno and mark the ring as having
an outstanding_lazy_request. Either through explicitly adding a request
after an execbuffer or through an implicit wait (either by the CPU or by
a semaphore), that sequence of writes will be terminated with a request.
So we can ellide all the intervening writes to the tail register and
send the entire command stream to the GPU at once. This will reduce the
number of *serialising* writes to the tail register by a factor or 3-5
times (depending upon architecture and number of workarounds, context
switches, etc involved). This becomes even more noticeable when the
register write is overloaded with a number of debugging tools. The
astute reader will wonder if it is then possible to overflow the ring
with a single command. It is not. When we start a command sequence to
the ring, we check for available space and issue a wait in case we have
not. The ring wait will in this case be forced to flush the outstanding
register write and then poll the ACTHD for sufficient space to continue.
The exception to the rule where everything is inside a request are a few
initialisation cases where we may want to write GPU commands via the CS
before userspace wakes up and page flips.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We now have more devices using ring->private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring->private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>