Commit Graph

483210 Commits

Author SHA1 Message Date
Imre Deak dbea3cea69 drm/i915: sanitize RPS resetting during GPU reset
Atm, we don't disable RPS interrupts and related work items before
resetting the GPU. This may interfere with the following GPU
initialization and cause RPS interrupts to show up in PM_IIR too early
before calling gen6_enable_rps_interrupts() (triggering a WARN there).

Solve this by disabling RPS interrupts and flushing any related work
items before resetting the GPU.

v2:
- split out the common parts of the gt suspend and the new gt reset
  functions (Paulo)
v3:
- remove the check for UMS, it's a NOP nowadays (Daniel)

Reported-by: He, Shuang <shuang.he@intel.com>
Testcase: igt/gem_reset_stats/ban-render
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86644
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-15 19:14:04 +02:00
Imre Deak 78e68d36da drm/i915: move RPS PM_IER enabling to gen6_enable_rps_interrupts
Paulo noticed that we don't enable RPS interrupts via PM_IER in
gen6_enable_rps_interrupts(). This wasn't a problem so far, since the
only place we disabled RPS interrupts was during system/runtime suspend
and after that we reenable all interrupts in the IRQ pre/postinstall
hooks.

In the next patch we'll disable/reenable RPS interrupts during GPU reset
too, but not call IRQ uninstall, pre/postinstall hooks, so there the
above wouldn't work. The logical place for programming PM_IER is
gen6_enable_rps_interrupts() and this also makes the function more
symmetric with gen6_disable_rps_interrupts(), so move the programming
there from the postinstall hooks.

Note that these changes don't affect the ILK RPS interrupt code, which
could be sanitized in a similar way. But that can be done as a
follow-up.

Credits-to: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-15 19:13:46 +02:00
Imre Deak c352d1ba1e drm/i915: vlv: fix IRQ masking when uninstalling interrupts
irq_mask should include all IRQ bits that we want to mask, but atm we
set it incorrectly to the inverse of this. If the mask is used
subsequently to enable/disable some IRQ bits, we may unintentionally
unmask unrelated IRQs. I can't see any way that this can lead to a real
problem in the current -nightly code, since the first place the mask
will be used next (after a suspend/resume cycle) is in
valleyview_irq_postinstall(), but the mask is reset there to its proper
value.

This causes a problem in the upstream kernel though, where - due to another
issue - the mask is used in the above way to disable only the display IRQs.
This other issue is fixed by:

commit 950eabaf5a
Author: Imre Deak <imre.deak@intel.com>
Date:   Mon Sep 8 15:21:09 2014 +0300

    drm/i915: vlv: fix display IRQ enable/disable

Interestingly, even with the above two bugs, we shouldn't in theory have
any real problems (arguably a famous last sentence:). That's because
even if we unmask something unintentionally via the VLV_IMR/VLV_IER
register the master IRQ masking bit in VLV_MASTER_IER is still set and
should prevent all i915 interrupts. According to my testing on an ASUS
T100 with DSI output this isn't the case at least with the
MIPIA_INTERRUPT. Leaving this one unmasked in IMR/IER, while having
VLV_MASTER_IER set to 0 may lead to a lockup during system suspend as
shown in the bugzilla ticket below. This fix should get rid of the
problem reported there in upstream and older kernels.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85920
Cc: stable@vger.kernel.org (v3.15+)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-11 16:06:10 +02:00
Jesse Barnes 9f49c37635 drm/i915: save/restore GMBUS freq across suspend/resume on gen4
Should probably just init this in the GMbus code all the time, based on
the cdclk and HPLL like we do on newer platforms.  Ville has code for
that in a rework branch, but until then we can fix this bug fairly
easily.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76301
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Nikolay <mar.kolya@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-11 15:31:59 +02:00
Damien Lespiau 26459343e0 drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
We may be hidding bugs by doing that, so let remove it and have the
actual mask value shine through, for better or worse.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-10 16:39:24 +02:00
Damien Lespiau cf4b0de6a3 drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
While trying to unify the order of those arguments throughout the
driver, Daniel noticed what we were inverting them in this part of the
code.

Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-10 16:33:30 +02:00
Damien Lespiau 98533251b0 drm/i915/bdw: Fix the write setting up the WIZ hashing mode
I was playing with clang and oh surprise! a warning trigerred by
-Wshift-overflow (gcc doesn't have this one):

    WA_SET_BIT_MASKED(GEN7_GT_MODE,
                      GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);

    drivers/gpu/drm/i915/intel_ringbuffer.c:786:2: warning: signed shift result
      (0x28002000000) requires 43 bits to represent, but 'int' only has 32 bits
      [-Wshift-overflow]
        WA_SET_BIT_MASKED(GEN7_GT_MODE,
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/gpu/drm/i915/intel_ringbuffer.c:737:15: note: expanded from macro
      'WA_SET_BIT_MASKED'
        WA_REG(addr, _MASKED_BIT_ENABLE(mask), (mask) & 0xffff)

Turned out GEN6_WIZ_HASHING_MASK was already shifted by 16, and we were
trying to shift it a bit more.

The other thing is that it's not the usual case of setting WA bits here, we
need to have separate mask and value.

To fix this, I've introduced a new _MASKED_FIELD() macro that takes both the
(unshifted) mask and the desired value and the rest of the patch ripples
through from it.

This bug was introduced when reworking the WA emission in:

  Commit 7225342ab5
  Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
  Date:   Tue Oct 7 17:21:26 2014 +0300

      drm/i915: Build workaround list in ring initialization

v2: Invert the order of the mask and value arguments (Daniel Vetter)
    Rewrite _MASKED_BIT_ENABLE() and _MASKED_BIT_DISABLE() with
    _MASKED_FIELD() (Jani Nikula)
    Make sure we only evaluate 'a' once in _MASKED_BIT_ENABLE() (Dave Gordon)
    Add check to ensure the value is within the mask boundaries (Chris Wilson)

v3: Ensure the the value and mask are 16 bits (Dave Gordon)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-10 11:20:46 +02:00
Daniel Vetter 0b6d24c019 drm/i915: Don't complain about stolen conflicts on gen3
Apparently stuff works that way on those machines.

I agree with Chris' concern that this is a bit risky but imo worth a
shot in -next just for fun. Afaics all these machines have the pci
resources allocated like that by the BIOS, so I suspect that it's all
ok.

This regression goes back to

commit eaba1b8f33
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 4 12:28:35 2013 +0100

    drm/i915: Verify that our stolen memory doesn't conflict

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76983
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71031
Tested-by: lu hua <huax.lu@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-10 11:11:28 +02:00
Dave Airlie e7d6f7d708 drm/i915: resume MST after reading back hw state
Otherwise the MST resume paths can hit DPMS paths
which hit state checker paths, which hit WARN_ON,
because the state checker is inconsistent with the
hw.

This fixes a bunch of WARN_ON's on resume after
undocking.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-08 14:07:52 +02:00
Daniel Vetter 9cca306880 drm/i915: Handle inaccurate time conversion issues
So apparently jiffies<->nsec<->ktime isn't accurate or something. At
elast if we timeout there's occasionally still a few hundred us left
(in a 2 second timeout).

Stuff I've tried and thrown out again:
- Sampling the before timestamp before jiffies. Doesn't improve test
  path rate at all.
- Using jiffies. Way to inaccurate, which means way too much drift
  with signals plus automatic ioctl restarting in userspace. In
  hindsight we should have used an absolute timeout, but hey we need
  something for v3 of the i915 gem wait interfaces ;-)
- Trying to figure out where accuracy gets lost. gl testcase really
  don't care all that much about this (as long as isn't not massively
  off), it's just that the testcase gets a bit upset if it receives an
  EITME with timeout > 0.

So as long as we're in the ballbark it's good enough. So patch
everything up if we're at most one jiffies off. I get's me a solid
test again.

This regression is probably introduced in

commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 16 21:05:06 2014 +0000

    drm: i915: Use nsec based interfaces

    Use ktime_get_raw_ns() and get rid of the back and forth timespec
    conversions.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: John Stultz <john.stultz@linaro.org>

Probably because I'm too lazy to confirm myself and still waiting for
QA ;-)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-05 15:24:45 +02:00
Daniel Vetter 7bd0e226e3 drm/i915: compute wait_ioctl timeout correctly
We've lost the +1 required for correct timeouts in

commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 16 21:05:06 2014 +0000

    drm: i915: Use nsec based interfaces

    Use ktime_get_raw_ns() and get rid of the back and forth timespec
    conversions.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: John Stultz <john.stultz@linaro.org>

So fix this up by reinstating our handrolled _timeout function. While
at it bother with handling MAX_JIFFIES.

v2: Convert to usecs (we don't care about the accuracy anyway) first
to avoid overflow issues Dave Gordon spotted.

v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should
take care of that already. It might be a bit too enthusiastic about it
though.

v4: Chris has a much nicer color, so use his implementation.

This requires to export nsec_to_jiffies from time.c.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-05 15:20:24 +02:00
Jesse Barnes af15d2ce5d drm/i915: don't always do full mode sets when infoframes are enabled
Partial revert of

commit 206645910b
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Wed Nov 5 14:26:09 2014 -0800

    drm/i915: check for audio and infoframe changes across mode sets v2

References: https://bugs.freedesktop.org/show_bug.cgi?id=86683
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Li Xu <li.l.xu@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2014-12-05 15:03:46 +02:00
Ville Syrjälä 00f0b37810 drm/i915: Reject modeset when the same digital port is used more than once
On pre-HSW we have two encoders per digital port: one HDMI, one DP.
However they are the same physical port in hardware and we can't enable
both at the same time. Reject the modeset if the user attempts this.

So far we've been saved by the fact that we never see both HDMI and DP
connectors as connected. But if the user decides to force a mode anyway,
all kinds of funny stuff might happen.

Unfortunately we don't seem to have any way to inform userspace that
such configurations are invalid except by returning an error from
setcrtc. possible_clones only covers real cloning situations, and
looking at the connector names doesn't work either since we don't
always register both connectors for the same port. I suppose the
only way to fix that would be to expose only a single encoder per
digital port like we do on HSW+ but that would be a fairly large
undertaking for little gain.

kms_setmode hits this since it forces modes on non-connected VGA and
HDMI connectors. Previosuly it just resulted in weirdness such as
failed link training. With this patch it will now get an error back
from the kernel and will die with an assert since it thinks that the
configuration should be fine.

v2: Deal with INTEL_OUTPUT_UNKNOWN (Paulo)

Cc: Paulo Zanoni <przanoni@gmail.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:31:53 +01:00
Imre Deak 9939fba226 drm/i915: mask RPS IRQs properly when disabling RPS
Atm, igt/gem_reset_stats can trigger the recently added WARN on
left-over PM_IIR bits in gen6_enable_rps_interrupts(). There are two
reasons for this:
1. we call intel_enable_gt_powersave() without a preceeding
   intel_disable_gt_powersave()
2. gen6_disable_rps_interrupts() doesn't mask interrupts in PM_IMR

1. means RPS interrupts will remain enabled and can be serviced during
the HW initialization after a GPU reset. 2. means even if we called
gen6_disable_rps_interrupts() any new RPS interrupt during RPS
initialization would still propagate to PM_IIR too early (though
wouldn't be serviced).

This patch solves the 2. issue by also masking interrupts in PM_IMR, the
following patch fixes 1. getting rid of the WARN. This also makes
intel_enable_gt_powersave() and intel_disable_gt_powersave() more
symmetric.

Since gen6_disable_rps_interrupts() is called during driver loading with
i915 interrupts disabled add a new version of gen6_disable_pm_irq() that
doesn't WARN for this.

Also while at it, get the irq_lock around the whole PM_IMR/IER/IIR
programming sequence and make sure that any queued PM_IIR bit is also
cleared.

The WARN was caught by PRTS after I sent my previous RPS sanitizing
patchset and I could easily reproduce it on HSW. To actually fix it we
also need the next patch.

Reported-by: He, Shuang <shuang.he@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:31:53 +01:00
Daniel Vetter 34273620d9 drm/i915: Tune down spurious CRC interrupt warning
We don't really synchronously turn them off from debugfs. We try to
avoid hitting them too badly by waiting one vblank, but apparently the
irq handler can still race through that gap.

Since this isn't really all that important for testcases, only for
debugging CRC issues let's tune it down to a debug message.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82602
Cc: Damien Lespiau <damien.lespiau@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-12-03 09:29:41 +01:00
Thomas Daniel 0794aed302 drm/i915: Fix context object leak for legacy contexts
Dynamic context pinning for LRCs introduced a leak in legacy mode.
Reinstate context unreference in i915_gem_free_request for legacy contexts.

Leak reported by i-g-t/drv_module_reload fixed by this patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86507
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: John Harrison<John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:41 +01:00
Akash Goel 8ee558d804 drm/i915/skl: Update in Gen9 multi-engine forcewake range
Updates in forcewake range for Render/Media/Common
power wells for Gen9.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Zhe Wang <zhe1.wang@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:41 +01:00
Egbert Eich 2c623c11c7 drm/i915/eDP: When enabling panel VDD cancel pending disable worker
Before testing if the panel VDD is enabled on eDP cancel any pending
disable worker. This makes sure the worker will be triggered with a
delay from the last time edp_panel_vdd_schedule_off() is called, not
the first time. This avoids unnecessary overhead.

https://bugs.freedesktop.org/show_bug.cgi?id=86201

v2: use cancel_delayed_work() instead of cancel_delayed_work_sync()
as the pps_mutexes will provide the required serialization with
edp_panel_vdd_work() while the sync variant may deadlock. Suggested
by Ville Syrjälä <ville.syrjala@linux.intel.com>.
Made commit message a bit clearer.

Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:40 +01:00
Daniel Vetter 9d8b0588cb drm/i915: Handle runtime pm in the CRC setup code
The crc code doesn't handle anything really that could drop the
register state (by design so that we have less complexity). Which
means userspace may only start crc capture once the pipe is fully set
up.

With an i-g-t patch this will be the case, but there's still the
problem that this results in obscure unclaimed register write
failures. Which is a pain to debug.

So instead make sure we don't have the basic unclaimed register write
failure by grabbing runtime pm references. And reject completely
invalid requests with -EIO. This is still racy of course, but for a
test library we don't really care - if userspace shuts down the pipe
right afterwards the entire setup will be lost anyway.

v2: Put instead of get, spotted by Damien. Also explain the runtime pm
dance.

v3: There's really no need for rpm get/put since power_is_enabled only
checks software state (Damien).

References: https://bugs.freedesktop.org/show_bug.cgi?id=86092
Cc: Damien Lespiau <damien.lespiau@intel.com> (v2)
Tested-by: lu hua <huax.lu@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2014-12-03 09:29:40 +01:00
Ville Syrjälä f98ce92fea drm/i915: Disable crtcs gracefully before GPU reset on gen3/4
The GPU reset also resets the display on gen3/4. The g33 docs say we
should disable all planes before flipping the reset switch. Just
disable all the crtcs instead. That seems a nicer thing to do anyway.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:39 +01:00
Ville Syrjälä 7514747d27 drm/i915: Grab modeset locks for GPU rest on pre-ctg
On gen4 and earlier the GPU reset also resets the display, so we should
protect against concurrent modeset operations. Grab all the modeset locks
around the entire GPU reset dance, remebering first ti dislogde any
pending page flip to make sure we don't deadlock. Any pageflip coming
in between these two steps should fail anyway due to reset_in_progress,
so this should be safe.

This fixes a lot of failed asserts in the modeset code when there's a
modeset racing with the reset. Naturally the asserts aren't happy when
the expected state has disappeared.

v2: Drop UMS checks, complete pending flips after the reset (Daniel)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:38 +01:00
Ville Syrjälä 408d4b9e1f drm/i915: Implement GPU reset for g33
g33 seems to sit somewhere between the 915/945/965 style and the
g4x style. The bits look like g4x, but we still need to do a full
reset including display.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:38 +01:00
Ville Syrjälä 59ea90543f drm/i915: Implement GPU reset for 915/945
915/945 have the same reset registers as 965, so share the code.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:37 +01:00
Ville Syrjälä ca83b9361b drm/i915: Restore the display config after a GPU reset on gen4
On pre-ctg GPU reset also resets the display hardware. Force a mode
restore after the GPU reset, and also re-init clock gating.

v2: Use intel_modeset_init_hw() instead of intel_init_clock_gating()
    in case more relevant stuff gets added there at some point
    Restore interrupts after the reset as well

Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:37 +01:00
Ville Syrjälä 73bbf6bd90 drm/i915: Fix gen4 GPU reset
On pre-ctg the reset bit directly controls the reset signal. We must
assert it for >=20usec and then deassert it. Bit 1 is a RO status bit
which should also go down when the reset is no longer asserted.

Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-12-03 09:29:36 +01:00