Atm, we don't disable RPS interrupts and related work items before
resetting the GPU. This may interfere with the following GPU
initialization and cause RPS interrupts to show up in PM_IIR too early
before calling gen6_enable_rps_interrupts() (triggering a WARN there).
Solve this by disabling RPS interrupts and flushing any related work
items before resetting the GPU.
v2:
- split out the common parts of the gt suspend and the new gt reset
functions (Paulo)
v3:
- remove the check for UMS, it's a NOP nowadays (Daniel)
Reported-by: He, Shuang <shuang.he@intel.com>
Testcase: igt/gem_reset_stats/ban-render
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86644
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Paulo noticed that we don't enable RPS interrupts via PM_IER in
gen6_enable_rps_interrupts(). This wasn't a problem so far, since the
only place we disabled RPS interrupts was during system/runtime suspend
and after that we reenable all interrupts in the IRQ pre/postinstall
hooks.
In the next patch we'll disable/reenable RPS interrupts during GPU reset
too, but not call IRQ uninstall, pre/postinstall hooks, so there the
above wouldn't work. The logical place for programming PM_IER is
gen6_enable_rps_interrupts() and this also makes the function more
symmetric with gen6_disable_rps_interrupts(), so move the programming
there from the postinstall hooks.
Note that these changes don't affect the ILK RPS interrupt code, which
could be sanitized in a similar way. But that can be done as a
follow-up.
Credits-to: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
irq_mask should include all IRQ bits that we want to mask, but atm we
set it incorrectly to the inverse of this. If the mask is used
subsequently to enable/disable some IRQ bits, we may unintentionally
unmask unrelated IRQs. I can't see any way that this can lead to a real
problem in the current -nightly code, since the first place the mask
will be used next (after a suspend/resume cycle) is in
valleyview_irq_postinstall(), but the mask is reset there to its proper
value.
This causes a problem in the upstream kernel though, where - due to another
issue - the mask is used in the above way to disable only the display IRQs.
This other issue is fixed by:
commit 950eabaf5a
Author: Imre Deak <imre.deak@intel.com>
Date: Mon Sep 8 15:21:09 2014 +0300
drm/i915: vlv: fix display IRQ enable/disable
Interestingly, even with the above two bugs, we shouldn't in theory have
any real problems (arguably a famous last sentence:). That's because
even if we unmask something unintentionally via the VLV_IMR/VLV_IER
register the master IRQ masking bit in VLV_MASTER_IER is still set and
should prevent all i915 interrupts. According to my testing on an ASUS
T100 with DSI output this isn't the case at least with the
MIPIA_INTERRUPT. Leaving this one unmasked in IMR/IER, while having
VLV_MASTER_IER set to 0 may lead to a lockup during system suspend as
shown in the bugzilla ticket below. This fix should get rid of the
problem reported there in upstream and older kernels.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85920
Cc: stable@vger.kernel.org (v3.15+)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We may be hidding bugs by doing that, so let remove it and have the
actual mask value shine through, for better or worse.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
I was playing with clang and oh surprise! a warning trigerred by
-Wshift-overflow (gcc doesn't have this one):
WA_SET_BIT_MASKED(GEN7_GT_MODE,
GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
drivers/gpu/drm/i915/intel_ringbuffer.c:786:2: warning: signed shift result
(0x28002000000) requires 43 bits to represent, but 'int' only has 32 bits
[-Wshift-overflow]
WA_SET_BIT_MASKED(GEN7_GT_MODE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/intel_ringbuffer.c:737:15: note: expanded from macro
'WA_SET_BIT_MASKED'
WA_REG(addr, _MASKED_BIT_ENABLE(mask), (mask) & 0xffff)
Turned out GEN6_WIZ_HASHING_MASK was already shifted by 16, and we were
trying to shift it a bit more.
The other thing is that it's not the usual case of setting WA bits here, we
need to have separate mask and value.
To fix this, I've introduced a new _MASKED_FIELD() macro that takes both the
(unshifted) mask and the desired value and the rest of the patch ripples
through from it.
This bug was introduced when reworking the WA emission in:
Commit 7225342ab5
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date: Tue Oct 7 17:21:26 2014 +0300
drm/i915: Build workaround list in ring initialization
v2: Invert the order of the mask and value arguments (Daniel Vetter)
Rewrite _MASKED_BIT_ENABLE() and _MASKED_BIT_DISABLE() with
_MASKED_FIELD() (Jani Nikula)
Make sure we only evaluate 'a' once in _MASKED_BIT_ENABLE() (Dave Gordon)
Add check to ensure the value is within the mask boundaries (Chris Wilson)
v3: Ensure the the value and mask are 16 bits (Dave Gordon)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Otherwise the MST resume paths can hit DPMS paths
which hit state checker paths, which hit WARN_ON,
because the state checker is inconsistent with the
hw.
This fixes a bunch of WARN_ON's on resume after
undocking.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
So apparently jiffies<->nsec<->ktime isn't accurate or something. At
elast if we timeout there's occasionally still a few hundred us left
(in a 2 second timeout).
Stuff I've tried and thrown out again:
- Sampling the before timestamp before jiffies. Doesn't improve test
path rate at all.
- Using jiffies. Way to inaccurate, which means way too much drift
with signals plus automatic ioctl restarting in userspace. In
hindsight we should have used an absolute timeout, but hey we need
something for v3 of the i915 gem wait interfaces ;-)
- Trying to figure out where accuracy gets lost. gl testcase really
don't care all that much about this (as long as isn't not massively
off), it's just that the testcase gets a bit upset if it receives an
EITME with timeout > 0.
So as long as we're in the ballbark it's good enough. So patch
everything up if we're at most one jiffies off. I get's me a solid
test again.
This regression is probably introduced in
commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000
drm: i915: Use nsec based interfaces
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Probably because I'm too lazy to confirm myself and still waiting for
QA ;-)
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We've lost the +1 required for correct timeouts in
commit 5ed0bdf21a
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Jul 16 21:05:06 2014 +0000
drm: i915: Use nsec based interfaces
Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: John Stultz <john.stultz@linaro.org>
So fix this up by reinstating our handrolled _timeout function. While
at it bother with handling MAX_JIFFIES.
v2: Convert to usecs (we don't care about the accuracy anyway) first
to avoid overflow issues Dave Gordon spotted.
v3: Drop the explicit MAX_JIFFY_OFFSET check, usecs_to_jiffies should
take care of that already. It might be a bit too enthusiastic about it
though.
v4: Chris has a much nicer color, so use his implementation.
This requires to export nsec_to_jiffies from time.c.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82749
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
On pre-HSW we have two encoders per digital port: one HDMI, one DP.
However they are the same physical port in hardware and we can't enable
both at the same time. Reject the modeset if the user attempts this.
So far we've been saved by the fact that we never see both HDMI and DP
connectors as connected. But if the user decides to force a mode anyway,
all kinds of funny stuff might happen.
Unfortunately we don't seem to have any way to inform userspace that
such configurations are invalid except by returning an error from
setcrtc. possible_clones only covers real cloning situations, and
looking at the connector names doesn't work either since we don't
always register both connectors for the same port. I suppose the
only way to fix that would be to expose only a single encoder per
digital port like we do on HSW+ but that would be a fairly large
undertaking for little gain.
kms_setmode hits this since it forces modes on non-connected VGA and
HDMI connectors. Previosuly it just resulted in weirdness such as
failed link training. With this patch it will now get an error back
from the kernel and will die with an assert since it thinks that the
configuration should be fine.
v2: Deal with INTEL_OUTPUT_UNKNOWN (Paulo)
Cc: Paulo Zanoni <przanoni@gmail.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Atm, igt/gem_reset_stats can trigger the recently added WARN on
left-over PM_IIR bits in gen6_enable_rps_interrupts(). There are two
reasons for this:
1. we call intel_enable_gt_powersave() without a preceeding
intel_disable_gt_powersave()
2. gen6_disable_rps_interrupts() doesn't mask interrupts in PM_IMR
1. means RPS interrupts will remain enabled and can be serviced during
the HW initialization after a GPU reset. 2. means even if we called
gen6_disable_rps_interrupts() any new RPS interrupt during RPS
initialization would still propagate to PM_IIR too early (though
wouldn't be serviced).
This patch solves the 2. issue by also masking interrupts in PM_IMR, the
following patch fixes 1. getting rid of the WARN. This also makes
intel_enable_gt_powersave() and intel_disable_gt_powersave() more
symmetric.
Since gen6_disable_rps_interrupts() is called during driver loading with
i915 interrupts disabled add a new version of gen6_disable_pm_irq() that
doesn't WARN for this.
Also while at it, get the irq_lock around the whole PM_IMR/IER/IIR
programming sequence and make sure that any queued PM_IIR bit is also
cleared.
The WARN was caught by PRTS after I sent my previous RPS sanitizing
patchset and I could easily reproduce it on HSW. To actually fix it we
also need the next patch.
Reported-by: He, Shuang <shuang.he@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We don't really synchronously turn them off from debugfs. We try to
avoid hitting them too badly by waiting one vblank, but apparently the
irq handler can still race through that gap.
Since this isn't really all that important for testcases, only for
debugging CRC issues let's tune it down to a debug message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82602
Cc: Damien Lespiau <damien.lespiau@intel.com>
Acked-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Before testing if the panel VDD is enabled on eDP cancel any pending
disable worker. This makes sure the worker will be triggered with a
delay from the last time edp_panel_vdd_schedule_off() is called, not
the first time. This avoids unnecessary overhead.
https://bugs.freedesktop.org/show_bug.cgi?id=86201
v2: use cancel_delayed_work() instead of cancel_delayed_work_sync()
as the pps_mutexes will provide the required serialization with
edp_panel_vdd_work() while the sync variant may deadlock. Suggested
by Ville Syrjälä <ville.syrjala@linux.intel.com>.
Made commit message a bit clearer.
Signed-off-by: Egbert Eich <eich@suse.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The crc code doesn't handle anything really that could drop the
register state (by design so that we have less complexity). Which
means userspace may only start crc capture once the pipe is fully set
up.
With an i-g-t patch this will be the case, but there's still the
problem that this results in obscure unclaimed register write
failures. Which is a pain to debug.
So instead make sure we don't have the basic unclaimed register write
failure by grabbing runtime pm references. And reject completely
invalid requests with -EIO. This is still racy of course, but for a
test library we don't really care - if userspace shuts down the pipe
right afterwards the entire setup will be lost anyway.
v2: Put instead of get, spotted by Damien. Also explain the runtime pm
dance.
v3: There's really no need for rpm get/put since power_is_enabled only
checks software state (Damien).
References: https://bugs.freedesktop.org/show_bug.cgi?id=86092
Cc: Damien Lespiau <damien.lespiau@intel.com> (v2)
Tested-by: lu hua <huax.lu@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The GPU reset also resets the display on gen3/4. The g33 docs say we
should disable all planes before flipping the reset switch. Just
disable all the crtcs instead. That seems a nicer thing to do anyway.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On gen4 and earlier the GPU reset also resets the display, so we should
protect against concurrent modeset operations. Grab all the modeset locks
around the entire GPU reset dance, remebering first ti dislogde any
pending page flip to make sure we don't deadlock. Any pageflip coming
in between these two steps should fail anyway due to reset_in_progress,
so this should be safe.
This fixes a lot of failed asserts in the modeset code when there's a
modeset racing with the reset. Naturally the asserts aren't happy when
the expected state has disappeared.
v2: Drop UMS checks, complete pending flips after the reset (Daniel)
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
g33 seems to sit somewhere between the 915/945/965 style and the
g4x style. The bits look like g4x, but we still need to do a full
reset including display.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On pre-ctg GPU reset also resets the display hardware. Force a mode
restore after the GPU reset, and also re-init clock gating.
v2: Use intel_modeset_init_hw() instead of intel_init_clock_gating()
in case more relevant stuff gets added there at some point
Restore interrupts after the reset as well
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On pre-ctg the reset bit directly controls the reset signal. We must
assert it for >=20usec and then deassert it. Bit 1 is a RO status bit
which should also go down when the reset is no longer asserted.
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>