Commit Graph

387681 Commits

Author SHA1 Message Date
Dave Airlie fb328d7365 Merge branch 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux
more DPM fixes for radeon.

* 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
  drm/radeon/dpm/atom: fix broken gcc harder
  drm/radeon/dpm/atom: restructure logic to work around a compiler bug
  drm/radeon/dpm: fix atom vram table parsing
  drm/radeon: fix an endian bug in atom table parsing
  drm/radeon: add a module parameter to disable aspm
2013-07-18 10:19:46 +10:00
Alex Deucher 444bddc4b9 drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
This allows you to look at the current DPM state via
debugfs.

Due to the way the hardware works on these asics, there's
no way to look up exactly what power state we are in, so
we make the best guess we can based on the current sclk.

v2: Anthoine's version
v3: fix ref div

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 16:47:52 -04:00
Alex Deucher f90555cbe6 drm/radeon/dpm/atom: fix broken gcc harder
See bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=66932
https://bugs.freedesktop.org/show_bug.cgi?id=66972
https://bugs.freedesktop.org/show_bug.cgi?id=66945

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 16:35:06 -04:00
Andre Heider 48fa04c3fc drm/radeon/dpm/atom: restructure logic to work around a compiler bug
It seems gcc 4.8.1 generates bogus code for the old logic causing
part of the function to get skipped.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66932
https://bugs.freedesktop.org/show_bug.cgi?id=66972
https://bugs.freedesktop.org/show_bug.cgi?id=66945

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 14:52:49 -04:00
Alex Deucher 77c7d50a4a drm/radeon/dpm: fix atom vram table parsing
Parsing the table in incorrectly led to problems with
certain asics with mclk switching.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 14:52:48 -04:00
Alex Deucher 1fa4252af7 drm/radeon: fix an endian bug in atom table parsing
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 14:52:47 -04:00
Alex Deucher 1294d4a36d drm/radeon: add a module parameter to disable aspm
Can cause hangs when enabled in certain motherboards.
Set radeon.aspm=0 to disable aspm.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-17 14:52:46 -04:00
Dave Airlie 6bd2cab2c1 Merge tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel
One feature latecomer, I've forgotten to merge the patch to reeanble the
Haswell power well feature now that the audio interaction is fixed up.
Since that was the only unfixed issue with it I've figured I could throw
it in a bit late, and it's trivial to revert in case I'm wrong.

Otherwise all bug/regression fixes:
- Fix status page reinit after gpu hangs, spotted by more paranoid igt
  checks.
- Fix object list walking fumble regression in the shrinker (only the
  counting part, the actual shrinking code was correct so no Oops
  potential), from Xiong Zhang.
- Fix DP 1.2 bw limits (Imre).
- Restore legacy forcewake on ivb, too many broken biosen out there. We
  dump a warn though that recent userspace might fall over with that
  config (Guenter Roeck).
- Patch up the gen2 cs tlb w/a.
- Improve the fence coherency w/a now that we have a better understanding
  what's going on. The removed wbinvd+ipi should make -rt folks happy. Big
  thanks to Jon Bloomfield for figuring this out, patches from Chris.
- Fix write-read race when switching ring (Chris). Spotted with code
  inspection, but now we also have an igt for it.

There's an ugly regression we're still working on introduced between
3.10-rc7 and 3.10.0. Unfortunately we can't just revert the offender since
that one fixes another regression :( I've asked Steven to include my
-fixes branch into linux-next to prevent such fallout in the future,
hopefully.

* tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel:
  Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
  drm/i915: Fix incoherence with fence updates on Sandybridge+
  drm/i915: Fix write-read race with multiple rings
  Partially revert "drm/i915: unconditionally use mt forcewake on hsw/ivb"
  drm/i915: fix lane bandwidth capping for DP 1.2 sinks
  drm/i915: fix up ring cleanup for the i830/i845 CS tlb w/a
  drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list
  drm/i915: switch disable_power_well default value to 1
  drm/i915: reinit status page registers after gpu reset
2013-07-17 08:40:49 +10:00
Sylvain 'ythier' Hitier d1ce3d5496 uvesafb: Really allow mtrr being 0, as documented and warn()ed
Fixup for commit "uvesafb: Clean up MTRR code"
    (63e28a7a5f)

Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Also-spotted-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-07-16 10:24:28 +10:00
Dave Airlie d4639ebadb Merge branch 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux
More DPM fixes, r6xx DMA fix for bo moving, UVD fixes,
one major regression fix on bootup on some machine (ttm backoff missing)

* 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux:
  radeon kms: do not flush uninitialized hotplug work
  drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
  drm/radeon: align VM PTBs (Page Table Blocks) to 32K
  drm/radeon: allow selection of alignment in the sub-allocator
  drm/radeon: never unpin UVD bo v3
  drm/radeon: fix UVD fence emit
  drm/radeon: add fault decode function for CIK
  drm/radeon: add fault decode function for SI (v2)
  drm/radeon: add fault decode function for cayman/TN (v2)
  drm/radeon: use radeon device for request firmware
  drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
  drm/radeon: use CP DMA on r6xx for bo moves
  drm/radeon: implement bo copy callback using CP DMA (v2)
  drm/radeon: Disable dma rings for bo moves on r6xx
  drm/radeon/dpm: disable gfx PG on PALM
  drm/radeon/hdmi: make sure we have an afmt block assigned
2013-07-16 10:19:44 +10:00
Sergey Senozhatsky a01c34f72e radeon kms: do not flush uninitialized hotplug work
Fix a warning from lockdep caused by calling flush_work() for
uninitialized hotplug work. Initialize hotplug_work, audio_work
and reset_work upon successful radeon_irq_kms_init() completion
and thus perform hotplug flush_work only when rdev->irq.installed
is true.

[    4.790019] [drm] Loading CEDAR Microcode
[    4.790943] r600_cp: Failed to load firmware "radeon/CEDAR_smc.bin"
[    4.791152] [drm:evergreen_startup] *ERROR* Failed to load firmware!
[    4.791330] radeon 0000:01:00.0: disabling GPU acceleration

[    4.792633] INFO: trying to register non-static key.
[    4.792792] the code is fine but needs lockdep annotation.
[    4.792953] turning off the locking correctness validator.

[    4.793114] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc0-dbg-10676-gfe56456-dirty #1816
[    4.793314] Hardware name: Acer             Aspire 5741G    /Aspire 5741G    , BIOS V1.20 02/08/2011
[    4.793507]  ffffffff821fd810 ffff8801530b9a18 ffffffff8160434e 0000000000000002
[    4.794155]  ffff8801530b9ad8 ffffffff810b8404 ffff8801530b0798 ffff8801530b0000
[    4.794789]  ffff8801530b9b00 0000000000000046 00000000000004c0 ffffffff00000000
[    4.795418] Call Trace:
[    4.795573]  [<ffffffff8160434e>] dump_stack+0x4e/0x82
[    4.795731]  [<ffffffff810b8404>] __lock_acquire+0x1a64/0x1d30
[    4.795893]  [<ffffffff814a87f0>] ? dev_vprintk_emit+0x50/0x60
[    4.796034]  [<ffffffff810b8fb4>] lock_acquire+0xa4/0x200
[    4.796216]  [<ffffffff8106cd75>] ? flush_work+0x5/0x280
[    4.796375]  [<ffffffff8106cdad>] flush_work+0x3d/0x280
[    4.796520]  [<ffffffff8106cd75>] ? flush_work+0x5/0x280
[    4.796682]  [<ffffffff810b659d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
[    4.796862]  [<ffffffff8131d775>] ? delay_tsc+0x95/0xf0
[    4.797024]  [<ffffffff8141bb8b>] radeon_irq_kms_fini+0x2b/0x70
[    4.797186]  [<ffffffff814557c9>] evergreen_init+0x2a9/0x2e0
[    4.797347]  [<ffffffff813ebb1f>] radeon_device_init+0x5ef/0x700
[    4.797511]  [<ffffffff81335bc7>] ? pci_find_capability+0x47/0x50
[    4.797672]  [<ffffffff813edaed>] radeon_driver_load_kms+0x8d/0x150
[    4.797843]  [<ffffffff813ce426>] drm_get_pci_dev+0x166/0x280
[    4.798007]  [<ffffffff8116cff5>] ? kfree+0xf5/0x2e0
[    4.798168]  [<ffffffff813ea298>] ? radeon_pci_probe+0x98/0xd0
[    4.798329]  [<ffffffff813ea2aa>] radeon_pci_probe+0xaa/0xd0
[    4.798489]  [<ffffffff81339404>] pci_device_probe+0x84/0xe0
[    4.798644]  [<ffffffff814ac7d6>] driver_probe_device+0x76/0x240
[    4.798805]  [<ffffffff814aca73>] __driver_attach+0x93/0xa0
[    4.798948]  [<ffffffff814ac9e0>] ? __device_attach+0x40/0x40
[    4.799126]  [<ffffffff814aa82b>] bus_for_each_dev+0x6b/0xb0
[    4.799272]  [<ffffffff814ac2be>] driver_attach+0x1e/0x20
[    4.799434]  [<ffffffff814abec0>] bus_add_driver+0x1f0/0x280
[    4.799596]  [<ffffffff814ad0e4>] driver_register+0x74/0x150
[    4.799758]  [<ffffffff8133923d>] __pci_register_driver+0x5d/0x60
[    4.799936]  [<ffffffff81d16efc>] ? ttm_init+0x67/0x67
[    4.800081]  [<ffffffff813ce655>] drm_pci_init+0x115/0x130
[    4.800243]  [<ffffffff81d16efc>] ? ttm_init+0x67/0x67
[    4.800405]  [<ffffffff81d16f98>] radeon_init+0x9c/0xba
[    4.800586]  [<ffffffff810002ca>] do_one_initcall+0xfa/0x150
[    4.800746]  [<ffffffff81073f60>] ? parse_args+0x120/0x330
[    4.800909]  [<ffffffff81cdafae>] kernel_init_freeable+0x111/0x191
[    4.801052]  [<ffffffff81cda87a>] ? do_early_param+0x88/0x88
[    4.801233]  [<ffffffff815fb670>] ? rest_init+0x140/0x140
[    4.801393]  [<ffffffff815fb67e>] kernel_init+0xe/0x180
[    4.801556]  [<ffffffff8160dcac>] ret_from_fork+0x7c/0xb0
[    4.801718]  [<ffffffff815fb670>] ? rest_init+0x140/0x140

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-07-15 09:37:38 -04:00
Alex Deucher 13f69c2c9c drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
Need to properly enable/disable boost states when forcing a performance
level.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-15 09:37:25 -04:00
Alex Deucher 1c01103cb9 drm/radeon: align VM PTBs (Page Table Blocks) to 32K
Covers requirements of all current asics.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-07-15 09:37:10 -04:00
Linus Torvalds ad81f0545e Linux 3.11-rc1 2013-07-14 15:18:27 -07:00
Linus Torvalds 54be820019 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab update from Pekka Enberg:
 "Highlights:

  - Fix for boot-time problems on some architectures due to
    init_lock_keys() not respecting kmalloc_caches boundaries
    (Christoph Lameter)

  - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)

  - Fix for excessive slab freelist draining (Wanpeng Li)

  - SLUB and SLOB cleanups and fixes (various people)"

I ended up editing the branch, and this avoids two commits at the end
that were immediately reverted, and I instead just applied the oneliner
fix in between myself.

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
  slub: Check for page NULL before doing the node_match check
  mm/slab: Give s_next and s_stop slab-specific names
  slob: Check for NULL pointer before calling ctor()
  slub: Make cpu partial slab support configurable
  slab: add kmalloc() to kernel API documentation
  slab: fix init_lock_keys
  slob: use DIV_ROUND_UP where possible
  slub: do not put a slab to cpu partial list when cpu_partial is 0
  mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
  mm/slub: Drop unnecessary nr_partials
  mm/slab: Fix /proc/slabinfo unwriteable for slab
  mm/slab: Sharing s_next and s_stop between slab and slub
  mm/slab: Fix drain freelist excessively
  slob: Rework #ifdeffery in slab.h
  mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-14 15:14:29 -07:00
Steven Rostedt c25f195e82 slub: Check for page NULL before doing the node_match check
In the -rt kernel (mrg), we hit the following dump:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
PGD a2d39067 PUD b1641067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU 3
Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
Stack:
 ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
 0000000001200011 0000000001200011 0000000000000000 0000000000000000
 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
Call Trace:
 [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
 [<ffffffff8104369a>] do_fork+0x5a/0x360
 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
 [<ffffffff8100b068>] sys_clone+0x28/0x30
 [<ffffffff81527423>] stub_clone+0x13/0x20
 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
 RSP <ffff8800a9b17d70>
CR2: 0000000000000000
---[ end trace 0000000000000002 ]---

Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
disable migration. But the SLUB code is relatively lockless, and the
spin_locks there are raw_spin_locks (not converted to mutexes), thus I
believe this bug can happen in mainline without -rt features. The -rt
patch is just good at triggering mainline bugs ;-)

Anyway, looking at where this crashed, it seems that the page variable
can be NULL when passed to the node_match() function (which does not
check if it is NULL). When this happens we get the above panic.

As page is only used in slab_alloc() to check if the node matches, if
it's NULL I'm assuming that we can say it doesn't and call the
__slab_alloc() code. Is this a correct assumption?

Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-14 15:13:16 -07:00
Linus Torvalds 41d9884c44 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs stuff from Al Viro:
 "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
  making simple_lookup() usable for filesystems with non-NULL s_d_op,
  which allows us to get rid of quite a bit of ugliness"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sunrpc: now we can just set ->s_d_op
  cgroup: we can use simple_lookup() now
  efivarfs: we can use simple_lookup() now
  make simple_lookup() usable for filesystems that set ->s_d_op
  configfs: don't open-code d_alloc_name()
  __rpc_lookup_create_exclusive: pass string instead of qstr
  rpc_create_*_dir: don't bother with qstr
  llist: llist_add() can use llist_add_batch()
  llist: fix/simplify llist_add() and llist_add_batch()
  fput: turn "list_head delayed_fput_list" into llist_head
  fs/file_table.c:fput(): add comment
  Safer ABI for O_TMPFILE
2013-07-14 11:42:26 -07:00
Alex Deucher 6c4f978b35 drm/radeon: allow selection of alignment in the sub-allocator
There are cases where we need more than 4k alignment.  No
functional change with this commit.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-07-14 10:11:31 -04:00
Christian König 9cc2e0e9f1 drm/radeon: never unpin UVD bo v3
Changing the UVD BOs offset on suspend/resume doesn't work because the VCPU
internally keeps pointers to it. Just keep it always pinned and save the
content manually.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=66425

v2: fix compiler warning
v3: fix CIK support

Note: a version of this patch needs to go to stable.

Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-14 10:11:30 -04:00
Christian König c9a6ca4abd drm/radeon: fix UVD fence emit
Currently doesn't matter cause we allocate the fence in the
lower 265MB anyway.

Reported-by: Frank Huang <FrankR.Huang@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2013-07-14 10:11:30 -04:00
Alex Deucher 3ec7d11b9a drm/radeon: add fault decode function for CIK
Helpful for debugging GPUVM errors as we can see what
hw block and page generated the fault in the log.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-14 10:11:29 -04:00
Alex Deucher fbf6dc7ac7 drm/radeon: add fault decode function for SI (v2)
Helpful for debugging GPUVM errors as we can see what
hw block and page generated the fault in the log.

v2: simplify fault decoding

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2013-07-14 10:11:29 -04:00
Alex Deucher 54e2e49ce2 drm/radeon: add fault decode function for cayman/TN (v2)
Helpful for debugging GPUVM errors as we can see what
hw block and page generated the fault in the log.

v2: simplify fault decoding

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2013-07-14 10:11:28 -04:00
Jerome Glisse 0a16893397 drm/radeon: use radeon device for request firmware
Avoid creating temporary platform device that will lead to issue
when several radeon gpu are in same computer. Instead directly use
the radeon device for requesting firmware.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-14 10:11:27 -04:00
Maarten Lankhorst 1b6e5fd5f4 drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
Op 10-07-13 12:03, Markus Trippelsdorf schreef:
> On 2013.07.10 at 11:56 +0200, Maarten Lankhorst wrote:
>> Op 10-07-13 11:46, Markus Trippelsdorf schreef:
>>> On 2013.07.10 at 11:29 +0200, Maarten Lankhorst wrote:
>>>> Op 10-07-13 11:22, Markus Trippelsdorf schreef:
>>>>> By simply copy/pasting a big document under LibreOffice my system hangs
>>>>> itself up. Only a hard reset gets it working again.
>>>>> see also: https://bugs.freedesktop.org/show_bug.cgi?id=66551
>>>>>
>>>>> I've bisected the issue to:
>>>>>
>>>>> commit ecff665f5e
>>>>> Author: Maarten Lankhorst <m.b.lankhorst@gmail.com>
>>>>> Date:   Thu Jun 27 13:48:17 2013 +0200
>>>>>
>>>>>     drm/ttm: make ttm reservation calls behave like reservation calls
>>>>>
>>>>>     This commit converts the source of the val_seq counter to
>>>>>     the ww_mutex api. The reservation objects are converted later,
>>>>>     because there is still a lockdep splat in nouveau that has to
>>>>>     resolved first.
>>>>>
>>>>>     Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>>>     Reviewed-by: Jerome Glisse <jglisse@redhat.com>
>>>>>     Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>> Hey,
>>>>
>>>> Can you try current head with CONFIG_PROVE_LOCKING set and post the
>>>> lockdep splat from dmesg, if any? If there is any locking issue
>>>> lockdep should warn about it.  Lockdep will turn itself off after the
>>>> first splat, so if the lockdep splat happens before running the
>>>> affected parts those will have to be fixed first.
>>> There was an unrelated EDAC lockdep splat, so I simply disabled it.
>>>
>>> This is what I get:
>>>
>>> Jul 10 11:40:44 x4 kernel: ================================================
>>> Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
>>> Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
>>> Jul 10 11:40:44 x4 kernel: ------------------------------------------------
>>> Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
>>> Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
>>> Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
>>> Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
>>> Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
>>> Jul 10 11:40:53 x4 kernel: Emergency Sync complete
>>>
>> Thanks, exactly what I thought. I missed a backoff somewhere..
>>
>> Does the below patch fix it?
> Yes. Thank you for your quick reply.

8<------
If radeon_cs_parser_relocs fails ttm_eu_backoff_reservation doesn't get called.
This left open a bug where ttm_eu_reserve_buffers succeeded but the bo's were
not unlocked afterwards:

Jul 10 11:40:44 x4 kernel: ================================================
Jul 10 11:40:44 x4 kernel: [ BUG: lock held when returning to user space! ]
Jul 10 11:40:44 x4 kernel: 3.10.0-08587-g496322b #35 Not tainted
Jul 10 11:40:44 x4 kernel: ------------------------------------------------
Jul 10 11:40:44 x4 kernel: X/211 is leaving the kernel with locks still held!
Jul 10 11:40:44 x4 kernel: 2 locks held by X/211:
Jul 10 11:40:44 x4 kernel: #0:  (reservation_ww_class_acquire){+.+.+.}, at: [<ffffffff813279f0>] radeon_bo_list_validate+0x20/0xd0
Jul 10 11:40:44 x4 kernel: #1:  (reservation_ww_class_mutex){+.+.+.}, at: [<ffffffff81309306>] ttm_eu_reserve_buffers+0x126/0x4b0
Jul 10 11:40:52 x4 kernel: SysRq : Emergency Sync
Jul 10 11:40:53 x4 kernel: Emergency Sync complete

This is a regression caused by commit ecff665f5e.
"drm/ttm: make ttm reservation calls behave like reservation calls"

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2013-07-14 10:11:27 -04:00