Commit Graph

190757 Commits

Author SHA1 Message Date
Don Zickus 58687acba5 lockup_detector: Combine nmi_watchdog and softlockup detector
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector.  Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.

Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.

To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event.  If that stops counting on a cpu, then the cpu is
most likely in trouble.

To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.

I tested this on x86_64 and both the softlockup and hardlockup paths
work.

V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events

V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions

V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes

V5:
- split apart warn flags for hard and soft lockups

TODO:
- figure out how to make an arch-agnostic clock2cycles call
  (if possible) to feed into perf events as a sample period

[fweisbec: merged conflict patch]

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <1273266711-18706-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-12 23:55:33 +02:00
Frederic Weisbecker a9aa1d02de Merge commit 'v2.6.34-rc7' into perf/nmi
Merge reason: catch up with latest softlockup detector changes.
2010-05-12 23:20:33 +02:00
Linus Torvalds b57f95a382 Linux 2.6.34-rc7 2010-05-09 18:36:28 -07:00
Linus Torvalds 93cb463141 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error
  [SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error
  [SCSI] scsi_debug: virtual_gb ignores sector_size
  [SCSI] libiscsi: regression: fix header digest errors
  [SCSI] fix locking around blk_abort_request()
  [SCSI] advansys: fix narrow board error path
2010-05-09 18:35:53 -07:00
Arjan van de Ven 1c6fe0364f cpuidle: Fix incorrect optimization
commit 672917dcc7 ("cpuidle: menu governor: reduce latency on exit")
added an optimization, where the analysis on the past idle period moved
from the end of idle, to the beginning of the new idle.

Unfortunately, this optimization had a bug where it zeroed one key
variable for new use, that is needed for the analysis.  The fix is
simple, zero the variable after doing the work from the previous idle.

During the audit of the code that found this issue, another issue was
also found; the ->measured_us data structure member is never set, a
local variable is always used instead.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-09 18:35:36 -07:00
Linus Torvalds f1c448e0a9 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: restore ability of spare drives to spin down.
  md/raid6: Fix raid-6 read-error correction in degraded state
2010-05-07 14:11:40 -07:00
Linus Torvalds 2c32b1dab5 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: fix compilation after 16bit state locking changes
  pcmcia: order userspace suspend and resume requests
  pcmcia: avoid pccard_validate_cis failure in resume callpath
2010-05-07 14:11:09 -07:00
Linus Torvalds 48fe37cb53 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  blk-cgroup: Fix an RCU warning in blkiocg_create()
  blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
  drbd: don't expose failed local READ to upper layers
2010-05-07 14:07:20 -07:00
Linus Torvalds e33b3e7567 Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/ttm: Remove the ttm_bo_block_reservation() function.
  drm/ttm: Remove some leftover debug messages.
  drm/radeon: async event synchronization for drmWaitVblank
2010-05-07 14:02:01 -07:00
Stijn Tintel e2dbe06c27 virtio: initialize earlier
Move initialization of the virtio framework before the initialization of
mtd, so that block2mtd can be used on virtio-based block devices.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15644

Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-07 14:01:17 -07:00
Linus Torvalds 9167746716 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix RCU issues in the NFSv4 delegation code
  NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
2010-05-07 13:59:48 -07:00
Linus Torvalds 4a22533136 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: sleep: init_set_sci_en_on_resume for Dell Studio 155x
  ACPI: fix acpi_hest_firmware_first_pci() caused oops
  sbshc: acpi_device_class "smbus_host_controller" too long
  power_meter: acpi_device_class "power_meter_resource" too long
  acpi_pad: "processor_aggregator" name too long
  PNP: don't check for conflicts with bridge windows
  ACPI: DMI init_set_sci_en_on_resume for multiple Lenovo ThinkPads
  PNPACPI: compute Address Space length rather than using _LEN
  ACPI: silence kmemcheck false positive
2010-05-07 13:59:22 -07:00
Linus Torvalds 417a9ef1f4 Merge branch 'v4l_for_2.6.34' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6
* 'v4l_for_2.6.34' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6:
  V4L/DVB: pxa_camera: move fifo reset direct before dma start
  V4L/DVB: video: testing unsigned for less than 0
  V4L/DVB: mx1-camera: compile fix
  V4L/DVB: budget: Oops: "BUG: unable to handle kernel NULL pointer 	dereference"
  V4L/DVB: ngene: Workaround for stuck DiSEqC pin
  V4L/DVB: saa7146: fix regression of the av7110/budget-av driver
  V4L/DVB: v4l: fix config dependencies: mxb and saa7191 are V4L2 drivers, not V4L1
  V4L/DVB: feature-removal: announce videotext.h removal
  V4L/DVB: V4L - vpfe capture - fix for kernel crash
  V4L/DVB: gspca: make usb id 0461:0815 get handled by the right driver
  V4L/DVB: gspca - stv06xx: Remove the 046d:08da from the stv06xx driver
  V4L/DVB: gspca - sn9c20x: Correct onstack wait_queue_head declaration
  V4L/DVB: saa7146: fix up bytesperline if it is an impossible value
  V4L/DVB: V4L: vpfe_capture - free ccdc_lock when memory allocation fails
  V4L/DVB: V4L - Makfile:Removed duplicate entry of davinci
  V4L/DVB: omap24xxcam: potential buffer overflow
2010-05-07 13:58:56 -07:00
Linus Torvalds 91bc482ec5 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: create rcu_my_thread_group_empty() wrapper
  memcg: css_id() must be called under rcu_read_lock()
  cgroup: Check task_lock in task_subsys_state()
  sched: Fix an RCU warning in print_task()
  cgroup: Fix an RCU warning in alloc_css_id()
  cgroup: Fix an RCU warning in cgroup_path()
  KEYS: Fix an RCU warning in the reading of user keys
  KEYS: Fix an RCU warning
2010-05-07 13:58:21 -07:00
NeilBrown 1176568de7 md: restore ability of spare drives to spin down.
Some time ago we stopped the clean/active metadata updates
from being written to a 'spare' device in most cases so that
it could spin down and say spun down.  Device failure/removal
etc are still recorded on spares.

However commit 51d5668cb2 broke this 50% of the time,
depending on whether the event count is even or odd.
The change log entry said:

   This means that the alignment between 'odd/even' and
    'clean/dirty' might take a little longer to attain,

how ever the code makes no attempt to create that alignment, so it
could take arbitrarily long.

So when we find that clean/dirty is not aligned with odd/even,
force a second metadata-update immediately.  There are already cases
where a second metadata-update is needed immediately (e.g. when a
device fails during the metadata update).  We just piggy-back on that.

Reported-by: Joe Bryant <tenminjoe@yahoo.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2010-05-07 21:10:57 +10:00
Gabriele A. Trombetti 87aa63000c md/raid6: Fix raid-6 read-error correction in degraded state
Fix: Raid-6 was not trying to correct a read-error when in
singly-degraded state and was instead dropping one more device, going to
doubly-degraded state. This patch fixes this behaviour.

Tested-by: Janos Haar <janos.haar@netcenter.hu>
Signed-off-by: Gabriele A. Trombetti <g.trombetti.lkrnl1213@logicschema.com>
Reported-by: Janos Haar <janos.haar@netcenter.hu>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2010-05-07 21:10:35 +10:00
Li Zefan 0341509fdf blk-cgroup: Fix an RCU warning in blkiocg_create()
with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o blkio xxx /mnt
  # mkdir /mnt/subgroup

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

To fix this, we avoid caling css_depth() here, which is a bit simpler
than the original code.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-07 08:57:00 +02:00
Len Brown 1468cf0542 Merge branches 'bugzilla-14337', 'bugzilla-14998', 'bugzilla-15407', 'bugzilla-15903' and 'misc-2.6.34' into release 2010-05-06 22:04:31 -04:00
Thomas Hellstrom 8cfe92d683 drm/ttm: Remove the ttm_bo_block_reservation() function.
It's unused and buggy in its current form, since it can place a bo
in the reserved state without removing it from lru lists.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-05-07 09:21:28 +10:00
Thomas Hellstrom 5be6eff965 drm/ttm: Remove some leftover debug messages.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-05-07 09:20:56 +10:00
Jerome Glisse 4fa07bf146 drm/radeon: async event synchronization for drmWaitVblank
Bring radeon up to speed with the async event synchronization for
drmWaitVblank. See c9a9c5e02a for
more information. Without this patch event never get delivered
to userspace client.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2010-05-07 09:16:56 +10:00
Stefan Herbrechtsmeier a47f6be456 V4L/DVB: pxa_camera: move fifo reset direct before dma start
Move the fifo reset from pxa_camera_start_capture to pxa_camera_irq direct
before the dma start after an end of frame interrupt to prevent images from
shifting because of old data at the begin of the frame.

Signed-off-by: Stefan Herbrechtsmeier <hbmeier@hni.uni-paderborn.de>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Tested-by: Antonio Ospite <ospite@studenti.unina.it>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-06 19:20:52 -03:00
Dan Carpenter 981cbef2c3 V4L/DVB: video: testing unsigned for less than 0
soc_mbus_bytes_per_line() returns -EINVAL on error but we store it in an
unsigned int so the test for less than zero doesn't work.  I think it
always returns "small" positive values so we can just cast it to int
here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-06 19:20:52 -03:00
Uwe Kleine-König b7d41d6d58 V4L/DVB: mx1-camera: compile fix
This fixes a regression of

	7d58289 (mx1: prefix SOC specific defines with MX1_ and deprecate old names)

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-06 19:20:51 -03:00
Bjørn Mork 6f550dc083 V4L/DVB: budget: Oops: "BUG: unable to handle kernel NULL pointer dereference"
Never call dvb_frontend_detach if we failed to attach a frontend. This fixes
the following oops, which will be triggered by a missing stv090x module:

[    8.172997] DVB: registering new adapter (TT-Budget S2-1600 PCI)
[    8.209018] adapter has MAC addr = 00:d0:5c:cc:a7:29
[    8.328665] Intel ICH 0000:00:1f.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[    8.328753] Intel ICH 0000:00:1f.5: setting latency timer to 64
[    8.562047] DVB: Unable to find symbol stv090x_attach()
[    8.562117] BUG: unable to handle kernel NULL pointer dereference at 000000ac
[    8.562239] IP: [<e08b04a3>] dvb_frontend_detach+0x4/0x67 [dvb_core]

Ref http://bugs.debian.org/575207

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Cc: stable@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-06 19:20:51 -03:00