Commit Graph

50 Commits

Author SHA1 Message Date
Christoph Hellwig 1eeb66a1bb move die notifier handling to common code
This patch moves the die notifier handling to common code.  Previous
various architectures had exactly the same code for it.  Note that the new
code is compiled unconditionally, this should be understood as an appel to
the other architecture maintainer to implement support for it aswell (aka
sprinkling a notify_die or two in the proper place)

arm had a notifiy_die that did something totally different, I renamed it to
arm_notify_die as part of the patch and made it static to the file it's
declared and used at.  avr32 used to pass slightly less information through
this interface and I brought it into line with the other architectures.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
[bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:04 -07:00
Andi Kleen 09198e6850 [PATCH] i386: Clean up NMI watchdog code
- Introduce a wd_ops structure
- Convert the various nmi watchdogs over to it
- This allows to split the perfctr reservation from the watchdog
setup cleanly.
- Do perfctr reservation globally as it should have always been
- Remove dead code referenced only by unused EXPORT_SYMBOLs

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:20 +02:00
Andi Kleen bbba11c35b [PATCH] i386: Remove unneeded externs in nmi.c
All were already in some header
Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:11 +02:00
Stephane Eranian bf8696ed6d [PATCH] i386: i386 make NMI use PERFCTR1 for architectural perfmon (take 2)
Hello,

This patch against 2.6.20-git14 makes the NMI watchdog use PERFSEL1/PERFCTR1
instead of PERFSEL0/PERFCTR0 on processors supporting Intel architectural
perfmon, such as Intel Core 2. Although all PMU events can work on
both counters, the Precise Event-Based Sampling (PEBS) requires that the
event be in PERFCTR0 to work correctly (see section 18.14.4.1 in the
IA32 SDM Vol 3b).

A similar patch for x86-64 is to follow.

Changelog:
        - make the i386 NMI watchdog use PERFSEL1/PERFCTR1 instead of PERFSEL0/PERFCTR0
          on processors supporting the Intel architectural perfmon (e.g. Core 2 Duo).
          This allows PEBS to work when the NMI watchdog is active.

signed-off-by: stephane eranian <eranian@hpl.hp.com>

Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02 19:27:05 +02:00
Andi Kleen 8689b517be [PATCH] i386: Fix some warnings added by earlier patch
Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-24 13:05:37 +02:00
Andi Kleen 1714f9bfc9 [PATCH] x86: Fix potential overflow in perfctr reservation
While reviewing this code again I found a potential overflow of the bitmap.
The p4 oprofile can theoretically set bits beyond the reservation bitmap for
specific configurations. Avoid that by sizing the bitmaps properly.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-16 10:30:27 +02:00
Andi Kleen 0fb2ebfcb5 [PATCH] x86-64: Increase NMI watchdog probing timeout
A 4 core Opteron needs longer than 10 ticks for this.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Andi Kleen 89e07569e4 [PATCH] x86-64: Let oprofile reserve MSR on all CPUs
The MSR reservation is per CPU and oprofile would only allocate them
on the CPU it was initialized on. Change this to handle all CPUs.

This also fixes a warning about unprotected use of smp_processor_id()
in preemptible kernels.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-04-02 12:14:12 +02:00
Linus Torvalds 8ce5e3e45e Disable NMI watchdog by default properly
This reverts commit 6ebf622b25 and
replaces it with one that actually works.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-14 17:53:43 -07:00
Thomas Gleixner f8b5035b9a [PATCH] i386 prepare nmi watchdog for dynticks
The NMI watchdog implementation assumes that the local APIC timer interrupt is
happening.  This assumption is not longer true when high resolution timers and
dynamic ticks come into play, as they may switch off the local APIC timer
completely.  Take the PIT/HPET interrupts into account too, to avoid false
positives.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: Zachary Amsden <zach@vmware.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rohit Seth <rohitseth@google.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-16 08:13:59 -08:00
Andi Kleen 0a4599c894 [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
For i386/x86-64.

Straight forward -- just reuse the Family 0xf code.

Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:25 +01:00
Ingo Molnar 5d0e600d90 [PATCH] x86: fix laptop bootup hang in init_acpi()
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():

 Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
 Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
 Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
 Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
 Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
 Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
 Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
 Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()

It's a hard hang that not even an NMI could punch through!  Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...

After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.

So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang.  (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)

Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.

As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang.  I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts.  Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).

I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-02-13 13:26:24 +01:00
Venkatesh Pallipadi 90ce4bc454 [PATCH] i386: Handle 32 bit PerfMon Counter writes cleanly in i386 nmi_watchdog
Change i386 nmi handler to handle 32 bit perfmon counter MSR writes cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2007-02-13 13:26:22 +01:00
Venkatesh Pallipadi 58d9ce7d75 [PATCH] Revert nmi_known_cpu() check during boot option parsing
Commit f2802e7f57 and its x86 version
(b7471c6da9) adds nmi_known_cpu() check
while parsing boot options in x86_64 and i386.

With that, "nmi_watchdog=2" stops working for me on Intel Core 2 CPU
based system.

The problem is, setup_nmi_watchdog is called while parsing the boot
option and identify_cpu is not done yet.  So, the return value of
nmi_known_cpu() is not valid at this point.

So revert that check.  This should not have any adverse effect as the
nmi_known_cpu() check is done again later in enable_lapic_nmi_watchdog().

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:05 -08:00
Ravikiran G Thirumalai 92715e282b [PATCH] x86: Fix boot hang due to nmi watchdog init code
2.6.19  stopped booting (or booted based on build/config) on our x86_64
systems due to a bug introduced in 2.6.19.  check_nmi_watchdog schedules an
IPI on all cpus to  busy wait on a flag, but fails to set the busywait
flag if NMI functionality is disabled.  This causes the secondary cpus
to spin in an endless loop, causing the kernel bootup to hang.
Depending upon the build, the  busywait flag got overwritten (stack variable)
and caused  the kernel to bootup on certain builds.  Following patch fixes
the bug by setting the busywait flag before returning from check_nmi_watchdog.
I guess using a stack variable is not good here as the calling function could
potentially return while the busy wait loop is still spinning on the flag.

AK: I redid the patch significantly to be cleaner

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-09 21:33:35 +01:00
Jan Beulich c6ea396de6 [PATCH] i386: Don't touch per cpu memory of offline CPUs in touch_nmi_watchdog
Just like on x86-64, don't touch foreign CPUs' memory if the watchdog
isn't enabled at all.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:09 +01:00
Andrew Morton bb81a09e55 [PATCH] x86: all cpu backtrace
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.

Cc: Andi Kleen <ak@muc.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07 02:14:01 +01:00
Andi Kleen a1bae67243 [PATCH] i386: Disable nmi watchdog on all ThinkPads
Even newer Thinkpads have bugs in SMM code that causes hangs with
NMI watchdog.

Signed-off-by: Andi Kleen <ak@suse.de>
2006-10-21 18:37:02 +02:00
Zachary Amsden 5a73fdc5ea [PATCH] Some config.h removals
During tracking down a PAE compile failure, I found that config.h was being
included in a bunch of places in i386 code.  It is no longer necessary, so
drop it.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-01 00:39:34 -07:00
Andi Kleen 29cbc78b90 [PATCH] x86: Clean up x86 NMI sysctls
Use prototypes in headers
Don't define panic_on_unrecovered_nmi for all architectures

Cc: dzickus@redhat.com

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-30 01:47:55 +02:00
Fernando Luis Vázquez Cao 06039754d7 [PATCH] i386: Disallow kprobes on NMI handlers
A kprobe executes IRET early and that could cause NMI recursion and stack
corruption.

Note: This problem was originally spotted and solved by Andi Kleen in the
x86_64 architecture. This patch is an adaption of his patch for i386.

AK: Merged with current code which was a bit different.
AK: Removed printk in nmi handler that shouldn't be there in the first time
AK: Added missing include.
AK: added KPROBES_END

Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:36 +02:00
Venkatesh Pallipadi 248dcb2fff [PATCH] x86: i386/x86-64 Add nmi watchdog support for new Intel CPUs
AK: This redoes the changes I temporarily reverted.

Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.

What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.

Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Andi Kleen 1de84979df [PATCH] i386: Enable NMI watchdog by default
I've had good experiences with having this on by default on x86-64.
It turns nasty hangs into easier to debug oopses.

Enable the local APIC wdog by default for systems newer than 2004.

This comes from a strange compromise: according to arjan the reason
it was off by default was some old IBM systems that corrupted
registered when NMI happened in SMI. Can't remember more specific,
but >= 2004 should avoid these. It's probably overly broad
because most older systems should be ok (and the really old systems
won't be supported by the local apic watchdog anyways)

Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26 10:52:27 +02:00
Shaohua Li 4038f901cf [PATCH] i386/x86-64: Fix NMI watchdog suspend/resume
Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
method. APs should follow CPU hotplug code path.

And:

+From: Don Zickus <dzickus@redhat.com>

Makes the start/stop paths of nmi watchdog more robust to handle the
suspend/resume cases more gracefully.

AK: I merged the two patches together

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00
Don Zickus e33e89ab1a [PATCH] x86: Add abilty to enable/disable nmi watchdog from procfs (update)
Adds a new /proc/sys/kernel/nmi_watchdog call that will enable/disable the
nmi watchdog.

By entering a non-zero value here, a user can enable the nmi watchdog to
monitor the online cpus in the system.  By entering a zero value here, a
user can disable the nmi watchdog and free up a performance counter which
could then be utilized by the oprofile subsystem, otherwise oprofile may be
short a counter when in use.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-26 10:52:27 +02:00