OK, so 25-mm1 gave a lockdep error which made me look into this.
The first thing that I noticed was the horrible mess; the second thing I
saw was hacks like: 71e93d1561
The problem is that arch idle routines are somewhat inconsitent with
their IRQ state handling and instead of fixing _that_, we go paper over
the problem.
So the thing I've tried to do is set a standard for idle routines and
fix them all up to adhere to that. So the rules are:
idle routines are entered with IRQs disabled
idle routines will exit with IRQs enabled
Nearly all already did this in one form or another.
Merge the 32 and 64 bit bits so they no longer have different bugs.
As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
irq-enable.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Only allocate the FPU area when the application actually uses FPU, i.e., in the
first lazy FPU trap. This could save memory for non-fpu using apps.
for example: on my system after boot, there are around 300 processes, with
only 17 using FPU.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split the FPU save area from the task struct. This allows easy migration
of FPU context, and it's generally cleaner. It also allows the following
two optimizations:
1) only allocate when the application actually uses FPU, so in the first
lazy FPU trap. This could save memory for non-fpu using apps. Next patch
does this lazy allocation.
2) allocate the right size for the actual cpu rather than 512 bytes always.
Patches enabling xsave/xrstor support (coming shortly) will take advantage
of this.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch implements the PR_GET_TSC and PR_SET_TSC prctl()
commands on the x86 platform (both 32 and 64 bit.) These
commands control the ability to read the timestamp counter
from userspace (the RDTSC instruction.)
While the RDTSC instuction is a useful profiling tool,
it is also the source of some non-determinism in ring-3.
For deterministic replay applications it is useful to be
able to trap and emulate (and record the outcome of) this
instruction.
This patch uses code earlier used to disable the timestamp
counter for the SECCOMP framework. A side-effect of this
patch is that the SECCOMP environment will now also disable
the timestamp counter on x86_64 due to the addition of the
TIF_NOTSC define on this platform.
The code which enables/disables the RDTSC instruction during
context switches is in the __switch_to_xtra function, which
already handles other unusual conditions, so normal
performance should not have to suffer from this change.
Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
I've made a small investigation about vm86.h inclusion rules and it
looks like everything is more or less ok.
Files that rely on asm/vm86.h symbols are:
- kprobes.c
- process_32.c
- signal_32.c
- traps_32.c
- vm86_32.c
File process_32.c includes vm86.h explicitly. We can remove that
include and it won't break anything.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This matches x86_64 behaviour, which is a superior one IMHO
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Otherwise, enabling (or better, subsequent disabling) of single
stepping would cause a kernel oops on CPUs not having this MSR.
The patch could have been added a conditional to the MSR write in
user_disable_single_step(), but centralizing the updates seems safer
and (looking forward) better manageable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch also resolves hangs on boot:
http://lkml.org/lkml/2008/2/23/263http://bugzilla.kernel.org/show_bug.cgi?id=10093
The bug was causing once-in-few-reboots 10-15 sec wait during boot on
certain laptops.
Earlier commit 40d6a14662 added
smp_call_function in cpu_idle_wait() to kick cpus that are in tickless
idle. Looking at cpu_idle_wait code at that time, code seemed to be
over-engineered for a case which is rarely used (while changing idle
handler).
Below is a simplified version of cpu_idle_wait, which just makes a dummy
smp_call_function to all cpus, to make them come out of old idle handler
and start using the new idle handler. It eliminates code in the idle
loop to handle cpu_idle_wait.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
revert the BTS ptrace extension for now.
based on general objections from Roland McGrath:
http://lkml.org/lkml/2008/2/21/323
we'll let the BTS functionality cook some more and re-enable
it in v2.6.26. We'll leave the dead code around to help the
development of this code.
(X86_BTS is not defined at the moment)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set.
Not all architectures support the A.OUT binfmt, so the ELF binfmt should not
be permitted to go looking for A.OUT libraries to load in such a case. Not
only that, but under such conditions A.OUT core dumps are not produced either.
To make this work, this patch also does the following:
(1) Makes the existence of the contents of linux/a.out.h contingent on
CONFIG_ARCH_SUPPORTS_AOUT.
(2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT
core dumping code.
(3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This
is then included only where needed. This means that this bit of arch
code will be stored in the appropriate A.OUT binfmt module rather than
the core kernel.
(4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not
needed) and FRV.
This patch depends on the previous patch to move STACK_TOP[_MAX] out of
asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT
format is available.
[jdike@addtoit.com: uml: re-remove accidentally restored code]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kernel/process_32.c:254:43: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fix the following warning:
WARNING: arch/x86/kernel/built-in.o(.text+0x3): Section mismatch: reference to .cpuinit.data:force_mwait in 'mwait_usable'
[Seen on 64 bit only but similar pattern exist on 32 bit so fix it there too]
mwait_usable() were only used by a function annotated __cpuinit
so annotate mwait_usable() with __cpuinit to fix the warning.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The warning message at idle_setup() is never shown because
smp_num_sibling hasn't been updated at this point yet.
Move this polling idle and HT enabled warning to select_idle_routine().
I also implement this warning on 64-bit kernel.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Previously there was a AMD specific quirk to handle the case of
AMD Fam10h MWAIT not supporting any C states. But it turns out
that CPUID already has ways to detectly detect that without
using special quirks.
The new code simply checks if MWAIT supports at least C1 and doesn't
use it if it doesn't. No more vendor specific code.
Note this is does not simply clear MWAIT because MWAIT can be still
useful even without C states.
Credit goes to Ben Serebrin for pointing out the (nearly) obvious.
Cc: "Andreas Herrmann" <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
an otherwise idle system takes about 3 ticks per network
interface in unregister_netdev() due to multiple calls to synchronize_rcu(),
which adds up to quite a few seconds for tearing down thousands of
interfaces. By flushing pending rcu callbacks in the idle loop, the system
makes progress hundreds of times faster. If this is indeed a sane thing to,
it probably needs to be done for other architectures than x86. And yes, the
network stack shouldn't call synchronize_rcu() quite so much, but fixing that
is a little more involved.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Right now, we take the stack pointer early during the backtrace path, but
only calculate bp several functions deep later, making it hard to reconcile
the stack and bp backtraces (as well as showing several internal backtrace
functions on the stack with bp based backtracing).
This patch moves the bp taking to the same place we take the stack pointer;
sadly this ripples through several layers of the back tracing stack,
but it's not all that bad in the end I hope.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This removes a bunch of dead code that is no longer needed now
that the user_regset interfaces are being used for all these jobs.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Resend using different mail client
Changes to the last version:
- split implementation into two layers: ds/bts and ptrace
- renamed TIF's
- save/restore ds save area msr in __switch_to_xtra()
- make block-stepping only look at BTF bit
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>