The WARN_ON() in the fixup return path of futex_lock_pi() can
trigger with false positives.
The following scenario happens:
t1 holds the futex and t2 and t3 are blocked on the kernel side rt_mutex.
t1 releases the futex (and the rt_mutex) and assigned t2 to be the next
owner of the futex.
t2 is interrupted and returns w/o acquiring the rt_mutex, before t1 can
release the rtmutex.
t1 releases the rtmutex and t3 becomes the pending owner of the rtmutex.
t2 notices that it is the designated owner (user space variable) and
fails to acquire the rt_mutex via trylock, because it is not allowed to
steal the rt_mutex from t3. Now it looks at the rt_mutex pending owner (t3)
and assigns the futex and the pi_state to it.
During the fixup t4 steals the rtmutex from t3.
t2 returns from the fixup and the owner of the rt_mutex has changed from
t3 to t4.
There is no need to do another round of fixups from t2. The important
part (t2 is not returning as the user space visible owner) is
done. The further fixups are done, before either t3 or t4 return to
user space.
For the user space it is not relevant which task (t3 or t4) is the real
owner, as long as those are both in the kernel, which is guaranteed by
the serialization of the hash bucket lock. Both tasks (which ever returns
first to userspace - t4 because it locked the rt_mutex or t3 due to a signal)
are going through the lock_futex_pi() return path where the ownership is
fixed before the return to user space.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To allow better diagnosis of tick-sched related, especially NOHZ
related problems, we need to know when the last wakeup via an irq
happened and when the CPU left the idle state.
Add two fields (idle_waketime, idle_exittime) to the tick_sched
structure and add them to the timer_list output.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
xtime_cache needs to be updated whenever xtime and or wall_to_monotic
are changed. Otherwise users of xtime_cache might see a stale (and in
the case of timezone changes utterly wrong) value until the next
update happens.
Fixup the obvious places, which miss this update.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Tested-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
this patch:
commit 37bb6cb409
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jan 25 21:08:32 2008 +0100
hrtimer: unlock hrtimer_wakeup
Broke hrtimer_init_sleeper() users. It forgot to fix up the futex
caller of this function to detect the failed queueing and messed up
the do_nanosleep() caller in that it could leak a TASK_INTERRUPTIBLE
state.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The recent UDP patch exposed this bug in the audit code. It
was calling pskb_expand_head without increasing skb->truesize.
The caller of pskb_expand_head needs to do so because that function
is designed to be called in places where truesize is already fixed
and therefore it doesn't update its value.
Because the audit system is using it in a place where the truesize
has not yet been fixed, it needs to update its value manually.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It may be used by the modules nfs.ko and sunrpc.ko
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Made it a regular export rather than GPL-only - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
Remove commented-out code copied from NFS
NFS: Switch from intr mount option to TASK_KILLABLE
Add wait_for_completion_killable
Add wait_event_killable
Add schedule_timeout_killable
Use mutex_lock_killable in vfs_readdir
Add mutex_lock_killable
Use lock_page_killable
Add lock_page_killable
Add fatal_signal_pending
Add TASK_WAKEKILL
exit: Use task_is_*
signal: Use task_is_*
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
ptrace: Use task_is_*
power: Use task_is_*
wait: Use TASK_NORMAL
proc/base.c: Use task_is_*
proc/array.c: Use TASK_REPORT
perfmon: Use task_is_*
...
Fixed up conflicts in NFS/sunrpc manually..
i was debugging early crashes and wondered where all the printks
went. The reason: ignore_loglevel_setup() was not called yet ...
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This removes the extra struct task_struct *p parameter in inc_nr_running
and dec_nr_running functions.
Signed-off by: Jerry Stralko <gerb.stralko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Michel Dänzr has bisected an interactivity problem with
plus-reniced tasks back to this commit:
810e95ccd5 is first bad commit
commit 810e95ccd5
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Oct 15 17:00:14 2007 +0200
sched: another wakeup_granularity fix
unit mis-match: wakeup_gran was used against a vruntime
fix this by assymetrically scaling the vtime of positive reniced
tasks.
Bisected-by: Michel Dänzer <michel@tungstengraphics.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The reason why we are getting better wakeup latencies for
!FAIR_USER_SCHED is because of this snippet of code in place_entity():
if (!initial) {
/* sleeps upto a single latency don't count. */
if (sched_feat(NEW_FAIR_SLEEPERS) && entity_is_task(se))
^^^^^^^^^^^^^^^^^^
vruntime -= sysctl_sched_latency;
/* ensure we never gain time by being placed backwards. */
vruntime = max_vruntime(se->vruntime, vruntime);
}
NEW_FAIR_SLEEPERS feature gives credit for sleeping only to tasks and
not group-level entities. With the patch attached, I could see that
wakeup latencies with FAIR_USER_SCHED are restored to the same level as
!FAIR_USER_SCHED.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
these bugs are harder to find than they seem, a stackdump helps.
make it dependent on CONFIG_DEBUG_SHIRQ so that people can turn it off
if it annoys them.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
During the work on the x86 32 and 64 bit backtrace code I found it useful
to have a simple test module to test a process and irq context backtrace.
Since the existing backtrace code was buggy, I figure it might be useful
to have such a test module in the kernel so that maybe we can even
detect such bugs earlier..
[ mingo@elte.hu: build fix ]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.
This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.
This is a start; we intend to add more tests to this bucket over time.
Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.
Tested on x86 (32/64) and powerpc.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Unlike oopses, WARN_ON() currently does't print the loaded modules list.
This makes it harder to take action on certain bug reports. For example,
recently there were a set of WARN_ON()s reported in the mac80211 stack,
which were just signalling a driver bug. It takes then anther round trip
to the bug reporter (if he responds at all) to find out which driver
is at fault.
Another issue is that, unlike oopses, WARN_ON() doesn't currently printk
the helpful "cut here" line, nor the "end of trace" marker.
Now that WARN_ON() is out of line, the size increase due to this is
minimal and it's worth adding.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A quick grep shows that there are currently 1145 instances of WARN_ON
in the kernel. Currently, WARN_ON is pretty much entirely inlined,
which makes it hard to enhance it without growing the size of the kernel
(and getting Andrew unhappy).
This patch build on top of Olof's patch that introduces __WARN,
and places the slowpath out of line. It also uses Ingo's suggestion
to not use __FUNCTION__ but to use kallsyms to do the lookup;
this saves a ton of extra space since gcc doesn't need to store the function
string twice now:
3936367 833603 624736 5394706 525112 vmlinux.before
3917508 833603 624736 5375847 520767 vmlinux-slowpath
15Kb savings...
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Olof Johansson <olof@lixom.net>
Acked-by: Matt Meckall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is useful to debug problems with interrupt handlers that return
sometimes IRQ_NONE.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This allows to change them at runtime using sysfs. No need to
reboot to set them.
I only added aliases (kernel.noirqdebug etc.) so the old options
still work.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds a generic definition of compat_sys_ptrace that calls
compat_arch_ptrace, parallel to sys_ptrace/arch_ptrace. Some
machines needing this already define a function by that name.
The new generic function is defined only on machines that
put #define __ARCH_WANT_COMPAT_SYS_PTRACE into asm/ptrace.h.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds a compat_ptrace_request that is the analogue of ptrace_request
for the things that 32-on-64 ptrace implementations can share in common.
So far there are just a couple of requests handled generically.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes ptrace_request handle {PEEK,POKE}{TEXT,DATA} directly.
Every arch_ptrace that could call generic_ptrace_peekdata already
has a default case calling ptrace_request, so this keeps things
simpler for the arch code.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>