Fix the pm_qos_add_request() kerneldoc comment that doesn't reflect
the behavior of the function after the last PM QoS update.
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag
tracing/trace_stack: Fix stack trace on ppc64
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
sched: Fix rq->clock synchronization when migrating tasks
sparse spotted that the kzalloc() in pm_qos_power_open() in the
current Linus' git tree had its parameters swapped. Fix this.
Signed-off-by: David Alan Gilbert <linux@treblig.org>
Acked-by: mark gross <markgross@thegnar.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There is a scalability issue for current implementation of optimistic
mutex spin in the kernel. It is found on a 8 node 64 core Nehalem-EX
system (HT mode).
The intention of the optimistic mutex spin is to busy wait and spin on a
mutex if the owner of the mutex is running, in the hope that the mutex
will be released soon and be acquired, without the thread trying to
acquire mutex going to sleep. However, when we have a large number of
threads, contending for the mutex, we could have the mutex grabbed by
other thread, and then another ……, and we will keep spinning, wasting cpu
cycles and adding to the contention. One possible fix is to quit
spinning and put the current thread on wait-list if mutex lock switch to
a new owner while we spin, indicating heavy contention (see the patch
included).
I did some testing on a 8 socket Nehalem-EX system with a total of 64
cores. Using Ingo's test-mutex program that creates/delete files with 256
threads (http://lkml.org/lkml/2006/1/8/50) , I see the following speed up
after putting in the mutex spin fix:
./mutex-test V 256 10
Ops/sec
2.6.34 62864
With fix 197200
Repeating the test with Aim7 fserver workload, again there is a speed up
with the fix:
Jobs/min
2.6.34 91657
With fix 149325
To look at the impact on the distribution of mutex acquisition time, I
collected the mutex acquisition time on Aim7 fserver workload with some
instrumentation. The average acquisition time is reduced by 48% and
number of contentions reduced by 32%.
#contentions Time to acquire mutex (cycles)
2.6.34 72973 44765791
With fix 49210 23067129
The histogram of mutex acquisition time is listed below. The acquisition
time is in 2^bin cycles. We see that without the fix, the acquisition
time is mostly around 2^26 cycles. With the fix, we the distribution get
spread out a lot more towards the lower cycles, starting from 2^13.
However, there is an increase of the tail distribution with the fix at
2^28 and 2^29 cycles. It seems a small price to pay for the reduced
average acquisition time and also getting the cpu to do useful work.
Mutex acquisition time distribution (acq time = 2^bin cycles):
2.6.34 With Fix
bin #occurrence % #occurrence %
11 2 0.00% 120 0.24%
12 10 0.01% 790 1.61%
13 14 0.02% 2058 4.18%
14 86 0.12% 3378 6.86%
15 393 0.54% 4831 9.82%
16 710 0.97% 4893 9.94%
17 815 1.12% 4667 9.48%
18 790 1.08% 5147 10.46%
19 580 0.80% 6250 12.70%
20 429 0.59% 6870 13.96%
21 311 0.43% 1809 3.68%
22 255 0.35% 2305 4.68%
23 317 0.44% 916 1.86%
24 610 0.84% 233 0.47%
25 3128 4.29% 95 0.19%
26 63902 87.69% 122 0.25%
27 619 0.85% 286 0.58%
28 0 0.00% 3536 7.19%
29 0 0.00% 903 1.83%
30 0 0.00% 0 0.00%
I've done similar experiments with 2.6.35 kernel on smaller boxes as
well. One is on a dual-socket Westmere box (12 cores total, with HT).
Another experiment is on an old dual-socket Core 2 box (4 cores total, no
HT)
On the 12-core Westmere box, I see a 250% increase for Ingo's mutex-test
program with my mutex patch but no significant difference in aim7's
fserver workload.
On the 4-core Core 2 box, I see the difference with the patch for both
mutex-test and aim7 fserver are negligible.
So far, it seems like the patch has not caused regression on smaller
systems.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org> # .35.x
LKML-Reference: <1282168827.9542.72.camel@schen9-DESK>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane reported that when the machine locks up, the regular ticks,
which are responsible to resetting the throttle count, stop too.
Hence the NMI watchdog can end up being throttled before it reports on
the locked up state, and we end up being sad..
Cure this by having the watchdog overflow reset its own throttle count.
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282215916.1926.4696.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.
This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.
With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):
Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sysrq operations do not accept tty argument anymore so no need to pass
it to us.
[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
caused by sysrq using bool but not including linux/types.h]
[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
driver]
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
sched_fork() -- we do task placement in ->task_fork_fair() ensure we
update_rq_clock() so we work with current time. We leave the vruntime
in relative state, so the time delay until wake_up_new_task() doesn't
matter.
wake_up_new_task() -- Since task_fork_fair() left p->vruntime in
relative state we can safely migrate, the activate_task() on the
remote rq will call update_rq_clock() and causes the clock to be
synced (enough).
Tested-by: Jack Daniel <wanders.thirst@gmail.com>
Tested-by: Philby John <pjohn@mvista.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281002322.1923.1708.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix build on POSIX shells
latencytop: Fix kconfig dependency warnings
perf annotate tui: Fix exit and RIGHT keys handling
tracing: Sanitize value returned from write(trace_marker, "...", len)
tracing/events: Convert format output to seq_file
tracing: Extend recordmcount to better support Blackfin mcount
tracing: Fix ring_buffer_read_page reading out of page boundary
tracing: Fix an unallocated memory access in function_graph
fs: fs_struct rwlock to spinlock
struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.
Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
vt,console,kdb: preserve console_blanked while in kdb
vt: fix regression warnings from KMS merge
arm,kgdb: fix GDB_MAX_REGS no longer used
kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
kdb: fix compile error without CONFIG_KALLSYMS
Using a program like the following:
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main() {
id_t id;
siginfo_t infop;
pid_t res;
id = fork();
if (id == 0) { sleep(1); exit(0); }
kill(id, SIGSTOP);
alarm(1);
waitid(P_PID, id, &infop, WCONTINUED);
return 0;
}
to call waitid() on a stopped process results in access to the child task's
credentials without the RCU read lock being held - which may be replaced in the
meantime - eliciting the following warning:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/exit.c:1460 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by waitid02/22252:
#0: (tasklist_lock){.?.?..}, at: [<ffffffff81061ce5>] do_wait+0xc5/0x310
#1: (&(&sighand->siglock)->rlock){-.-...}, at: [<ffffffff810611da>]
wait_consider_task+0x19a/0xbe0
stack backtrace:
Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
Call Trace:
[<ffffffff81095da4>] lockdep_rcu_dereference+0xa4/0xc0
[<ffffffff81061b31>] wait_consider_task+0xaf1/0xbe0
[<ffffffff81061d15>] do_wait+0xf5/0x310
[<ffffffff810620b6>] sys_waitid+0x86/0x1f0
[<ffffffff8105fce0>] ? child_wait_callback+0x0/0x70
[<ffffffff81003282>] system_call_fastpath+0x16/0x1b
This is fixed by holding the RCU read lock in wait_task_continued() to ensure
that the task's current credentials aren't destroyed between us reading the
cred pointer and us reading the UID from those credentials.
Furthermore, protect wait_task_stopped() in the same way.
We don't need to keep holding the RCU read lock once we've read the UID from
the credentials as holding the RCU read lock doesn't stop the target task from
changing its creds under us - so the credentials may be outdated immediately
after we've read the pointer, lock or no lock.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:
arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().
do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.
Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If CONFIG_KGDB_KDB is set and CONFIG_KALLSYMS is not set the kernel
will fail to build with the error:
kernel/built-in.o: In function `kallsyms_symbol_next':
kernel/debug/kdb/kdb_support.c:237: undefined reference to `kdb_walk_kallsyms'
kernel/built-in.o: In function `kallsyms_symbol_complete':
kernel/debug/kdb/kdb_support.c:193: undefined reference to `kdb_walk_kallsyms'
The kdb_walk_kallsyms needs a #ifdef proper header to match the C
implementation. This patch also fixes the compiler warnings in
kdb_support.c when compiling without CONFIG_KALLSYMS set. The
compiler warnings are a result of the kallsyms_lookup() macro not
initializing the two of the pass by reference variables.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Reported-by: Michal Simek <monstr@monstr.eu>