Commit Graph

61 Commits

Author SHA1 Message Date
Ingo Molnar
29d7b90c15 sched: fix kernel warning on /proc/sched_debug access
Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:

> [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
> [  775.663303] caller is native_sched_clock+0x3a/0x80
> [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c354 #1
> [  775.663322] Call Trace:
> [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
> [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
> [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
> [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
> [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
> [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
> [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
> [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
> [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
> [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
> ...

This breakage was caused by me via:

  7cbaef9: sched: optimize sched_clock() a bit

Change the calls to cpu_clock().

Reported-by: Luis Henriques <henrix@sapo.pt>
2008-11-16 08:07:15 +01:00
Peter Zijlstra
5ac5c4d604 sched: clean up debug info
Impact: clean up and fix debug info printout

While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.

Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-10 10:51:51 +01:00
Li Zefan
a9cf4ddb3b sched: change sched_debug's mode to 0444
Impact: change /proc/sched/debug from rw-r--r-- to r--r--r--

/proc/sched_debug is read-only.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-30 11:37:57 +01:00
Lai Jiangshan
a6bebbc87a [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
lock_task_sighand() make sure task->sighand is being protected,
so we do not need rcu_read_lock().
[ exec() will get task->sighand->siglock before change task->sighand! ]

But code using rcu_read_lock() _just_ to protect lock_task_sighand()
only appear in procfs. (and some code in procfs use lock_task_sighand()
without such redundant protection.)

Other subsystem may put lock_task_sighand() into rcu_read_lock()
critical region, but these rcu_read_lock() are used for protecting
"for_each_process()", "find_task_by_vpid()" etc. , not for protecting
lock_task_sighand().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
[ok from Oleg]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-10 04:18:57 +04:00
Peter Zijlstra
32df2ee86a sched: add full schedstats to /proc/sched_debug
show all the schedstats in /debug/sched_debug as well.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:31 +02:00
Peter Zijlstra
c09595f63b sched: revert revert of: fair-group: SMP-nice for group scheduling
Try again..

Initial commit: 18d95a2832
Revert: 6363ca57c7

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:29 +02:00
Peter Zijlstra
ada18de2eb sched: debug: add some rt debug output
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Daniel K." <dk@uw.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 10:25:59 +02:00
Ingo Molnar
6363ca57c7 revert ("sched: fair-group: SMP-nice for group scheduling")
Yanmin Zhang reported:

Comparing with 2.6.25, volanoMark has big regression with kernel 2.6.26-rc1.
It's about 50% on my 8-core stoakley, 16-core tigerton, and Itanium Montecito.

With bisect, I located the following patch:

| 18d95a2832 is first bad commit
| commit 18d95a2832
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date:   Sat Apr 19 19:45:00 2008 +0200
|
|     sched: fair-group: SMP-nice for group scheduling

Revert it so that we get v2.6.25 behavior.

Bisected-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-29 11:28:57 +02:00
Peter Zijlstra
3e51f33fcc sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
this replaces the rq->clock stuff (and possibly cpu_clock()).

 - architectures that have an 'imperfect' hardware clock can set
   CONFIG_HAVE_UNSTABLE_SCHED_CLOCK

 - the 'jiffie' window might be superfulous when we update tick_gtod
   before the __update_sched_clock() call in sched_clock_tick()

 - cpu_clock() might be implemented as:

     sched_clock_cpu(smp_processor_id())

   if the accuracy proves good enough - how far can TSC drift in a
   single jiffie when considering the filtering and idle hooks?

[ mingo@elte.hu: various fixes and cleanups ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05 23:56:18 +02:00
Roman Zippel
6f6d6a1a6a rename div64_64 to div64_u64
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide.  Move its definition
to math64.h as currently no architecture overrides the generic implementation.
 They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01 08:03:58 -07:00
Denis V. Lunev
c33fff0afb kernel: use non-racy method for proc entries creation
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:22 -07:00
Ingo Molnar
486fdae214 sched: build fix
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:45:01 +02:00
Peter Zijlstra
d19ca30874 sched: debug: add some debug code to handle the full hierarchy
Add some extra debug output so we can get a better overview of the
full hierarchy.

We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:45:00 +02:00
Ingo Molnar
50df5d6aea sched: remove sysctl_sched_batch_wakeup_granularity
it's unused.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19 19:44:57 +02:00
Ingo Molnar
4ae7d5cefd sched: improve affine wakeups
improve affine wakeups. Maintain the 'overlap' metric based on CFS's
sum_exec_runtime - which means the amount of time a task executes
after it wakes up some other task.

Use the 'overlap' for the wakeup decisions: if the 'overlap' is short,
it means there's strong workload coupling between this task and the
woken up task. If the 'overlap' is large then the workload is decoupled
and the scheduler will move them to separate CPUs more easily.

( Also slightly move the preempt_check within try_to_wake_up() - this has
  no effect on functionality but allows 'early wakeups' (for still-on-rq
  tasks) to be correctly accounted as well.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-19 04:27:53 +01:00
Arjan van de Ven
6d082592b6 sched: keep total / count stats in addition to the max for
Right now, the linux kernel (with scheduler statistics enabled) keeps track
of the maximum time a process is waiting to be scheduled. While the maximum
is a very useful metric, tracking average and total is equally useful
(at least for latencytop) to figure out the accumulated effect of scheduler
delays. The accumulated effect is important to judge the performance impact
of scheduler tuning/behavior.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:35 +01:00
Guillaume Chazarain
cc203d2422 sched: monitor clock underflows in /proc/sched_debug
We monitor clock overflows, let's also monitor clock underflows.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25 21:08:34 +01:00
Ingo Molnar
90b2628f1f sched: fix gcc warnings
Meelis Roos reported these warnings on sparc64:

  CC      kernel/sched.o
  In file included from kernel/sched.c:879:
  kernel/sched_debug.c: In function 'nsec_high':
  kernel/sched_debug.c:38: warning: comparison of distinct pointer types lacks a cast

the debug check in do_div() is over-eager here, because the long long
is always positive in these places. Mark this by casting them to
unsigned long long.

no change in code output:

   text    data     bss     dec     hex filename
  51471    6582     376   58429    e43d sched.o.before
  51471    6582     376   58429    e43d sched.o.after

  md5:
   7f7729c111f185bf3ccea4d542abc049  sched.o.before.asm
   7f7729c111f185bf3ccea4d542abc049  sched.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-30 17:24:35 +01:00
Ingo Molnar
c1a89740da sched: clean up overlong line in kernel/sched_debug.c
clean up overlong line in kernel/sched_debug.c.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-28 15:52:56 +01:00
Ingo Molnar
f7b9329e55 sched: bump version of kernel/sched_debug.c
bump version of kernel/sched_debug.c and remove CFS version
information from it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-26 21:21:49 +01:00
Peter Zijlstra
b2be5e96dc sched: reintroduce the sched_min_granularity tunable
we lost the sched_min_granularity tunable to a clever optimization
that uses the sched_latency/min_granularity ratio - but the ratio
is quite unintuitive to users and can also crash the kernel if the
ratio is set to 0. So reintroduce the min_granularity tunable,
while keeping the ratio maintained internally.

no functionality changed.

[ mingo@elte.hu: some fixlets. ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-09 22:39:37 +01:00
Peter Zijlstra
ab63a633cf sched: fix unconditional irq lock
Lockdep noticed that this lock can also be taken from hardirq context, and can
thus not unconditionally disable/enable irqs.

 WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on()
  [show_trace_log_lvl+26/48] show_trace_log_lvl+0x1a/0x30
  [show_trace+18/32] show_trace+0x12/0x20
  [dump_stack+22/32] dump_stack+0x16/0x20
  [trace_hardirqs_on+405/416] trace_hardirqs_on+0x195/0x1a0
  [_read_unlock_irq+34/48] _read_unlock_irq+0x22/0x30
  [sched_debug_show+2615/4224] sched_debug_show+0xa37/0x1080
  [show_state_filter+326/368] show_state_filter+0x146/0x170
  [sysrq_handle_showstate+10/16] sysrq_handle_showstate+0xa/0x10
  [__handle_sysrq+123/288] __handle_sysrq+0x7b/0x120
  [handle_sysrq+40/64] handle_sysrq+0x28/0x40
  [kbd_event+1045/1680] kbd_event+0x415/0x690
  [input_pass_event+206/208] input_pass_event+0xce/0xd0
  [input_handle_event+170/928] input_handle_event+0xaa/0x3a0
  [input_event+95/112] input_event+0x5f/0x70
  [atkbd_interrupt+434/1456] atkbd_interrupt+0x1b2/0x5b0
  [serio_interrupt+59/128] serio_interrupt+0x3b/0x80
  [i8042_interrupt+263/576] i8042_interrupt+0x107/0x240
  [handle_IRQ_event+40/96] handle_IRQ_event+0x28/0x60
  [handle_edge_irq+175/320] handle_edge_irq+0xaf/0x140
  [do_IRQ+64/128] do_IRQ+0x40/0x80
  [common_interrupt+46/52] common_interrupt+0x2e/0x34

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-25 14:02:45 +02:00
Ken Chen
480b9434c5 sched: reduce schedstat variable overhead a bit
schedstat is useful in investigating CPU scheduler behavior.  Ideally,
I think it is beneficial to have it on all the time.  However, the
cost of turning it on in production system is quite high, largely due
to number of events it collects and also due to its large memory
footprint.

Most of the fields probably don't need to be full 64-bit on 64-bit
arch.  Rolling over 4 billion events will most like take a long time
and user space tool can be made to accommodate that.  I'm proposing
kernel to cut back most of variable width on 64-bit system.  (note,
the following patch doesn't affect 32-bit system).

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-18 21:32:56 +02:00
Arjan van de Ven
0dbee3a6b0 Make scheduler debug file operations const
In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Ingo Molnar
cc367732ff sched: debug, improve migration statistics
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00