Commit Graph

79 Commits

Author SHA1 Message Date
Li Zefan
b629317e66 sched: Fix an RCU warning in print_task()
With CONFIG_PROVE_RCU=y, a warning can be triggered:

  $ cat /proc/sched_debug

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

Both cgroup_path() and task_group() should be called with either
rcu_read_lock or cgroup_mutex held.

The rcu_dereference_check() does include cgroup_lock_is_held(), so we
know that this lock is not held.  Therefore, in a CONFIG_PREEMPT kernel,
to say nothing of a CONFIG_PREEMPT_RT kernel, the original code could
have ended up copying a string out of the freelist.

This patch inserts RCU read-side primitives needed to prevent this
scenario.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2010-05-04 09:25:01 -07:00
Mike Galbraith
269484a492 sched: Fix proc_sched_set_task()
Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks
task_times().  Other places in kernel use nvcsw and nivcsw, which are
being cleared as well,  Clear task statistics only.

Reported-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269940193.19286.14.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:06:40 +02:00
Thomas Gleixner
05fa785cf8 sched: Convert rq->lock to raw_spinlock
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 23:55:33 +01:00
Ingo Molnar
b9889ed1dd sched: Remove forced2_migrations stats
This build warning:

 kernel/sched.c: In function 'set_task_cpu':
 kernel/sched.c:2070: warning: unused variable 'old_rq'

Made me realize that the forced2_migrations stat looks pretty
pointless (and a misnomer) - remove it.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-10 20:32:39 +01:00
Christian Ehrhardt
1983a922a1 sched: Make tunable scaling style configurable
As scaling now takes place on all kind of cpu add/remove events a user
that configures values via proc should be able to configure if his set
values are still rescaled or kept whatever happens.

As the comments state that log2 was just a second guess that worked the
interface is not just designed for on/off, but to choose a scaling type.
Currently this allows none, log and linear, but more important it allwos
us to keep the interface even if someone has an even better idea how to
scale the values.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-3-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:04:01 +01:00
Peter Zijlstra
6cecd084d0 sched: Discard some old bits
WAKEUP_RUNNING was an experiment, not sure why that ever ended up being
merged...

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:03:07 +01:00
Mike Galbraith
1b9508f683 sched: Rate-limit newidle
Rate limit newidle to migration_cost. It's a win for all
stages of sysbench oltp tests.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04 19:13:48 +01:00
Peter Zijlstra
ad4b78bbcb sched: Add new wakeup preemption mode: WAKEUP_RUNNING
Create a new wakeup preemption mode, preempt towards tasks that run
shorter on avg. It sets next buddy to be sure we actually run the task
we preempted for.

Test results:

 root@twins:~# while :; do :; done &
 [1] 6537
 root@twins:~# while :; do :; done &
 [2] 6538
 root@twins:~# while :; do :; done &
 [3] 6539
 root@twins:~# while :; do :; done &
 [4] 6540

 root@twins:/home/peter# ./latt -c4 sleep 4
 Entries: 48 (clients=4)

 Averages:
 ------------------------------
        Max          4750 usec
        Avg           497 usec
        Stdev         737 usec

 root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features

 root@twins:/home/peter# ./latt -c4 sleep 4
 Entries: 48 (clients=4)

 Averages:
 ------------------------------
        Max            14 usec
        Avg             5 usec
        Stdev           3 usec

Disabled by default - needs more testing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <new-submission>
2009-09-17 10:17:25 +02:00
Arjan van de Ven
8f0dfc34e9 sched: Provide iowait counters
For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.

In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.

This patch below adds this scheduler statistic to the kernel.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A64B813.1080506@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02 08:44:08 +02:00
Hitoshi Mitake
348ec61e62 sched: Hide runqueues from direct refer at source code level
There are some points which refer the per-cpu value "runqueues" directly.
sched.c provides nice abstraction, such as cpu_rq() and this_rq(),
so we should use these macros when looking runqueues.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
LKML-Reference: <20090617.222055.374768827975756908.mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 18:29:42 +02:00
Luis Henriques
67aa0f767a sched: remove unused fields from struct rq
Impact: cleanup, new schedstat ABI

Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.

Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.

The schedtop tool has been updated to properly handle new version of
schedstat:

   http://rt.wiki.kernel.org/index.php/Schedtop_utility

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090324221002.GA10061@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24 23:16:51 +01:00
Luis Henriques
af66df5ecf sched: jiffies not printed per CPU
The jiffies value was being printed for each CPU, which does not seem to make
sense.  Moved jiffies to system section.

Signed-off-by: Luis Henriques <henrix@sapo.pt>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090318000425.GA2228@hades.domain.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18 09:57:26 +01:00
Peter Zijlstra
831451ac4e sched: introduce avg_wakeup
Introduce a new avg_wakeup statistic.

avg_wakeup is a measure of how frequently a task wakes up other tasks, it
represents the average time between wakeups, with a limit of avg_runtime
for when it doesn't wake up anybody.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15 12:00:08 +01:00
Li Zefan
805194c35b sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()"
Impact: avoid accessing NULL tg.css->cgroup

In commit 0a0db8f5c9, I removed checking
NULL tg.css->cgroup, but I realized I was wrong when I found reading
/proc/sched_debug can race with cgroup_create().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-11 02:40:32 +01:00
Arun R Bharadwaj
6c415b9234 sched: add uid information to sched_debug for CONFIG_USER_SCHED
Impact: extend information in /proc/sched_debug

This patch adds uid information in sched_debug for CONFIG_USER_SCHED

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-01 20:39:50 +01:00
Ingo Molnar
3ac3ba0b39 Merge branch 'linus' into sched/core
Conflicts:
	kernel/Makefile
2008-11-19 09:44:37 +01:00
Ingo Molnar
29d7b90c15 sched: fix kernel warning on /proc/sched_debug access
Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
triggers when using latencytop:

> [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
> [  775.663303] caller is native_sched_clock+0x3a/0x80
> [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c354 #1
> [  775.663322] Call Trace:
> [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
> [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
> [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
> [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
> [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
> [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
> [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
> [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
> [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
> [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
> ...

This breakage was caused by me via:

  7cbaef9: sched: optimize sched_clock() a bit

Change the calls to cpu_clock().

Reported-by: Luis Henriques <henrix@sapo.pt>
2008-11-16 08:07:15 +01:00
Bharata B Rao
ff9b48c359 sched: include group statistics in /proc/sched_debug
Impact: extend /proc/sched_debug info

Since the statistics of a group entity isn't exported directly from the
kernel, it becomes difficult to obtain some of the group statistics.
For example, the current method to obtain exec time of a group entity
is not always accurate. One has to read the exec times of all
the tasks(/proc/<pid>/sched) in the group and add them. This method
fails (or becomes difficult) if we want to collect stats of a group over
a duration where tasks get created and terminated.

This patch makes it easier to obtain group stats by directly including
them in /proc/sched_debug. Stats like group exec time would help user
programs (like LTP) to accurately measure the group fairness.

An example output of group stats from /proc/sched_debug:

cfs_rq[3]:/3/a/1
  .exec_clock                    : 89.598007
  .MIN_vruntime                  : 0.000001
  .min_vruntime                  : 256300.970506
  .max_vruntime                  : 0.000001
  .spread                        : 0.000000
  .spread0                       : -25373.372248
  .nr_running                    : 0
  .load                          : 0
  .yld_exp_empty                 : 0
  .yld_act_empty                 : 0
  .yld_both_empty                : 0
  .yld_count                     : 4474
  .sched_switch                  : 0
  .sched_count                   : 40507
  .sched_goidle                  : 12686
  .ttwu_count                    : 15114
  .ttwu_local                    : 11950
  .bkl_count                     : 67
  .nr_spread_over                : 0
  .shares                        : 0
  .se->exec_start                : 113676.727170
  .se->vruntime                  : 1592.612714
  .se->sum_exec_runtime          : 89.598007
  .se->wait_start                : 0.000000
  .se->sleep_start               : 0.000000
  .se->block_start               : 0.000000
  .se->sleep_max                 : 0.000000
  .se->block_max                 : 0.000000
  .se->exec_max                  : 1.000282
  .se->slice_max                 : 1.999750
  .se->wait_max                  : 54.981093
  .se->wait_sum                  : 217.610521
  .se->wait_count                : 50
  .se->load.weight               : 2

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 11:44:18 +01:00
Peter Zijlstra
5ac5c4d604 sched: clean up debug info
Impact: clean up and fix debug info printout

While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.

Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-10 10:51:51 +01:00
Li Zefan
0a0db8f5c9 sched debug: remove NULL checking in print_cfs/rt_rq()
Impact: cleanup

cfs->tg is initialized in init_tg_cfs_entry() with tg != NULL, and
will never be invalidated to NULL. And the underlying cgroup of a
valid task_group is always valid.

Same for rt->tg.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04 10:23:18 +01:00
Li Zefan
a9cf4ddb3b sched: change sched_debug's mode to 0444
Impact: change /proc/sched/debug from rw-r--r-- to r--r--r--

/proc/sched_debug is read-only.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-30 11:37:57 +01:00
Lai Jiangshan
a6bebbc87a [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
lock_task_sighand() make sure task->sighand is being protected,
so we do not need rcu_read_lock().
[ exec() will get task->sighand->siglock before change task->sighand! ]

But code using rcu_read_lock() _just_ to protect lock_task_sighand()
only appear in procfs. (and some code in procfs use lock_task_sighand()
without such redundant protection.)

Other subsystem may put lock_task_sighand() into rcu_read_lock()
critical region, but these rcu_read_lock() are used for protecting
"for_each_process()", "find_task_by_vpid()" etc. , not for protecting
lock_task_sighand().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
[ok from Oleg]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2008-10-10 04:18:57 +04:00
Peter Zijlstra
32df2ee86a sched: add full schedstats to /proc/sched_debug
show all the schedstats in /debug/sched_debug as well.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:31 +02:00
Peter Zijlstra
c09595f63b sched: revert revert of: fair-group: SMP-nice for group scheduling
Try again..

Initial commit: 18d95a2832
Revert: 6363ca57c7

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-27 14:31:29 +02:00
Peter Zijlstra
ada18de2eb sched: debug: add some rt debug output
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Daniel K." <dk@uw.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-20 10:25:59 +02:00