Commit Graph

351 Commits

Author SHA1 Message Date
Peter Zijlstra 94dfb5e75e sched: add tree based averages
add support for tree based vruntime averages.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:05 +02:00
Ingo Molnar 28a1f6fa2f sched: remove SCHED_FEAT_SKIP_INITIAL
remove SCHED_FEAT_SKIP_INITIAL - it was off by default and even
when enabled it never made any real difference.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:05 +02:00
Ingo Molnar 6cb5819514 sched: optimize vruntime based scheduling
optimize vruntime based scheduling.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:04 +02:00
Ingo Molnar bf5c91ba8c sched: move sched_feat() definitions
move sched_feat() definitions so that it can be used sooner by generic
code too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:04 +02:00
Ingo Molnar e9acbff648 sched: introduce se->vruntime
introduce se->vruntime as a sum of weighted delta-exec's, and use that
as the key into the tree.

the idea to use absolute virtual time as the basic metric of scheduling
has been first raised by William Lee Irwin, advanced by Tong Li and first
prototyped by Roman Zippel in the "Really Fair Scheduler" (RFS) patchset.

also see:

   http://lkml.org/lkml/2007/9/2/76

for a simpler variant of this patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:04 +02:00
Ingo Molnar 1091985b48 sched: speed up update_load_add/_sub()
speed up update_load_add/_sub() by not delaying the division - this
reduces CPU pipeline dependencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:04 +02:00
Ingo Molnar 62160e3f4a sched: track cfs_rq->curr on !group-scheduling too
Noticed by Roman Zippel: use cfs_rq->curr in the !group-scheduling
case too. Small micro-optimization and cleanup effect:

   text    data     bss     dec     hex filename
   36269    3482      24   39775    9b5f sched.o.before
   36177    3486      24   39687    9b07 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:03 +02:00
Ingo Molnar 53df556e06 sched: remove precise CPU load calculations #2
continued removal of precise CPU load calculations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:03 +02:00
Ingo Molnar a25707f3ae sched: remove precise CPU load
CPU load calculations are statistical anyway, and there's little benefit
from having it calculated on every scheduling event. So remove this code,
it gets rid of a divide from the scheduler wakeup and context-switch
fastpath.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:03 +02:00
Ingo Molnar 8ebc91d936 sched: remove stat_gran
remove the stat_gran code - it was disabled by default and it causes
unnecessary overhead.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:03 +02:00
Ingo Molnar 2bd8e6d422 sched: use constants if !CONFIG_SCHED_DEBUG
use constants if !CONFIG_SCHED_DEBUG.

this speeds up the code and reduces code-size:

    text    data     bss     dec     hex filename
   27464    3014      16   30494    771e sched.o.before
   26929    3010      20   29959    7507 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Ingo Molnar 38ad464d41 sched: uniform tunings
use the same defaults on both UP and SMP.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Ingo Molnar eba1ed4b7e sched: debug: track maximum 'slice'
track the maximum amount of time a task has executed while
the CPU load was at least 2x. (i.e. at least two nice-0
tasks were runnable)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Matthias Kaehlcke 2e45874c5a sched: use list_for_each_entry_safe() in __wake_up_common()
Use list_for_each_entry_safe() instead of list_for_each_safe() in
__wake_up_common()

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:02 +02:00
Ingo Molnar 44142fac34 sched: fix sysctl_sched_child_runs_first flag
fix the sched_child_runs_first flag: always call into ->task_new()
if we are on the same CPU, as SCHED_OTHER tasks depend on it for
correct initial setup.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15 17:00:01 +02:00
Jens Axboe f5ff8422bb Fix warnings with !CONFIG_BLOCK
Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup
the (few) files that fail to build because they were relying on blkdev.h
pulling in extra includes for them.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-10 09:25:57 +02:00
Hiroshi Shimamoto 9c95e7319b sched: fix invalid sched_class use
When using rt_mutex, a NULL pointer dereference is occurred at
enqueue_task_rt. Here is a scenario;
1) there are two threads, the thread A is fair_sched_class and
   thread B is rt_sched_class.
2) Thread A is boosted up to rt_sched_class, because the thread A
   has a rt_mutex lock and the thread B is waiting the lock.
3) At this time, when thread A create a new thread C, the thread
   C has a rt_sched_class.
4) When doing wake_up_new_task() for the thread C, the priority
   of the thread C is out of the RT priority range, because the
   normal priority of thread A is not the RT priority. It makes
   data corruption by overflowing the rt_prio_array.
The new thread C should be fair_sched_class.

The new thread should be valid scheduler class before queuing.
This patch fixes to set the suitable scheduler class.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-09-19 23:34:46 +02:00
Ingo Molnar 1799e35d5b sched: add /proc/sys/kernel/sched_compat_yield
add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.

with sched_compat_yield=0:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
  2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop

with sched_compat_yield=1:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
  2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-09-19 23:34:46 +02:00
Ingo Molnar cf2ab4696e sched: fix xtensa build warning
rename RSR to SRR - 'RSR' is already defined on xtensa.

found by Adrian Bunk.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-09-05 14:32:49 +02:00
Ingo Molnar a206c07213 sched: debug: fix cfs_rq->wait_runtime accounting
the cfs_rq->wait_runtime debug/statistics counter was not maintained
properly - fix this.

this also removes some code:

   text    data     bss     dec     hex filename
  13420     228    1204   14852    3a04 sched.o.before
  13404     228    1204   14836    39f4 sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-09-05 14:32:49 +02:00
Suresh Siddha 7fd0d2dde9 sched: fix MC/HT scheduler optimization, without breaking the FUZZ logic.
First fix the check
	if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task)
with this
	if (*imbalance < busiest_load_per_task)

As the current check is always false for nice 0 tasks (as
SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0
tasks).

With the above change, imbalance was getting reset to 0 in the corner
case condition, making the FUZZ logic fail. Fix it by not corrupting the
imbalance and change the imbalance, only when it finds that the HT/MC
optimization is needed.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-09-05 14:32:48 +02:00
Ingo Molnar f6cf891c4d sched: make the scheduler converge to the ideal latency
de-HZ-ification of the granularity defaults unearthed a pre-existing
property of CFS: while it correctly converges to the granularity goal,
it does not prevent run-time fluctuations in the range of
[-gran ... 0 ... +gran].

With the increase of the granularity due to the removal of HZ
dependencies, this becomes visible in chew-max output (with 5 tasks
running):

 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
 out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40

average slice is the ideal 13 msecs and the period is picture-perfect 40
msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
mechanism in CFS to keep that from happening: it's a perfectly valid
solution that CFS finds.

to fix this we add a granularity/preemption rule that knows about
the "target latency", which makes tasks that run longer than the ideal
latency run a bit less. The simplest approach is to simply decrease the
preemption granularity when a task overruns its ideal latency. For this
we have to track how much the task executed since its last preemption.

( this adds a new field to task_struct, but we can eliminate that
  overhead in 2.6.24 by putting all the scheduler timestamps into an
  anonymous union. )

with this change in place, chew-max output is fluctuation-less all
around:

 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40

this patch has no impact on any fastpath or on any globally observable
scheduling property. (unless you have sharp enough eyes to see
millisecond-level ruckles in glxgears smoothness :-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mike Galbraith <efault@gmx.de>
2007-08-28 12:53:24 +02:00
Ingo Molnar 50c46637aa sched: s/sched_latency/sched_min_granularity
runtime limit and wakeup granularity used to be a function of
granularity and that was incorrect changed to sched_latency.

Fix this to make wakeup granularity a function of min-granularity,
and the runtime limit equal to latency.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-25 22:17:19 +02:00
Ingo Molnar 172ac3dbb7 sched: cleanup, sched_granularity -> sched_min_granularity
due to adaptive granularity scheduling the role of sched_granularity
has changed to "minimum granularity", so rename the variable (and the
tunable) accordingly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-08-25 18:41:53 +02:00
Peter Zijlstra 218050855e sched: adaptive scheduler granularity
Instead of specifying the preemption granularity, specify the wanted
latency. By fixing the granlarity to a constany the wakeup latency
it a function of the number of running tasks on the rq.

Invert this relation.

sysctl_sched_granularity becomes a minimum for the dynamic granularity
computed from the new sysctl_sched_latency.

Then use this latency to do more intelligent granularity decisions: if
there are fewer tasks running then we can schedule coarser. This helps
performance while still always keeping the latency target.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-08-25 18:41:53 +02:00