Commit Graph

408 Commits

Author SHA1 Message Date
Ingo Molnar
9c63d9c021 sched: sync wakeups preempt too
make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:20 +02:00
Ingo Molnar
71e20f1873 sched: affine sync wakeups
make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Laurent Vivier
94886b84b1 sched: guest CPU accounting: maintain stats in account_system_time()
modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Milton Miller
6323469f9b sched: domain sysctl fixes: add terminator comment
we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Milton Miller
ad1cdc1d78 sched: domain sysctl fixes: do not crash on allocation failure
Now that we are calling this at runtime, a more relaxed error path is
suggested.  If an allocation fails, we just register the partial table,
which will show empty directories.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Milton Miller
6382bc90f5 sched: domain sysctl fixes: unregister the sysctl table before domains
Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains.  This prevents the
sysctl table from pointing to freed memory for root to write.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Milton Miller
97b6ea7b63 sched: domain sysctl fixes: use for_each_online_cpu()
init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables.  If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data.  If the online mask is not
contigious then we would show offline cpus and miss online ones.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Milton Miller
5cf9f062c8 sched: domain sysctl fixes: use kcalloc()
kcalloc checks for n * sizeof(element) overflows and it zeros.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:19 +02:00
Ingo Molnar
6bc1665ba7 sched: allow the immediate migration of cache-cold tasks
allow the immediate migration of cache-cold tasks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Ingo Molnar
cc367732ff sched: debug, improve migration statistics
add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Peter Zijlstra
ff56b2f015 sched: activate task_hot() only on fair-scheduled tasks
activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Ingo Molnar
da84d96176 sched: reintroduce cache-hot affinity
reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Ingo Molnar
178be79348 sched: do not normalize kernel threads via SysRq-N
do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Andi Kleen
1666703af9 sched: remove stale comment from sched_group_set_shares()
remove stale comment from sched_group_set_shares().

Function never returns -EINVAL.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:18 +02:00
Ingo Molnar
d5036e89dc sched: clean up is_migration_thread()
clean up is_migration_thread() and turn it into an inline function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:15 +02:00
Andi Kleen
3a5e4dc12f sched: cleanup: refactor normalize_rt_tasks
Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:15 +02:00
Andi Kleen
8cbbe86dfc sched: cleanup: refactor common code of sleep_on / wait_for_completion
Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Andi Kleen
3a5c359a58 sched: cleanup: remove unnecessary gotos
Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;

No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Mike Galbraith
95938a35c5 sched: prevent wakeup over-scheduling
Prevent wakeup over-scheduling.  Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept.  Instead, the
task is marked for preemption at the next tick.  Tasks of higher
priority still preempt immediately.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Peter Zijlstra
ce6c131131 sched: disable forced preemption by default
Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Zou Nan hai
ace8b3d633 sched: some proc entries are missed in sched_domain sys_ctl debug code
cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Gautham R Shenoy
638e13ac37 sched: fix rt ptracer monopolizing CPU
yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.

Use schedule_timeout_uninterruptible(1) instead.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Dhaval Giani
5cb350baf5 sched: group scheduling, sysfs tunables
Add tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

	/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
	# cd /sys/kernel/uids/
	# cat 512/cpu_share
	1024
	# echo 2048 > 512/cpu_share
	# cat 512/cpu_share
	2048
	#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Paul E. McKenney
a58f6f253d sched: export cpu_clock()
export cpu_clock() - the preferred API instead of sched_clock().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15 17:00:14 +02:00
Ingo Molnar
00bf7bfc2e sched: fix: move the CPU check into ->task_new_fair()
noticed by Peter Zijlstra:

fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.

(without this there can be large latencies)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 17:00:14 +02:00