Commit Graph

373 Commits

Author SHA1 Message Date
Peter Zijlstra 669c55e9f9 sched: Pre-compute cpumask_weight(sched_domain_span(sd))
Dave reported that his large SPARC machines spend lots of time in
hweight64(), try and optimize some of those needless cpumask_weight()
invocations (esp. with the large offstack cpumasks these are very
expensive indeed).

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-23 11:02:02 +02:00
Peter Zijlstra 371fd7e7a5 sched: Add enqueue/dequeue flags
In order to reduce the dependency on TASK_WAKING rework the enqueue
interface to support a proper flags field.

Replace the int wakeup, bool head arguments with an int flags argument
and create the following flags:

  ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task,
  ENQUEUE_WAKING - the enqueue has relative vruntime due to
                   having sched_class::task_waking() called,
  ENQUEUE_HEAD - the waking task should be places on the head
                 of the priority queue (where appropriate).

For symmetry also convert sched_class::dequeue() to a flags scheme.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:05 +02:00
Peter Zijlstra 0017d73509 sched: Fix TASK_WAKING vs fork deadlock
Oleg noticed a few races with the TASK_WAKING usage on fork.

 - since TASK_WAKING is basically a spinlock, it should be IRQ safe
 - since we set TASK_WAKING (*) without holding rq->lock it could
   be there still is a rq->lock holder, thereby not actually
   providing full serialization.

(*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING.

Cure the second issue by not setting TASK_WAKING in sched_fork(), but
only temporarily in wake_up_new_task() while calling select_task_rq().

Cure the first by holding rq->lock around the select_task_rq() call,
this will disable IRQs, this however requires that we push down the
rq->lock release into select_task_rq_fair()'s cgroup stuff.

Because select_task_rq_fair() still needs to drop the rq->lock we
cannot fully get rid of TASK_WAKING.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:12:03 +02:00
Ingo Molnar c9494727cf Merge branch 'linus' into sched/core
Merge reason: update to latest upstream

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-02 20:03:08 +02:00
Mike Galbraith beac4c7e4a sched: Remove AFFINE_WAKEUPS feature
Disabling affine wakeups is too horrible to contemplate.  Remove the feature flag.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301890.6785.50.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:53 +01:00
Mike Galbraith 13814d42e4 sched: Remove ASYM_GRAN feature
This features has been enabled for quite a while, after testing showed that
easing preemption for light tasks was harmful to high priority threads.

Remove the feature flag.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301675.6785.44.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:53 +01:00
Mike Galbraith f2e74eeac0 sched: Remove WAKEUP_SYNC feature
This feature never earned its keep, remove it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301591.6785.42.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:52 +01:00
Mike Galbraith 5ca9880c6f sched: Remove FAIR_SLEEPERS feature
Our preemption model relies too heavily on sleeper fairness to disable it
without dire consequences.  Remove the feature, and save a branch or two.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301520.6785.40.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:52 +01:00
Mike Galbraith 6bc6cf2b61 sched: Remove NORMALIZED_SLEEPER
This feature hasn't been enabled in a long time, remove effectively dead code.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301447.6785.38.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:52 +01:00
Mike Galbraith 8b911acdf0 sched: Fix select_idle_sibling()
Don't bother with selection when the current cpu is idle.  Recent load
balancing changes also make it no longer necessary to check wake_affine()
success before returning the selected sibling, so we now always use it.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301369.6785.36.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:51 +01:00
Mike Galbraith 21406928af sched: Tweak sched_latency and min_granularity
Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as
a second buddy pair arrives on scene.  The cost is latency starting to climb
sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%.  No detrimental
effects noted in normal idesktop usage.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301285.6785.34.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:51 +01:00
Mike Galbraith a64692a3af sched: Cleanup/optimize clock updates
Now that we no longer depend on the clock being updated prior to enqueueing
on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock()
exactly where they are needed, ie on enqueue, dequeue and schedule events.

In the case of a freshly enqueued task immediately preempting, we can skip the
update during preemption, as the clock was just updated by the enqueue event.
We also save an unneeded call during a migratory wakeup by not updating the
previous runqueue, where update_curr() won't be invoked.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301199.6785.32.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:50 +01:00
Mike Galbraith e12f31d3e5 sched: Remove avg_overlap
Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy
was detrimentally affected by cross-cpu wakeups, this because we are missing
the necessary call to update_curr().  This can't be fixed without increasing
overhead in our already too fat fastpath.

Additionally, with recent load balancing changes making us prefer to place tasks
in an idle cache domain (which is good for compute bound loads), communicating
tasks suffer when a sync wakeup, which would enable affine placement, is turned
into a non-sync wakeup by SYNC_LESS.  With one task on the runqueue, wake_affine()
rejects the affine wakeup request, leaving the unfortunate where placed, taking
frequent cache misses.

Remove it, and recover some fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301121.6785.30.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:50 +01:00
Mike Galbraith b42e0c41a4 sched: Remove avg_wakeup
Testing the load which led to this heuristic (nfs4 kbuild) shows that it has
outlived it's usefullness.  With intervening load balancing changes, I cannot
see any difference with/without, so recover there fastpath cycles.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268301062.6785.29.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 18:32:50 +01:00
Lucas De Marchi 41acab8851 sched: Implement group scheduler statistics in one struct
Put all statistic fields of sched_entity in one struct, sched_statistics,
and embed it into sched_entity.

This change allows to memset the sched_statistics to 0 when needed (for
instance when forking), avoiding bugs of non initialized fields.

Signed-off-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-11 15:22:28 +01:00
Paul E. McKenney 90a6501f94 sched, rcu: Fix rcu_dereference() for RCU-lockdep
Make rcu_dereference() of runqueue data structures be
rcu_dereference_sched().

Located-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <20100228163218.GD6846@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-01 09:29:58 +01:00
Suresh Siddha dd5feea14a sched: Fix SCHED_MC regression caused by change in sched cpu_power
On platforms like dual socket quad-core platform, the scheduler load
balancer is not detecting the load imbalances in certain scenarios. This
is leading to scenarios like where one socket is completely busy (with
all the 4 cores running with 4 tasks) and leaving another socket
completely idle. This causes performance issues as those 4 tasks share
the memory controller, last-level cache bandwidth etc. Also we won't be
taking advantage of turbo-mode as much as we would like, etc.

Some of the comparisons in the scheduler load balancing code are
comparing the "weighted cpu load that is scaled wrt sched_group's
cpu_power" with the "weighted average load per task that is not scaled
wrt sched_group's cpu_power". While this has probably been broken for a
longer time (for multi socket numa nodes etc), the problem got aggrevated
via this recent change:

 |
 |  commit f93e65c186
 |  Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
 |  Date:   Tue Sep 1 10:34:32 2009 +0200
 |
 |	sched: Restore __cpu_power to a straight sum of power
 |

Also with this change, the sched group cpu power alone no longer reflects
the group capacity that is needed to implement MC, MT performance
(default) and power-savings (user-selectable) policies.

We need to use the computed group capacity (sgs.group_capacity, that is
computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
find out if the group with the max load is above its capacity and how
much load to move etc.

Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
[ -v2: build fix ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x]
LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 15:45:13 +01:00
Thomas Gleixner 6e40f5bbbc Merge branch 'sched/urgent' into sched/core
Conflicts: kernel/sched.c

Necessary due to the urgent fixes which conflict with the code move
from sched.c to sched_fair.c

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 16:48:56 +01:00
Ingo Molnar 6d3e0907b8 Merge branch 'sched/urgent' into sched/core
Merge reason: Merge dependent fix, update to latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-08 08:55:46 +01:00
Thomas Gleixner ea87bb7853 sched: Extend enqueue_task to allow head queueing
The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.

Extend the related functions with a "head" argument.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.734886007@linutronix.de>
2010-01-22 18:09:59 +01:00
Gautham R Shenoy 871e35bc97 sched: Fix the place where group powers are updated
We want to update the sched_group_powers when balance_cpu == this_cpu.

Currently the group powers are updated only if the balance_cpu is the
first CPU in the local group. But balance_cpu = this_cpu could also be
the first idle cpu in the group. Hence fix the place where the group
powers are updated.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264017764.5717.127.camel@jschopp-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:17 +01:00
Peter Zijlstra 8f190fb3f7 sched: Assume *balance is valid
Since all load_balance() callers will have !NULL balance parameters we
can now assume so and remove a few checks.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:15 +01:00
Peter Zijlstra f492e12ef0 sched: Remove load_balance_newidle()
The two functions: load_balance{,_newidle}() are very similar, with the
following differences:

 - rq->lock usage
 - sb->balance_interval updates
 - *balance check

So remove the load_balance_newidle() call with load_balance(.idle =
CPU_NEWLY_IDLE), explicitly unlock the rq->lock before calling (would be
done by double_lock_balance() anyway), and ignore the other differences
for now.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:14 +01:00
Peter Zijlstra 1af3ed3ddf sched: Unify load_balance{,_newidle}()
load_balance() and load_balance_newidle() look remarkably similar, one
key point they differ in is the condition on when to active balance.

So split out that logic into a separate function.

One side effect is that previously load_balance_newidle() used to fail
and return -1 under these conditions, whereas now it doesn't. I've not
yet fully figured out the whole -1 return case for either
load_balance{,_newidle}().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:13 +01:00
Peter Zijlstra baa8c1102f sched: Add a lock break for PREEMPT=y
Since load-balancing can hold rq->locks for quite a long while, allow
breaking out early when there is lock contention.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:13 +01:00