* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (77 commits)
x86: UV startup of slave cpus
x86: integrate pci-dma.c
x86: don't do dma if mask is NULL.
x86: return conditional to mmu
x86: remove kludge from x86_64
x86: unify gfp masks
x86: retry allocation if failed
x86: don't try to allocate from DMA zone at first
x86: use a fallback dev for i386
x86: use numa allocation function in i386
x86: remove virt_to_bus in pci-dma_64.c
x86: adjust dma_free_coherent for i386
x86: move bad_dma_address
x86: isolate coherent mapping functions
x86: move dma_coherent functions to pci-dma.c
x86: merge iommu initialization parameters
x86: merge dma_supported
x86: move pci fixup to pci-dma.c
x86: move x86_64-specific to common code.
x86: move initialization functions to pci-dma.c
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (27 commits)
sh: Fix up L2 cache probe.
sh: Fix up SH-4A part probe.
sh: Add support for SH7723 CPU subtype.
sh: Fix up SH7763 build.
sh: Add migor_ts support to MigoR
sh: Add rs5c732b RTC support to MigoR
sh: Add I2C support to MigoR
sh: Add I2C platform data to sh7722
sh: MigoR NAND flash support using gen_flash
sh: MigoR NOR flash support using physmap-flash
sh: Fix up mach-types formatting from merge damage.
sh: r7780rp: Hook up the I2C and SMBus platform devices.
sh: Use phyical addresses for MigoR smc91x resources
sh: Use physical addresses for sh7722 USBF resources
sh: Add MigoR header file
Fix sh_keysc double free
sh: Fix up __access_ok() check for nommu.
sh: Allow optimized clear/copy page routines to be used on SH-2.
sh: Hook up the rest of the SH7770 serial ports.
sh: Add support for Solution Engine SH7721 board
...
The RCU iterators used 'rcu_dereference()' on an already-fetched RCU
pointer value, which defeats the whole point of the exercise.
When we dereference a pointer protected by RCU, we need to make sure
that we only fetch the value _once_, because if the compiler ends up
re-loading it due to register pressure, the newly reloaded value could
be different from the previously fetched one, and you get inconsistent
results.
Cleaned-up, fixed, and the pointless list_for_each_safe_rcu #define
deleted by Paul Kenney.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Viktor was nice enough to enhance the document based on my replies to
his questions on the subject.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
provide a text based interface to the scheduler features; this saves the
'user' from setting bits using decimal arithmetic.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to level the hierarchy, we need to calculate load based on the
root view. That is, each task's load is in the same unit.
A
/ \
B 1
/ \
2 3
To compute 1's load we do:
weight(1)
--------------
rq_weight(A)
To compute 2's load we do:
weight(2) weight(B)
------------ * -----------
rq_weight(B) rw_weight(A)
This yields load fractions in comparable units.
The consequence is that it changes virtual time. We used to have:
time_{i}
vtime_{i} = ------------
weight_{i}
vtime = \Sum vtime_{i} = time / rq_weight.
But with the new way of load calculation we get that vtime equals time.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
De-couple load-balancing from the rb-trees, so that I can change their
organization.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently FAIR_GROUP sched grows the scheduler latency outside of
sysctl_sched_latency, invert this so it stays within.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add some extra debug output so we can get a better overview of the
full hierarchy.
We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implement SMP nice support for the full group hierarchy.
On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.
After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.
Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.
XXX: there will be some numerical issues due to the limited nature of
SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
total weight. When the tree is deep enough, or the task weight small
enough, we'll run out of bits.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Abhishek Chandra <chandra@cs.umn.edu>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for sched-devel/latest]
- Add a new cpuset file, having levels:
sched_relax_domain_level
- Modify partition_sched_domains() and build_sched_domains()
to take attributes parameter passed from cpuset.
- Fill newidle_idx for node domains which currently unused but
might be required if sched_relax_domain_level become higher.
- We can change the default level by boot option 'relax_domain_level='.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces new feature of cpuset - sched domain customization.
This version provides a per-cpuset file 'sched_relax_domain_level' that
enable us to change the searching range of scheduler, which used to limit
how many cpus the scheduler searches at some schedule events, such as
wakening task and running out of runqueue.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the full parent<->child relation thing into task_groups as well.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
UID grouping doesn't actually have a task_group representing the root of
the task_group tree. Add one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the group scheduler multi hierarchy aware.
[a.p.zijlstra@chello.nl: rt-parts and assorted fixes]
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>