kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Peter Zijlstra	799e2205ec	sched: Disable SD_PREFER_LOCAL for MC/CPU domains Yanmin reported that both tbench and hackbench were significantly hurt by trying to keep tasks local on these domains, esp on small cache machines. So disable it in order to promote spreading outside of the cache domains. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Mike Galbraith <efault@gmx.de> LKML-Reference: <1255083400.8802.15.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-14 15:02:34 +02:00
Rusty Russell	6f401420e2	cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: core There were replaced by topology_core_cpumask and topology_thread_cpumask. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-09-24 09:34:41 +09:30
Peter Zijlstra	182a85f8a1	sched: Disable wakeup balancing Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 16:44:33 +02:00
Peter Zijlstra	59abf02644	sched: Add SD_PREFER_LOCAL And turn it on for NUMA and MC domains. This improves locality in balancing decisions by keeping up to capacity amount of tasks local before looking for idle CPUs. (and twice the capacity if SD_POWERSAVINGS_BALANCE is set.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-16 08:42:40 +02:00
Peter Zijlstra	b8a543ea5a	sched: Reduce forkexec_idx If we're looking to place a new task, we might as well find the idlest position _now_, not 1 tick ago. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:23 +02:00
Mike Galbraith	0ec9fab3d1	sched: Improve latencies and throughput Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:51:16 +02:00
Peter Zijlstra	6bd7821f90	sched: Fix some domain tunings CPU level should have WAKE_AFFINE, whereas ALLNODES is dubious. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:08 +02:00
Peter Zijlstra	78e7ed53c9	sched: Tweak wake_idx When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:07 +02:00
Peter Zijlstra	c88d591089	sched: Merge select_task_rq_fair() and sched_balance_self() The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 16:01:05 +02:00
Peter Zijlstra	a8fae3ec5f	sched: enable SD_WAKE_IDLE Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore, enable it by default for MC, CPU and NUMA domains. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-07 22:00:17 +02:00
Ingo Molnar	840a065310	sched: Turn on SD_BALANCE_NEWIDLE Start the re-tuning of the balancer by turning on newidle. It improves hackbench performance and parallelism on a 4x4 box. The "perf stat --repeat 10" measurements give us: domain0 domain1 ....................................... -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2041.273208 task-clock-msecs # 9.354 CPUs ( +- 0.363% ) +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2086.326925 task-clock-msecs # 11.934 CPUs ( +- 0.301% ) +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE: 2115.289791 task-clock-msecs # 12.158 CPUs ( +- 0.263% ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 11:52:54 +02:00
Ingo Molnar	47734f89be	sched: Clean up topology.h Re-organize the flag settings so that it's visible at a glance which sched-domains flags are set and which not. With the new balancer code we'll need to re-tune these details anyway, so make it cleaner to make fewer mistakes down the road ;-) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 11:52:53 +02:00
Peter Zijlstra	a52bfd7358	sched: Add smt_gain The idea is that multi-threading a core yields more work capacity than a single thread, provide a way to express a static gain for threads. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.073345955@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-04 10:09:54 +02:00
Rusty Russell	082edb7bf4	numa, cpumask: move numa_node_id default implementation to topology.h Impact: cleanup, potential bugfix Not sure what changed to expose this, but clearly that numa_node_id() doesn't belong in mmzone.h (the inline in gfp.h is probably overkill, too). In file included from include/linux/topology.h:34, from arch/x86/mm/numa.c:2: /home/rusty/patches-cpumask/linux-2.6/arch/x86/include/asm/topology.h:64:1: warning: "numa_node_id" redefined In file included from include/linux/topology.h:32, from arch/x86/mm/numa.c:2: include/linux/mmzone.h:770:1: warning: this is the location of the previous definition Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <200903132343.37661.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 14:35:31 +01:00
Rusty Russell	a70f730282	cpumask: replace node_to_cpumask with cpumask_of_node. Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:46 +10:30
Rusty Russell	fbd59a8d1f	cpumask: Use topology_core_cpumask()/topology_thread_cpumask() Impact: reduce stack usage, use new cpumask API. This actually uses topology_core_cpumask() and topology_thread_cpumask(), removing the only users of topology_core_siblings() and topology_thread_siblings() Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: linux-net-drivers@solarflare.com	2009-01-11 19:12:49 +01:00
Vaidyanathan Srinivasan	100fdaee70	sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 Impact: change task balancing to save power more agressively Add SD_BALANCE_NEWIDLE flag at MC level and CPU level if sched_mc is set. This helps power savings and will not affect performance when sched_mc=0 Ingo and Mike Galbraith have optimised the SD flags by removing SD_BALANCE_NEWIDLE at MC and CPU level. This helps performance but hurts power savings since this slows down task consolidation by reducing the number of times load_balance is run. sched: fine-tune SD_MC_INIT commit `1480098470` Author: Mike Galbraith <efault@gmx.de> Date: Fri Nov 7 15:26:50 2008 +0100 sched: re-tune balancing -- revert commit `9fcd18c9e6` Author: Ingo Molnar <mingo@elte.hu> Date: Wed Nov 5 16:52:08 2008 +0100 This patch selectively enables SD_BALANCE_NEWIDLE flag only when sched_mc is set to 1 or 2. This helps power savings by task consolidation and also does not hurt performance at sched_mc=0 where all power saving optimisations are turned off. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:55 +01:00
Vaidyanathan Srinivasan	716707b299	sched: convert BALANCE_FOR_xx_POWER to inline functions Impact: cleanup BALANCE_FOR_MC_POWER and similar macros defined in sched.h are not constants and have various condition checks and significant amount of code that is not suitable to be contain in a macro. Also there could be side effects on the expressions passed to some of them like test_sd_parent(). This patch converts all complex macros related to power savings balance to inline functions. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-19 09:21:45 +01:00
Heiko Carstens	ee79d1bdb6	sched: let arch_update_cpu_topology indicate if topology changed Change arch_update_cpu_topology so it returns 1 if the cpu topology changed and 0 if it didn't change. This will be useful for the next patch which adds a call to this function in partition_sched_domains. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-12-12 13:47:21 +01:00
Ingo Molnar	52c642f33b	sched: fine-tune SD_SIBLING_INIT fine-tune the HT sched-domains parameters as well. On a HT capable box, this increases lat_ctx performance from 23.87 usecs to 1.49 usecs: # before $ ./lat_ctx -s 0 2 "size=0k ovr=1.89 2 23.87 # after $ ./lat_ctx -s 0 2 "size=0k ovr=1.84 2 1.49 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-07 16:09:23 +01:00
Mike Galbraith	1480098470	sched: fine-tune SD_MC_INIT Tune SD_MC_INIT the same way as SD_CPU_INIT: unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE. This improves vmark by 5%: vmark 132102 125968 125497 messages/sec avg 127855.66 .984 vmark 139404 131719 131272 messages/sec avg 134131.66 1.033 Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> # DOCUMENTATION	2008-11-07 15:35:11 +01:00
Ingo Molnar	9fcd18c9e6	sched: re-tune balancing Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.60 \| 2 5.70 after: \| phoenix:~/l> ./lat_ctx -s 0 2 \| "size=0k ovr=2.65 \| 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: \| phoenix:~/sched-tests> ./pipe-test \| 18.26 usecs/loop. \| 14.70 usecs/loop. \| 14.38 usecs/loop. \| 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 \| 8.63 usecs/loop. \| 8.59 usecs/loop. \| 9.03 usecs/loop. \| 8.94 usecs/loop. \| 8.96 usecs/loop. \| 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-05 18:04:38 +01:00
Ben Hutchings	c50cbb05a0	cpu topology: always define CPU topology information This can result in an empty topology directory in sysfs, and requires in-kernel users to protect all uses with #ifdef - see <http://marc.info/?l=linux-netdev&m=120639033904472&w=2>. The documentation of CPU topology specifies what the defaults should be if only partial information is available from the hardware. So we can provide these defaults as a fallback. This patch: - Adds default definitions of the 4 topology macros to <linux/topology.h> - Changes drivers/base/topology.c to use the topology macros unconditionally and to cope with definitions that aren't lvalues - Updates documentation accordingly [ From: Andrew Morton <akpm@linux-foundation.org> - fold now-duplicated code - fix layout ] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Mike Travis <travis@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: John Hawkes <hawkes@sgi.com> Cc: Zhang, Yanmin <yanmin.zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-06-13 10:09:46 +02:00
Ingo Molnar	ea3f01f8af	sched: re-tune NUMA topologies improve the sysbench ramp-up phase and its peak throughput on a 16way NUMA box, by turning on WAKE_AFFINE: tip/sched tip/sched+wake-affine ------------------------------------------------- 1: 700 830 +15.65% 2: 1465 1391 -5.28% 4: 3017 3105 +2.81% 8: 5100 6021 +15.30% 16: 10725 10745 +0.19% 32: 10135 10150 +0.16% 64: 9338 9240 -1.06% 128: 8599 8252 -4.21% 256: 8475 8144 -4.07% ------------------------------------------------- SUM: 57558 57882 +0.56% this change also improves lat_ctx from 6.69 usecs to 1.11 usec: $ ./lat_ctx -s 0 2 "size=0k ovr=1.19 2 1.11 $ ./lat_ctx -s 0 2 "size=0k ovr=1.22 2 6.69 in sysbench it's an overall win with some weakness at the lots-of-clients side. That happens because we now under-balance this workload a bit. To counter that effect, turn on NEWIDLE: wake-idle wake-idle+newidle ------------------------------------------------- 1: 830 834 +0.43% 2: 1391 1401 +0.65% 4: 3105 3091 -0.43% 8: 6021 6046 +0.42% 16: 10745 10736 -0.08% 32: 10150 10206 +0.55% 64: 9240 9533 +3.08% 128: 8252 8355 +1.24% 256: 8144 8384 +2.87% ------------------------------------------------- SUM: 57882 58591 +1.21% as a bonus this not only improves the many-clients case but also improves the (more important) rampup phase. sysbench is a workload that quickly breaks down if the scheduler over-balances, so since it showed an improvement under NEWIDLE this change is definitely good.	2008-05-29 14:46:30 +02:00
Mike Travis	7c16ec585c	cpumask: reduce stack usage in SD_x_INIT initializers * Remove empty cpumask_t (and all non-zero/non-null) variables in SD__INIT macros. Use memset(0) to clear. Also, don't inline the initializer functions to save on stack space in build_sched_domains(). Merge change to include/linux/topology.h that uses the new node_to_cpumask_ptr function in the nr_cpus_node macro into this patch. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-04-19 19:44:59 +02:00

1 2

49 Commits