* Use new set_cpus_allowed_ptr() function added by previous patch,
which instead of passing the "newly allowed cpus" cpumask_t arg
by value, pass it by pointer:
-int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
+int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)
* Cleanup uses of CPU_MASK_ALL.
* Collapse other NR_CPUS changes to arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Use pointers to cpumask_t arguments whenever possible.
Depends on:
[sched-devel]: sched: add new set_cpus_allowed_ptr function
Cc: Len Brown <len.brown@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Change fixed size arrays to per_cpu variables or dynamically allocated
arrays in sched_init() and sched_init_smp().
(1) static struct sched_entity *init_sched_entity_p[NR_CPUS];
(1) static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
(1) static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
(1) static struct rt_rq *init_rt_rq_p[NR_CPUS];
static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];
(1) - these arrays are allocated via alloc_bootmem_low()
* Change sched_domain_debug_one() to use cpulist_scnprintf instead of
cpumask_scnprintf. This reduces the output buffer required and improves
readability when large NR_CPU count machines arrive.
* In sched_create_group() we allocate new arrays based on nr_cpu_ids.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
and MAXNODES counts.
* In some cases, the cpumask variable was initialized but then overwritten
with another value. This is the case for changes like this:
- cpumask_t oldmask = CPU_MASK_ALL;
+ cpumask_t oldmask;
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Move large array "struct bootnode nodes" from stack to _initdata
section to reduce amount of stack space required.
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a simple macro to always return a pointer to the node_to_cpumask(node)
value. This relies on compiler optimization to remove the extra indirection:
#define node_to_cpumask_ptr(v, node) \
cpumask_t _##v = node_to_cpumask(node), *v = &_##v
For those systems with a large cpumask size, then a true pointer
to the array element can be used:
#define node_to_cpumask_ptr(v, node) \
cpumask_t *v = &(node_to_cpumask_map[node])
A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.
The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.
Note: there are no references to either of these new macros in this patch,
only the definition.
Based on 2.6.25-rc5-mm1
# alpha
Cc: Richard Henderson <rth@twiddle.net>
# fujitsu
Cc: David Howells <dhowells@redhat.com>
# ia64
Cc: Tony Luck <tony.luck@intel.com>
# powerpc
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
# sparc
Cc: David S. Miller <davem@davemloft.net>
Cc: William L. Irwin <wli@holomorphy.com>
# x86
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Change the following arrays sized by NR_CPUS to be PERCPU variables:
static struct op_msrs cpu_msrs[NR_CPUS];
static unsigned long saved_lvtpc[NR_CPUS];
Also some minor complaints from checkpatch.pl fixed.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
All changes were transparent except for:
static void nmi_shutdown(void)
{
+ struct op_msrs *msrs = &__get_cpu_var(cpu_msrs);
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 0, 1);
unregister_die_notifier(&profile_exceptions_nb);
- model->shutdown(cpu_msrs);
+ model->shutdown(msrs);
free_msrs();
}
The existing code passed a reference to cpu 0's instance of struct op_msrs
to model->shutdown, whilst the other functions are passed a reference to
<this cpu's> instance of a struct op_msrs. This seemed to be a bug to me
even though as long as cpu 0 and <this cpu> are of the same type it would
have the same effect...?
Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Change the following static arrays sized by NR_CPUS to
per_cpu data variables:
_cpuid4_info *cpuid4_info[NR_CPUS];
_index_kobject *index_kobject[NR_CPUS];
kobject * cache_kobject[NR_CPUS];
* Remove the local NR_CPUS array with a kmalloc'd region in
show_shared_cpu_map().
Also some minor complaints from checkpatch.pl fixed.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits. The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.) This function provides the
caller the means to allocate the correct string length.
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the schedstats implementation does not allow the statistics
to be reset. This patch aims to allow that.
echo 0 > cpuacct.usage
resets the usage. Any other value is not allowed and returns -EINVAL.
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I noticed this when looking at an openswan issue. Openswan (ab?)uses the
tasklet API to defer processing of packets in some situations, with one
packet per tasklet_action(). I started noticing sequences of
backwards-ordered sequence numbers coming over the wire, since new tasklets
are always queued at the head of the list but processed sequentially.
Convert it to instead append new entries to the tail of the list. As an
extra bonus, the splicing code in takeover_tasklets() no longer has to
iterate over the list.
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the rt group scheduling does a per cpu runtime limit, however
the rt load balancer makes no guarantees about an equal spread of real-
time tasks, just that at any one time, the highest priority tasks run.
Solve this by making the runtime limit a global property by borrowing
excessive runtime from the other cpus once the local limit runs out.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Various SMP balancing algorithms require that the bandwidth period
run in sync.
Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Balbir Singh reported:
> 1:mon> t
> [c0000000e7677da0] c000000000067de0 .sys_sched_yield+0x6c/0xbc
> [c0000000e7677e30] c000000000008748 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000400001d09e4
> SP (4000664cb10) is in userspace
> 1:mon> r
> cpu 0x1: Vector: 300 (Data Access) at [c0000000e7677aa0]
> pc: c000000000068e50: .yield_task_fair+0x94/0xc4
> lr: c000000000067de0: .sys_sched_yield+0x6c/0xbc
the check that should have avoided that is:
/*
* Are we the only task in the tree?
*/
if (unlikely(rq->load.weight == curr->se.load.weight))
return;
But I guess that overlooks rt tasks, they also increase the load.
So I guess something like this ought to fix it..
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no need to loop any longer when 'same == 0'.
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
turn off sync wakeups by default. They are not needed anymore - the
buddy logic should be smart enough to keep the system from
overscheduling.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The wakeup buddy logic didn't use the same wakeup granularity logic as the
wakeup preemption did, this might cause the ->next buddy to be selected past
the point where we would have preempted had the task been a single running
instance.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC.
We check that the number of skipped ticks matches the clock jump seen in
__update_rq_clock().
Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kernel/sched.c:506: erreur: implicit declaration of function tick_get_tick_sched
kernel/sched.c:506: erreur: invalid type argument of ->
kernel/sched.c:506: erreur: NOHZ_MODE_INACTIVE undeclared (first use in this function)
kernel/sched.c:506: erreur: (Each undeclared identifier is reported only once
kernel/sched.c:506: erreur: for each function it appears in.)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>