Impact: Reduce stack usage, use new cpumask API. ALPHA mod!
Main change is that irq_default_affinity becomes a cpumask_var_t, so
treat it as a pointer (this effects alpha).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: change existing irq_chip API
Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.
Fortunately, not widely used code, but hits a few architectures.
Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly. Ingo, does this break anything?
(Folded in fix from KOSAKI Motohiro)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Impact: new feature
Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.
To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.
When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).
This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preserve user-modified affinities on interrupts
Kumar Galak noticed that commit
1840475676 (genirq: Expose default irq
affinity mask (take 3))
overrides an already set affinity setting across a free /
request_irq(). Happens e.g. with ifdown/ifup of a network device.
Change the logic to mark the affinities as set and keep them
intact. This also fixes the unlocked access to irq_desc in
irq_select_affinity() when called from irq_affinity_proc_write()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This variable is only used in the source file, so make it static.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are a handful of loops that go from 0 to nr_irqs and use
get_irq_desc() on them. These would allocate all the irq_desc
entries, regardless of the need for them.
Use the smarter for_each_irq_desc() iterator that will only iterate
over the present ones.
v2: make sure arch without GENERIC_HARDIRQS work too
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.
Preallocate 32 irq_desc, and irq_desc() will try to get more.
( No change in functionality is expected anywhere, except the odd build
failure where we missed a code site or where a crossing commit itroduces
new irq_desc[] usage. )
v2: according to Eric, change get_irq_desc() to irq_desc()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
at this point nr_irqs is equal NR_IRQS
convert a few easy users from NR_IRQS to dynamic nr_irqs.
v2: according to Eric, we need to take care of arch without generic_hardirqs
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Switch /proc/irq/*/smp_affinity , /proc/irq/default_smp_affinity to
seq_files.
cat(1) reads with 1024 chunks by default, with high enough NR_CPUS, there
will be -EINVAL.
As side effect, there are now two less users of the ->read_proc interface.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current IRQ affinity interface does not provide a way to set affinity
for the IRQs that will be allocated/activated in the future.
This patch creates /proc/irq/default_smp_affinity that lets users set
default affinity mask for the newly allocated IRQs. Changing the default
does not affect affinity masks for the currently active IRQs, they
have to be changed explicitly.
Updated based on Paul J's comments and added some more documentation.
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: a.p.zijlstra@chello.nl
Cc: tglx@linutronix.de
Cc: rdunlap@xenotime.net
Cc: mingo@elte.hu
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is useful to debug problems with interrupt handlers that return
sometimes IRQ_NONE.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On SN, only allow one bit to be set in the smp_affinty mask when
redirecting an interrupt. Currently setting multiple bits is allowed, but
only the first bit is used in determining the CPU to redirect to. This has
caused confusion among some customers.
[akpm@linux-foundation.org: fixes]
Signed-off-by: John Keller <jpk@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
setup_irq() releases a desc->lock before calling register_handler_proc(), so
the iteration over the IRQ action list is not protected.
(akpm: the check itself is still racy, but at least it probably won't oops
now).
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide funtions to:
- check, whether an interrupt can set the affinity
- pin the interrupt to a given cpu
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
[akpm@osdl.org: alpha build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag so we can prevent the irq balancing of an interrupt. Move the
bits, so we have room for more :)
Necessary for the ability to setup clocksources more flexible (e.g. use the
different HPET channels per CPU)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While running my MCA test (hardware error injection) on 2.6.19,
I got some warning like following:
> BUG: warning at kernel/irq/migration.c:27/move_masked_irq()
>
> Call Trace:
> [<a000000100013d20>] show_stack+0x40/0xa0
> sp=e00000006b2578d0 bsp=e00000006b2510b0
> [<a000000100013db0>] dump_stack+0x30/0x60
> sp=e00000006b257aa0 bsp=e00000006b251098
> [<a0000001000de430>] move_masked_irq+0xb0/0x240
> sp=e00000006b257aa0 bsp=e00000006b251070
> [<a0000001000de6a0>] move_native_irq+0xe0/0x180
> sp=e00000006b257aa0 bsp=e00000006b251040
> [<a00000010004ff50>] iosapic_end_level_irq+0x30/0xe0
> sp=e00000006b257aa0 bsp=e00000006b251020
> [<a0000001000d94d0>] __do_IRQ+0x170/0x400
> sp=e00000006b257aa0 bsp=e00000006b250fd8
> [<a0000001000116f0>] ia64_handle_irq+0x1b0/0x260
> sp=e00000006b257aa0 bsp=e00000006b250fa8
> [<a00000010000c3a0>] ia64_leave_kernel+0x0/0x280
> sp=e00000006b257aa0 bsp=e00000006b250fa8
> [<a000000100690cf0>] _spin_unlock_irqrestore+0x30/0x60
> sp=e00000006b257c70 bsp=e00000006b250f90
It comes from:
[kernel/irq/migration.c]
26 if (CHECK_IRQ_PER_CPU(desc->status)) {
27 WARN_ON(1);
28 return;
29 }
By putting some printk in kernel, I found that irqbalance is trying to
move CPEI which is handled as PER_CPU irq. That's why.
CPEI(Corrected Platform Error Interrupt) is ia64 specific irq, is
allowed to pin to particular processor which selected by the platform, and
even it is PER_CPU but it has set_affinity handler (=iosapic_set_affinity)
as same as other IO-SAPIC-level interrupts. (I don't know why, but
I guess that there would be typical situation where the handler for
migration is needed, such as hotplug - the processor going to be
offline/hot-removed.)
To shut up this warning, there are 2 way at least:
a) fix CPEI stuff
b) prohibit setting affinity to PER_CPU irq
I'm not sure what stuff of CPEI need to be fixed, but I think that
returning error to attempting move PER_CPU irq is useful for all
applications since it will never work.
Following small patch takes b) style.
It works, the warning disappeared and irqbalance still runs well.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lib/bitmap.c:bitmap_parse() is a library function that received as input a
user buffer. This seemed to have originated from the way the write_proc
function of the /proc filesystem operates.
This has been reworked to not use kmalloc and eliminates a lot of
get_user() overhead by performing one access_ok before using __get_user().
We need to test if we are in kernel or user space (is_user) and access the
buffer differently. We cannot use __get_user() to access kernel addresses
in all cases, for example in architectures with separate address space for
kernel and user.
This function will be useful for other uses as well; for example, taking
input for /sysfs instead of /proc, so it was changed to accept kernel
buffers. We have this use for the Linux UWB project, as part as the
upcoming bandwidth allocator code.
Only a few routines used this function and they were changed too.
Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com>
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidation: remove the irq_dir[NR_IRQS] and the smp_affinity_entry[NR_IRQS]
arrays and move them into the irq_desc[] array.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
irq_desc[NR_IRQS].affinity field.
[akpm@osdl.org: sparc64 build fix]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>