The smp ipi data is passed around and given write access by
other cpus and should be separated from per-cpu data consumed by
this cpu.
Looking for hot lines, I saw call_function_data shared with
tick_cpu_sched.
Signed-off-by: Milton Miller <miltonm@bga.com>
Acked-by: Anton Blanchard <anton@samba.org>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: : Nick Piggin <npiggin@suse.de>
LKML-Reference: <20100118020051.GR12666@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The change in acpi_cpufreq to use smp_call_function_any causes a warning
when it is called since the function erroneously passes the cpu id to
cpumask_of_node rather than the node that the cpu is on. Fix this.
cpumask_of_node(3): node > nr_node_ids(1)
Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f189 #223
Call Trace:
[<ffffffff81028bb3>] cpumask_of_node+0x23/0x58
[<ffffffff81061f51>] smp_call_function_any+0x65/0xfa
[<ffffffff810160d1>] ? do_drv_read+0x0/0x2f
[<ffffffff81015fba>] get_cur_val+0xb0/0x102
[<ffffffff81016080>] get_cur_freq_on_cpu+0x74/0xc5
[<ffffffff810168a7>] acpi_cpufreq_cpu_init+0x417/0x515
[<ffffffff81562ce9>] ? __down_write+0xb/0xd
[<ffffffff8148055e>] cpufreq_add_dev+0x278/0x922
Signed-off-by: David John <davidjon@xenontk.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
clockevents: Convert to raw_spinlock
clockevents: Make tick_device_lock static
debugobjects: Convert to raw_spinlocks
perf_event: Convert to raw_spinlock
hrtimers: Convert to raw_spinlocks
genirq: Convert irq_desc.lock to raw_spinlock
smp: Convert smplocks to raw_spinlocks
rtmutes: Convert rtmutex.lock to raw_spinlock
sched: Convert pi_lock to raw_spinlock
sched: Convert cpupri lock to raw_spinlock
sched: Convert rt_runtime_lock to raw_spinlock
sched: Convert rq->lock to raw_spinlock
plist: Make plist debugging raw_spinlock aware
bkl: Fixup core_lock fallout
locking: Cleanup the name space completely
locking: Further name space cleanups
alpha: Fix fallout from locking changes
locking: Implement new raw_spinlock
locking: Convert raw_rwlock functions to arch_rwlock
locking: Convert raw_rwlock to arch_rwlock
...
Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
This patch can remove spinlock from struct call_function_data, the
reasons are below:
1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
it can atomically test and clear specific cpu, we can use it instead
of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
to protect those in generic_smp_call_function_interrupt().
2: in smp_call_function_many(), after csd_lock() return, the current's
cfd_data is deleted from call_function list, so it not have race
between other cpus, then cfs_data is only used in
smp_call_function_many() that must disable preemption and not from
a hardware interrupthandler or from a bottom half handler to call,
only the correspond cpu can use it, so it not have race in current
cpu, no need cfs_data->lock to protect it.
3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
generic_smp_call_function_interrupt(), so we can define cfs_data->refs
to atomic_t, and no need cfs_data->lock any more.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
[akpm@linux-foundation.org: use atomic_dec_return()]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reason: Change to is_new_memtype_allowed() in x86/urgent
Resolved semantic conflicts in:
arch/x86/mm/pat.c
arch/x86/mm/ioremap.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Because of deadlock possiblities smp_call_function() is not allowed to
be called with interrupts disabled. Add an exception for the cpu not
yet online, as no one else can send smp call function interrupt to this
cpu that is not yet online and as such deadlock condition is not possible.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
These are defined as static cpumask_var_t so if MAXSMP is not used,
they are cleared already. Avoid surprises when MAXSMP is enabled.
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove the use of kmalloc() from the smp_call_function_*()
calls.
Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
data for single cpu ipi calls) started the discussion on the use
of kmalloc() in this code and fixed the
smp_call_function_single(.wait=0) fallback case.
In this patch we complete this by also providing means for the
_many() call, which fully removes the need for kmalloc() in this
code.
The problem with the _many() call is that other cpus might still
be observing our entry when we're done with it. It solved this
by dynamically allocating data elements and RCU-freeing it.
We solve it by using a single per-cpu entry which provides
static storage and solves one half of the problem (avoiding
referencing freed data).
The other half, ensuring the queue iteration it still possible,
is done by placing re-used entries at the head of the list. This
means that if someone was still iterating that entry when it got
moved, he will now re-visit the entries on the list he had
already seen, but avoids skipping over entries like would have
happened had we placed the new entry at the end.
Furthermore, visiting entries twice is not a problem, since we
remove our cpu from the entry's cpumask once its called.
Many thanks to Oleg for his suggestions and him poking holes in
my earlier attempts.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Simplify the barriers in generic remote function call interrupt
code.
Firstly, just unconditionally take the lock and check the list
in the generic_call_function_single_interrupt IPI handler. As
we've just taken an IPI here, the chances are fairly high that
there will be work on the list for us, so do the locking
unconditionally. This removes the tricky lockless list_empty
check and dubious barriers. The change looks bigger than it is
because it is just removing an outer loop.
Secondly, clarify architecture specific IPI locking rules.
Generic code has no tools to impose any sane ordering on IPIs if
they go outside normal cache coherency, ergo the arch code must
make them appear to obey cache coherency as a "memory operation"
to initiate an IPI, and a "memory operation" to receive one.
This way at least they can be reasoned about in generic code,
and smp_mb used to provide ordering.
The combination of these two changes means that explict barriers
can be taken out of queue handling for the single case -- shared
data is explicitly locked, and ipi ordering must conform to
that, so no barriers needed. An extra barrier is needed in the
many handler, so as to ensure we load the list element after the
IPI is received.
Does any architecture actually *need* these barriers? For the
initiator I could see it, but for the handler I would be
surprised. So the other thing we could do for simplicity is just
to require that, rather than just matching with cache coherency,
we just require a full barrier before generating an IPI, and
after receiving an IPI. In which case, the smp_mb()s can go
away. But just for now, we'll be on the safe side and use the
barriers (they're in the slow case anyway).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-arch@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The smp_call_function can be passed a wait parameter telling it to
wait for all the functions running on other CPUs to complete before
returning, or to return without waiting. Unfortunately, this is
currently just a suggestion and not manditory. That is, the
smp_call_function can decide not to return and wait instead.
The reason for this is because it uses kmalloc to allocate storage
to send to the called CPU and that CPU will free it when it is done.
But if we fail to allocate the storage, the stack is used instead.
This means we must wait for the called CPU to finish before
continuing.
Unfortunatly, some callers do no abide by this hint and act as if
the non-wait option is mandatory. The MTRR code for instance will
deadlock if the smp_call_function is set to wait. This is because
the smp_call_function will wait for the other CPUs to finish their
called functions, but those functions are waiting on the caller to
continue.
This patch changes the generic smp_call_function code to use per cpu
variables if the allocation of the data fails for a single CPU call. The
smp_call_function_many will fall back to the smp_call_function_single
if it fails its alloc. The smp_call_function_single is modified
to not force the wait state.
Since we now are using a single data per cpu we must synchronize the
callers to prevent a second caller modifying the data before the
first called IPI functions complete. To do so, I added a flag to
the call_single_data called CSD_FLAG_LOCK. When the single CPU is
called (which can be called when a many call fails an alloc), we
set the LOCK bit on this per cpu data. When the caller finishes
it clears the LOCK bit.
The caller must wait till the LOCK bit is cleared before setting
it. When it is cleared, there is no IPI function using it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
In future, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: James Morris <jmorris@namei.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Impact: new API to reduce stack usage
We're weaning the core code off handing cpumask's around on-stack.
This introduces arch_send_call_function_ipi_mask().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Implementation change to remove cpumask_t from stack.
Actually change smp_call_function_mask() to smp_call_function_many().
We avoid cpumasks on the stack in this version.
(S390 has its own version, but that's going away apparently).
We have to do some dancing to figure out if 0 or 1 other cpus are in
the mask supplied and the online mask without allocating a tmp
cpumask. It's still fairly cheap.
We allocate the cpumask at the end of the call_function_data
structure: if allocation fails we fallback to smp_call_function_single
rather than using the baroque quiescing code (which needs a cpumask on
stack).
(Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: npiggin@suse.de
Cc: axboe@kernel.dk
smp_mb() is needed (to make the memory operations visible globally) before
sending the ipi on the sender and the receiver (on Alpha atleast) needs
smp_read_barrier_depends() in the handler before reading the call_single_queue
list in a lock-free fashion.
On x86, x2apic mode register accesses for sending IPI's don't have serializing
semantics. So the need for smp_mb() before sending the IPI becomes more
critical in x2apic mode.
Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Have smp_call_function_single() return invalid CPU indicies and return
-ENXIO. This function is already executed inside a
get_cpu()..put_cpu() which locks out CPU removal, so rather than
having the higher layers doing another layer of locking to guard
against unplugged CPUs do the test here.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> > Nick Piggin (1):
> > generic-ipi: fix stack and rcu interaction bug in
> > smp_call_function_mask()
>
> I'm still not 100% sure that I have this patch right... I might have seen
> a lockup trace implicating the smp call function path... which may have
> been due to some other problem or a different bug in the new call function
> code, but if some more people can take a look at it before merging?
OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
data properly in the alloc && wait case. Secondly, I wasn't resetting
CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
wait).
After those fixes, the patch boots and runs with the kmalloc commented
out (so it always executes the slowpath).
Signed-off-by: Ingo Molnar <mingo@elte.hu>