This patch adds missed "__" prefixes, otherwise these functions
works as irq/preemption safe.
Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
It doesn't make sense to trace irq off or do irq flags
lock proving inside 'this_cpu' operations, so replace local_irq_*
with raw_local_irq_* in 'this_cpu' op.
Also the patch fixes onelockdep warning[1] by the replacement, see
below:
In commit: 933393f58fef9963eac61db8093689544e29a600(percpu:
Remove irqsafe_cpu_xxx variants), local_irq_save/restore(flags) are
added inside this_cpu_inc operation, so that trace_hardirqs_off_caller
will be called by trace_hardirqs_on_caller directly because
__debug_atomic_inc is implemented as this_cpu_inc, which may trigger
the lockdep warning[1], for example in the below ARM scenary:
kernel_thread_helper /*irq disabled*/
->trace_hardirqs_on_caller /*hardirqs_enabled was set*/
->trace_hardirqs_off_caller /*hardirqs_enabled cleared*/
__this_cpu_add(redundant_hardirqs_on)
->trace_hardirqs_off_caller /*irq disabled, so call here*/
The 'unannotated irqs-on' warning will be triggered somewhere because
irq is just enabled after the irq trace in kernel_thread_helper.
[1],
[ 0.162841] ------------[ cut here ]------------
[ 0.167694] WARNING: at kernel/lockdep.c:3493 check_flags+0xc0/0x1d0()
[ 0.174468] Modules linked in:
[ 0.177703] Backtrace:
[ 0.180328] [<c00171f0>] (dump_backtrace+0x0/0x110) from [<c0412320>] (dump_stack+0x18/0x1c)
[ 0.189086] r6:c051f778 r5:00000da5 r4:00000000 r3:60000093
[ 0.195007] [<c0412308>] (dump_stack+0x0/0x1c) from [<c00410e8>] (warn_slowpath_common+0x54/0x6c)
[ 0.204223] [<c0041094>] (warn_slowpath_common+0x0/0x6c) from [<c0041124>] (warn_slowpath_null+0x24/0x2c)
[ 0.214111] r8:00000000 r7:00000000 r6:ee069598 r5:60000013 r4:ee082000
[ 0.220825] r3:00000009
[ 0.223693] [<c0041100>] (warn_slowpath_null+0x0/0x2c) from [<c0088f38>] (check_flags+0xc0/0x1d0)
[ 0.232910] [<c0088e78>] (check_flags+0x0/0x1d0) from [<c008d348>] (lock_acquire+0x4c/0x11c)
[ 0.241668] [<c008d2fc>] (lock_acquire+0x0/0x11c) from [<c0415aa4>] (_raw_spin_lock+0x3c/0x74)
[ 0.250610] [<c0415a68>] (_raw_spin_lock+0x0/0x74) from [<c010a844>] (set_task_comm+0x20/0xc0)
[ 0.259521] r6:ee069588 r5:ee0691c0 r4:ee082000
[ 0.264404] [<c010a824>] (set_task_comm+0x0/0xc0) from [<c0060780>] (kthreadd+0x28/0x108)
[ 0.272857] r8:00000000 r7:00000013 r6:c0044a08 r5:ee0691c0 r4:ee082000
[ 0.279571] r3:ee083fe0
[ 0.282470] [<c0060758>] (kthreadd+0x0/0x108) from [<c0044a08>] (do_exit+0x0/0x6dc)
[ 0.290405] r5:c0060758 r4:00000000
[ 0.294189] ---[ end trace 1b75b31a2719ed1c ]---
[ 0.299041] possible reason: unannotated irqs-on.
[ 0.303955] irq event stamp: 5
[ 0.307159] hardirqs last enabled at (4): [<c001331c>] no_work_pending+0x8/0x2c
[ 0.314880] hardirqs last disabled at (5): [<c0089b08>] trace_hardirqs_on_caller+0x60/0x26c
[ 0.323547] softirqs last enabled at (0): [<c003f754>] copy_process+0x33c/0xef4
[ 0.331207] softirqs last disabled at (0): [< (null)>] (null)
[ 0.337585] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch adds missed "__" into function prefix.
Otherwise on all archectures (except x86) it expands to irq/preemtion-safe
variant: _this_cpu_generic_add_return(), which do extra irq-save/irq-restore.
Optimal generic implementation is __this_cpu_generic_add_return().
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
We simply say that regular this_cpu use must be safe regardless of
preemption and interrupt state. That has no material change for x86
and s390 implementations of this_cpu operations. However, arches that
do not provide their own implementation for this_cpu operations will
now get code generated that disables interrupts instead of preemption.
-tj: This is part of on-going percpu API cleanup. For detailed
discussion of the subject, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1222078
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled,
the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early
panic during boot unless we always align cpu_slab properly.
In principle we could remove the alignment-testing VM_BUG_ON() for
architectures that don't have CMPXCHG_LOCAL, but leaving it in means
that new code will tend not to break x86 even if it is introduced
on another platform, and it's low cost to require alignment.
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
The SLUB allocator use of the cmpxchg_double logic was wrong: it
actually needs the irq-safe one.
That happens automatically when we use the native unlocked 'cmpxchg8b'
instruction, but when compiling the kernel for older x86 CPUs that do
not support that instruction, we fall back to the generic emulation
code.
And if you don't specify that you want the irq-safe version, the generic
code ends up just open-coding the cmpxchg8b equivalent without any
protection against interrupts or preemption. Which definitely doesn't
work for SLUB.
This was reported by Werner Landgraf <w.landgraf@ru.ru>, who saw
instability with his distro-kernel that was compiled to support pretty
much everything under the sun. Most big Linux distributions tend to
compile for PPro and later, and would never have noticed this problem.
This also fixes the prototypes for the irqsafe cmpxchg_double functions
to use 'bool' like they should.
[ Btw, that whole "generic code defaults to no protection" design just
sounds stupid - if the code needs no protection, there is no reason to
use "cmpxchg_double" to begin with. So we should probably just remove
the unprotected version entirely as pointless. - Linus ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reported-and-tested-by: werner <w.landgraf@ru.ru>
Acked-and-tested-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1105041539050.3005@ionos
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce this_cpu_cmpxchg_double(). this_cpu_cmpxchg_double() allows
the comparison between two consecutive words and replaces them if
there is a match.
bool this_cpu_cmpxchg_double(pcp1, pcp2,
old_word1, old_word2, new_word1, new_word2)
this_cpu_cmpxchg_double does not return the old value (difficult since
there are two words) but a boolean indicating if the operation was
successful.
The first percpu variable must be double word aligned!
-tj: Updated to return bool instead of int, converted size check to
BUILD_BUG_ON() instead of VM_BUG_ON() and other cosmetic changes.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Generic code to provide new per cpu atomic features
this_cpu_cmpxchg
this_cpu_xchg
Fallback occurs to functions using interrupts disable/enable
to ensure correct per cpu atomicity.
Fallback to regular cmpxchg and xchg is not possible since per cpu atomic
semantics include the guarantee that the current cpus per cpu data is
accessed atomically. Use of regular cmpxchg and xchg requires the
determination of the address of the per cpu data before regular cmpxchg
or xchg which therefore cannot be atomically included in an xchg or
cmpxchg without segment override.
tj: - Relocated new ops to conform better to the general organization.
- This patch contains a trivial comment fix.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
- include/linux/percpu.h: this_cpu_add_return() and friends were
located next to __this_cpu_add_return(). However, the overall
organization is to first group by preemption safeness. Relocate
this_cpu_add_return() and friends to preemption-safe area.
- arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
other more basic operations. Relocate [__]this_cpu_add_return_8()
so that they're first grouped by preemption safeness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Introduce generic support for this_cpu_add_return etc.
The fallback is to realize these operations with simpler __this_cpu_ops.
tj: - Reformatted __cpu_size_call_return2() to make it more consistent
with its neighbors.
- Dropped unnecessary temp variable ret__ from
__this_cpu_generic_add_return().
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: update comments to reflect that percpu allocations are always zero-filled
percpu: Optimize __get_cpu_var()
x86, percpu: Optimize this_cpu_ptr
percpu: clear memory allocated with the km allocator
percpu: fix build breakage on s390 and cleanup build configuration tests
percpu: use percpu allocator on UP too
percpu: reduce PCPU_MIN_UNIT_SIZE to 32k
vmalloc: pcpu_get/free_vm_areas() aren't needed on UP
Fixed up trivial conflicts in include/linux/percpu.h
On UP, percpu allocations were redirected to kmalloc. This has the
following problems.
* For certain amount of allocations (determined by
PERCPU_DYNAMIC_EARLY_SLOTS and PERCPU_DYNAMIC_EARLY_SIZE), percpu
allocator can be used before the usual kernel memory allocator is
brought online. On SMP, this is used to initialize the kernel
memory allocator.
* percpu allocator honors alignment upto PAGE_SIZE but kmalloc()
doesn't. For example, workqueue makes use of larger alignments for
cpu_workqueues.
Currently, users of percpu allocators need to handle UP differently,
which is somewhat fragile and ugly. Other than small amount of
memory, there isn't much to lose by enabling percpu allocator on UP.
It can simply use kernel memory based chunk allocation which was added
for SMP archs w/o MMUs.
This patch removes mm/percpu_up.c, builds mm/percpu.c on UP too and
makes UP build use percpu-km. As percpu addresses and kernel
addresses are always identity mapped and static percpu variables don't
need any special treatment, nothing is arch dependent and mm/percpu.c
implements generic setup_per_cpu_areas() for UP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
In preparation of enabling percpu allocator for UP, reduce
PCPU_MIN_UNIT_SIZE to 32k. On UP, the first chunk doesn't have to
include static percpu variables and chunk size can be smaller which is
important as UP percpu allocator will use contiguous kernel memory to
populate chunks.
PCPU_MIN_UNIT_SIZE also determines the maximum supported allocation
size but 32k should still be enough.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
UP accessors didn't take care of __percpu notations leading to a lot
of spurious sparse warnings on UP configurations. Fix it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch updates percpu allocator such that it can serve limited
amount of allocation before slab comes online. This is primarily to
allow slab to depend on working percpu allocator.
Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
much memory space and allocation map slots are reserved. If this
reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
will fail till slab comes online.
The following changes are made to implement early alloc.
* pcpu_mem_alloc() now checks slab_is_available()
* Chunks are allocated using pcpu_mem_alloc()
* Init paths make sure ai->dyn_size is at least as large as
PERCPU_DYNAMIC_EARLY_SIZE.
* Initial alloc maps are allocated in __initdata and copied to
kmalloc'd areas once slab is online.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
size. There's no use case for forcing 0 and the upcoming early alloc
support always requires non-zero dynamic size. Make @dyn_size always
mean minimum dyn_size.
While at it, make pcpu_build_alloc_info() static which doesn't have
any external caller as suggested by David Rientjes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
eeepc-wmi: include slab.h
staging/otus: include slab.h from usbdrv.h
percpu: don't implicitly include slab.h from percpu.h
kmemcheck: Fix build errors due to missing slab.h
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h
Fix up trivial conflicts in include/linux/percpu.h due to
is_kernel_percpu_address() having been introduced since the slab.h
cleanup with the percpu_up.c splitup.
percpu.h has always been including slab.h to get k[mz]alloc/free() for
UP inline implementation. percpu.h being used by very low level
headers including module.h and sched.h, this meant that a lot files
unintentionally got slab.h inclusion.
Lee Schermerhorn was trying to make topology.h use percpu.h and got
bitten by this implicit inclusion. The right thing to do is break
this ultimately unnecessary dependency. The previous patch added
explicit inclusion of either gfp.h or slab.h to the source files using
them. This patch updates percpu.h such that slab.h is no longer
included from percpu.h.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
lockdep has custom code to check whether a pointer belongs to static
percpu area which is somewhat broken. Implement proper
is_kernel/module_percpu_address() and replace the custom code.
On UP, percpu variables are regular static variables and can't be
distinguished from them. Always return %false on UP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Commit 3b034b0d08 implemented
per_cpu_ptr_to_phys() but forgot to add UP definition. Add UP
definition which is simple wrapper around __pa().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>