UP accessors didn't take care of __percpu notations leading to a lot
of spurious sparse warnings on UP configurations. Fix it.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch updates percpu allocator such that it can serve limited
amount of allocation before slab comes online. This is primarily to
allow slab to depend on working percpu allocator.
Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
much memory space and allocation map slots are reserved. If this
reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
will fail till slab comes online.
The following changes are made to implement early alloc.
* pcpu_mem_alloc() now checks slab_is_available()
* Chunks are allocated using pcpu_mem_alloc()
* Init paths make sure ai->dyn_size is at least as large as
PERCPU_DYNAMIC_EARLY_SIZE.
* Initial alloc maps are allocated in __initdata and copied to
kmalloc'd areas once slab is online.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
size. There's no use case for forcing 0 and the upcoming early alloc
support always requires non-zero dynamic size. Make @dyn_size always
mean minimum dyn_size.
While at it, make pcpu_build_alloc_info() static which doesn't have
any external caller as suggested by David Rientjes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
* 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
eeepc-wmi: include slab.h
staging/otus: include slab.h from usbdrv.h
percpu: don't implicitly include slab.h from percpu.h
kmemcheck: Fix build errors due to missing slab.h
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
iwlwifi: don't include iwl-dev.h from iwl-devtrace.h
x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h
Fix up trivial conflicts in include/linux/percpu.h due to
is_kernel_percpu_address() having been introduced since the slab.h
cleanup with the percpu_up.c splitup.
percpu.h has always been including slab.h to get k[mz]alloc/free() for
UP inline implementation. percpu.h being used by very low level
headers including module.h and sched.h, this meant that a lot files
unintentionally got slab.h inclusion.
Lee Schermerhorn was trying to make topology.h use percpu.h and got
bitten by this implicit inclusion. The right thing to do is break
this ultimately unnecessary dependency. The previous patch added
explicit inclusion of either gfp.h or slab.h to the source files using
them. This patch updates percpu.h such that slab.h is no longer
included from percpu.h.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
lockdep has custom code to check whether a pointer belongs to static
percpu area which is somewhat broken. Implement proper
is_kernel/module_percpu_address() and replace the custom code.
On UP, percpu variables are regular static variables and can't be
distinguished from them. Always return %false on UP.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Commit 3b034b0d08 implemented
per_cpu_ptr_to_phys() but forgot to add UP definition. Add UP
definition which is simple wrapper around __pa().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
o kdump functionality reserves a per cpu area at boot time and exports the
physical address of that area to user space through sys interface. This
area stores some dump related information like cpu register states etc
at the time of crash.
o We were assuming that per cpu area always come from linearly mapped meory
region and using __pa() to determine physical address.
With percpu_alloc=page, per cpu area can come from vmalloc region also and
__pa() breaks.
o This patch implments a new function to convert per cpu address to
physical address.
Before the patch, crash_notes addresses looked as follows.
cpu0 60fffff49800
cpu1 60fffff60800
cpu2 60fffff77800
These are bogus phsyical addresses.
After the patch, address are following.
cpu0 13eb44000
cpu1 13eb43000
cpu2 13eb42000
cpu3 13eb41000
These look fine. I got 4G of memory and /proc/iomem tell me following.
100000000-13fffffff : System RAM
tj: * added missing asm/io.h include reported by Stephen Rothwell
* repositioned per_cpu_ptr_phys() in percpu.c and added comment.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
The previous patch made sparse warn about percpu variables being used
directly without going through percpu accessors. This patch
implements the other half - checking whether non percpu variable is
passed into percpu accessors.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
We have to make __kernel "__attribute__((address_space(0)))" so we can
cast to it.
tj: * put_cpu_var() update.
* Annotations added to dynamic allocator interface.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tejun Heo <tj@kernel.org>
Now that per_cpu__ prefix is gone, there's no distinction between
static and dynamic percpu variables. Make get_cpu_var() take dynamic
percpu variables and ensure that all macros have parentheses around
the parameter evaluation and evaluate the variable parameter only once
such that any expression which evaluates to percpu address can be used
safely.
Signed-off-by: Tejun Heo <tj@kernel.org>
Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.
Now we have sparse, we can use that (next patch).
tj: * Updated to convert stuff which were missed by or added after the
original patch.
* Kill per_cpu_var() macro.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Make the following changes to remove some sparse warnings.
* Make DEFINE_PER_CPU_SECTION() declare __pcpu_unique_* before
defining it.
* Annotate pcpu_extend_area_map() that it is entered with pcpu_lock
held, releases it and then reacquires it.
* Make percpu related macros use unique nested variable names.
* While at it, add pcpu prefix to __size_call[_return]() macros as
to-be-implemented sparse annotations will add percpu specific stuff
to these macros.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
alloc_percpu() couldn't handle array types like "int [100]" due to the
way return type was casted. Fix it by using typeof() instead.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
This patch introduces two things: First this_cpu_ptr and then per cpu
atomic operations.
this_cpu_ptr
------------
A common operation when dealing with cpu data is to get the instance of the
cpu data associated with the currently executing processor. This can be
optimized by
this_cpu_ptr(xx) = per_cpu_ptr(xx, smp_processor_id).
The problem with per_cpu_ptr(x, smp_processor_id) is that it requires
an array lookup to find the offset for the cpu. Processors typically
have the offset for the current cpu area in some kind of (arch dependent)
efficiently accessible register or memory location.
We can use that instead of doing the array lookup to speed up the
determination of the address of the percpu variable. This is particularly
significant because these lookups occur in performance critical paths
of the core kernel. this_cpu_ptr() can avoid memory accesses and
this_cpu_ptr comes in two flavors. The preemption context matters since we
are referring the the currently executing processor. In many cases we must
insure that the processor does not change while a code segment is executed.
__this_cpu_ptr -> Do not check for preemption context
this_cpu_ptr -> Check preemption context
The parameter to these operations is a per cpu pointer. This can be the
address of a statically defined per cpu variable (&per_cpu_var(xxx)) or
the address of a per cpu variable allocated with the per cpu allocator.
per cpu atomic operations: this_cpu_*(var, val)
-----------------------------------------------
this_cpu_* operations (like this_cpu_add(struct->y, value) operate on
abitrary scalars that are members of structures allocated with the new
per cpu allocator. They can also operate on static per_cpu variables
if they are passed to per_cpu_var() (See patch to use this_cpu_*
operations for vm statistics).
These operations are guaranteed to be atomic vs preemption when modifying
the scalar. The calculation of the per cpu offset is also guaranteed to
be atomic at the same time. This means that a this_cpu_* operation can be
safely used to modify a per cpu variable in a context where interrupts are
enabled and preemption is allowed. Many architectures can perform such
a per cpu atomic operation with a single instruction.
Note that the atomicity here is different from regular atomic operations.
Atomicity is only guaranteed for data accessed from the currently executing
processor. Modifications from other processors are still possible. There
must be other guarantees that the per cpu data is not modified from another
processor when using these instruction. The per cpu atomicity is created
by the fact that the processor either executes and instruction or not.
Embedded in the instruction is the relocation of the per cpu address to
the are reserved for the current processor and the RMW action. Therefore
interrupts or preemption cannot occur in the mids of this processing.
Generic fallback functions are used if an arch does not define optimized
this_cpu operations. The functions come also come in the two flavors used
for this_cpu_ptr().
The firstparameter is a scalar that is a member of a structure allocated
through allocpercpu or a per cpu variable (use per_cpu_var(xxx)). The
operations are similar to what percpu_add() and friends do.
this_cpu_read(scalar)
this_cpu_write(scalar, value)
this_cpu_add(scale, value)
this_cpu_sub(scalar, value)
this_cpu_inc(scalar)
this_cpu_dec(scalar)
this_cpu_and(scalar, value)
this_cpu_or(scalar, value)
this_cpu_xor(scalar, value)
Arch code can override the generic functions and provide optimized atomic
per cpu operations. These atomic operations must provide both the relocation
(x86 does it through a segment override) and the operation on the data in a
single instruction. Otherwise preempt needs to be disabled and there is no
gain from providing arch implementations.
A third variant is provided prefixed by irqsafe_. These variants are safe
against hardware interrupts on the *same* processor (all per cpu atomic
primitives are *always* *only* providing safety for code running on the
*same* processor!). The increment needs to be implemented by the hardware
in such a way that it is a single RMW instruction that is either processed
before or after an interrupt.
cc: David Howells <dhowells@redhat.com>
cc: Ingo Molnar <mingo@elte.hu>
cc: Rusty Russell <rusty@rustcorp.com.au>
cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
With x86 converted to embedding allocator, lpage doesn't have any user
left. Kill it along with cpa handling code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Now that percpu core can handle very sparse units, given that vmalloc
space is large enough, embedding first chunk allocator can use any
memory to build the first chunk. This patch teaches
pcpu_embed_first_chunk() about distances between cpus and to use
alloc/free callbacks to allocate node specific areas for each group
and use them for the first chunk.
This brings the benefits of embedding allocator to NUMA configurations
- no extra TLB pressure with the flexibility of unified dynamic
allocator and no need to restructure arch code to build memory layout
suitable for percpu. With units put into atom_size aligned groups
according to cpu distances, using large page for dynamic chunks is
also easily possible with falling back to reuglar pages if large
allocation fails.
Embedding allocator users are converted to specify NULL
cpu_distance_fn, so this patch doesn't cause any visible behavior
difference. Following patches will convert them.
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently units are mapped sequentially into address space. This
patch adds pcpu_unit_offsets[] which allows units to be mapped to
arbitrary offsets from the chunk base address. This is necessary to
allow sparse embedding which might would need to allocate address
ranges and memory areas which aren't aligned to unit size but
allocation atom size (page or large page size). This also simplifies
things a bit by removing the need to calculate offset from unit
number.
With this change, there's no need for the arch code to know
pcpu_unit_size. Update pcpu_setup_first_chunk() and first chunk
allocators to return regular 0 or -errno return code instead of unit
size or -errno.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Till now, non-linear cpu->unit map was expressed using an integer
array which maps each cpu to a unit and used only by lpage allocator.
Although how many units have been placed in a single contiguos area
(group) is known while building unit_map, the information is lost when
the result is recorded into the unit_map array. For lpage allocator,
as all allocations are done by lpages and whether two adjacent lpages
are in the same group or not is irrelevant, this didn't cause any
problem. Non-linear cpu->unit mapping will be used for sparse
embedding and this grouping information is necessary for that.
This patch introduces pcpu_alloc_info which contains all the
information necessary for initializing percpu allocator.
pcpu_alloc_info contains array of pcpu_group_info which describes how
units are grouped and mapped to cpus. pcpu_group_info also has
base_offset field to specify its offset from the chunk's base address.
pcpu_build_alloc_info() initializes this field as if all groups are
allocated back-to-back as is currently done but this will be used to
sparsely place groups.
pcpu_alloc_info is a rather complex data structure which contains a
flexible array which in turn points to nested cpu_map arrays.
* pcpu_alloc_alloc_info() and pcpu_free_alloc_info() are provided to
help dealing with pcpu_alloc_info.
* pcpu_lpage_build_unit_map() is updated to build pcpu_alloc_info,
generalized and renamed to pcpu_build_alloc_info().
@cpu_distance_fn may be NULL indicating that all cpus are of
LOCAL_DISTANCE.
* pcpul_lpage_dump_cfg() is updated to process pcpu_alloc_info,
generalized and renamed to pcpu_dump_alloc_info(). It now also
prints which group each alloc unit belongs to.
* pcpu_setup_first_chunk() now takes pcpu_alloc_info instead of the
separate parameters. All first chunk allocators are updated to use
pcpu_build_alloc_info() to build alloc_info and call
pcpu_setup_first_chunk() with it. This has the side effect of
packing units for sparse possible cpus. ie. if cpus 0, 2 and 4 are
possible, they'll be assigned unit 0, 1 and 2 instead of 0, 2 and 4.
* x86 setup_pcpu_lpage() is updated to deal with alloc_info.
* sparc64 setup_per_cpu_areas() is updated to build alloc_info.
Although the changes made by this patch are pretty pervasive, it
doesn't cause any behavior difference other than packing of sparse
cpus. It mostly changes how information is passed among
initialization functions and makes room for more flexibility.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Unit map handling will be generalized and extended and used for
embedding sparse first chunk and other purposes. Relocate two
unit_map related functions upward in preparation. This patch just
moves the code without any actual change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Now that all actual first chunk allocation and copying happen in the
first chunk allocators and helpers, there's no reason for
pcpu_setup_first_chunk() to try to determine @dyn_size automatically.
The only left user is page first chunk allocator. Make it determine
dyn_size like other allocators and make @dyn_size mandatory for
pcpu_setup_first_chunk().
Signed-off-by: Tejun Heo <tj@kernel.org>