Now that explicit invocation of percpu_ref_exit() is necessary to free
the percpu counter, we can implement percpu_ref_reinit() which
reinitializes a released percpu_ref. This can be used implement
scalable gating switch which can be drained and then re-opened without
worrying about memory allocation failures.
percpu_ref_is_zero() is added to be used in a sanity check in
percpu_ref_exit(). As this function will be useful for other purposes
too, make it a public interface.
v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We
only need data dep barrier and smp_load_acquire() is stronger and
heavier on some archs. Spotted by Lai Jiangshan.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Currently, a percpu_ref undoes percpu_ref_init() automatically by
freeing the allocated percpu area when the percpu_ref is killed.
While seemingly convenient, this has the following niggles.
* It's impossible to re-init a released reference counter without
going through re-allocation.
* In the similar vein, it's impossible to initialize a percpu_ref
count with static percpu variables.
* We need and have an explicit destructor anyway for failure paths -
percpu_ref_cancel_init().
This patch removes the automatic percpu counter freeing in
percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
generic destructor now named percpu_ref_exit(). percpu_ref_destroy()
is considered but it gets confusing with percpu_ref_kill() while
"exit" clearly indicates that it's the counterpart of
percpu_ref_init().
All percpu_ref_cancel_init() users are updated to invoke
percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
added to the destruction path of all percpu_ref users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Li Zefan <lizefan@huawei.com>
percpu_ref->pcpu_count is a percpu pointer with a status flag in its
lowest bit. As such, it always goes through arithmetic operations
which is very cumbersome to do on a pointer. It has to be first
casted to unsigned long and then back.
Let's just make the field unsigned long so that we can skip the first
casts. While at it, rename it to pcpu_counter_ptr to clarify that
it's a pointer value.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
* All four percpu_ref_*() operations implemented in the header file
perform the same operation to determine whether the percpu_ref is
alive and extract the percpu pointer. Factor out the common logic
into __pcpu_ref_alive(). This doesn't change the generated code.
* There are a couple places in percpu-refcount.c which masks out
PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into
pcpu_count_ptr().
* The above changes make the WARN_ON_ONCE() conditional at the top of
percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test
PCPU_REF_DEAD directly and remove REF_STATUS().
This patch doesn't introduce any functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
percpu-refcount currently reserves two lowest bits of its percpu
pointer to indicate its state; however, only one bit is used for
PCPU_REF_DEAD.
Simplify it by removing PCPU_STATUS_BITS/MASK and testing
PCPU_REF_DEAD directly. This also allows the compiler to choose a
more efficient instruction depending on the architecture.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
ioctx_alloc() reaches inside percpu_ref and directly frees
->pcpu_count in its failure path, which is quite gross. percpu_ref
has been providing a proper interface to do this,
percpu_ref_cancel_init(), for quite some time now. Let's use that
instead.
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>
After the recent changes, when POOL_DISASSOCIATED is cleared, the
running worker's local CPU should be the same as pool->cpu without any
exception even during cpu-hotplug. Update the sanity check in
process_one_work() accordingly.
This patch changes "(proposition_A && proposition_B && proposition_C)"
to "(proposition_B && proposition_C)", so if the old compound
proposition is true, the new one must be true too. so this will not
hide any possible bug which can be caught by the old test.
tj: Minor updates to the description.
CC: Jason J. Herne <jjherne@linux.vnet.ibm.com>
CC: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The commit a9ab775bca ("workqueue: directly restore CPU affinity of
workers from CPU_ONLINE") moved the pool->lock into rebind_workers()
without also moving "pool->flags &= ~POOL_DISASSOCIATED".
There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not
being moved together, but there isn't any benefit either. We move it
into rebind_workers() and achieve these benefits:
1) Better readability. POOL_DISASSOCIATED is cleared in
rebind_workers() as expected.
2) When POOL_DISASSOCIATED is cleared, we can ensure that all the
running workers of the pool are on the local CPU (pool->cpu).
tj: Cosmetic updates to the code and description.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
__verify_pcpu_ptr() is used to verify that a specified parameter is
actually an percpu pointer by percpu accessor and operation
implementations. Currently, where it's called isn't clearly defined
and we just ensure that it's invoked at least once for all accessors
and operations.
The lack of clarity on when it should be called isn't nice and given
that this is a completely generic issue, there's no reason to make
archs worry about it.
This patch updates __verify_pcpu_ptr() invocations such that it's
always invoked from the final generic wrapper once per access or
operation. As this is already the case for {raw|this}_cpu_*()
definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr()
accessors need to be updated.
This change makes it unnecessary for archs to worry about
__verify_pcpu_ptr(). x86's arch_raw_cpu_ptr() is updated accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
percpu macros are difficult to read. It's partly because they're
fairly complex but also because they simply lack visual and
conventional consistency to an unusual degree. The preceding patches
tried to organize macro definitions consistently by their roles. This
patch makes the following cosmetic changes to improve overall
readability.
* Use consistent convention for multi-line macro definitions - "do {"
or "({" are now put on their own lines and the line continuing '\'
are all put on the same column.
* Temp variables used inside macro are consistently given "__" prefix.
* When a macro argument is passed to another macro or a function,
putting extra parenthses around it doesn't help anything. Don't put
them.
* _this_cpu_generic_*() are renamed to this_cpu_generic_*() so that
they're consistent with raw_cpu_generic_*().
* Reorganize raw_cpu_*() and this_cpu_*() definitions so that trivial
wrappers are collected in one place after actual operation
definitions.
* Other misc cleanups including reorganizing comments.
All changes in this patch are cosmetic and cause no functional
difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
__this_cpu_*() operations are the same as raw_cpu_*() operations
except for the added __this_cpu_preempt_check(). Curiously, these
were defined using __pcu_size_call_*() instead of being layered on top
of raw_cpu_*().
Let's layer them so that __this_cpu_*() are defined in terms of
raw_cpu_*(). It's simpler and less error-prone this way.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
* In include/asm-generic/percpu.h, collect {raw|_this}_cpu_generic*()
macros into one place. They were dispersed through
{raw|this}_cpu_*_N() definitions and the visiual inconsistency was
making following the code unnecessarily difficult.
* In include/linux/percpu-defs.h, move __verify_pcpu_ptr() later in
the file so that it's right above accessor definitions where it's
actually used.
This is pure reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
We're in the process of moving all percpu accessors and operations to
include/linux/percpu-defs.h so that they're available to arch headers
without having to include full include/linux/percpu.h which may cause
cyclic inclusion dependency.
This patch moves {raw|this}_cpu_*() definitions from
include/linux/percpu.h to include/linux/percpu-defs.h. The code is
moved mostly verbatim; however, raw_cpu_*() are placed above
this_cpu_*() which is more conventional as the raw operations may be
used to defined other variants.
This is pure reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
{raw|this}_cpu_*_N() operations are expected to be provided by archs
and the generic definitions are provided as fallbacks. As such, these
firmly belong to include/asm-generic/percpu.h.
Move the generic definitions to include/asm-generic/percpu.h. The
code is moved mostly verbatim; however, raw_cpu_*_N() are placed above
this_cpu_*_N() which is more conventional as the raw operations may be
used to defined other variants.
This is pure reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Currently, percpu allows two separate methods for overriding
{raw|this}_cpu_*() ops - for a given operation, an arch can provide
whole replacement or sized sub operations to override specific parts
of it. e.g. arch either can provide this_cpu_add() or
this_cpu_add_4() to override only the 4 byte operation.
While quite flexible on a glance, the dual-overriding scheme
complicates the code path for no actual gain. It compilcates the
already complex operation definitions and if an arch wants to override
all sizes, it can easily provide all variants anyway. In fact, no
arch is actually making use of whole operation override.
Another oddity is that __this_cpu_*() operations are defined in the
same way as raw_cpu_*() but ignores full overrides of the raw_cpu_*()
and doesn't allow full operation override, so if an arch provides
whole overrides for raw_cpu_*() operations __this_cpu_*() ends up
using the generic implementations.
More importantly, it takes away the layering between arch-specific and
generic parts making it impossible for the generic part to implement
arch-independent features on top of arch-specific overrides.
This patch removes the support for whole operation overrides. As no
arch is using it, this doesn't cause any actual difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Reorganize for better readability.
* Accessor definitions are collected into one place and SMP and UP now
define them in the same order.
* Definitions are layered when possible - e.g. per_cpu() is now
defined in terms of this_cpu_ptr().
* Rather pointless comment dropped.
* per_cpu(), __raw_get_cpu_var() and __get_cpu_var() are defined in a
way which can be shared between SMP and UP and moved out of
CONFIG_SMP blocks.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
include/linux/percpu-defs.h is gonna host all accessors and operations
so that arch headers can make use of them too without worrying about
circular dependency through include/linux/percpu.h.
This patch moves the following accessors from include/linux/percpu.h
to include/linux/percpu-defs.h.
* get/put_cpu_var()
* get/put_cpu_ptr()
* per_cpu_ptr()
This is pure reorgniazation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
The roles of the various percpu header files has become unclear.
There are four header files involved.
include/linux/percpu-defs.h
include/linux/percpu.h
include/asm-generic/percpu.h
arch/*/include/asm/percpu.h
The original intention for include/asm-generic/percpu.h is providing
generic definitions for arch-overridable parts; however, it now hosts
various stuff which can't be overridden by archs.
Also, include/linux/percpu-defs.h was initially added to contain
section and percpu variable definition macros so that arch header
files can make use of them without worrying about introducing cyclic
inclusion dependency by including include/linux/percpu.h; however,
arch headers sometimes need to access percpu variables too and this is
one of the reasons why some accessors were implemented in
include/linux/asm-generic/percpu.h.
Let's clear up the situation by making include/asm-generic/percpu.h
contain only arch-overridable parts and moving accessors and
operations into include/linux/percpu-defs. Note that this patch only
moves things from include/asm-generic/percpu.h.
include/linux/percpu.h will be taken care of by later patches.
This patch moves the followings.
* SHIFT_PERCPU_PTR() / VERIFY_PERCPU_PTR()
* per_cpu()
* raw_cpu_ptr()
* this_cpu_ptr()
* __get_cpu_var()
* __raw_get_cpu_var()
* __this_cpu_ptr()
* PER_CPU_[SHARED_]ALIGNED_SECTION
* PER_CPU_[SHARED_]ALIGNED_SECTION
* PER_CPU_FIRST_SECTION
This patch is pure reorganization.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Currently, archs can override raw_cpu_ptr() directly; however, we
wanna build a layer of indirection in the generic part of percpu so
that we can implement generic features there without affecting archs.
Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by
generic percpu code. The two are identical for now. x86 is currently
the only arch which overrides raw_cpu_ptr() and is converted to
define arch_raw_cpu_ptr() instead.
This doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
It has been about half a decade since all archs started using the
dynamic percpu allocator and thus the same SHIFT_PERCPU_PTR()
implementation. There's no benefit in overriding SHIFT_PERCPU_PTR()
anymore.
Remove #ifndef around it to clarify that this is identical regardless
of the arch.
This patch doesn't cause any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
This fixes use-after-free of epi->fllink.next inside list loop macro.
This loop actually releases elements in the body. The list is
rcu-protected but here we cannot hold rcu_read_lock because we need to
lock mutex inside.
The obvious solution is to use list_for_each_entry_safe(). RCU-ness
isn't essential because nobody can change this list under us, it's final
fput for this file.
The bug was introduced by ae10b2b4eb ("epoll: optimize EPOLL_CTL_DEL
using rcu")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Stable <stable@vger.kernel.org> # 3.13+
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit e1edf18b20.
This patch was a misguided attempt at fixing offb for LE ppc64
kernels on BE qemu but is just wrong ... it breaks real LE/LE
setups, LE with real HW, and existing mixed endian systems
that did the fight thing with the appropriate device-tree
property. Bad reviewing on my part, sorry.
The right fix is to either make qemu change its endian when
the guest changes endian (working on that) or to use the
existing foreign endian support.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org> [v3.13+]
---
Pull networking fixes from David Miller:
1) Fix checksumming regressions, from Tom Herbert.
2) Undo unintentional permissions changes for SCTP rto_alpha and
rto_beta sysfs knobs, from Denial Borkmann.
3) VXLAN, like other IP tunnels, should advertize it's encapsulation
size using dev->needed_headroom instead of dev->hard_header_len.
From Cong Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: sctp: fix permissions for rto_alpha and rto_beta knobs
vxlan: Checksum fixes
net: add skb_pop_rcv_encapsulation
udp: call __skb_checksum_complete when doing full checksum
net: Fix save software checksum complete
net: Fix GSO constants to match NETIF flags
udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
vxlan: use dev->needed_headroom instead of dev->hard_header_len
MAINTAINERS: update cxgb4 maintainer