Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

Conflicts:
	arch/x86/kernel/ptrace.c

Pull the latest RCU tree from Paul E. McKenney:

"       The major features of this series are:

  1.	A first version of no-callbacks CPUs.  This version prohibits
  	offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
  	Relaxing this constraint is in progress, but not yet ready
  	for prime time.  These commits were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb.

  2.	Changes to SRCU that allows statically initialized srcu_struct
  	structures.  These commits were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu.

  3.	Restructuring of RCU's debugfs output.  These commits were posted
  	to LKML at https://lkml.org/lkml/2012/10/30/341, and are at
  	branch rcu/tracing.

  4.	Additional CPU-hotplug/RCU improvements, posted to LKML at
  	https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug.
  	Note that the commit eliminating __stop_machine() was judged to
  	be too-high of risk, so is deferred to 3.9.

  5.	Changes to RCU's idle interface, most notably a new module
  	parameter that redirects normal grace-period operations to
  	their expedited equivalents.  These were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle.

  6.	Additional diagnostics for RCU's CPU stall warning facility,
  	posted to LKML at https://lkml.org/lkml/2012/10/30/315, and
  	are at branch rcu/stall.  The most notable change reduces the
  	default RCU CPU stall-warning time from 60 seconds to 21 seconds,
  	so that it once again happens sooner than the softlockup timeout.

  7.	Documentation updates, which were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc.
  	A couple of late-breaking changes were posted at
  	https://lkml.org/lkml/2012/11/16/634 and
  	https://lkml.org/lkml/2012/11/16/547.

  8.	Miscellaneous fixes, which were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/309, along with a late-breaking
  	change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID
  	<20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org
  	seems to have missed.  These are at branch rcu/fixes.

  9.	Finally, a fix for an lockdep-RCU splat was posted to LKML
  	at https://lkml.org/lkml/2012/11/7/486.  This is at rcu/next. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Ingo Molnar
2012-12-03 06:27:05 +01:00
39 changed files with 1484 additions and 619 deletions
+1 -1
View File
@@ -186,7 +186,7 @@ Bibtex Entries
@article{Kung80
,author="H. T. Kung and Q. Lehman"
,title="Concurrent Maintenance of Binary Search Trees"
,title="Concurrent Manipulation of Binary Search Trees"
,Year="1980"
,Month="September"
,journal="ACM Transactions on Database Systems"
+8 -9
View File
@@ -271,15 +271,14 @@ over a rather long period of time, but improvements are always welcome!
The same cautions apply to call_rcu_bh() and call_rcu_sched().
9. All RCU list-traversal primitives, which include
rcu_dereference(), list_for_each_entry_rcu(),
list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
must be either within an RCU read-side critical section or
must be protected by appropriate update-side locks. RCU
read-side critical sections are delimited by rcu_read_lock()
and rcu_read_unlock(), or by similar primitives such as
rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case
the matching rcu_dereference() primitive must be used in order
to keep lockdep happy, in this case, rcu_dereference_bh().
rcu_dereference(), list_for_each_entry_rcu(), and
list_for_each_safe_rcu(), must be either within an RCU read-side
critical section or must be protected by appropriate update-side
locks. RCU read-side critical sections are delimited by
rcu_read_lock() and rcu_read_unlock(), or by similar primitives
such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
case the matching rcu_dereference() primitive must be used in
order to keep lockdep happy, in this case, rcu_dereference_bh().
The reason that it is permissible to use RCU list-traversal
primitives when the update-side lock is held is that doing so
+1 -1
View File
@@ -205,7 +205,7 @@ RCU ("read-copy update") its name. The RCU code is as follows:
audit_copy_rule(&ne->rule, &e->rule);
ne->rule.action = newaction;
ne->rule.file_count = newfield_count;
list_replace_rcu(e, ne);
list_replace_rcu(&e->list, &ne->list);
call_rcu(&e->rcu, audit_free_rule);
return 0;
}
+59 -2
View File
@@ -20,7 +20,7 @@ release_referenced() delete()
{ {
... write_lock(&list_lock);
atomic_dec(&el->rc, relfunc) ...
... delete_element
... remove_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc))
@@ -52,7 +52,7 @@ release_referenced() delete()
{ {
... spin_lock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); delete_element
call_rcu(&el->head, el_free); remove_element
... spin_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
@@ -64,3 +64,60 @@ Sometimes, a reference to the element needs to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases.
It is not always convenient to deal with "FAIL" in the
search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free()
as follows:
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
add_element rcu_read_unlock();
... }
spin_unlock(&list_lock); 4.
} delete()
3. {
release_referenced() spin_lock(&list_lock);
{ ...
... remove_element
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
kfree(el); ...
... call_rcu(&el->head, el_free);
} ...
5. }
void el_free(struct rcu_head *rhp)
{
release_referenced();
}
The key point is that the initial reference added by add() is not removed
until after a grace period has elapsed following removal. This means that
search_and_reference() cannot find this element, which means that the value
of el->rc cannot increase. Thus, once it reaches zero, there are no
readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter.
In cases where delete() can sleep, synchronize_rcu() can be called from
delete(), so that el_free() can be subsumed into delete as follows:
4.
delete()
{
spin_lock(&list_lock);
...
remove_element
spin_unlock(&list_lock);
...
synchronize_rcu();
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
File diff suppressed because it is too large Load Diff
+12 -5
View File
@@ -499,6 +499,8 @@ The foo_reclaim() function might appear as follows:
{
struct foo *fp = container_of(rp, struct foo, rcu);
foo_cleanup(fp->a);
kfree(fp);
}
@@ -521,6 +523,12 @@ o Use call_rcu() -after- removing a data element from an
read-side critical sections that might be referencing that
data item.
If the callback for call_rcu() is not doing anything more than calling
kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
to avoid having to write your own callback:
kfree_rcu(old_fp, rcu);
Again, see checklist.txt for additional rules governing the use of RCU.
@@ -773,8 +781,8 @@ a single atomic update, converting to RCU will require special care.
Also, the presence of synchronize_rcu() means that the RCU version of
delete() can now block. If this is a problem, there is a callback-based
mechanism that never blocks, namely call_rcu(), that can be used in
place of synchronize_rcu().
mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
be used in place of synchronize_rcu().
7. FULL LIST OF RCU APIs
@@ -789,9 +797,7 @@ RCU list traversal:
list_for_each_entry_rcu
hlist_for_each_entry_rcu
hlist_nulls_for_each_entry_rcu
list_for_each_continue_rcu (to be deprecated in favor of new
list_for_each_entry_continue_rcu)
list_for_each_entry_continue_rcu
RCU pointer/list update:
@@ -813,6 +819,7 @@ RCU: Critical sections Grace period Barrier
rcu_read_unlock synchronize_rcu
rcu_dereference synchronize_rcu_expedited
call_rcu
kfree_rcu
bh: Critical sections Grace period Barrier
+21
View File
@@ -2394,6 +2394,27 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/blockdev/ramdisk.txt.
rcu_nocbs= [KNL,BOOT]
In kernels built with CONFIG_RCU_NOCB_CPU=y, set
the specified list of CPUs to be no-callback CPUs.
Invocation of these CPUs' RCU callbacks will
be offloaded to "rcuoN" kthreads created for
that purpose. This reduces OS jitter on the
offloaded CPUs, which can be useful for HPC and
real-time workloads. It can also improve energy
efficiency for asymmetric multiprocessors.
rcu_nocbs_poll [KNL,BOOT]
Rather than requiring that offloaded CPUs
(specified by rcu_nocbs= above) explicitly
awaken the corresponding "rcuoN" kthreads,
make these kthreads poll for callbacks.
This improves the real-time response for the
offloaded CPUs by relieving them of the need to
wake up the corresponding kthread, but degrades
energy efficiency by requiring that the kthreads
periodically wake up to do the polling.
rcutree.blimit= [KNL,BOOT]
Set maximum number of finished RCU callbacks to process
in one batch.
+5 -4
View File
@@ -251,12 +251,13 @@ And there are a number of things that _must_ or _must_not_ be assumed:
And for:
*A = X; Y = *A;
*A = X; *(A + 4) = Y;
we may get either of:
we may get any of:
STORE *A = X; Y = LOAD *A;
STORE *A = Y = X;
STORE *A = X; STORE *(A + 4) = Y;
STORE *(A + 4) = Y; STORE *A = X;
STORE {*A, *(A + 4) } = {X, Y};
=========================
+8 -7
View File
@@ -300,15 +300,16 @@ config SECCOMP_FILTER
See Documentation/prctl/seccomp_filter.txt for details.
config HAVE_RCU_USER_QS
config HAVE_CONTEXT_TRACKING
bool
help
Provide kernel entry/exit hooks necessary for userspace
RCU extended quiescent state. Syscalls need to be wrapped inside
rcu_user_exit()-rcu_user_enter() through the slow path using
TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
are already protected inside rcu_irq_enter/rcu_irq_exit() but
preemption or signal handling on irq exit still need to be protected.
Provide kernel/user boundaries probes necessary for subsystems
that need it, such as userspace RCU extended quiescent state.
Syscalls need to be wrapped inside user_exit()-user_enter() through
the slow path using TIF_NOHZ flag. Exceptions handlers must be
wrapped as well. Irqs are already protected inside
rcu_irq_enter/rcu_irq_exit() but preemption or signal handling on
irq exit still need to be protected.
config HAVE_VIRT_CPU_ACCOUNTING
bool
+1 -1
View File
@@ -648,7 +648,7 @@ static void stack_proc(void *arg)
struct task_struct *from = current, *to = arg;
to->thread.saved_task = from;
rcu_switch(from, to);
rcu_user_hooks_switch(from, to);
switch_to(from, to, from);
}
+1 -1
View File
@@ -106,7 +106,7 @@ config X86
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select HAVE_RCU_USER_QS if X86_64
select HAVE_CONTEXT_TRACKING if X86_64
select HAVE_IRQ_TIME_ACCOUNTING
select GENERIC_KERNEL_THREAD
select GENERIC_KERNEL_EXECVE
@@ -1,27 +1,26 @@
#ifndef _ASM_X86_RCU_H
#define _ASM_X86_RCU_H
#ifndef _ASM_X86_CONTEXT_TRACKING_H
#define _ASM_X86_CONTEXT_TRACKING_H
#ifndef __ASSEMBLY__
#include <linux/rcupdate.h>
#include <linux/context_tracking.h>
#include <asm/ptrace.h>
static inline void exception_enter(struct pt_regs *regs)
{
rcu_user_exit();
user_exit();
}
static inline void exception_exit(struct pt_regs *regs)
{
#ifdef CONFIG_RCU_USER_QS
#ifdef CONFIG_CONTEXT_TRACKING
if (user_mode(regs))
rcu_user_enter();
user_enter();
#endif
}
#else /* __ASSEMBLY__ */
#ifdef CONFIG_RCU_USER_QS
#ifdef CONFIG_CONTEXT_TRACKING
# define SCHEDULE_USER call schedule_user
#else
# define SCHEDULE_USER call schedule
+1 -1
View File
@@ -56,7 +56,7 @@
#include <asm/ftrace.h>
#include <asm/percpu.h>
#include <asm/asm.h>
#include <asm/rcu.h>
#include <asm/context_tracking.h>
#include <asm/smap.h>
#include <linux/err.h>
+4 -3
View File
@@ -23,6 +23,7 @@
#include <linux/hw_breakpoint.h>
#include <linux/rcupdate.h>
#include <linux/module.h>
#include <linux/context_tracking.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
@@ -1491,7 +1492,7 @@ long syscall_trace_enter(struct pt_regs *regs)
{
long ret = 0;
rcu_user_exit();
user_exit();
/*
* If we stepped into a sysenter/syscall insn, it trapped in
@@ -1546,7 +1547,7 @@ void syscall_trace_leave(struct pt_regs *regs)
* or do_notify_resume(), in which case we can be in RCU
* user mode.
*/
rcu_user_exit();
user_exit();
audit_syscall_exit(regs);
@@ -1564,5 +1565,5 @@ void syscall_trace_leave(struct pt_regs *regs)
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
rcu_user_enter();
user_enter();
}
+3 -2
View File
@@ -22,6 +22,7 @@
#include <linux/uaccess.h>
#include <linux/user-return-notifier.h>
#include <linux/uprobes.h>
#include <linux/context_tracking.h>
#include <asm/processor.h>
#include <asm/ucontext.h>
@@ -816,7 +817,7 @@ static void do_signal(struct pt_regs *regs)
void
do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
{
rcu_user_exit();
user_exit();
#ifdef CONFIG_X86_MCE
/* notify userspace of pending MCEs */
@@ -838,7 +839,7 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
if (thread_info_flags & _TIF_USER_RETURN_NOTIFY)
fire_user_return_notifiers();
rcu_user_enter();
user_enter();
}
void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
+1 -1
View File
@@ -55,7 +55,7 @@
#include <asm/i387.h>
#include <asm/fpu-internal.h>
#include <asm/mce.h>
#include <asm/rcu.h>
#include <asm/context_tracking.h>
#include <asm/mach_traps.h>
+1 -1
View File
@@ -18,7 +18,7 @@
#include <asm/pgalloc.h> /* pgd_*(), ... */
#include <asm/kmemcheck.h> /* kmemcheck_*(), ... */
#include <asm/fixmap.h> /* VSYSCALL_START */
#include <asm/rcu.h> /* exception_enter(), ... */
#include <asm/context_tracking.h> /* exception_enter(), ... */
/*
* Page fault error code bits:
+18
View File
@@ -0,0 +1,18 @@
#ifndef _LINUX_CONTEXT_TRACKING_H
#define _LINUX_CONTEXT_TRACKING_H
#ifdef CONFIG_CONTEXT_TRACKING
#include <linux/sched.h>
extern void user_enter(void);
extern void user_exit(void);
extern void context_tracking_task_switch(struct task_struct *prev,
struct task_struct *next);
#else
static inline void user_enter(void) { }
static inline void user_exit(void) { }
static inline void context_tracking_task_switch(struct task_struct *prev,
struct task_struct *next) { }
#endif /* !CONFIG_CONTEXT_TRACKING */
#endif
-17
View File
@@ -286,23 +286,6 @@ static inline void list_splice_init_rcu(struct list_head *list,
&pos->member != (head); \
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
/**
* list_for_each_continue_rcu
* @pos: the &struct list_head to use as a loop cursor.
* @head: the head for your list.
*
* Iterate over an rcu-protected list, continuing after current point.
*
* This list-traversal primitive may safely run concurrently with
* the _rcu list-mutation primitives such as list_add_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_continue_rcu(pos, head) \
for ((pos) = rcu_dereference_raw(list_next_rcu(pos)); \
(pos) != (head); \
(pos) = rcu_dereference_raw(list_next_rcu(pos)))
/**
* list_for_each_entry_continue_rcu - continue iteration over list of given type
* @pos: the type * to use as a loop cursor.
+27 -2
View File
@@ -90,6 +90,25 @@ extern void do_trace_rcu_torture_read(char *rcutorturename,
* that started after call_rcu() was invoked. RCU read-side critical
* sections are delimited by rcu_read_lock() and rcu_read_unlock(),
* and may be nested.
*
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing RCU read-side critical section. On systems with more
* than one CPU, this means that when "func()" is invoked, each CPU is
* guaranteed to have executed a full memory barrier since the end of its
* last RCU read-side critical section whose beginning preceded the call
* to call_rcu(). It also means that each CPU executing an RCU read-side
* critical section that continues beyond the start of "func()" must have
* executed a memory barrier after the call_rcu() but before the beginning
* of that RCU read-side critical section. Note that these guarantees
* include CPUs that are offline, idle, or executing in user mode, as
* well as CPUs that are executing in the kernel.
*
* Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
* resulting RCU callback function "func()", then both CPU A and CPU B are
* guaranteed to execute a full memory barrier during the time interval
* between the call to call_rcu() and the invocation of "func()" -- even
* if CPU A and CPU B are the same CPU (but again only if the system has
* more than one CPU).
*/
extern void call_rcu(struct rcu_head *head,
void (*func)(struct rcu_head *head));
@@ -118,6 +137,9 @@ extern void call_rcu(struct rcu_head *head,
* OR
* - rcu_read_lock_bh() and rcu_read_unlock_bh(), if in process context.
* These may be nested.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
*/
extern void call_rcu_bh(struct rcu_head *head,
void (*func)(struct rcu_head *head));
@@ -137,6 +159,9 @@ extern void call_rcu_bh(struct rcu_head *head,
* OR
* anything that disables preemption.
* These may be nested.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
*/
extern void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));
@@ -197,13 +222,13 @@ extern void rcu_user_enter(void);
extern void rcu_user_exit(void);
extern void rcu_user_enter_after_irq(void);
extern void rcu_user_exit_after_irq(void);
extern void rcu_user_hooks_switch(struct task_struct *prev,
struct task_struct *next);
#else
static inline void rcu_user_enter(void) { }
static inline void rcu_user_exit(void) { }
static inline void rcu_user_enter_after_irq(void) { }
static inline void rcu_user_exit_after_irq(void) { }
static inline void rcu_user_hooks_switch(struct task_struct *prev,
struct task_struct *next) { }
#endif /* CONFIG_RCU_USER_QS */
extern void exit_rcu(void);

Some files were not shown because too many files have changed in this diff Show More