Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
doctorture.2013.01.11a: Changes to rcutorture and to RCU documentation.
fixes.2013.01.26a: Miscellaneous fixes.
tagcb.2013.01.24a: Tag RCU callbacks with grace-period number to
simplify callback advancement.
tiny.2013.01.29b: Enhancements to uniprocessor handling in tiny RCU.
Currently, rcutorture traces every read-side access. This can be
problematic because even a two-minute rcutorture run on a two-CPU system
can generate 28,853,363 reads. Normally, only a failing read is of
interest, so this commit traces adjusts rcutorture's tracing to only
trace failing reads. The resulting event tracing records the time
and the ->completed value captured at the beginning of the RCU read-side
critical section, allowing correlation with other event-tracing messages.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ paulmck: Add fix to build problem located by Randy Dunlap based on
diagnosis by Steven Rostedt. ]
Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.
This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.
We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: fix whitespace error and email address. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
urgent.2012.10.27a: Fix for RCU user-mode transition (already in -tip).
doc.2012.11.08a: Documentation updates, most notably codifying the
memory-barrier guarantees inherent to grace periods.
fixes.2012.11.13a: Miscellaneous fixes.
srcu.2012.10.27a: Allow statically allocated and initialized srcu_struct
structures (courtesy of Lai Jiangshan).
stall.2012.11.13a: Add more diagnostic information to RCU CPU stall
warnings, also decrease from 60 seconds to 21 seconds.
hotplug.2012.11.08a: Minor updates to CPU hotplug handling.
tracing.2012.11.08a: Improved debugfs tracing, courtesy of Michael Wang.
idle.2012.10.24a: Updates to RCU idle/adaptive-idle handling, including
a boot parameter that maps normal grace periods to expedited.
Resolved conflict in kernel/rcutree.c due to side-by-side change.
This commit explicitly states the memory-ordering properties of the
RCU grace-period primitives. Although these properties were in some
sense implied by the fundmental property of RCU ("a grace period must
wait for all pre-existing RCU read-side critical sections to complete"),
stating it explicitly will be a great labor-saving device.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
In some cases, it is necessary to enter or exit userspace-RCU-idle mode
from an interrupt handler, for example, if some other CPU sends this
CPU a resched IPI. In this case, the current CPU would enter the IPI
handler in userspace-RCU-idle mode, but would need to exit the IPI handler
after having exited that mode.
To allow this to work, this commit adds two new APIs to TREE_RCU:
- rcu_user_enter_after_irq(). This must be called from an interrupt between
rcu_irq_enter() and rcu_irq_exit(). After the irq calls rcu_irq_exit(),
the irq handler will return into an RCU extended quiescent state.
In theory, this interrupt is never a nested interrupt, but in practice
it might interrupt softirq, which looks to RCU like a nested interrupt.
- rcu_user_exit_after_irq(). This must be called from a non-nesting
interrupt, interrupting an RCU extended quiescent state, also
between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
state.
[ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
RCU currently insists that only idle tasks can enter RCU idle mode, which
prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
would mean that usermode execution would always take scheduling-clock
interrupts, even when there is only one task runnable on the CPU in
question.
This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
allow non-idle tasks to enter RCU idle mode. These are quite similar
to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
omit the idle-task checks.
[ Updated to use "user" flag rather than separate check functions. ]
[ paulmck: Updated to drop exports of new functions based on Josh's patch
getting rid of the need for them. ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
There is a need to use RCU from interrupt context, but either before
rcu_irq_enter() is called or after rcu_irq_exit() is called. If the
interrupt occurs from idle, then lockdep-RCU will complain about such
uses, as they appear to be illegal uses of RCU from the idle loop.
In other environments, RCU_NONIDLE() could be used to properly protect
the use of RCU, but RCU_NONIDLE() currently cannot be invoked except
from process context.
This commit therefore modifies RCU_NONIDLE() to permit its use more
globally.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Although the C language allows you to break strings across lines, doing
this makes it hard for people to find the Linux kernel code corresponding
to a given console message. This commit therefore fixes broken strings
throughout RCU's source code.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
bigrtm: First steps towards getting RCU out of the way of
tens-of-microseconds real-time response on systems compiled
with NR_CPUS=4096. Also cleanups for and increased concurrency
of rcu_barrier() family of primitives.
doctorture: rcutorture and documentation improvements.
fixes: Miscellaneous fixes.
fnh: RCU_FAST_NO_HZ fixes and improvements.
Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
created a macro out of an inline function in order to avoid build
breakage for certain combinations of gcc flags. Unfortunately, it also
converted a kfree_call_rcu() to a call_rcu(), which made the rcu_data
structure's ->qlen_lazy field lose counts. This commit therefore changes
the call_rcu() back to kfree_call_rcu().
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Although __call_rcu() is handled correctly when called from a momentary
non-idle period, if it is called on a CPU that RCU believes to be idle
on RCU_FAST_NO_HZ kernels, the callback might be indefinitely postponed.
This commit therefore ensures that RCU is aware of the new callback and
has a chance to force the CPU out of dyntick-idle mode when a new callback
is posted.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
added cpp macro versions of __kfree_rcu() and __is_kfree_rcu_offset(),
but failed to remove the old inline-function versions. This commit does
this cleanup.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
__rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
consolidates them into kernel/rcupdate.h.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The return value from rcu_assign_pointer() is not used, and using it
would be quite ugly, for example:
q = rcu_assign_pointer(global_p, p);
To prevent this sort of ugliness from spreading, this commit wraps
rcu_assign_pointer() in a do-while loop.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The return value from RCU_INIT_POINTER() is not used, and using it
would be quite ugly, for example:
q = RCU_INIT_POINTER(global_p, p);
To prevent this sort of ugliness from appearing, this commit wraps
RCU_INIT_POINTER() in a do-while loop.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Howells <dhowells@redhat.com>
RCU_INIT_POINTER() returns a value that is never used, and which should
be abolished due to terminal ugliness:
q = RCU_INIT_POINTER(global_p, p);
However, there are two uses that cannot be handled by a do-while
formulation because they do gcc-style initialization:
RCU_INIT_POINTER(.real_cred, &init_cred),
RCU_INIT_POINTER(.cred, &init_cred),
This usage is clever, but not necessarily the nicest approach.
This commit therefore creates an RCU_POINTER_INITIALIZER() macro that
is specifically designed for gcc-style initialization.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
This reverts commit 616c310e83.
(Move PREEMPT_RCU preemption to switch_to() invocation).
Testing by Sasha Levin <levinsasha928@gmail.com> showed that this
can result in deadlock due to invoking the scheduler when one of
the runqueue locks is held. Because this commit was simply a
performance optimization, revert it.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of
the system. This branch also includes improvements to
RCU_FAST_NO_HZ, which are included here due to conflicts.
fixes: Miscellaneous fixes.
inline: Remaining changes from an abortive attempt to inline
preemptible RCU's __rcu_read_lock(). These are (1) making
exit_rcu() avoid unnecessary work and (2) avoiding having
preemptible RCU record a blocked thread when the scheduler
declines to do a context switch.
srcu: Lai Jiangshan's algorithmic implementation of SRCU, including
call_srcu().
When running preemptible RCU, if a task exits in an RCU read-side
critical section having blocked within that same RCU read-side critical
section, the task must be removed from the list of tasks blocking a
grace period (perhaps the current grace period, perhaps the next grace
period, depending on timing). The exit() path invokes exit_rcu() to
do this cleanup.
However, the current implementation of exit_rcu() needlessly does the
cleanup even if the task did not block within the current RCU read-side
critical section, which wastes time and needlessly increases the size
of the state space. Fix this by only doing the cleanup if the current
task is actually on the list of tasks blocking some grace period.
While we are at it, consolidate the two identical exit_rcu() functions
into a single function.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
kernel/rcupdate.c