This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Updated to include kernel/torture.c as suggested by Jason Low. ]
This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
that emulates a very early boot rcu_expedite_gp(). A late-boot
call to rcu_end_inkernel_boot() will provide the corresponding
rcu_unexpedite_gp(). The late-boot call to rcu_end_inkernel_boot()
should be made just before init is spawned.
According to Arjan:
> To show the boot time, I'm using the timestamp of the "Write protecting"
> line, that's pretty much the last thing we print prior to ring 3 execution.
>
> A kernel with default RCU behavior (inside KVM, only virtual devices)
> looks like this:
>
> [ 0.038724] Write protecting the kernel read-only data: 10240k
>
> a kernel with expedited RCU (using the command line option, so that I
> don't have to recompile between measurements and thus am completely
> oranges-to-oranges)
>
> [ 0.031768] Write protecting the kernel read-only data: 10240k
>
> which, in percentage, is an 18% improvement.
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Arjan van de Ven <arjan@linux.intel.com>
Currently, expediting of normal synchronous grace-period primitives
(synchronize_rcu() and friends) is controlled by the rcu_expedited()
boot/sysfs parameter. This works well, but does not handle nesting.
This commit therefore provides rcu_expedite_gp() to enable expediting
and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
which support nesting.
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There are currently duplicate identical definitions of the
rcu_synchronize() structure and the wakeme_after_rcu() function.
Thie commit therefore consolidates them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit affines rcu_tasks_kthread() to the housekeeping CPUs
in CONFIG_NO_HZ_FULL builds. This is just a default, so systems
administrators are free to put this kthread somewhere else if they wish.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Add early boot self tests for RCU under CONFIG_PROVE_RCU.
Currently the only test is adding a dummy callback which increments a counter
which we then later verify after calling rcu_barrier*().
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
and uses PREEMPT_RCU config option in its place.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_preempt_note_context_switch() function is on a scheduling fast
path, so it would be good to avoid disabling irqs. The reason that irqs
are disabled is to synchronize process-level and irq-handler access to
the task_struct ->rcu_read_unlock_special bitmask. This commit therefore
makes ->rcu_read_unlock_special instead be a union of bools with a short
allowing single-access checks in RCU's __rcu_read_unlock(). This results
in the process-level and irq-handler accesses being simple loads and
stores, so that irqs need no longer be disabled. This commit therefore
removes the irq disabling from rcu_preempt_note_context_switch().
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The grace-period-wait loop in rcu_tasks_kthread() is under (unnecessary)
RCU protection, and therefore has no preemption points in a PREEMPT=n
kernel. This commit therefore removes the RCU protection and inserts
cond_resched().
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
usermode execution. There would be neither a context switch nor a
scheduling-clock interrupt to tell TASKS_RCU that the task in question
had passed through a quiescent state. The grace period would therefore
extend indefinitely. This commit therefore makes RCU's dyntick-idle
subsystem record the task_struct structure of the task that is running
in dyntick-idle mode on each CPU. The TASKS_RCU grace period can
then access this information and record a quiescent state on
behalf of any CPU running in dyntick-idle usermode.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It is expected that many sites will have CONFIG_TASKS_RCU=y, but
will never actually invoke call_rcu_tasks(). For such sites, creating
rcu_tasks_kthread() at boot is wasteful. This commit therefore defers
creation of this kthread until the time of the first call_rcu_tasks().
This of course means that the first call_rcu_tasks() must be invoked
from process context after the scheduler is fully operational.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The current RCU-tasks implementation uses strict polling to detect
callback arrivals. This works quite well, but is not so good for
energy efficiency. This commit therefore replaces the strict polling
with a wait queue.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a ten-minute RCU-tasks stall warning. The actual
time is controlled by the boot/sysfs parameter rcu_task_stall_timeout,
with values less than or equal to zero disabling the stall warnings.
The default value is ten minutes, which means that the tasks that have
not yet responded will get their stacks dumped every ten minutes, until
they pass through a voluntary context switch.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit exports the RCU-tasks synchronous APIs,
synchronize_rcu_tasks() and rcu_barrier_tasks(), to
GPL-licensed kernel modules.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Once a task has passed exit_notify() in the do_exit() code path, it
is no longer on the task lists, and is therefore no longer visible
to rcu_tasks_kthread(). This means that an almost-exited task might
be preempted while within a trampoline, and this task won't be waited
on by rcu_tasks_kthread(). This commit fixes this bug by adding an
srcu_struct. An exiting task does srcu_read_lock() just before calling
exit_notify(), and does the corresponding srcu_read_unlock() after
doing the final preempt_disable(). This means that rcu_tasks_kthread()
can do synchronize_srcu() to wait for all mostly-exited tasks to reach
their final preempt_disable() region, and then use synchronize_sched()
to wait for those tasks to finish exiting.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It turns out to be easier to add the synchronous grace-period waiting
functions to RCU-tasks than to work around their absense in rcutorture,
so this commit adds them. The key point is that the existence of
call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds a new RCU-tasks flavor of RCU, which provides
call_rcu_tasks(). This RCU flavor's quiescent states are voluntary
context switch (not preemption!) and userspace execution (not the idle
loop -- use some sort of schedule_on_each_cpu() if you need to handle the
idle tasks. Note that unlike other RCU flavors, these quiescent states
occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt.
This RCU flavor is assumed to have very infrequent latency-tolerant
updaters. This assumption permits significant simplifications, including
a single global callback list protected by a single global lock, along
with a single task-private linked list containing all tasks that have not
yet passed through a quiescent state. If experience shows this assumption
to be incorrect, the required additional complexity will be added.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit uninlines rcu_read_lock_held(). According to "size vmlinux"
this saves 28549 in .text:
- 5541731 3014560 14757888 23314179
+ 5513182 3026848 14757888 23297918
Note: it looks as if the data grows by 12288 bytes but this is not true,
it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and
since .text shrinks the padding grows, and thus .data grows too as it
seen by /bin/size. diff System.map:
- ffffffff81510000 D _sdata
- ffffffff81510000 D init_thread_union
+ ffffffff81509000 D _sdata
+ ffffffff8150c000 D init_thread_union
Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size
can't "wrongly" report that .data grows if .text shinks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
effective at finding race conditions, so this commit removes it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Remove definition and uses as noted by Paul Bolle. ]
Commit ac1bea8578 (Make cond_resched() report RCU quiescent states)
fixed a problem where a CPU looping in the kernel with but one runnable
task would give RCU CPU stall warnings, even if the in-kernel loop
contained cond_resched() calls. Unfortunately, in so doing, it introduced
performance regressions in Anton Blanchard's will-it-scale "open1" test.
The problem appears to be not so much the increased cond_resched() path
length as an increase in the rate at which grace periods complete, which
increased per-update grace-period overhead.
This commit takes a different approach to fixing this bug, mainly by
moving the RCU-visible quiescent state from cond_resched() to
rcu_note_context_switch(), and by further reducing the check to a
simple non-zero test of a single per-CPU variable. However, this
approach requires that the force-quiescent-state processing send
resched IPIs to the offending CPUs. These will be sent only once
the grace period has reached an age specified by the boot/sysfs
parameter rcutree.jiffies_till_sched_qs, or once the grace period
reaches an age halfway to the point at which RCU CPU stall warnings
will be emitted, whichever comes first.
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
[ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
ktest build robot. Also fixed smp_mb() comment as noted by
Oleg Nesterov. ]
Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks. If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.
This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Looks-good-to: Christoph Lameter <cl@linux.com>