Now that the kfree_rcu() special-casing has been removed from tree RCU,
this commit removes kfree_call_rcu_nobatch() since it is no longer needed.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Recently a discussion about stability and performance of a system
involving a high rate of kfree_rcu() calls surfaced on the list [1]
which led to another discussion how to prepare for this situation.
This patch adds basic batching support for kfree_rcu(). It is "basic"
because we do none of the slab management, dynamic allocation, code
moving or any of the other things, some of which previous attempts did
[2]. These fancier improvements can be follow-up patches and there are
different ideas being discussed in those regards. This is an effort to
start simple, and build up from there. In the future, an extension to
use kfree_bulk and possibly per-slab batching could be done to further
improve performance due to cache-locality and slab-specific bulk free
optimizations. By using an array of pointers, the worker thread
processing the work would need to read lesser data since it does not
need to deal with large rcu_head(s) any longer.
Torture tests follow in the next patch and show improvements of around
5x reduction in number of grace periods on a 16 CPU system. More
details and test data are in that patch.
There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.
Another implication is higher active memory usage (although not
run-away..) until the kfree_rcu() flooding ends, in comparison to
without batching. More details about this are in the second patch which
adds an rcuperf test.
Finally, in the near future we will get rid of kfree_rcu() special casing
within RCU such as in rcu_do_batch and switch everything to just
batching. Currently we don't do that since timer subsystem is not yet up
and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
lock are not initialized. That would also mean getting rid of
kfree_call_rcu_nobatch() entirely.
[1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org
[2] https://lkml.org/lkml/2017/12/19/824
Cc: kernel-team@android.com
Cc: kernel-team@lge.com
Co-developed-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
[ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
[ paulmck: Make it work during early boot. ]
[ paulmck: Add a crude early boot self-test. ]
[ paulmck: Style adjustments and experimental docbook structure header. ]
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.9999.1908161931110.32497@viisi.sifive.com/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU
CPU stall warning. This can be misleading because what is instead needed
is information on whatever task is blocking multi_cpu_stop(). This commit
therefore inserts an RCU quiescent state into the multi_cpu_stop()
function's waitloop.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Update .h SPDX format per Joe Perches. ]
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Now that rcu_all_qs() is used only in !PREEMPT builds, move it to
tree_plugin.h so that it is defined only in those builds. This in
turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT
builds, but it is simply marked __maybe_unused in order to keep it
near the rest of the dyntick-idle code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit saves a few lines by consolidating the RCU-sched function
definitions at the end of include/linux/rcupdate.h. This consolidation
also makes it easier to remove them all when the time comes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit saves a few lines by consolidating the RCU-bh function
definitions at the end of include/linux/rcupdate.h. This consolidation
also makes it easier to remove them all when the time comes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit renames Tiny RCU functions so that the lowest level of
functionality is RCU (e.g., synchronize_rcu()) rather than RCU-sched
(e.g., synchronize_sched()). This provides greater naming compatibility
with Tree RCU, which will in turn permit more LoC removal once
the RCU-sched and RCU-bh update-side API is removed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix Tiny call_rcu()'s EXPORT_SYMBOL() in response to a bug
report from kbuild test robot. ]
Now that RCU-preempt knows about preemption disabling, its implementation
of synchronize_rcu() works for synchronize_sched(), and likewise for the
other RCU-sched update-side API members. This commit therefore confines
the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines
RCU-sched's update-side API members in terms of those of RCU-preempt.
This means that any given build of the Linux kernel has only one
update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds
and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels
built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
This commit updates comments and help text to account for the fact that
RCU-bh update-side functions are now simple wrappers for their RCU or
RCU-sched counterparts.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.
In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks. One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.
This must be done carefully. For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.
This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem. This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs(). The latter call handles
any deferred quiescent states.
Note that __do_softirq() still invokes rcu_bh_qs(). It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.
RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.
Fix this by initializing RCU way early in the MTRR case.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
Because rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
inaccessible outside of the RCU implementation. This patch therefore
moves the rcu_eqs_special_set() declaration to include/linux/rcutree.h,
which allows it to be used in non-rcu kernel code.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
rcu_irq_enter() and rcu_irq_exit() may be used from any context. There is
thus no need for rcu_irq_enter_disabled() and for the checks using it.
This commit therefore eliminates rcu_irq_enter_disabled().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes a #ifdef and saves a few lines of code by moving
the rcu_end_inkernel_boot() function from include/linux/rcupdate.h to
include/linux/rcutiny.h (for TINY_RCU) and to include/linux/rcutree.h
(for TREE_RCU).
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_request_urgent_qs_task() function is used only within RCU,
so there is no point in exporting it to the rest of the kernel from
nclude/linux/rcutiny.h and include/linux/rcutree.h. This commit therefore
moves this function to kernel/rcu/rcu.h.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The various functions similar to rcu_batches_started(), the
function show_rcu_gp_kthreads(), the various functions similar to
rcu_force_quiescent_state(), and the variables rcutorture_testseq and
rcutorture_vernum are used only within RCU. There is therefore no point
in exporting them to the kernel at large from include/linux/rcutiny.h
and include/linux/rcutree.h. This commit therefore moves all of these
to kernel/rcu/rcu.h.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself. There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, a call to schedule() acts as a Tasks RCU quiescent state
only if a context switch actually takes place. However, just the
call to schedule() guarantees that the calling task has moved off of
whatever tracing trampoline that it might have been one previously.
This commit therefore plumbs schedule()'s "preempt" parameter into
rcu_note_context_switch(), which then records the Tasks RCU quiescent
state, but only if this call to schedule() was -not- due to a preemption.
To avoid adding overhead to the common-case context-switch path,
this commit hides the rcu_note_context_switch() check under an existing
non-common-case check.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit provides rcu_exp_batches_completed() and
rcu_exp_batches_completed_sched() functions to allow torture-test modules
to check how many expedited grace period batches have completed.
These are analogous to the existing rcu_batches_completed(),
rcu_batches_completed_bh(), and rcu_batches_completed_sched() functions.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit replaces a local_irq_save()/local_irq_restore() pair with
a lockdep assertion that interrupts are already disabled. This should
remove the corresponding overhead from the interrupt entry/exit fastpaths.
This change was inspired by the fact that Iftekhar Ahmed's mutation
testing showed that removing rcu_irq_enter()'s call to local_ird_restore()
had no effect, which might indicate that interrupts were always enabled
anyway.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal. Especially given that there are regions of the scheduler that
already have interrupts disabled.
This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
by Peter Zijlstra. ]