Commit Graph

48 Commits

Author SHA1 Message Date
Paul E. McKenney 52d7e48b86 rcu: Narrow early boot window of illegal synchronous grace periods
The current preemptible RCU implementation goes through three phases
during bootup.  In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running.  During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending).  In the third and final phase, RCU is fully operational and
everything works normally.

This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This code worked "by accident" for awhile, but started failing as soon
as expedited RCU grace periods switched over to workqueues in commit
8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue").
Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter).  The callchain from the failure case is as follows:

early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited

The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.

This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase.  This commit is therefore a
fix to a regression introduced in v4.9, and is therefore being put
forward post-merge-window in v4.10.

This commit sets a flag from the existing rcu_scheduler_starting()
function which causes all synchronous grace periods to take the expedited
path.  The expedited path now checks this flag, using the requesting task
to drive the expedited grace period forward during the mid-boot phase.
Finally, this flag is updated by a core_initcall() function named
rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.

Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.

Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Stan Kain <stan.kain@gmail.com>
Tested-by: Ivan <waffolz@hotmail.com>
Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
Tested-by: Bruno Pesavento <bpesavento@infinito.it>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Frederic Bezies <fredbezies@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
2017-01-14 21:23:48 -08:00
Paul Gortmaker e77b704125 rcu: Don't use modular infrastructure in non-modular code
The Kconfig currently controlling compilation of tree.c is:

init/Kconfig:config TREE_RCU
init/Kconfig:   bool

...and update.c and sync.c are "obj-y" meaning that none are ever
built as a module by anyone.

Since MODULE_ALIAS is a no-op for non-modular code, we can remove
them from these files.

We leave moduleparam.h behind since the files instantiate some boot
time configuration parameters with module_param() still.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-08-22 09:35:27 -07:00
Paul E. McKenney 4929c913bd rcu: Make call_rcu_tasks() tolerate first call with irqs disabled
Currently, if the very first call to call_rcu_tasks() has irqs disabled,
it will create the rcu_tasks_kthread with irqs disabled, which will
result in a splat in the memory allocator, which kthread_run() invokes
with the expectation that irqs are enabled.

This commit fixes this problem by deferring kthread creation if called
with irqs disabled.  The first call to call_rcu_tasks() that has irqs
enabled will create the kthread.

This bug was detected by rcutorture changes that were motivated by
Iftekhar Ahmed's mutation-testing efforts.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-06-15 15:45:00 -07:00
Du, Changbin b9fdac7f66 debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is
tracked in the object tracker.  If it is a non-static object then the
activation is illegal.

In previous implementation, each subsystem need take care of this in
their fixup callbacks.  Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.

To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not.  If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.

This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.

At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked.  Then Change
such object can overwrite someone's memory and cause unexpected
behaviour.  For example, the timer_fixup_activate bind timer to function
stub_timer.

Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
  Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Du, Changbin 3263d28eb5 rcu: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).

Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Boqun Feng 293e2421fe rcu: Remove superfluous versions of rcu_read_lock_sched_held()
Currently, we have four versions of rcu_read_lock_sched_held(), depending
on the combined choices on PREEMPT_COUNT and DEBUG_LOCK_ALLOC.  However,
there is an existing function preemptible() that already distinguishes
between the PREEMPT_COUNT=y and PREEMPT_COUNT=n cases, and allows these
four implementations to be consolidated down to two.

This commit therefore uses preemptible() to achieve this consolidation.

Note that there could be a small performance regression in the case
of CONFIG_DEBUG_LOCK_ALLOC=y && PREEMPT_COUNT=n.  However, given the
overhead associated with CONFIG_DEBUG_LOCK_ALLOC=y, this should be
down in the noise.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-03-31 13:34:50 -07:00
Paul E. McKenney 4f2a848c56 rcu: Export rcu_gp_is_normal()
This commit exports rcu_gp_is_normal() in order to allow it to be used
by rcutorture and rcuperf.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-02-23 20:04:51 -08:00
Paul E. McKenney 79cfea0273 rcu: Remove TINY_RCU bloat from pointless boot parameters
The rcu_expedited, rcu_normal, and rcu_normal_after_boot kernel boot
parameters are pointless in the case of TINY_RCU because in that case
synchronous grace periods, both expedited and normal, are no-ops.
However, these three symbols contribute several hundred bytes of bloat.
This commit therefore uses CPP directives to avoid compiling this code
in TINY_RCU kernels.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-12-07 16:59:37 -08:00
Paul E. McKenney 3e42ec1aa7 rcu: Allow expedited grace periods to be disabled at init
Expedited grace periods can speed up boot, but are undesirable in
aggressive real-time systems.  This commit therefore introduces a
kernel parameter rcupdate.rcu_normal_after_boot that disables
expedited grace periods just before init is spawned.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:54 -08:00
Paul E. McKenney 5a9be7c628 rcu: Add rcu_normal kernel parameter to suppress expediting
Although expedited grace periods can be quite useful, and although their
OS jitter has been greatly reduced, they can still pose problems for
extreme real-time workloads.  This commit therefore adds a rcu_normal
kernel boot parameter (which can also be manipulated via sysfs)
to suppress expedited grace periods, that is, to treat requests for
expedited grace periods as if they were requests for normal grace periods.
If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
This means that if you are relying on expedited grace periods to speed up
boot, you will want to specify rcu_expedited on the kernel command line,
and then specify rcu_normal via sysfs once boot completes.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-12-04 12:26:53 -08:00
Boqun Feng b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00
Paul E. McKenney a76a9a485d rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
The RCU_LOCKDEP_WARN() in synchronize_rcu_tasks() triggers if the
scheduler is active, which is backwards.  This commit therefore
negates the test.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:33 -07:00
Paul E. McKenney f78f5b90c4 rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2015-07-22 15:27:32 -07:00
Paul E. McKenney ec90a194ae rcu: Create a synchronize_rcu_mult()
There have been several requests for a primitive that waits for
grace periods for several RCU flavors concurrently, so this
commit creates it.  This is a variadic macro, and you pass in
the call_rcu() functions of the flavors of RCU that you wish to
wait for.

Note that you cannot pass in call_srcu() for two reasons: (1) This
would result in a type mismatch and (2) You need to specify which
srcu_struct you want to use.  Handle this by creating a wrapper
function for your SRCU domain, for example:

	void call_srcu_mine(struct rcu_head *head, rcu_callback_t func)
	{
		call_srcu(&ss_mine, head, func);
	}

You can then do something like this:

	synchronize_rcu_mult(call_srcu_mine, call_rcu, call_rcu_sched);

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-22 15:27:29 -07:00
Denys Vlasenko d5671f6bf2 rcu: Deinline rcu_read_lock_sched_held() if DEBUG_LOCK_ALLOC
DEBUG_LOCK_ALLOC=y is not a production setting, but it is
not very unusual either. Many developers routinely
use kernels built with it enabled.

Apart from being selected by hand, it is also auto-selected by
PROVE_LOCKING "Lock debugging: prove locking correctness" and
LOCK_STAT "Lock usage statistics" config options.
LOCK STAT is necessary for "perf lock" to work.

I wouldn't spend too much time optimizing it, but this particular
function has a very large cost in code size: when it is deinlined,
code size decreases by 830,000 bytes:

    text     data      bss       dec     hex filename
85674192 22294776 20627456 128596424 7aa39c8 vmlinux.before
84837612 22294424 20627456 127759492 79d7484 vmlinux

(with this config: http://busybox.net/~vda/kernel_config)

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Tejun Heo <tj@kernel.org>
CC: Oleg Nesterov <oleg@redhat.com>
CC: linux-kernel@vger.kernel.org
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-07-15 14:43:52 -07:00
Paul E. McKenney 7d0ae8086b rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck:  Updated to include kernel/torture.c as suggested by Jason Low. ]
2015-05-27 12:56:15 -07:00
Paul E. McKenney 42528795ac Merge branches 'doc.2015.02.26a', 'earlycb.2015.03.03a', 'fixes.2015.03.03a', 'gpexp.2015.02.26a', 'hotplug.2015.03.20a', 'sysidle.2015.02.26b' and 'tiny.2015.02.26a' into HEAD
doc.2015.02.26a:  Documentation changes
earlycb.2015.03.03a:  Permit early-boot RCU callbacks
fixes.2015.03.03a:  Miscellaneous fixes
gpexp.2015.02.26a:  In-kernel expediting of normal grace periods
hotplug.2015.03.20a:  CPU hotplug fixes
sysidle.2015.02.26b:  NO_HZ_FULL_SYSIDLE fixes
tiny.2015.02.26a:  TINY_RCU fixes
2015-03-20 08:31:01 -07:00
Paul E. McKenney ee42571f43 rcu: Add Kconfig option to expedite grace periods during boot
This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
that emulates a very early boot rcu_expedite_gp().  A late-boot
call to rcu_end_inkernel_boot() will provide the corresponding
rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
should be made just before init is spawned.

According to Arjan:

> To show the boot time, I'm using the timestamp of the "Write protecting"
> line, that's pretty much the last thing we print prior to ring 3 execution.
>
> A kernel with default RCU behavior (inside KVM, only virtual devices)
> looks like this:
>
> [    0.038724] Write protecting the kernel read-only data: 10240k
>
> a kernel with expedited RCU (using the command line option, so that I
> don't have to recompile between measurements and thus am completely
> oranges-to-oranges)
>
> [    0.031768] Write protecting the kernel read-only data: 10240k
>
> which, in percentage, is an 18% improvement.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Arjan van de Ven <arjan@linux.intel.com>
2015-02-26 12:03:03 -08:00
Paul E. McKenney 0d39482c3d rcu: Provide rcu_expedite_gp() and rcu_unexpedite_gp()
Currently, expediting of normal synchronous grace-period primitives
(synchronize_rcu() and friends) is controlled by the rcu_expedited()
boot/sysfs parameter.  This works well, but does not handle nesting.
This commit therefore provides rcu_expedite_gp() to enable expediting
and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
which support nesting.

Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-26 12:02:59 -08:00
Paul E. McKenney ee376dbdf2 rcu: Consolidate rcu_synchronize and wakeme_after_rcu()
There are currently duplicate identical definitions of the
rcu_synchronize() structure and the wakeme_after_rcu() function.
Thie commit therefore consolidates them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-02-25 17:03:03 -08:00
Paul E. McKenney 9ea6c58856 Merge branches 'torture.2014.11.03a', 'cpu.2014.11.03a', 'doc.2014.11.13a', 'fixes.2014.11.13a', 'signal.2014.10.29a' and 'rt.2014.10.29a' into HEAD
cpu.2014.11.03a: Changes for per-CPU variables.
doc.2014.11.13a: Documentation updates.
fixes.2014.11.13a: Miscellaneous fixes.
signal.2014.10.29a: Signal changes.
rt.2014.10.29a: Real-time changes.
torture.2014.11.03a: torture-test changes.
2014-11-13 10:39:04 -08:00
Paul E. McKenney 60ced4950c rcu: Fix FIXME in rcu_tasks_kthread()
This commit affines rcu_tasks_kthread() to the housekeeping CPUs
in CONFIG_NO_HZ_FULL builds.  This is just a default, so systems
administrators are free to put this kthread somewhere else if they wish.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-13 10:35:41 -08:00
Pranith Kumar aa23c6fbc5 rcutorture: Add early boot self tests
Add early boot self tests for RCU under CONFIG_PROVE_RCU.

Currently the only test is adding a dummy callback which increments a counter
which we then later verify after calling rcu_barrier*().

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-03 19:26:37 -08:00
Pranith Kumar 28f6569ab7 rcu: Remove redundant TREE_PREEMPT_RCU config option
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after
TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU
and uses PREEMPT_RCU config option in its place.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29 10:20:05 -07:00
Paul E. McKenney 96b4672703 Merge branch 'rcu-tasks.2014.09.10a' into HEAD
rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.
2014-09-16 10:10:44 -07:00