Commit Graph

202 Commits

Author SHA1 Message Date
Hugh Dickins
1cc85961e2 rcu: Stop spurious warnings from synchronize_sched_expedited
synchronize_sched_expedited() is spamming CONFIG_DEBUG_PREEMPT=y
users with an unintended warning from the cpu_is_offline() check: use
raw_smp_processor_id() instead of smp_processor_id() there.

Because the warning is under a get_online_cpus(), it is not possible
for any CPUs to go offline, though it is quite possible that the
task might migrate between the raw_smp_processor_id() and the check
of cpu_is_offline().  This is not a problem because the task cannot
migrate from an offline CPU to an online one or vice versa.  The point
of the check is to verify that synchronize_sched_expedited() is not
called from an offline CPU, for example, from a CPU_DYING notifier, or,
more important, from an outgoing CPU making its way from its CPU_DYING
notifiers to the idle loop.

Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 15:33:34 -08:00
Paul E. McKenney
8a2ecf474d rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
RCU, RCU-bh, and RCU-sched read-side critical sections are forbidden
in the inner idle loop, that is, between the rcu_idle_enter() and the
rcu_idle_exit() -- RCU will happily ignore any such read-side critical
sections.  However, things like powertop need tracepoints in the inner
idle loop.

This commit therefore provides an RCU_NONIDLE() macro that can be used to
wrap code in the idle loop that requires RCU read-side critical sections.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
2012-02-21 09:06:13 -08:00
Paul E. McKenney
29e37d8141 rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
Use of RCU in the idle loop is incorrect, quite a few instances of
just that have made their way into mainline, primarily event tracing.
The problem with RCU read-side critical sections on CPUs that RCU believes
to be idle is that RCU is completely ignoring the CPU, along with any
attempts and RCU read-side critical sections.

The approaches of eliminating the offending uses and of pushing the
definition of idle down beyond the offending uses have both proved
impractical.  The new approach is to encapsulate offending uses of RCU
with rcu_idle_exit() and rcu_idle_enter(), but this requires nesting
for code that is invoked both during idle and and during normal execution.
Therefore, this commit modifies rcu_idle_enter() and rcu_idle_exit() to
permit nesting.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
2012-02-21 09:06:12 -08:00
Paul E. McKenney
236fefafe5 rcu: Call out dangers of expedited RCU primitives
The expedited RCU primitives can be quite useful, but they have some
high costs as well.  This commit updates and creates docbook comments
calling out the costs, and updates the RCU documentation as well.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:08 -08:00
Paul E. McKenney
2036d94a7b rcu: Rework detection of use of RCU by offline CPUs
Because newly offlined CPUs continue executing after completing the
CPU_DYING notifiers, they legitimately enter the scheduler and use
RCU while appearing to be offline.  This calls for a more sophisticated
approach as follows:

1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.

2.	RCU marks the CPU offline during the CPU_DEAD phase.

3.	Diagnostics regarding use of read-side RCU by offline CPUs use
	RCU's accounting rather than the cpu_online_map.  (Note that
	__call_rcu() still uses cpu_online_map to detect illegal
	invocations within CPU_DYING notifiers.)

4.	Offline CPUs are prevented from hanging the system by
	force_quiescent_state(), which pays attention to cpu_online_map.
	Some additional work (in a later commit) will be needed to
	guarantee that force_quiescent_state() waits a full jiffy before
	assuming that a CPU is offline, for example, when called from
	idle entry.  (This commit also makes the one-jiffy wait
	explicit, since the old-style implicit wait can now be defeated
	by RCU_FAST_NO_HZ and by rcutorture.)

This approach avoids the false positives encountered when attempting to
use more exact classification of CPU online/offline state.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:07 -08:00
Paul E. McKenney
3d3b7db0a2 rcu: Move synchronize_sched_expedited() to rcutree.c
Now that TREE_RCU and TREE_PREEMPT_RCU no longer do anything different
for the single-CPU case, there is no need for multiple definitions of
synchronize_sched_expedited().  It is no longer in any sense a plug-in,
so move it from kernel/rcutree_plugin.h to kernel/rcutree.c.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:04 -08:00
Paul E. McKenney
c0d6d01bff rcu: Check for illegal use of RCU from offlined CPUs
Although it is legal to use RCU during early boot, it is anything
but legal to use RCU at runtime from an offlined CPU.  After all, RCU
explicitly ignores offlined CPUs.  This commit therefore adds checks
for runtime use of RCU from offlined CPUs.

These checks are not perfect, in particular, they can be subverted
through use of things like rcu_dereference_raw().  Note that it is not
possible to put checks in rcu_read_lock() and friends due to the fact
that these primitives are used in code that might be used under either
RCU or lock-based protection, which means that checking rcu_read_lock()
gets you fat piles of false positives.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:06:03 -08:00
Paul E. McKenney
a858af2875 rcu: Print scheduling-clock information on RCU CPU stall-warning messages
There have been situations where RCU CPU stall warnings were caused by
issues in scheduling-clock timer initialization.  To make it easier to
track these down, this commit causes the RCU CPU stall-warning messages
to print out the number of scheduling-clock interrupts taken in the
current grace period for each stalled CPU.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:49 -08:00
Paul E. McKenney
13cfcca0e4 rcu: Set RCU CPU stall times via sysfs
The default CONFIG_RCU_CPU_STALL_TIMEOUT value of 60 seconds has served
Linux users well for production use for quite some time.  However, for
debugging, there will be more than three minutes between subsequent
stall-warning messages.  This can be an annoyingly long wait if you
are trying to work out where the offending infinite loop is hiding.

Therefore, this commit provides a rcu_cpu_stall_timeout sysfs
parameter that may be adjusted at boot time and at runtime to speed
up debugging.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:48 -08:00
Paul E. McKenney
27565d64a4 rcu: Remove #ifdef CONFIG_SMP from TREE_RCU
Now that both TINY_RCU and TINY_PREEMPT_RCU have been in place for awhile,
it is time to remove UP support from TREE_RCU, which is what this commit
does.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:47 -08:00
Paul E. McKenney
c44e2cddac rcu: Check for idle-loop entry while in RCU read-side critical section
The inner idle loop is an extended quiescent state for all flavors
of RCU, but there have been recent bug involving use of RCU read-side
primitives from within the idle loop.  Therefore, this commit enlists
lockdep-RCU to detect attempts to enter the inner idle loop while in
an RCU read-side critical section, emitting a lockdep-RCU splat if so.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:45 -08:00
Paul E. McKenney
30fbcc90b0 rcu: Clean up straggling rcu_preempt_needs_cpu() name
The recent updates to RCU_CPU_FAST_NO_HZ have an rcu_needs_cpu() that
does more than just check for callbacks, so get the name for
rcu_preempt_needs_cpu() consistent with that change, now calling it
rcu_preempt_cpu_has_callbacks().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:44 -08:00
Paul E. McKenney
f38bd1020f rcu: Remove single-rcu_node optimization in rcu_start_gp()
The grace-period initialization sequence in rcu_start_gp() has a special
case for systems where the rcu_node tree is a single rcu_node structure.
This made sense some years ago when systems were smaller and up to 64
CPUs could share a single rcu_node structure, but now that large systems
are common and a given leaf rcu_node structure can support only 16 CPUs
(due to lock contention on the rcu_node's ->lock field), this optimization
is almost never taken.  And even the small mobile platforms that might
make use of it might rather have the kernel text reduction.

Therefore, this commit removes the check for single-rcu_node trees.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-02-21 09:03:38 -08:00
Paul E. McKenney
a50c3af910 rcu: Don't make callbacks go through second full grace period
RCU's current CPU-offline code path dumps all of the outgoing CPU's
callbacks onto the RCU_NEXT_TAIL portion of the surviving CPU's
callback list.  This means that all the ready-to-invoke callbacks from
the outgoing CPU must wait for another full RCU grace period.  This was
just fine when CPU-hotplug events were rare, but there is increasing
evidence that users are planning to make increasing use of CPU hotplug.

Therefore, this commit changes the callback-dumping procedure so that
callbacks that are ready to invoke are moved to the RCU_DONE_TAIL
portion of the surviving CPU's callback list.  This avoids running
these callbacks through a second unnecessary grace period.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:37 -08:00
Paul E. McKenney
8146c4e2e2 rcu: Check for callback invocation from offline CPUs
Because quiescent states are now reported from offline CPUs in
CPU_DYING state, there is some possibility that such a CPU might
note the end of a grace period and attempt to start invoking
callbacks.  This would be a very bad thing, and is supposed to
be prevented by the fact that the CPU_DYING CPU gets rid of all
its callbacks before reporting the quiescent state.  However,
there is other CPU-offline code in the kernel, and it is quite
possible that someone will invoke RCU core processing from that
code.  Therefore, this commit adds a warning for this case.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:37 -08:00
Paul E. McKenney
e560140008 rcu: Simplify offline processing
Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
notifier.  This simplifies the code by eliminating a potential
deadlock and by reducing the responsibilities of force_quiescent_state().
Also rename functions to make their connection to the CPU-hotplug
stages explicit.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:34 -08:00
Paul E. McKenney
486e259340 rcu: Avoid waking up CPUs having only kfree_rcu() callbacks
When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
enter dyntick-idle mode even if it still has RCU callbacks queued.
RCU avoids system hangs in this case by scheduling a timer for several
jiffies in the future.  However, if all of the callbacks on that CPU
are from kfree_rcu(), there is no reason to wake the CPU up, as it is
not a problem to defer freeing of memory.

This commit therefore tracks the number of callbacks on a given CPU
that are from kfree_rcu(), and avoids scheduling the timer if all of
a given CPU's callbacks are from kfree_rcu().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:25 -08:00
Paul E. McKenney
0bb7b59d6e rcu: Add diagnostic for misaligned rcu_head structures
The push for energy efficiency will require that RCU tag rcu_head
structures to indicate whether or not their invocation is time critical.
This tagging is best carried out in the bottom bits of the ->next
pointers in the rcu_head structures.  This tagging requires that the
rcu_head structures be properly aligned, so this commit adds the required
diagnostics.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:24 -08:00
Paul E. McKenney
fe15d706cf rcu: Add lockdep-RCU checks for simple self-deadlock
It is illegal to have a grace period within a same-flavor RCU read-side
critical section, so this commit adds lockdep-RCU checks to splat when
such abuse is encountered.  This commit does not detect more elaborate
RCU deadlock situations.  These situations might be a job for lockdep
enhancements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:23 -08:00
Paul E. McKenney
4968c300e1 rcu: Augment rcu_batch_end tracing for idle and callback state
The current rcu_batch_end event trace records only the name of the RCU
flavor and the total number of callbacks that remain queued on the
current CPU.  This is insufficient for testing and tuning the new
dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
with whether or not any of the callbacks that were ready to invoke
at the beginning of rcu_do_batch() are still queued.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:22 -08:00
Paul E. McKenney
2d1dc9a600 rcu: Remove redundant rcu_cpu_stall_suppress declaration
No point in having two identical rcu_cpu_stall_suppress declarations,
so remove the more obscure of the two.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:11 -08:00
Paul E. McKenney
dff1672d91 rcu: Keep invoking callbacks if CPU otherwise idle
The rcu_do_batch() function that invokes callbacks for TREE_RCU and
TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
scheduling latency.  However, as long as the CPU would otherwise be idle,
there is no downside to continuing to invoke any callbacks that have passed
through their grace periods.  In fact, processing such callbacks in a
timely manner has the benefit of increasing the probability that the
CPU can enter the power-saving dyntick-idle mode.

Therefore, this commit allows callback invocation to continue beyond the
preset limit as long as the scheduler does not have some other task to
run and as long as context is that of the idle task or the relevant
RCU kthread.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:09 -08:00
Frederic Weisbecker
facc4e1596 rcu: Irq nesting is always 0 on rcu_enter_idle_common
Because tasks don't nest, the ->dyntick_nesting must always be zero upon
entry to rcu_idle_enter_common().  Therefore, pass "0" rather than the
counter itself.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:08 -08:00
Frederic Weisbecker
b6fc602014 rcu: Don't check irq nesting from rcu idle entry/exit
Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do
not need to check for nesting.  This commit therefore moves nesting
checks from rcu_idle_enter_common() to rcu_irq_exit() and from
rcu_idle_exit_common() to rcu_irq_enter().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:07 -08:00
Paul E. McKenney
7cb9249900 rcu: Permit dyntick-idle with callbacks pending
The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
dyntick-idle state if they have RCU callbacks pending.  Unfortunately,
this has the side-effect of often preventing them from entering this
state, especially if at least one other CPU is not in dyntick-idle state.
However, the resulting per-tick wakeup is wasteful in many cases: if the
CPU has already fully responded to the current RCU grace period, there
will be nothing for it to do until this grace period ends, which will
frequently take several jiffies.

This commit therefore permits a CPU that has done everything that the
current grace period has asked of it (rcu_pending() == 0) even if it
still as RCU callbacks pending.  However, such a CPU posts a timer to
wake it up several jiffies later (6 jiffies, based on experience with
grace-period lengths).  This wakeup is required to handle situations
that can result in all CPUs being in dyntick-idle mode, thus failing
to ever complete the current grace period.  If a CPU wakes up before
the timer goes off, then it cancels that timer, thus avoiding spurious
wakeups.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11 10:32:07 -08:00