mirror of
https://github.com/ukui/kernel.git
synced 2026-03-09 10:07:04 -07:00
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Make kfree_rcu() use kfree_bulk() for added performance - RCU updates - Callback-overload handling updates - Tasks-RCU KCSAN and sparse updates - Locking torture test and RCU torture test updates - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Make rcu_barrier() account for offline no-CBs CPUs rcu: Mark rcu_state.gp_seq to detect concurrent writes Documentation/memory-barriers: Fix typos doc: Add rcutorture scripting to torture.txt doc/RCU/rcu: Use https instead of http if possible doc/RCU/rcu: Use absolute paths for non-rst files doc/RCU/rcu: Use ':ref:' for links to other docs doc/RCU/listRCU: Update example function name doc/RCU/listRCU: Fix typos in a example code snippets doc/RCU/Design: Remove remaining HTML tags in ReST files doc: Add some more RCU list patterns in the kernel rcutorture: Set KCSAN Kconfig options to detect more data races rcutorture: Manually clean up after rcu_barrier() failure rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU rcuperf: Measure memory footprint during kfree_rcu() test rcutorture: Annotation lockless accesses to rcu_torture_current rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch rcutorture: Fix stray access to rcu_fwd_cb_nodelay rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race rcutorture: Make kvm-find-errors.sh abort on bad directory ...
This commit is contained in:
@@ -4,7 +4,7 @@ A Tour Through TREE_RCU's Grace-Period Memory Ordering
|
||||
|
||||
August 8, 2017
|
||||
|
||||
This article was contributed by Paul E. McKenney
|
||||
This article was contributed by Paul E. McKenney
|
||||
|
||||
Introduction
|
||||
============
|
||||
@@ -48,7 +48,7 @@ Tree RCU Grace Period Memory Ordering Building Blocks
|
||||
|
||||
The workhorse for RCU's grace-period memory ordering is the
|
||||
critical section for the ``rcu_node`` structure's
|
||||
``->lock``. These critical sections use helper functions for lock
|
||||
``->lock``. These critical sections use helper functions for lock
|
||||
acquisition, including ``raw_spin_lock_rcu_node()``,
|
||||
``raw_spin_lock_irq_rcu_node()``, and ``raw_spin_lock_irqsave_rcu_node()``.
|
||||
Their lock-release counterparts are ``raw_spin_unlock_rcu_node()``,
|
||||
@@ -102,9 +102,9 @@ lock-acquisition and lock-release functions::
|
||||
23 r3 = READ_ONCE(x);
|
||||
24 }
|
||||
25
|
||||
26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);
|
||||
26 WARN_ON(r1 == 0 && r2 == 0 && r3 == 0);
|
||||
|
||||
The ``WARN_ON()`` is evaluated at “the end of time”,
|
||||
The ``WARN_ON()`` is evaluated at "the end of time",
|
||||
after all changes have propagated throughout the system.
|
||||
Without the ``smp_mb__after_unlock_lock()`` provided by the
|
||||
acquisition functions, this ``WARN_ON()`` could trigger, for example
|
||||
|
||||
@@ -4,12 +4,61 @@ Using RCU to Protect Read-Mostly Linked Lists
|
||||
=============================================
|
||||
|
||||
One of the best applications of RCU is to protect read-mostly linked lists
|
||||
("struct list_head" in list.h). One big advantage of this approach
|
||||
(``struct list_head`` in list.h). One big advantage of this approach
|
||||
is that all of the required memory barriers are included for you in
|
||||
the list macros. This document describes several applications of RCU,
|
||||
with the best fits first.
|
||||
|
||||
Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
|
||||
|
||||
Example 1: Read-mostly list: Deferred Destruction
|
||||
-------------------------------------------------
|
||||
|
||||
A widely used usecase for RCU lists in the kernel is lockless iteration over
|
||||
all processes in the system. ``task_struct::tasks`` represents the list node that
|
||||
links all the processes. The list can be traversed in parallel to any list
|
||||
additions or removals.
|
||||
|
||||
The traversal of the list is done using ``for_each_process()`` which is defined
|
||||
by the 2 macros::
|
||||
|
||||
#define next_task(p) \
|
||||
list_entry_rcu((p)->tasks.next, struct task_struct, tasks)
|
||||
|
||||
#define for_each_process(p) \
|
||||
for (p = &init_task ; (p = next_task(p)) != &init_task ; )
|
||||
|
||||
The code traversing the list of all processes typically looks like::
|
||||
|
||||
rcu_read_lock();
|
||||
for_each_process(p) {
|
||||
/* Do something with p */
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
The simplified code for removing a process from a task list is::
|
||||
|
||||
void release_task(struct task_struct *p)
|
||||
{
|
||||
write_lock(&tasklist_lock);
|
||||
list_del_rcu(&p->tasks);
|
||||
write_unlock(&tasklist_lock);
|
||||
call_rcu(&p->rcu, delayed_put_task_struct);
|
||||
}
|
||||
|
||||
When a process exits, ``release_task()`` calls ``list_del_rcu(&p->tasks)`` under
|
||||
``tasklist_lock`` writer lock protection, to remove the task from the list of
|
||||
all tasks. The ``tasklist_lock`` prevents concurrent list additions/removals
|
||||
from corrupting the list. Readers using ``for_each_process()`` are not protected
|
||||
with the ``tasklist_lock``. To prevent readers from noticing changes in the list
|
||||
pointers, the ``task_struct`` object is freed only after one or more grace
|
||||
periods elapse (with the help of call_rcu()). This deferring of destruction
|
||||
ensures that any readers traversing the list will see valid ``p->tasks.next``
|
||||
pointers and deletion/freeing can happen in parallel with traversal of the list.
|
||||
This pattern is also called an **existence lock**, since RCU pins the object in
|
||||
memory until all existing readers finish.
|
||||
|
||||
|
||||
Example 2: Read-Side Action Taken Outside of Lock: No In-Place Updates
|
||||
----------------------------------------------------------------------
|
||||
|
||||
The best applications are cases where, if reader-writer locking were
|
||||
@@ -26,7 +75,7 @@ added or deleted, rather than being modified in place.
|
||||
|
||||
A straightforward example of this use of RCU may be found in the
|
||||
system-call auditing support. For example, a reader-writer locked
|
||||
implementation of audit_filter_task() might be as follows::
|
||||
implementation of ``audit_filter_task()`` might be as follows::
|
||||
|
||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||
{
|
||||
@@ -34,7 +83,7 @@ implementation of audit_filter_task() might be as follows::
|
||||
enum audit_state state;
|
||||
|
||||
read_lock(&auditsc_lock);
|
||||
/* Note: audit_netlink_sem held by caller. */
|
||||
/* Note: audit_filter_mutex held by caller. */
|
||||
list_for_each_entry(e, &audit_tsklist, list) {
|
||||
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
||||
read_unlock(&auditsc_lock);
|
||||
@@ -58,7 +107,7 @@ This means that RCU can be easily applied to the read side, as follows::
|
||||
enum audit_state state;
|
||||
|
||||
rcu_read_lock();
|
||||
/* Note: audit_netlink_sem held by caller. */
|
||||
/* Note: audit_filter_mutex held by caller. */
|
||||
list_for_each_entry_rcu(e, &audit_tsklist, list) {
|
||||
if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
|
||||
rcu_read_unlock();
|
||||
@@ -69,18 +118,18 @@ This means that RCU can be easily applied to the read side, as follows::
|
||||
return AUDIT_BUILD_CONTEXT;
|
||||
}
|
||||
|
||||
The read_lock() and read_unlock() calls have become rcu_read_lock()
|
||||
The ``read_lock()`` and ``read_unlock()`` calls have become rcu_read_lock()
|
||||
and rcu_read_unlock(), respectively, and the list_for_each_entry() has
|
||||
become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
|
||||
become list_for_each_entry_rcu(). The **_rcu()** list-traversal primitives
|
||||
insert the read-side memory barriers that are required on DEC Alpha CPUs.
|
||||
|
||||
The changes to the update side are also straightforward. A reader-writer
|
||||
lock might be used as follows for deletion and insertion::
|
||||
The changes to the update side are also straightforward. A reader-writer lock
|
||||
might be used as follows for deletion and insertion::
|
||||
|
||||
static inline int audit_del_rule(struct audit_rule *rule,
|
||||
struct list_head *list)
|
||||
{
|
||||
struct audit_entry *e;
|
||||
struct audit_entry *e;
|
||||
|
||||
write_lock(&auditsc_lock);
|
||||
list_for_each_entry(e, list, list) {
|
||||
@@ -113,9 +162,9 @@ Following are the RCU equivalents for these two functions::
|
||||
static inline int audit_del_rule(struct audit_rule *rule,
|
||||
struct list_head *list)
|
||||
{
|
||||
struct audit_entry *e;
|
||||
struct audit_entry *e;
|
||||
|
||||
/* Do not use the _rcu iterator here, since this is the only
|
||||
/* No need to use the _rcu iterator here, since this is the only
|
||||
* deletion routine. */
|
||||
list_for_each_entry(e, list, list) {
|
||||
if (!audit_compare_rule(rule, &e->rule)) {
|
||||
@@ -139,45 +188,45 @@ Following are the RCU equivalents for these two functions::
|
||||
return 0;
|
||||
}
|
||||
|
||||
Normally, the write_lock() and write_unlock() would be replaced by
|
||||
a spin_lock() and a spin_unlock(), but in this case, all callers hold
|
||||
audit_netlink_sem, so no additional locking is required. The auditsc_lock
|
||||
can therefore be eliminated, since use of RCU eliminates the need for
|
||||
writers to exclude readers. Normally, the write_lock() calls would
|
||||
be converted into spin_lock() calls.
|
||||
Normally, the ``write_lock()`` and ``write_unlock()`` would be replaced by a
|
||||
spin_lock() and a spin_unlock(). But in this case, all callers hold
|
||||
``audit_filter_mutex``, so no additional locking is required. The
|
||||
``auditsc_lock`` can therefore be eliminated, since use of RCU eliminates the
|
||||
need for writers to exclude readers.
|
||||
|
||||
The list_del(), list_add(), and list_add_tail() primitives have been
|
||||
replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
|
||||
The _rcu() list-manipulation primitives add memory barriers that are
|
||||
needed on weakly ordered CPUs (most of them!). The list_del_rcu()
|
||||
primitive omits the pointer poisoning debug-assist code that would
|
||||
otherwise cause concurrent readers to fail spectacularly.
|
||||
The **_rcu()** list-manipulation primitives add memory barriers that are needed on
|
||||
weakly ordered CPUs (most of them!). The list_del_rcu() primitive omits the
|
||||
pointer poisoning debug-assist code that would otherwise cause concurrent
|
||||
readers to fail spectacularly.
|
||||
|
||||
So, when readers can tolerate stale data and when entries are either added
|
||||
or deleted, without in-place modification, it is very easy to use RCU!
|
||||
So, when readers can tolerate stale data and when entries are either added or
|
||||
deleted, without in-place modification, it is very easy to use RCU!
|
||||
|
||||
Example 2: Handling In-Place Updates
|
||||
|
||||
Example 3: Handling In-Place Updates
|
||||
------------------------------------
|
||||
|
||||
The system-call auditing code does not update auditing rules in place.
|
||||
However, if it did, reader-writer-locked code to do so might look as
|
||||
follows (presumably, the field_count is only permitted to decrease,
|
||||
otherwise, the added fields would need to be filled in)::
|
||||
The system-call auditing code does not update auditing rules in place. However,
|
||||
if it did, the reader-writer-locked code to do so might look as follows
|
||||
(assuming only ``field_count`` is updated, otherwise, the added fields would
|
||||
need to be filled in)::
|
||||
|
||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||
struct list_head *list,
|
||||
__u32 newaction,
|
||||
__u32 newfield_count)
|
||||
{
|
||||
struct audit_entry *e;
|
||||
struct audit_newentry *ne;
|
||||
struct audit_entry *e;
|
||||
struct audit_entry *ne;
|
||||
|
||||
write_lock(&auditsc_lock);
|
||||
/* Note: audit_netlink_sem held by caller. */
|
||||
/* Note: audit_filter_mutex held by caller. */
|
||||
list_for_each_entry(e, list, list) {
|
||||
if (!audit_compare_rule(rule, &e->rule)) {
|
||||
e->rule.action = newaction;
|
||||
e->rule.file_count = newfield_count;
|
||||
e->rule.field_count = newfield_count;
|
||||
write_unlock(&auditsc_lock);
|
||||
return 0;
|
||||
}
|
||||
@@ -188,16 +237,16 @@ otherwise, the added fields would need to be filled in)::
|
||||
|
||||
The RCU version creates a copy, updates the copy, then replaces the old
|
||||
entry with the newly updated entry. This sequence of actions, allowing
|
||||
concurrent reads while doing a copy to perform an update, is what gives
|
||||
RCU ("read-copy update") its name. The RCU code is as follows::
|
||||
concurrent reads while making a copy to perform an update, is what gives
|
||||
RCU (*read-copy update*) its name. The RCU code is as follows::
|
||||
|
||||
static inline int audit_upd_rule(struct audit_rule *rule,
|
||||
struct list_head *list,
|
||||
__u32 newaction,
|
||||
__u32 newfield_count)
|
||||
{
|
||||
struct audit_entry *e;
|
||||
struct audit_newentry *ne;
|
||||
struct audit_entry *e;
|
||||
struct audit_entry *ne;
|
||||
|
||||
list_for_each_entry(e, list, list) {
|
||||
if (!audit_compare_rule(rule, &e->rule)) {
|
||||
@@ -206,7 +255,7 @@ RCU ("read-copy update") its name. The RCU code is as follows::
|
||||
return -ENOMEM;
|
||||
audit_copy_rule(&ne->rule, &e->rule);
|
||||
ne->rule.action = newaction;
|
||||
ne->rule.file_count = newfield_count;
|
||||
ne->rule.field_count = newfield_count;
|
||||
list_replace_rcu(&e->list, &ne->list);
|
||||
call_rcu(&e->rcu, audit_free_rule);
|
||||
return 0;
|
||||
@@ -215,34 +264,45 @@ RCU ("read-copy update") its name. The RCU code is as follows::
|
||||
return -EFAULT; /* No matching rule */
|
||||
}
|
||||
|
||||
Again, this assumes that the caller holds audit_netlink_sem. Normally,
|
||||
the reader-writer lock would become a spinlock in this sort of code.
|
||||
Again, this assumes that the caller holds ``audit_filter_mutex``. Normally, the
|
||||
writer lock would become a spinlock in this sort of code.
|
||||
|
||||
Example 3: Eliminating Stale Data
|
||||
Another use of this pattern can be found in the openswitch driver's *connection
|
||||
tracking table* code in ``ct_limit_set()``. The table holds connection tracking
|
||||
entries and has a limit on the maximum entries. There is one such table
|
||||
per-zone and hence one *limit* per zone. The zones are mapped to their limits
|
||||
through a hashtable using an RCU-managed hlist for the hash chains. When a new
|
||||
limit is set, a new limit object is allocated and ``ct_limit_set()`` is called
|
||||
to replace the old limit object with the new one using list_replace_rcu().
|
||||
The old limit object is then freed after a grace period using kfree_rcu().
|
||||
|
||||
|
||||
Example 4: Eliminating Stale Data
|
||||
---------------------------------
|
||||
|
||||
The auditing examples above tolerate stale data, as do most algorithms
|
||||
The auditing example above tolerates stale data, as do most algorithms
|
||||
that are tracking external state. Because there is a delay from the
|
||||
time the external state changes before Linux becomes aware of the change,
|
||||
additional RCU-induced staleness is normally not a problem.
|
||||
additional RCU-induced staleness is generally not a problem.
|
||||
|
||||
However, there are many examples where stale data cannot be tolerated.
|
||||
One example in the Linux kernel is the System V IPC (see the ipc_lock()
|
||||
function in ipc/util.c). This code checks a "deleted" flag under a
|
||||
per-entry spinlock, and, if the "deleted" flag is set, pretends that the
|
||||
One example in the Linux kernel is the System V IPC (see the shm_lock()
|
||||
function in ipc/shm.c). This code checks a *deleted* flag under a
|
||||
per-entry spinlock, and, if the *deleted* flag is set, pretends that the
|
||||
entry does not exist. For this to be helpful, the search function must
|
||||
return holding the per-entry spinlock, as ipc_lock() does in fact do.
|
||||
return holding the per-entry spinlock, as shm_lock() does in fact do.
|
||||
|
||||
.. _quick_quiz:
|
||||
|
||||
Quick Quiz:
|
||||
Why does the search function need to return holding the per-entry lock for
|
||||
this deleted-flag technique to be helpful?
|
||||
For the deleted-flag technique to be helpful, why is it necessary
|
||||
to hold the per-entry lock while returning from the search function?
|
||||
|
||||
:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
|
||||
:ref:`Answer to Quick Quiz <quick_quiz_answer>`
|
||||
|
||||
If the system-call audit module were to ever need to reject stale data,
|
||||
one way to accomplish this would be to add a "deleted" flag and a "lock"
|
||||
spinlock to the audit_entry structure, and modify audit_filter_task()
|
||||
as follows::
|
||||
If the system-call audit module were to ever need to reject stale data, one way
|
||||
to accomplish this would be to add a ``deleted`` flag and a ``lock`` spinlock to the
|
||||
audit_entry structure, and modify ``audit_filter_task()`` as follows::
|
||||
|
||||
static enum audit_state audit_filter_task(struct task_struct *tsk)
|
||||
{
|
||||
@@ -267,20 +327,20 @@ as follows::
|
||||
}
|
||||
|
||||
Note that this example assumes that entries are only added and deleted.
|
||||
Additional mechanism is required to deal correctly with the
|
||||
update-in-place performed by audit_upd_rule(). For one thing,
|
||||
audit_upd_rule() would need additional memory barriers to ensure
|
||||
that the list_add_rcu() was really executed before the list_del_rcu().
|
||||
Additional mechanism is required to deal correctly with the update-in-place
|
||||
performed by ``audit_upd_rule()``. For one thing, ``audit_upd_rule()`` would
|
||||
need additional memory barriers to ensure that the list_add_rcu() was really
|
||||
executed before the list_del_rcu().
|
||||
|
||||
The audit_del_rule() function would need to set the "deleted"
|
||||
flag under the spinlock as follows::
|
||||
The ``audit_del_rule()`` function would need to set the ``deleted`` flag under the
|
||||
spinlock as follows::
|
||||
|
||||
static inline int audit_del_rule(struct audit_rule *rule,
|
||||
struct list_head *list)
|
||||
{
|
||||
struct audit_entry *e;
|
||||
struct audit_entry *e;
|
||||
|
||||
/* Do not need to use the _rcu iterator here, since this
|
||||
/* No need to use the _rcu iterator here, since this
|
||||
* is the only deletion routine. */
|
||||
list_for_each_entry(e, list, list) {
|
||||
if (!audit_compare_rule(rule, &e->rule)) {
|
||||
@@ -295,6 +355,91 @@ flag under the spinlock as follows::
|
||||
return -EFAULT; /* No matching rule */
|
||||
}
|
||||
|
||||
This too assumes that the caller holds ``audit_filter_mutex``.
|
||||
|
||||
|
||||
Example 5: Skipping Stale Objects
|
||||
---------------------------------
|
||||
|
||||
For some usecases, reader performance can be improved by skipping stale objects
|
||||
during read-side list traversal if the object in concern is pending destruction
|
||||
after one or more grace periods. One such example can be found in the timerfd
|
||||
subsystem. When a ``CLOCK_REALTIME`` clock is reprogrammed - for example due to
|
||||
setting of the system time, then all programmed timerfds that depend on this
|
||||
clock get triggered and processes waiting on them to expire are woken up in
|
||||
advance of their scheduled expiry. To facilitate this, all such timers are added
|
||||
to an RCU-managed ``cancel_list`` when they are setup in
|
||||
``timerfd_setup_cancel()``::
|
||||
|
||||
static void timerfd_setup_cancel(struct timerfd_ctx *ctx, int flags)
|
||||
{
|
||||
spin_lock(&ctx->cancel_lock);
|
||||
if ((ctx->clockid == CLOCK_REALTIME &&
|
||||
(flags & TFD_TIMER_ABSTIME) && (flags & TFD_TIMER_CANCEL_ON_SET)) {
|
||||
if (!ctx->might_cancel) {
|
||||
ctx->might_cancel = true;
|
||||
spin_lock(&cancel_lock);
|
||||
list_add_rcu(&ctx->clist, &cancel_list);
|
||||
spin_unlock(&cancel_lock);
|
||||
}
|
||||
}
|
||||
spin_unlock(&ctx->cancel_lock);
|
||||
}
|
||||
|
||||
When a timerfd is freed (fd is closed), then the ``might_cancel`` flag of the
|
||||
timerfd object is cleared, the object removed from the ``cancel_list`` and
|
||||
destroyed::
|
||||
|
||||
int timerfd_release(struct inode *inode, struct file *file)
|
||||
{
|
||||
struct timerfd_ctx *ctx = file->private_data;
|
||||
|
||||
spin_lock(&ctx->cancel_lock);
|
||||
if (ctx->might_cancel) {
|
||||
ctx->might_cancel = false;
|
||||
spin_lock(&cancel_lock);
|
||||
list_del_rcu(&ctx->clist);
|
||||
spin_unlock(&cancel_lock);
|
||||
}
|
||||
spin_unlock(&ctx->cancel_lock);
|
||||
|
||||
hrtimer_cancel(&ctx->t.tmr);
|
||||
kfree_rcu(ctx, rcu);
|
||||
return 0;
|
||||
}
|
||||
|
||||
If the ``CLOCK_REALTIME`` clock is set, for example by a time server, the
|
||||
hrtimer framework calls ``timerfd_clock_was_set()`` which walks the
|
||||
``cancel_list`` and wakes up processes waiting on the timerfd. While iterating
|
||||
the ``cancel_list``, the ``might_cancel`` flag is consulted to skip stale
|
||||
objects::
|
||||
|
||||
void timerfd_clock_was_set(void)
|
||||
{
|
||||
struct timerfd_ctx *ctx;
|
||||
unsigned long flags;
|
||||
|
||||
rcu_read_lock();
|
||||
list_for_each_entry_rcu(ctx, &cancel_list, clist) {
|
||||
if (!ctx->might_cancel)
|
||||
continue;
|
||||
spin_lock_irqsave(&ctx->wqh.lock, flags);
|
||||
if (ctx->moffs != ktime_mono_to_real(0)) {
|
||||
ctx->moffs = KTIME_MAX;
|
||||
ctx->ticks++;
|
||||
wake_up_locked_poll(&ctx->wqh, EPOLLIN);
|
||||
}
|
||||
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
|
||||
}
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
The key point here is, because RCU-traversal of the ``cancel_list`` happens
|
||||
while objects are being added and removed to the list, sometimes the traversal
|
||||
can step on an object that has been removed from the list. In this example, it
|
||||
is seen that it is better to skip such objects using a flag.
|
||||
|
||||
|
||||
Summary
|
||||
-------
|
||||
|
||||
@@ -303,19 +448,21 @@ the most amenable to use of RCU. The simplest case is where entries are
|
||||
either added or deleted from the data structure (or atomically modified
|
||||
in place), but non-atomic in-place modifications can be handled by making
|
||||
a copy, updating the copy, then replacing the original with the copy.
|
||||
If stale data cannot be tolerated, then a "deleted" flag may be used
|
||||
If stale data cannot be tolerated, then a *deleted* flag may be used
|
||||
in conjunction with a per-entry spinlock in order to allow the search
|
||||
function to reject newly deleted data.
|
||||
|
||||
.. _answer_quick_quiz_list:
|
||||
.. _quick_quiz_answer:
|
||||
|
||||
Answer to Quick Quiz:
|
||||
Why does the search function need to return holding the per-entry
|
||||
lock for this deleted-flag technique to be helpful?
|
||||
For the deleted-flag technique to be helpful, why is it necessary
|
||||
to hold the per-entry lock while returning from the search function?
|
||||
|
||||
If the search function drops the per-entry lock before returning,
|
||||
then the caller will be processing stale data in any case. If it
|
||||
is really OK to be processing stale data, then you don't need a
|
||||
"deleted" flag. If processing stale data really is a problem,
|
||||
*deleted* flag. If processing stale data really is a problem,
|
||||
then you need to hold the per-entry lock across all of the code
|
||||
that uses the value that was returned.
|
||||
|
||||
:ref:`Back to Quick Quiz <quick_quiz>`
|
||||
|
||||
@@ -11,8 +11,8 @@ must be long enough that any readers accessing the item being deleted have
|
||||
since dropped their references. For example, an RCU-protected deletion
|
||||
from a linked list would first remove the item from the list, wait for
|
||||
a grace period to elapse, then free the element. See the
|
||||
Documentation/RCU/listRCU.rst file for more information on using RCU with
|
||||
linked lists.
|
||||
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` for more information on
|
||||
using RCU with linked lists.
|
||||
|
||||
Frequently Asked Questions
|
||||
--------------------------
|
||||
@@ -50,7 +50,7 @@ Frequently Asked Questions
|
||||
- If I am running on a uniprocessor kernel, which can only do one
|
||||
thing at a time, why should I wait for a grace period?
|
||||
|
||||
See the Documentation/RCU/UP.rst file for more information.
|
||||
See :ref:`Documentation/RCU/UP.rst <up_doc>` for more information.
|
||||
|
||||
- How can I see where RCU is currently used in the Linux kernel?
|
||||
|
||||
@@ -68,18 +68,18 @@ Frequently Asked Questions
|
||||
|
||||
- Why the name "RCU"?
|
||||
|
||||
"RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
|
||||
has more information on where this name came from, search for
|
||||
"read-copy update" to find it.
|
||||
"RCU" stands for "read-copy update".
|
||||
:ref:`Documentation/RCU/listRCU.rst <list_rcu_doc>` has more information on where
|
||||
this name came from, search for "read-copy update" to find it.
|
||||
|
||||
- I hear that RCU is patented? What is with that?
|
||||
|
||||
Yes, it is. There are several known patents related to RCU,
|
||||
search for the string "Patent" in RTFP.txt to find them.
|
||||
search for the string "Patent" in Documentation/RCU/RTFP.txt to find them.
|
||||
Of these, one was allowed to lapse by the assignee, and the
|
||||
others have been contributed to the Linux kernel under GPL.
|
||||
There are now also LGPL implementations of user-level RCU
|
||||
available (http://liburcu.org/).
|
||||
available (https://liburcu.org/).
|
||||
|
||||
- I hear that RCU needs work in order to support realtime kernels?
|
||||
|
||||
@@ -88,5 +88,5 @@ Frequently Asked Questions
|
||||
|
||||
- Where can I find more information on RCU?
|
||||
|
||||
See the RTFP.txt file in this directory.
|
||||
See the Documentation/RCU/RTFP.txt file.
|
||||
Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
|
||||
|
||||
@@ -124,9 +124,14 @@ using a dynamically allocated srcu_struct (hence "srcud-" rather than
|
||||
debugging. The final "T" entry contains the totals of the counters.
|
||||
|
||||
|
||||
USAGE
|
||||
USAGE ON SPECIFIC KERNEL BUILDS
|
||||
|
||||
The following script may be used to torture RCU:
|
||||
It is sometimes desirable to torture RCU on a specific kernel build,
|
||||
for example, when preparing to put that kernel build into production.
|
||||
In that case, the kernel should be built with CONFIG_RCU_TORTURE_TEST=m
|
||||
so that the test can be started using modprobe and terminated using rmmod.
|
||||
|
||||
For example, the following script may be used to torture RCU:
|
||||
|
||||
#!/bin/sh
|
||||
|
||||
@@ -142,8 +147,136 @@ checked for such errors. The "rmmod" command forces a "SUCCESS",
|
||||
two are self-explanatory, while the last indicates that while there
|
||||
were no RCU failures, CPU-hotplug problems were detected.
|
||||
|
||||
However, the tools/testing/selftests/rcutorture/bin/kvm.sh script
|
||||
provides better automation, including automatic failure analysis.
|
||||
It assumes a qemu/kvm-enabled platform, and runs guest OSes out of initrd.
|
||||
See tools/testing/selftests/rcutorture/doc/initrd.txt for instructions
|
||||
on setting up such an initrd.
|
||||
|
||||
USAGE ON MAINLINE KERNELS
|
||||
|
||||
When using rcutorture to test changes to RCU itself, it is often
|
||||
necessary to build a number of kernels in order to test that change
|
||||
across a broad range of combinations of the relevant Kconfig options
|
||||
and of the relevant kernel boot parameters. In this situation, use
|
||||
of modprobe and rmmod can be quite time-consuming and error-prone.
|
||||
|
||||
Therefore, the tools/testing/selftests/rcutorture/bin/kvm.sh
|
||||
script is available for mainline testing for x86, arm64, and
|
||||
powerpc. By default, it will run the series of tests specified by
|
||||
tools/testing/selftests/rcutorture/configs/rcu/CFLIST, with each test
|
||||
running for 30 minutes within a guest OS using a minimal userspace
|
||||
supplied by an automatically generated initrd. After the tests are
|
||||
complete, the resulting build products and console output are analyzed
|
||||
for errors and the results of the runs are summarized.
|
||||
|
||||
On larger systems, rcutorture testing can be accelerated by passing the
|
||||
--cpus argument to kvm.sh. For example, on a 64-CPU system, "--cpus 43"
|
||||
would use up to 43 CPUs to run tests concurrently, which as of v5.4 would
|
||||
complete all the scenarios in two batches, reducing the time to complete
|
||||
from about eight hours to about one hour (not counting the time to build
|
||||
the sixteen kernels). The "--dryrun sched" argument will not run tests,
|
||||
but rather tell you how the tests would be scheduled into batches. This
|
||||
can be useful when working out how many CPUs to specify in the --cpus
|
||||
argument.
|
||||
|
||||
Not all changes require that all scenarios be run. For example, a change
|
||||
to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
|
||||
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
|
||||
Large systems can run multiple copies of of the full set of scenarios,
|
||||
for example, a system with 448 hardware threads can run five instances
|
||||
of the full set concurrently. To make this happen:
|
||||
|
||||
kvm.sh --cpus 448 --configs '5*CFLIST'
|
||||
|
||||
Alternatively, such a system can run 56 concurrent instances of a single
|
||||
eight-CPU scenario:
|
||||
|
||||
kvm.sh --cpus 448 --configs '56*TREE04'
|
||||
|
||||
Or 28 concurrent instances of each of two eight-CPU scenarios:
|
||||
|
||||
kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04'
|
||||
|
||||
Of course, each concurrent instance will use memory, which can be
|
||||
limited using the --memory argument, which defaults to 512M. Small
|
||||
values for memory may require disabling the callback-flooding tests
|
||||
using the --bootargs parameter discussed below.
|
||||
|
||||
Sometimes additional debugging is useful, and in such cases the --kconfig
|
||||
parameter to kvm.sh may be used, for example, "--kconfig 'CONFIG_KASAN=y'".
|
||||
|
||||
Kernel boot arguments can also be supplied, for example, to control
|
||||
rcutorture's module parameters. For example, to test a change to RCU's
|
||||
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
|
||||
This will of course result in the scripting reporting a failure, namely
|
||||
the resuling RCU CPU stall warning. As noted above, reducing memory may
|
||||
require disabling rcutorture's callback-flooding tests:
|
||||
|
||||
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
|
||||
--bootargs 'rcutorture.fwd_progress=0'
|
||||
|
||||
Sometimes all that is needed is a full set of kernel builds. This is
|
||||
what the --buildonly argument does.
|
||||
|
||||
Finally, the --trust-make argument allows each kernel build to reuse what
|
||||
it can from the previous kernel build.
|
||||
|
||||
There are additional more arcane arguments that are documented in the
|
||||
source code of the kvm.sh script.
|
||||
|
||||
If a run contains failures, the number of buildtime and runtime failures
|
||||
is listed at the end of the kvm.sh output, which you really should redirect
|
||||
to a file. The build products and console output of each run is kept in
|
||||
tools/testing/selftests/rcutorture/res in timestamped directories. A
|
||||
given directory can be supplied to kvm-find-errors.sh in order to have
|
||||
it cycle you through summaries of errors and full error logs. For example:
|
||||
|
||||
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
|
||||
tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23
|
||||
|
||||
However, it is often more convenient to access the files directly.
|
||||
Files pertaining to all scenarios in a run reside in the top-level
|
||||
directory (2020.01.20-15.54.23 in the example above), while per-scenario
|
||||
files reside in a subdirectory named after the scenario (for example,
|
||||
"TREE04"). If a given scenario ran more than once (as in "--configs
|
||||
'56*TREE04'" above), the directories corresponding to the second and
|
||||
subsequent runs of that scenario include a sequence number, for example,
|
||||
"TREE04.2", "TREE04.3", and so on.
|
||||
|
||||
The most frequently used file in the top-level directory is testid.txt.
|
||||
If the test ran in a git repository, then this file contains the commit
|
||||
that was tested and any uncommitted changes in diff format.
|
||||
|
||||
The most frequently used files in each per-scenario-run directory are:
|
||||
|
||||
.config: This file contains the Kconfig options.
|
||||
|
||||
Make.out: This contains build output for a specific scenario.
|
||||
|
||||
console.log: This contains the console output for a specific scenario.
|
||||
This file may be examined once the kernel has booted, but
|
||||
it might not exist if the build failed.
|
||||
|
||||
vmlinux: This contains the kernel, which can be useful with tools like
|
||||
objdump and gdb.
|
||||
|
||||
A number of additional files are available, but are less frequently used.
|
||||
Many are intended for debugging of rcutorture itself or of its scripting.
|
||||
|
||||
As of v5.4, a successful run with the default set of scenarios produces
|
||||
the following summary at the end of the run on a 12-CPU system:
|
||||
|
||||
SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
|
||||
SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
|
||||
SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
|
||||
SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
|
||||
TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
|
||||
TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
|
||||
TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
|
||||
TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
|
||||
TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
|
||||
TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
|
||||
TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
|
||||
TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
|
||||
CPU count limited from 16 to 12
|
||||
TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
|
||||
TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
|
||||
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
|
||||
CPU count limited from 16 to 12
|
||||
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
|
||||
|
||||
@@ -4005,6 +4005,15 @@
|
||||
Set threshold of queued RCU callbacks below which
|
||||
batch limiting is re-enabled.
|
||||
|
||||
rcutree.qovld= [KNL]
|
||||
Set threshold of queued RCU callbacks beyond which
|
||||
RCU's force-quiescent-state scan will aggressively
|
||||
enlist help from cond_resched() and sched IPIs to
|
||||
help CPUs more quickly reach quiescent states.
|
||||
Set to less than zero to make this be set based
|
||||
on rcutree.qhimark at boot time and to zero to
|
||||
disable more aggressive help enlistment.
|
||||
|
||||
rcutree.rcu_idle_gp_delay= [KNL]
|
||||
Set wakeup interval for idle CPUs that have
|
||||
RCU callbacks (RCU_FAST_NO_HZ=y).
|
||||
@@ -4220,6 +4229,12 @@
|
||||
rcupdate.rcu_cpu_stall_suppress= [KNL]
|
||||
Suppress RCU CPU stall warning messages.
|
||||
|
||||
rcupdate.rcu_cpu_stall_suppress_at_boot= [KNL]
|
||||
Suppress RCU CPU stall warning messages and
|
||||
rcutorture writer stall warnings that occur
|
||||
during early boot, that is, during the time
|
||||
before the init task is spawned.
|
||||
|
||||
rcupdate.rcu_cpu_stall_timeout= [KNL]
|
||||
Set timeout for RCU CPU stall warning messages.
|
||||
|
||||
@@ -4892,6 +4907,10 @@
|
||||
topology updates sent by the hypervisor to this
|
||||
LPAR.
|
||||
|
||||
torture.disable_onoff_at_boot= [KNL]
|
||||
Prevent the CPU-hotplug component of torturing
|
||||
until after init has spawned.
|
||||
|
||||
tp720= [HW,PS2]
|
||||
|
||||
tpm_suspend_pcr=[HW,TPM]
|
||||
|
||||
@@ -185,7 +185,7 @@ As a further example, consider this sequence of events:
|
||||
=============== ===============
|
||||
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
||||
B = 4; Q = P;
|
||||
P = &B D = *Q;
|
||||
P = &B; D = *Q;
|
||||
|
||||
There is an obvious data dependency here, as the value loaded into D depends on
|
||||
the address retrieved from P by CPU 2. At the end of the sequence, any of the
|
||||
@@ -569,7 +569,7 @@ following sequence of events:
|
||||
{ A == 1, B == 2, C == 3, P == &A, Q == &C }
|
||||
B = 4;
|
||||
<write barrier>
|
||||
WRITE_ONCE(P, &B)
|
||||
WRITE_ONCE(P, &B);
|
||||
Q = READ_ONCE(P);
|
||||
D = *Q;
|
||||
|
||||
@@ -1721,7 +1721,7 @@ of optimizations:
|
||||
and WRITE_ONCE() are more selective: With READ_ONCE() and
|
||||
WRITE_ONCE(), the compiler need only forget the contents of the
|
||||
indicated memory locations, while with barrier() the compiler must
|
||||
discard the value of all memory locations that it has currented
|
||||
discard the value of all memory locations that it has currently
|
||||
cached in any machine registers. Of course, the compiler must also
|
||||
respect the order in which the READ_ONCE()s and WRITE_ONCE()s occur,
|
||||
though the CPU of course need not do so.
|
||||
@@ -1833,7 +1833,7 @@ Aside: In the case of data dependencies, the compiler would be expected
|
||||
to issue the loads in the correct order (eg. `a[b]` would have to load
|
||||
the value of b before loading a[b]), however there is no guarantee in
|
||||
the C specification that the compiler may not speculate the value of b
|
||||
(eg. is equal to 1) and load a before b (eg. tmp = a[1]; if (b != 1)
|
||||
(eg. is equal to 1) and load a[b] before b (eg. tmp = a[1]; if (b != 1)
|
||||
tmp = a[b]; ). There is also the problem of a compiler reloading b after
|
||||
having loaded a[b], thus having a newer copy of b than a[b]. A consensus
|
||||
has not yet been reached about these problems, however the READ_ONCE()
|
||||
|
||||
@@ -2489,7 +2489,7 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
|
||||
rcu_read_lock();
|
||||
if (nfsi->cache_validity & NFS_INO_INVALID_ACCESS)
|
||||
goto out;
|
||||
lh = rcu_dereference(nfsi->access_cache_entry_lru.prev);
|
||||
lh = rcu_dereference(list_tail_rcu(&nfsi->access_cache_entry_lru));
|
||||
cache = list_entry(lh, struct nfs_access_entry, lru);
|
||||
if (lh == &nfsi->access_cache_entry_lru ||
|
||||
cred_fscmp(cred, cache->cred) != 0)
|
||||
|
||||
@@ -60,9 +60,9 @@ static inline void INIT_LIST_HEAD_RCU(struct list_head *list)
|
||||
#define __list_check_rcu(dummy, cond, extra...) \
|
||||
({ \
|
||||
check_arg_count_one(extra); \
|
||||
RCU_LOCKDEP_WARN(!cond && !rcu_read_lock_any_held(), \
|
||||
RCU_LOCKDEP_WARN(!(cond) && !rcu_read_lock_any_held(), \
|
||||
"RCU-list traversed in non-reader section!"); \
|
||||
})
|
||||
})
|
||||
#else
|
||||
#define __list_check_rcu(dummy, cond, extra...) \
|
||||
({ check_arg_count_one(extra); })
|
||||
|
||||
@@ -83,6 +83,7 @@ void rcu_scheduler_starting(void);
|
||||
static inline void rcu_scheduler_starting(void) { }
|
||||
#endif /* #else #ifndef CONFIG_SRCU */
|
||||
static inline void rcu_end_inkernel_boot(void) { }
|
||||
static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
|
||||
static inline bool rcu_is_watching(void) { return true; }
|
||||
static inline void rcu_momentary_dyntick_idle(void) { }
|
||||
static inline void kfree_rcu_scheduler_running(void) { }
|
||||
|
||||
@@ -54,6 +54,7 @@ void exit_rcu(void);
|
||||
void rcu_scheduler_starting(void);
|
||||
extern int rcu_scheduler_active __read_mostly;
|
||||
void rcu_end_inkernel_boot(void);
|
||||
bool rcu_inkernel_boot_has_ended(void);
|
||||
bool rcu_is_watching(void);
|
||||
#ifndef CONFIG_PREEMPTION
|
||||
void rcu_all_qs(void);
|
||||
|
||||
@@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct timer_list *timer) { }
|
||||
*/
|
||||
static inline int timer_pending(const struct timer_list * timer)
|
||||
{
|
||||
return timer->entry.pprev != NULL;
|
||||
return !hlist_unhashed_lockless(&timer->entry);
|
||||
}
|
||||
|
||||
extern void add_timer_on(struct timer_list *timer, int cpu);
|
||||
|
||||
@@ -623,6 +623,34 @@ TRACE_EVENT_RCU(rcu_invoke_kfree_callback,
|
||||
__entry->rcuname, __entry->rhp, __entry->offset)
|
||||
);
|
||||
|
||||
/*
|
||||
* Tracepoint for the invocation of a single RCU callback of the special
|
||||
* kfree_bulk() form. The first argument is the RCU flavor, the second
|
||||
* argument is a number of elements in array to free, the third is an
|
||||
* address of the array holding nr_records entries.
|
||||
*/
|
||||
TRACE_EVENT_RCU(rcu_invoke_kfree_bulk_callback,
|
||||
|
||||
TP_PROTO(const char *rcuname, unsigned long nr_records, void **p),
|
||||
|
||||
TP_ARGS(rcuname, nr_records, p),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field(const char *, rcuname)
|
||||
__field(unsigned long, nr_records)
|
||||
__field(void **, p)
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->rcuname = rcuname;
|
||||
__entry->nr_records = nr_records;
|
||||
__entry->p = p;
|
||||
),
|
||||
|
||||
TP_printk("%s bulk=0x%p nr_records=%lu",
|
||||
__entry->rcuname, __entry->p, __entry->nr_records)
|
||||
);
|
||||
|
||||
/*
|
||||
* Tracepoint for exiting rcu_do_batch after RCU callbacks have been
|
||||
* invoked. The first argument is the name of the RCU flavor,
|
||||
@@ -712,6 +740,7 @@ TRACE_EVENT_RCU(rcu_torture_read,
|
||||
* "Begin": rcu_barrier() started.
|
||||
* "EarlyExit": rcu_barrier() piggybacked, thus early exit.
|
||||
* "Inc1": rcu_barrier() piggyback check counter incremented.
|
||||
* "OfflineNoCBQ": rcu_barrier() found offline no-CBs CPU with callbacks.
|
||||
* "OnlineQ": rcu_barrier() found online CPU with callbacks.
|
||||
* "OnlineNQ": rcu_barrier() found online CPU, no callbacks.
|
||||
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
|
||||
|
||||
@@ -618,7 +618,7 @@ static struct lock_torture_ops percpu_rwsem_lock_ops = {
|
||||
static int lock_torture_writer(void *arg)
|
||||
{
|
||||
struct lock_stress_stats *lwsp = arg;
|
||||
static DEFINE_TORTURE_RANDOM(rand);
|
||||
DEFINE_TORTURE_RANDOM(rand);
|
||||
|
||||
VERBOSE_TOROUT_STRING("lock_torture_writer task started");
|
||||
set_user_nice(current, MAX_NICE);
|
||||
@@ -655,7 +655,7 @@ static int lock_torture_writer(void *arg)
|
||||
static int lock_torture_reader(void *arg)
|
||||
{
|
||||
struct lock_stress_stats *lrsp = arg;
|
||||
static DEFINE_TORTURE_RANDOM(rand);
|
||||
DEFINE_TORTURE_RANDOM(rand);
|
||||
|
||||
VERBOSE_TOROUT_STRING("lock_torture_reader task started");
|
||||
set_user_nice(current, MAX_NICE);
|
||||
@@ -696,15 +696,16 @@ static void __torture_print_stats(char *page,
|
||||
if (statp[i].n_lock_fail)
|
||||
fail = true;
|
||||
sum += statp[i].n_lock_acquired;
|
||||
if (max < statp[i].n_lock_fail)
|
||||
max = statp[i].n_lock_fail;
|
||||
if (min > statp[i].n_lock_fail)
|
||||
min = statp[i].n_lock_fail;
|
||||
if (max < statp[i].n_lock_acquired)
|
||||
max = statp[i].n_lock_acquired;
|
||||
if (min > statp[i].n_lock_acquired)
|
||||
min = statp[i].n_lock_acquired;
|
||||
}
|
||||
page += sprintf(page,
|
||||
"%s: Total: %lld Max/Min: %ld/%ld %s Fail: %d %s\n",
|
||||
write ? "Writes" : "Reads ",
|
||||
sum, max, min, max / 2 > min ? "???" : "",
|
||||
sum, max, min,
|
||||
!onoff_interval && max / 2 > min ? "???" : "",
|
||||
fail, fail ? "!!!" : "");
|
||||
if (fail)
|
||||
atomic_inc(&cxt.n_lock_torture_errors);
|
||||
|
||||
@@ -57,7 +57,7 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
|
||||
if (rt_mutex_has_waiters(lock))
|
||||
val |= RT_MUTEX_HAS_WAITERS;
|
||||
|
||||
lock->owner = (struct task_struct *)val;
|
||||
WRITE_ONCE(lock->owner, (struct task_struct *)val);
|
||||
}
|
||||
|
||||
static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
|
||||
|
||||
@@ -3,6 +3,10 @@
|
||||
# and is generally not a function of system call inputs.
|
||||
KCOV_INSTRUMENT := n
|
||||
|
||||
ifeq ($(CONFIG_KCSAN),y)
|
||||
KBUILD_CFLAGS += -g -fno-omit-frame-pointer
|
||||
endif
|
||||
|
||||
obj-y += update.o sync.o
|
||||
obj-$(CONFIG_TREE_SRCU) += srcutree.o
|
||||
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
|
||||
|
||||
@@ -198,6 +198,13 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
|
||||
}
|
||||
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
|
||||
|
||||
extern int rcu_cpu_stall_suppress_at_boot;
|
||||
|
||||
static inline bool rcu_stall_is_suppressed_at_boot(void)
|
||||
{
|
||||
return rcu_cpu_stall_suppress_at_boot && !rcu_inkernel_boot_has_ended();
|
||||
}
|
||||
|
||||
#ifdef CONFIG_RCU_STALL_COMMON
|
||||
|
||||
extern int rcu_cpu_stall_ftrace_dump;
|
||||
@@ -205,6 +212,11 @@ extern int rcu_cpu_stall_suppress;
|
||||
extern int rcu_cpu_stall_timeout;
|
||||
int rcu_jiffies_till_stall_check(void);
|
||||
|
||||
static inline bool rcu_stall_is_suppressed(void)
|
||||
{
|
||||
return rcu_stall_is_suppressed_at_boot() || rcu_cpu_stall_suppress;
|
||||
}
|
||||
|
||||
#define rcu_ftrace_dump_stall_suppress() \
|
||||
do { \
|
||||
if (!rcu_cpu_stall_suppress) \
|
||||
@@ -218,6 +230,11 @@ do { \
|
||||
} while (0)
|
||||
|
||||
#else /* #endif #ifdef CONFIG_RCU_STALL_COMMON */
|
||||
|
||||
static inline bool rcu_stall_is_suppressed(void)
|
||||
{
|
||||
return rcu_stall_is_suppressed_at_boot();
|
||||
}
|
||||
#define rcu_ftrace_dump_stall_suppress()
|
||||
#define rcu_ftrace_dump_stall_unsuppress()
|
||||
#endif /* #ifdef CONFIG_RCU_STALL_COMMON */
|
||||
@@ -325,7 +342,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
|
||||
* Iterate over all possible CPUs in a leaf RCU node.
|
||||
*/
|
||||
#define for_each_leaf_node_possible_cpu(rnp, cpu) \
|
||||
for ((cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
|
||||
for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
|
||||
(cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
|
||||
(cpu) <= rnp->grphi; \
|
||||
(cpu) = cpumask_next((cpu), cpu_possible_mask))
|
||||
|
||||
@@ -335,7 +353,8 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
|
||||
#define rcu_find_next_bit(rnp, cpu, mask) \
|
||||
((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
|
||||
#define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
|
||||
for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
|
||||
for (WARN_ON_ONCE(!rcu_is_leaf_node(rnp)), \
|
||||
(cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
|
||||
(cpu) <= rnp->grphi; \
|
||||
(cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
|
||||
|
||||
|
||||
@@ -182,7 +182,7 @@ void rcu_segcblist_offload(struct rcu_segcblist *rsclp)
|
||||
bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp)
|
||||
{
|
||||
return rcu_segcblist_is_enabled(rsclp) &&
|
||||
&rsclp->head != rsclp->tails[RCU_DONE_TAIL];
|
||||
&rsclp->head != READ_ONCE(rsclp->tails[RCU_DONE_TAIL]);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -381,8 +381,6 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp,
|
||||
return; /* Nothing to do. */
|
||||
WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head);
|
||||
WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail);
|
||||
rclp->head = NULL;
|
||||
rclp->tail = &rclp->head;
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@@ -12,6 +12,7 @@
|
||||
#include <linux/types.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/kthread.h>
|
||||
#include <linux/err.h>
|
||||
@@ -611,6 +612,7 @@ kfree_perf_thread(void *arg)
|
||||
long me = (long)arg;
|
||||
struct kfree_obj *alloc_ptr;
|
||||
u64 start_time, end_time;
|
||||
long long mem_begin, mem_during = 0;
|
||||
|
||||
VERBOSE_PERFOUT_STRING("kfree_perf_thread task started");
|
||||
set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
|
||||
@@ -626,6 +628,12 @@ kfree_perf_thread(void *arg)
|
||||
}
|
||||
|
||||
do {
|
||||
if (!mem_during) {
|
||||
mem_during = mem_begin = si_mem_available();
|
||||
} else if (loop % (kfree_loops / 4) == 0) {
|
||||
mem_during = (mem_during + si_mem_available()) / 2;
|
||||
}
|
||||
|
||||
for (i = 0; i < kfree_alloc_num; i++) {
|
||||
alloc_ptr = kmalloc(sizeof(struct kfree_obj), GFP_KERNEL);
|
||||
if (!alloc_ptr)
|
||||
@@ -645,9 +653,11 @@ kfree_perf_thread(void *arg)
|
||||
else
|
||||
b_rcu_gp_test_finished = cur_ops->get_gp_seq();
|
||||
|
||||
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld\n",
|
||||
pr_alert("Total time taken by all kfree'ers: %llu ns, loops: %d, batches: %ld, memory footprint: %lldMB\n",
|
||||
(unsigned long long)(end_time - start_time), kfree_loops,
|
||||
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started));
|
||||
rcuperf_seq_diff(b_rcu_gp_test_finished, b_rcu_gp_test_started),
|
||||
(mem_begin - mem_during) >> (20 - PAGE_SHIFT));
|
||||
|
||||
if (shutdown) {
|
||||
smp_mb(); /* Assign before wake. */
|
||||
wake_up(&shutdown_wq);
|
||||
|
||||
@@ -339,7 +339,7 @@ rcu_read_delay(struct torture_random_state *rrsp, struct rt_read_seg *rtrsp)
|
||||
* period, and we want a long delay occasionally to trigger
|
||||
* force_quiescent_state. */
|
||||
|
||||
if (!rcu_fwd_cb_nodelay &&
|
||||
if (!READ_ONCE(rcu_fwd_cb_nodelay) &&
|
||||
!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
|
||||
started = cur_ops->get_gp_seq();
|
||||
ts = rcu_trace_clock_local();
|
||||
@@ -375,11 +375,12 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
|
||||
{
|
||||
int i;
|
||||
|
||||
i = rp->rtort_pipe_count;
|
||||
i = READ_ONCE(rp->rtort_pipe_count);
|
||||
if (i > RCU_TORTURE_PIPE_LEN)
|
||||
i = RCU_TORTURE_PIPE_LEN;
|
||||
atomic_inc(&rcu_torture_wcount[i]);
|
||||
if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
|
||||
WRITE_ONCE(rp->rtort_pipe_count, i + 1);
|
||||
if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
|
||||
rp->rtort_mbtest = 0;
|
||||
return true;
|
||||
}
|
||||
@@ -1015,7 +1016,8 @@ rcu_torture_writer(void *arg)
|
||||
if (i > RCU_TORTURE_PIPE_LEN)
|
||||
i = RCU_TORTURE_PIPE_LEN;
|
||||
atomic_inc(&rcu_torture_wcount[i]);
|
||||
old_rp->rtort_pipe_count++;
|
||||
WRITE_ONCE(old_rp->rtort_pipe_count,
|
||||
old_rp->rtort_pipe_count + 1);
|
||||
switch (synctype[torture_random(&rand) % nsynctypes]) {
|
||||
case RTWS_DEF_FREE:
|
||||
rcu_torture_writer_state = RTWS_DEF_FREE;
|
||||
@@ -1067,7 +1069,8 @@ rcu_torture_writer(void *arg)
|
||||
if (stutter_wait("rcu_torture_writer") &&
|
||||
!READ_ONCE(rcu_fwd_cb_nodelay) &&
|
||||
!cur_ops->slow_gps &&
|
||||
!torture_must_stop())
|
||||
!torture_must_stop() &&
|
||||
rcu_inkernel_boot_has_ended())
|
||||
for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
|
||||
if (list_empty(&rcu_tortures[i].rtort_free) &&
|
||||
rcu_access_pointer(rcu_torture_current) !=
|
||||
@@ -1290,7 +1293,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp)
|
||||
atomic_inc(&n_rcu_torture_mberror);
|
||||
rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp);
|
||||
preempt_disable();
|
||||
pipe_count = p->rtort_pipe_count;
|
||||
pipe_count = READ_ONCE(p->rtort_pipe_count);
|
||||
if (pipe_count > RCU_TORTURE_PIPE_LEN) {
|
||||
/* Should not happen, but... */
|
||||
pipe_count = RCU_TORTURE_PIPE_LEN;
|
||||
@@ -1404,14 +1407,15 @@ rcu_torture_stats_print(void)
|
||||
int i;
|
||||
long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
||||
long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
|
||||
struct rcu_torture *rtcp;
|
||||
static unsigned long rtcv_snap = ULONG_MAX;
|
||||
static bool splatted;
|
||||
struct task_struct *wtp;
|
||||
|
||||
for_each_possible_cpu(cpu) {
|
||||
for (i = 0; i < RCU_TORTURE_PIPE_LEN + 1; i++) {
|
||||
pipesummary[i] += per_cpu(rcu_torture_count, cpu)[i];
|
||||
batchsummary[i] += per_cpu(rcu_torture_batch, cpu)[i];
|
||||
pipesummary[i] += READ_ONCE(per_cpu(rcu_torture_count, cpu)[i]);
|
||||
batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch, cpu)[i]);
|
||||
}
|
||||
}
|
||||
for (i = RCU_TORTURE_PIPE_LEN - 1; i >= 0; i--) {
|
||||
@@ -1420,9 +1424,10 @@ rcu_torture_stats_print(void)
|
||||
}
|
||||
|
||||
pr_alert("%s%s ", torture_type, TORTURE_FLAG);
|
||||
rtcp = rcu_access_pointer(rcu_torture_current);
|
||||
pr_cont("rtc: %p %s: %lu tfle: %d rta: %d rtaf: %d rtf: %d ",
|
||||
rcu_torture_current,
|
||||
rcu_torture_current ? "ver" : "VER",
|
||||
rtcp,
|
||||
rtcp && !rcu_stall_is_suppressed_at_boot() ? "ver" : "VER",
|
||||
rcu_torture_current_version,
|
||||
list_empty(&rcu_torture_freelist),
|
||||
atomic_read(&n_rcu_torture_alloc),
|
||||
@@ -1478,7 +1483,8 @@ rcu_torture_stats_print(void)
|
||||
if (cur_ops->stats)
|
||||
cur_ops->stats();
|
||||
if (rtcv_snap == rcu_torture_current_version &&
|
||||
rcu_torture_current != NULL) {
|
||||
rcu_access_pointer(rcu_torture_current) &&
|
||||
!rcu_stall_is_suppressed()) {
|
||||
int __maybe_unused flags = 0;
|
||||
unsigned long __maybe_unused gp_seq = 0;
|
||||
|
||||
@@ -1993,8 +1999,11 @@ static int rcu_torture_fwd_prog(void *args)
|
||||
schedule_timeout_interruptible(fwd_progress_holdoff * HZ);
|
||||
WRITE_ONCE(rcu_fwd_emergency_stop, false);
|
||||
register_oom_notifier(&rcutorture_oom_nb);
|
||||
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
|
||||
rcu_torture_fwd_prog_cr(rfp);
|
||||
if (!IS_ENABLED(CONFIG_TINY_RCU) ||
|
||||
rcu_inkernel_boot_has_ended())
|
||||
rcu_torture_fwd_prog_nr(rfp, &tested, &tested_tries);
|
||||
if (rcu_inkernel_boot_has_ended())
|
||||
rcu_torture_fwd_prog_cr(rfp);
|
||||
unregister_oom_notifier(&rcutorture_oom_nb);
|
||||
|
||||
/* Avoid slow periods, better to test when busy. */
|
||||
@@ -2044,6 +2053,14 @@ static void rcu_torture_barrier_cbf(struct rcu_head *rcu)
|
||||
atomic_inc(&barrier_cbs_invoked);
|
||||
}
|
||||
|
||||
/* IPI handler to get callback posted on desired CPU, if online. */
|
||||
static void rcu_torture_barrier1cb(void *rcu_void)
|
||||
{
|
||||
struct rcu_head *rhp = rcu_void;
|
||||
|
||||
cur_ops->call(rhp, rcu_torture_barrier_cbf);
|
||||
}
|
||||
|
||||
/* kthread function to register callbacks used to test RCU barriers. */
|
||||
static int rcu_torture_barrier_cbs(void *arg)
|
||||
{
|
||||
@@ -2067,9 +2084,11 @@ static int rcu_torture_barrier_cbs(void *arg)
|
||||
* The above smp_load_acquire() ensures barrier_phase load
|
||||
* is ordered before the following ->call().
|
||||
*/
|
||||
local_irq_disable(); /* Just to test no-irq call_rcu(). */
|
||||
cur_ops->call(&rcu, rcu_torture_barrier_cbf);
|
||||
local_irq_enable();
|
||||
if (smp_call_function_single(myid, rcu_torture_barrier1cb,
|
||||
&rcu, 1)) {
|
||||
// IPI failed, so use direct call from current CPU.
|
||||
cur_ops->call(&rcu, rcu_torture_barrier_cbf);
|
||||
}
|
||||
if (atomic_dec_and_test(&barrier_cbs_count))
|
||||
wake_up(&barrier_wq);
|
||||
} while (!torture_must_stop());
|
||||
@@ -2105,7 +2124,21 @@ static int rcu_torture_barrier(void *arg)
|
||||
pr_err("barrier_cbs_invoked = %d, n_barrier_cbs = %d\n",
|
||||
atomic_read(&barrier_cbs_invoked),
|
||||
n_barrier_cbs);
|
||||
WARN_ON_ONCE(1);
|
||||
WARN_ON(1);
|
||||
// Wait manually for the remaining callbacks
|
||||
i = 0;
|
||||
do {
|
||||
if (WARN_ON(i++ > HZ))
|
||||
i = INT_MIN;
|
||||
schedule_timeout_interruptible(1);
|
||||
cur_ops->cb_barrier();
|
||||
} while (atomic_read(&barrier_cbs_invoked) !=
|
||||
n_barrier_cbs &&
|
||||
!torture_must_stop());
|
||||
smp_mb(); // Can't trust ordering if broken.
|
||||
if (!torture_must_stop())
|
||||
pr_err("Recovered: barrier_cbs_invoked = %d\n",
|
||||
atomic_read(&barrier_cbs_invoked));
|
||||
} else {
|
||||
n_barrier_successes++;
|
||||
}
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
* Copyright (C) IBM Corporation, 2006
|
||||
* Copyright (C) Fujitsu, 2012
|
||||
*
|
||||
* Author: Paul McKenney <paulmck@linux.ibm.com>
|
||||
* Authors: Paul McKenney <paulmck@linux.ibm.com>
|
||||
* Lai Jiangshan <laijs@cn.fujitsu.com>
|
||||
*
|
||||
* For detailed explanation of Read-Copy Update mechanism see -
|
||||
@@ -450,7 +450,7 @@ static void srcu_gp_start(struct srcu_struct *ssp)
|
||||
spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
|
||||
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
|
||||
rcu_seq_start(&ssp->srcu_gp_seq);
|
||||
state = rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq));
|
||||
state = rcu_seq_state(ssp->srcu_gp_seq);
|
||||
WARN_ON_ONCE(state != SRCU_STATE_SCAN1);
|
||||
}
|
||||
|
||||
@@ -534,7 +534,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
||||
rcu_seq_end(&ssp->srcu_gp_seq);
|
||||
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
|
||||
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
|
||||
ssp->srcu_gp_seq_needed_exp = gpseq;
|
||||
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, gpseq);
|
||||
spin_unlock_irq_rcu_node(ssp);
|
||||
mutex_unlock(&ssp->srcu_gp_mutex);
|
||||
/* A new grace period can start at this point. But only one. */
|
||||
@@ -550,7 +550,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
|
||||
snp->srcu_have_cbs[idx] = gpseq;
|
||||
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
|
||||
if (ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, gpseq))
|
||||
snp->srcu_gp_seq_needed_exp = gpseq;
|
||||
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
|
||||
mask = snp->srcu_data_have_cbs[idx];
|
||||
snp->srcu_data_have_cbs[idx] = 0;
|
||||
spin_unlock_irq_rcu_node(snp);
|
||||
@@ -614,7 +614,7 @@ static void srcu_funnel_exp_start(struct srcu_struct *ssp, struct srcu_node *snp
|
||||
}
|
||||
spin_lock_irqsave_rcu_node(ssp, flags);
|
||||
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
||||
ssp->srcu_gp_seq_needed_exp = s;
|
||||
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
||||
spin_unlock_irqrestore_rcu_node(ssp, flags);
|
||||
}
|
||||
|
||||
@@ -660,7 +660,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
||||
if (snp == sdp->mynode)
|
||||
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
|
||||
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
|
||||
snp->srcu_gp_seq_needed_exp = s;
|
||||
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
|
||||
spin_unlock_irqrestore_rcu_node(snp, flags);
|
||||
}
|
||||
|
||||
@@ -674,7 +674,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *ssp, struct srcu_data *sdp,
|
||||
smp_store_release(&ssp->srcu_gp_seq_needed, s); /*^^^*/
|
||||
}
|
||||
if (!do_norm && ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, s))
|
||||
ssp->srcu_gp_seq_needed_exp = s;
|
||||
WRITE_ONCE(ssp->srcu_gp_seq_needed_exp, s);
|
||||
|
||||
/* If grace period not already done and none in progress, start it. */
|
||||
if (!rcu_seq_done(&ssp->srcu_gp_seq, s) &&
|
||||
@@ -1079,7 +1079,7 @@ EXPORT_SYMBOL_GPL(srcu_barrier);
|
||||
*/
|
||||
unsigned long srcu_batches_completed(struct srcu_struct *ssp)
|
||||
{
|
||||
return ssp->srcu_idx;
|
||||
return READ_ONCE(ssp->srcu_idx);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(srcu_batches_completed);
|
||||
|
||||
@@ -1130,7 +1130,9 @@ static void srcu_advance_state(struct srcu_struct *ssp)
|
||||
return; /* readers present, retry later. */
|
||||
}
|
||||
srcu_flip(ssp);
|
||||
spin_lock_irq_rcu_node(ssp);
|
||||
rcu_seq_set_state(&ssp->srcu_gp_seq, SRCU_STATE_SCAN2);
|
||||
spin_unlock_irq_rcu_node(ssp);
|
||||
}
|
||||
|
||||
if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)) == SRCU_STATE_SCAN2) {
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user