You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
This commit is contained in:
@@ -17,7 +17,7 @@ rcu_dereference.txt
|
|||||||
rcubarrier.txt
|
rcubarrier.txt
|
||||||
- RCU and Unloadable Modules
|
- RCU and Unloadable Modules
|
||||||
rculist_nulls.txt
|
rculist_nulls.txt
|
||||||
- RCU list primitives for use with SLAB_DESTROY_BY_RCU
|
- RCU list primitives for use with SLAB_TYPESAFE_BY_RCU
|
||||||
rcuref.txt
|
rcuref.txt
|
||||||
- Reference-count design for elements of lists/arrays protected by RCU
|
- Reference-count design for elements of lists/arrays protected by RCU
|
||||||
rcu.txt
|
rcu.txt
|
||||||
|
|||||||
@@ -19,6 +19,8 @@ to each other.
|
|||||||
The <tt>rcu_state</tt> Structure</a>
|
The <tt>rcu_state</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_node Structure">
|
<li> <a href="#The rcu_node Structure">
|
||||||
The <tt>rcu_node</tt> Structure</a>
|
The <tt>rcu_node</tt> Structure</a>
|
||||||
|
<li> <a href="#The rcu_segcblist Structure">
|
||||||
|
The <tt>rcu_segcblist</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_data Structure">
|
<li> <a href="#The rcu_data Structure">
|
||||||
The <tt>rcu_data</tt> Structure</a>
|
The <tt>rcu_data</tt> Structure</a>
|
||||||
<li> <a href="#The rcu_dynticks Structure">
|
<li> <a href="#The rcu_dynticks Structure">
|
||||||
@@ -841,6 +843,134 @@ for lockdep lock-class names.
|
|||||||
Finally, lines 64-66 produce an error if the maximum number of
|
Finally, lines 64-66 produce an error if the maximum number of
|
||||||
CPUs is too large for the specified fanout.
|
CPUs is too large for the specified fanout.
|
||||||
|
|
||||||
|
<h3><a name="The rcu_segcblist Structure">
|
||||||
|
The <tt>rcu_segcblist</tt> Structure</a></h3>
|
||||||
|
|
||||||
|
The <tt>rcu_segcblist</tt> structure maintains a segmented list of
|
||||||
|
callbacks as follows:
|
||||||
|
|
||||||
|
<pre>
|
||||||
|
1 #define RCU_DONE_TAIL 0
|
||||||
|
2 #define RCU_WAIT_TAIL 1
|
||||||
|
3 #define RCU_NEXT_READY_TAIL 2
|
||||||
|
4 #define RCU_NEXT_TAIL 3
|
||||||
|
5 #define RCU_CBLIST_NSEGS 4
|
||||||
|
6
|
||||||
|
7 struct rcu_segcblist {
|
||||||
|
8 struct rcu_head *head;
|
||||||
|
9 struct rcu_head **tails[RCU_CBLIST_NSEGS];
|
||||||
|
10 unsigned long gp_seq[RCU_CBLIST_NSEGS];
|
||||||
|
11 long len;
|
||||||
|
12 long len_lazy;
|
||||||
|
13 };
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The segments are as follows:
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed.
|
||||||
|
These callbacks are ready to be invoked.
|
||||||
|
<li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the
|
||||||
|
current grace period.
|
||||||
|
Note that different CPUs can have different ideas about which
|
||||||
|
grace period is current, hence the <tt>->gp_seq</tt> field.
|
||||||
|
<li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next
|
||||||
|
grace period to start.
|
||||||
|
<li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been
|
||||||
|
associated with a grace period.
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The <tt>->head</tt> pointer references the first callback or
|
||||||
|
is <tt>NULL</tt> if the list contains no callbacks (which is
|
||||||
|
<i>not</i> the same as being empty).
|
||||||
|
Each element of the <tt>->tails[]</tt> array references the
|
||||||
|
<tt>->next</tt> pointer of the last callback in the corresponding
|
||||||
|
segment of the list, or the list's <tt>->head</tt> pointer if
|
||||||
|
that segment and all previous segments are empty.
|
||||||
|
If the corresponding segment is empty but some previous segment is
|
||||||
|
not empty, then the array element is identical to its predecessor.
|
||||||
|
Older callbacks are closer to the head of the list, and new callbacks
|
||||||
|
are added at the tail.
|
||||||
|
This relationship between the <tt>->head</tt> pointer, the
|
||||||
|
<tt>->tails[]</tt> array, and the callbacks is shown in this
|
||||||
|
diagram:
|
||||||
|
|
||||||
|
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
||||||
|
|
||||||
|
</p><p>In this figure, the <tt>->head</tt> pointer references the
|
||||||
|
first
|
||||||
|
RCU callback in the list.
|
||||||
|
The <tt>->tails[RCU_DONE_TAIL]</tt> array element references
|
||||||
|
the <tt>->head</tt> pointer itself, indicating that none
|
||||||
|
of the callbacks is ready to invoke.
|
||||||
|
The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback
|
||||||
|
CB 2's <tt>->next</tt> pointer, which indicates that
|
||||||
|
CB 1 and CB 2 are both waiting on the current grace period,
|
||||||
|
give or take possible disagreements about exactly which grace period
|
||||||
|
is the current one.
|
||||||
|
The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element
|
||||||
|
references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt>
|
||||||
|
does, which indicates that there are no callbacks waiting on the next
|
||||||
|
RCU grace period.
|
||||||
|
The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references
|
||||||
|
CB 4's <tt>->next</tt> pointer, indicating that all the
|
||||||
|
remaining RCU callbacks have not yet been assigned to an RCU grace
|
||||||
|
period.
|
||||||
|
Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element
|
||||||
|
always references the last RCU callback's <tt>->next</tt> pointer
|
||||||
|
unless the callback list is empty, in which case it references
|
||||||
|
the <tt>->head</tt> pointer.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
There is one additional important special case for the
|
||||||
|
<tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt>
|
||||||
|
when this list is <i>disabled</i>.
|
||||||
|
Lists are disabled when the corresponding CPU is offline or when
|
||||||
|
the corresponding CPU's callbacks are offloaded to a kthread,
|
||||||
|
both of which are described elsewhere.
|
||||||
|
|
||||||
|
</p><p>CPUs advance their callbacks from the
|
||||||
|
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
||||||
|
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
||||||
|
as grace periods advance.
|
||||||
|
|
||||||
|
</p><p>The <tt>->gp_seq[]</tt> array records grace-period
|
||||||
|
numbers corresponding to the list segments.
|
||||||
|
This is what allows different CPUs to have different ideas as to
|
||||||
|
which is the current grace period while still avoiding premature
|
||||||
|
invocation of their callbacks.
|
||||||
|
In particular, this allows CPUs that go idle for extended periods
|
||||||
|
to determine which of their callbacks are ready to be invoked after
|
||||||
|
reawakening.
|
||||||
|
|
||||||
|
</p><p>The <tt>->len</tt> counter contains the number of
|
||||||
|
callbacks in <tt>->head</tt>, and the
|
||||||
|
<tt>->len_lazy</tt> contains the number of those callbacks that
|
||||||
|
are known to only free memory, and whose invocation can therefore
|
||||||
|
be safely deferred.
|
||||||
|
|
||||||
|
<p><b>Important note</b>: It is the <tt>->len</tt> field that
|
||||||
|
determines whether or not there are callbacks associated with
|
||||||
|
this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt>
|
||||||
|
pointer.
|
||||||
|
The reason for this is that all the ready-to-invoke callbacks
|
||||||
|
(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
|
||||||
|
all at once at callback-invocation time.
|
||||||
|
If callback invocation must be postponed, for example, because a
|
||||||
|
high-priority process just woke up on this CPU, then the remaining
|
||||||
|
callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment.
|
||||||
|
Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts
|
||||||
|
are adjusted after the corresponding callbacks have been invoked, and so
|
||||||
|
again it is the <tt>->len</tt> count that accurately reflects whether
|
||||||
|
or not there are callbacks associated with this <tt>rcu_segcblist</tt>
|
||||||
|
structure.
|
||||||
|
Of course, off-CPU sampling of the <tt>->len</tt> count requires
|
||||||
|
the use of appropriate synchronization, for example, memory barriers.
|
||||||
|
This synchronization can be a bit subtle, particularly in the case
|
||||||
|
of <tt>rcu_barrier()</tt>.
|
||||||
|
|
||||||
<h3><a name="The rcu_data Structure">
|
<h3><a name="The rcu_data Structure">
|
||||||
The <tt>rcu_data</tt> Structure</a></h3>
|
The <tt>rcu_data</tt> Structure</a></h3>
|
||||||
|
|
||||||
@@ -983,62 +1113,18 @@ choice.
|
|||||||
as follows:
|
as follows:
|
||||||
|
|
||||||
<pre>
|
<pre>
|
||||||
1 struct rcu_head *nxtlist;
|
1 struct rcu_segcblist cblist;
|
||||||
2 struct rcu_head **nxttail[RCU_NEXT_SIZE];
|
2 long qlen_last_fqs_check;
|
||||||
3 unsigned long nxtcompleted[RCU_NEXT_SIZE];
|
3 unsigned long n_cbs_invoked;
|
||||||
4 long qlen_lazy;
|
4 unsigned long n_nocbs_invoked;
|
||||||
5 long qlen;
|
5 unsigned long n_cbs_orphaned;
|
||||||
6 long qlen_last_fqs_check;
|
6 unsigned long n_cbs_adopted;
|
||||||
7 unsigned long n_force_qs_snap;
|
7 unsigned long n_force_qs_snap;
|
||||||
8 unsigned long n_cbs_invoked;
|
8 long blimit;
|
||||||
9 unsigned long n_cbs_orphaned;
|
|
||||||
10 unsigned long n_cbs_adopted;
|
|
||||||
11 long blimit;
|
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<p>The <tt>->nxtlist</tt> pointer and the
|
<p>The <tt>->cblist</tt> structure is the segmented callback list
|
||||||
<tt>->nxttail[]</tt> array form a four-segment list with
|
described earlier.
|
||||||
older callbacks near the head and newer ones near the tail.
|
|
||||||
Each segment contains callbacks with the corresponding relationship
|
|
||||||
to the current grace period.
|
|
||||||
The pointer out of the end of each of the four segments is referenced
|
|
||||||
by the element of the <tt>->nxttail[]</tt> array indexed by
|
|
||||||
<tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period),
|
|
||||||
<tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period),
|
|
||||||
<tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next
|
|
||||||
grace period), and
|
|
||||||
<tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated
|
|
||||||
with a specific grace period)
|
|
||||||
respectively, as shown in the following figure.
|
|
||||||
|
|
||||||
</p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%">
|
|
||||||
|
|
||||||
</p><p>In this figure, the <tt>->nxtlist</tt> pointer references the
|
|
||||||
first
|
|
||||||
RCU callback in the list.
|
|
||||||
The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references
|
|
||||||
the <tt>->nxtlist</tt> pointer itself, indicating that none
|
|
||||||
of the callbacks is ready to invoke.
|
|
||||||
The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback
|
|
||||||
CB 2's <tt>->next</tt> pointer, which indicates that
|
|
||||||
CB 1 and CB 2 are both waiting on the current grace period.
|
|
||||||
The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element
|
|
||||||
references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt>
|
|
||||||
does, which indicates that there are no callbacks waiting on the next
|
|
||||||
RCU grace period.
|
|
||||||
The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references
|
|
||||||
CB 4's <tt>->next</tt> pointer, indicating that all the
|
|
||||||
remaining RCU callbacks have not yet been assigned to an RCU grace
|
|
||||||
period.
|
|
||||||
Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element
|
|
||||||
always references the last RCU callback's <tt>->next</tt> pointer
|
|
||||||
unless the callback list is empty, in which case it references
|
|
||||||
the <tt>->nxtlist</tt> pointer.
|
|
||||||
|
|
||||||
</p><p>CPUs advance their callbacks from the
|
|
||||||
<tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the
|
|
||||||
<tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments
|
|
||||||
as grace periods advance.
|
|
||||||
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
|
The CPU advances the callbacks in its <tt>rcu_data</tt> structure
|
||||||
whenever it notices that another RCU grace period has completed.
|
whenever it notices that another RCU grace period has completed.
|
||||||
The CPU detects the completion of an RCU grace period by noticing
|
The CPU detects the completion of an RCU grace period by noticing
|
||||||
@@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's
|
|||||||
<tt>->completed</tt> field is updated at the end of each
|
<tt>->completed</tt> field is updated at the end of each
|
||||||
grace period.
|
grace period.
|
||||||
|
|
||||||
</p><p>The <tt>->nxtcompleted[]</tt> array records grace-period
|
<p>
|
||||||
numbers corresponding to the list segments.
|
|
||||||
This allows CPUs that go idle for extended periods to determine
|
|
||||||
which of their callbacks are ready to be invoked after reawakening.
|
|
||||||
|
|
||||||
</p><p>The <tt>->qlen</tt> counter contains the number of
|
|
||||||
callbacks in <tt>->nxtlist</tt>, and the
|
|
||||||
<tt>->qlen_lazy</tt> contains the number of those callbacks that
|
|
||||||
are known to only free memory, and whose invocation can therefore
|
|
||||||
be safely deferred.
|
|
||||||
The <tt>->qlen_last_fqs_check</tt> and
|
The <tt>->qlen_last_fqs_check</tt> and
|
||||||
<tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
|
<tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent
|
||||||
states from <tt>call_rcu()</tt> and friends when callback
|
states from <tt>call_rcu()</tt> and friends when callback
|
||||||
@@ -1069,6 +1146,10 @@ lists grow excessively long.
|
|||||||
fields count the number of callbacks invoked,
|
fields count the number of callbacks invoked,
|
||||||
sent to other CPUs when this CPU goes offline,
|
sent to other CPUs when this CPU goes offline,
|
||||||
and received from other CPUs when those other CPUs go offline.
|
and received from other CPUs when those other CPUs go offline.
|
||||||
|
The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks
|
||||||
|
are offloaded to a kthread.
|
||||||
|
|
||||||
|
<p>
|
||||||
Finally, the <tt>->blimit</tt> counter is the maximum number of
|
Finally, the <tt>->blimit</tt> counter is the maximum number of
|
||||||
RCU callbacks that may be invoked at a given time.
|
RCU callbacks that may be invoked at a given time.
|
||||||
|
|
||||||
@@ -1104,6 +1185,9 @@ Its fields are as follows:
|
|||||||
1 int dynticks_nesting;
|
1 int dynticks_nesting;
|
||||||
2 int dynticks_nmi_nesting;
|
2 int dynticks_nmi_nesting;
|
||||||
3 atomic_t dynticks;
|
3 atomic_t dynticks;
|
||||||
|
4 bool rcu_need_heavy_qs;
|
||||||
|
5 unsigned long rcu_qs_ctr;
|
||||||
|
6 bool rcu_urgent_qs;
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<p>The <tt>->dynticks_nesting</tt> field counts the
|
<p>The <tt>->dynticks_nesting</tt> field counts the
|
||||||
@@ -1117,11 +1201,32 @@ NMIs are counted by the <tt>->dynticks_nmi_nesting</tt>
|
|||||||
field, except that NMIs that interrupt non-dyntick-idle execution
|
field, except that NMIs that interrupt non-dyntick-idle execution
|
||||||
are not counted.
|
are not counted.
|
||||||
|
|
||||||
</p><p>Finally, the <tt>->dynticks</tt> field counts the corresponding
|
</p><p>The <tt>->dynticks</tt> field counts the corresponding
|
||||||
CPU's transitions to and from dyntick-idle mode, so that this counter
|
CPU's transitions to and from dyntick-idle mode, so that this counter
|
||||||
has an even value when the CPU is in dyntick-idle mode and an odd
|
has an even value when the CPU is in dyntick-idle mode and an odd
|
||||||
value otherwise.
|
value otherwise.
|
||||||
|
|
||||||
|
</p><p>The <tt>->rcu_need_heavy_qs</tt> field is used
|
||||||
|
to record the fact that the RCU core code would really like to
|
||||||
|
see a quiescent state from the corresponding CPU, so much so that
|
||||||
|
it is willing to call for heavy-weight dyntick-counter operations.
|
||||||
|
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
|
||||||
|
code, which provide a momentary idle sojourn in response.
|
||||||
|
|
||||||
|
</p><p>The <tt>->rcu_qs_ctr</tt> field is used to record
|
||||||
|
quiescent states from <tt>cond_resched()</tt>.
|
||||||
|
Because <tt>cond_resched()</tt> can execute quite frequently, this
|
||||||
|
must be quite lightweight, as in a non-atomic increment of this
|
||||||
|
per-CPU field.
|
||||||
|
|
||||||
|
</p><p>Finally, the <tt>->rcu_urgent_qs</tt> field is used to record
|
||||||
|
the fact that the RCU core code would really like to see a quiescent
|
||||||
|
state from the corresponding CPU, with the various other fields indicating
|
||||||
|
just how badly RCU wants this quiescent state.
|
||||||
|
This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
|
||||||
|
code, which, if nothing else, non-atomically increment <tt>->rcu_qs_ctr</tt>
|
||||||
|
in response.
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr><th> </th></tr>
|
<tr><th> </th></tr>
|
||||||
<tr><th align="left">Quick Quiz:</th></tr>
|
<tr><th align="left">Quick Quiz:</th></tr>
|
||||||
|
|||||||
@@ -19,7 +19,7 @@
|
|||||||
id="svg2"
|
id="svg2"
|
||||||
version="1.1"
|
version="1.1"
|
||||||
inkscape:version="0.48.4 r9939"
|
inkscape:version="0.48.4 r9939"
|
||||||
sodipodi:docname="nxtlist.fig">
|
sodipodi:docname="segcblist.svg">
|
||||||
<metadata
|
<metadata
|
||||||
id="metadata94">
|
id="metadata94">
|
||||||
<rdf:RDF>
|
<rdf:RDF>
|
||||||
@@ -28,7 +28,7 @@
|
|||||||
<dc:format>image/svg+xml</dc:format>
|
<dc:format>image/svg+xml</dc:format>
|
||||||
<dc:type
|
<dc:type
|
||||||
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
|
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
|
||||||
<dc:title></dc:title>
|
<dc:title />
|
||||||
</cc:Work>
|
</cc:Work>
|
||||||
</rdf:RDF>
|
</rdf:RDF>
|
||||||
</metadata>
|
</metadata>
|
||||||
@@ -241,61 +241,51 @@
|
|||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="675"
|
y="675"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text64"
|
||||||
id="text64">nxtlist</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->head</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="1800"
|
y="1800"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text66"
|
||||||
id="text66">nxttail[RCU_DONE_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_DONE_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="2925"
|
y="2925"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text68"
|
||||||
id="text68">nxttail[RCU_WAIT_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_WAIT_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="4050"
|
y="4050"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text70"
|
||||||
id="text70">nxttail[RCU_NEXT_READY_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_READY_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
x="225"
|
x="225"
|
||||||
y="5175"
|
y="5175"
|
||||||
fill="#000000"
|
|
||||||
font-family="Courier"
|
|
||||||
font-style="normal"
|
font-style="normal"
|
||||||
font-weight="bold"
|
font-weight="bold"
|
||||||
font-size="324"
|
font-size="324"
|
||||||
text-anchor="start"
|
id="text72"
|
||||||
id="text72">nxttail[RCU_NEXT_TAIL]</text>
|
style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_TAIL]</text>
|
||||||
<!-- Text -->
|
<!-- Text -->
|
||||||
<text
|
<text
|
||||||
xml:space="preserve"
|
xml:space="preserve"
|
||||||
|
|||||||
|
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
@@ -284,6 +284,7 @@ Expedited Grace Period Refinements</a></h2>
|
|||||||
Funnel locking and wait/wakeup</a>.
|
Funnel locking and wait/wakeup</a>.
|
||||||
<li> <a href="#Use of Workqueues">Use of Workqueues</a>.
|
<li> <a href="#Use of Workqueues">Use of Workqueues</a>.
|
||||||
<li> <a href="#Stall Warnings">Stall warnings</a>.
|
<li> <a href="#Stall Warnings">Stall warnings</a>.
|
||||||
|
<li> <a href="#Mid-Boot Operation">Mid-boot operation</a>.
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<h3><a name="Idle-CPU Checks">Idle-CPU Checks</a></h3>
|
<h3><a name="Idle-CPU Checks">Idle-CPU Checks</a></h3>
|
||||||
@@ -524,7 +525,7 @@ their grace periods and carrying out their wakeups.
|
|||||||
In earlier implementations, the task requesting the expedited
|
In earlier implementations, the task requesting the expedited
|
||||||
grace period also drove it to completion.
|
grace period also drove it to completion.
|
||||||
This straightforward approach had the disadvantage of needing to
|
This straightforward approach had the disadvantage of needing to
|
||||||
account for signals sent to user tasks,
|
account for POSIX signals sent to user tasks,
|
||||||
so more recent implemementations use the Linux kernel's
|
so more recent implemementations use the Linux kernel's
|
||||||
<a href="https://www.kernel.org/doc/Documentation/workqueue.txt">workqueues</a>.
|
<a href="https://www.kernel.org/doc/Documentation/workqueue.txt">workqueues</a>.
|
||||||
|
|
||||||
@@ -533,8 +534,8 @@ The requesting task still does counter snapshotting and funnel-lock
|
|||||||
processing, but the task reaching the top of the funnel lock
|
processing, but the task reaching the top of the funnel lock
|
||||||
does a <tt>schedule_work()</tt> (from <tt>_synchronize_rcu_expedited()</tt>
|
does a <tt>schedule_work()</tt> (from <tt>_synchronize_rcu_expedited()</tt>
|
||||||
so that a workqueue kthread does the actual grace-period processing.
|
so that a workqueue kthread does the actual grace-period processing.
|
||||||
Because workqueue kthreads do not accept signals, grace-period-wait
|
Because workqueue kthreads do not accept POSIX signals, grace-period-wait
|
||||||
processing need not allow for signals.
|
processing need not allow for POSIX signals.
|
||||||
|
|
||||||
In addition, this approach allows wakeups for the previous expedited
|
In addition, this approach allows wakeups for the previous expedited
|
||||||
grace period to be overlapped with processing for the next expedited
|
grace period to be overlapped with processing for the next expedited
|
||||||
@@ -586,6 +587,46 @@ blocking the current grace period are printed.
|
|||||||
Each stall warning results in another pass through the loop, but the
|
Each stall warning results in another pass through the loop, but the
|
||||||
second and subsequent passes use longer stall times.
|
second and subsequent passes use longer stall times.
|
||||||
|
|
||||||
|
<h3><a name="Mid-Boot Operation">Mid-boot operation</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The use of workqueues has the advantage that the expedited
|
||||||
|
grace-period code need not worry about POSIX signals.
|
||||||
|
Unfortunately, it has the
|
||||||
|
corresponding disadvantage that workqueues cannot be used until
|
||||||
|
they are initialized, which does not happen until some time after
|
||||||
|
the scheduler spawns the first task.
|
||||||
|
Given that there are parts of the kernel that really do want to
|
||||||
|
execute grace periods during this mid-boot “dead zone”,
|
||||||
|
expedited grace periods must do something else during thie time.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
What they do is to fall back to the old practice of requiring that the
|
||||||
|
requesting task drive the expedited grace period, as was the case
|
||||||
|
before the use of workqueues.
|
||||||
|
However, the requesting task is only required to drive the grace period
|
||||||
|
during the mid-boot dead zone.
|
||||||
|
Before mid-boot, a synchronous grace period is a no-op.
|
||||||
|
Some time after mid-boot, workqueues are used.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Non-expedited non-SRCU synchronous grace periods must also operate
|
||||||
|
normally during mid-boot.
|
||||||
|
This is handled by causing non-expedited grace periods to take the
|
||||||
|
expedited code path during mid-boot.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The current code assumes that there are no POSIX signals during
|
||||||
|
the mid-boot dead zone.
|
||||||
|
However, if an overwhelming need for POSIX signals somehow arises,
|
||||||
|
appropriate adjustments can be made to the expedited stall-warning code.
|
||||||
|
One such adjustment would reinstate the pre-workqueue stall-warning
|
||||||
|
checks, but only during the mid-boot dead zone.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
With this refinement, synchronous grace periods can now be used from
|
||||||
|
task context pretty much any time during the life of the kernel.
|
||||||
|
|
||||||
<h3><a name="Summary">
|
<h3><a name="Summary">
|
||||||
Summary</a></h3>
|
Summary</a></h3>
|
||||||
|
|
||||||
|
|||||||
@@ -659,8 +659,9 @@ systems with more than one CPU:
|
|||||||
In other words, a given instance of <tt>synchronize_rcu()</tt>
|
In other words, a given instance of <tt>synchronize_rcu()</tt>
|
||||||
can avoid waiting on a given RCU read-side critical section only
|
can avoid waiting on a given RCU read-side critical section only
|
||||||
if it can prove that <tt>synchronize_rcu()</tt> started first.
|
if it can prove that <tt>synchronize_rcu()</tt> started first.
|
||||||
|
</font>
|
||||||
|
|
||||||
<p>
|
<p><font color="ffffff">
|
||||||
A related question is “When <tt>rcu_read_lock()</tt>
|
A related question is “When <tt>rcu_read_lock()</tt>
|
||||||
doesn't generate any code, why does it matter how it relates
|
doesn't generate any code, why does it matter how it relates
|
||||||
to a grace period?”
|
to a grace period?”
|
||||||
@@ -675,8 +676,9 @@ systems with more than one CPU:
|
|||||||
within the critical section, in which case none of the accesses
|
within the critical section, in which case none of the accesses
|
||||||
within the critical section may observe the effects of any
|
within the critical section may observe the effects of any
|
||||||
access following the grace period.
|
access following the grace period.
|
||||||
|
</font>
|
||||||
|
|
||||||
<p>
|
<p><font color="ffffff">
|
||||||
As of late 2016, mathematical models of RCU take this
|
As of late 2016, mathematical models of RCU take this
|
||||||
viewpoint, for example, see slides 62 and 63
|
viewpoint, for example, see slides 62 and 63
|
||||||
of the
|
of the
|
||||||
@@ -1616,8 +1618,8 @@ CPUs should at least make reasonable forward progress.
|
|||||||
In return for its shorter latencies, <tt>synchronize_rcu_expedited()</tt>
|
In return for its shorter latencies, <tt>synchronize_rcu_expedited()</tt>
|
||||||
is permitted to impose modest degradation of real-time latency
|
is permitted to impose modest degradation of real-time latency
|
||||||
on non-idle online CPUs.
|
on non-idle online CPUs.
|
||||||
That said, it will likely be necessary to take further steps to reduce this
|
Here, “modest” means roughly the same latency
|
||||||
degradation, hopefully to roughly that of a scheduling-clock interrupt.
|
degradation as a scheduling-clock interrupt.
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
There are a number of situations where even
|
There are a number of situations where even
|
||||||
@@ -1913,12 +1915,9 @@ This requirement is another factor driving batching of grace periods,
|
|||||||
but it is also the driving force behind the checks for large numbers
|
but it is also the driving force behind the checks for large numbers
|
||||||
of queued RCU callbacks in the <tt>call_rcu()</tt> code path.
|
of queued RCU callbacks in the <tt>call_rcu()</tt> code path.
|
||||||
Finally, high update rates should not delay RCU read-side critical
|
Finally, high update rates should not delay RCU read-side critical
|
||||||
sections, although some read-side delays can occur when using
|
sections, although some small read-side delays can occur when using
|
||||||
<tt>synchronize_rcu_expedited()</tt>, courtesy of this function's use
|
<tt>synchronize_rcu_expedited()</tt>, courtesy of this function's use
|
||||||
of <tt>try_stop_cpus()</tt>.
|
of <tt>smp_call_function_single()</tt>.
|
||||||
(In the future, <tt>synchronize_rcu_expedited()</tt> will be
|
|
||||||
converted to use lighter-weight inter-processor interrupts (IPIs),
|
|
||||||
but this will still disturb readers, though to a much smaller degree.)
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Although all three of these corner cases were understood in the early
|
Although all three of these corner cases were understood in the early
|
||||||
@@ -2154,7 +2153,8 @@ as will <tt>rcu_assign_pointer()</tt>.
|
|||||||
<p>
|
<p>
|
||||||
Although <tt>call_rcu()</tt> may be invoked at any
|
Although <tt>call_rcu()</tt> may be invoked at any
|
||||||
time during boot, callbacks are not guaranteed to be invoked until after
|
time during boot, callbacks are not guaranteed to be invoked until after
|
||||||
the scheduler is fully up and running.
|
all of RCU's kthreads have been spawned, which occurs at
|
||||||
|
<tt>early_initcall()</tt> time.
|
||||||
This delay in callback invocation is due to the fact that RCU does not
|
This delay in callback invocation is due to the fact that RCU does not
|
||||||
invoke callbacks until it is fully initialized, and this full initialization
|
invoke callbacks until it is fully initialized, and this full initialization
|
||||||
cannot occur until after the scheduler has initialized itself to the
|
cannot occur until after the scheduler has initialized itself to the
|
||||||
@@ -2167,8 +2167,10 @@ on what operations those callbacks could invoke.
|
|||||||
Perhaps surprisingly, <tt>synchronize_rcu()</tt>,
|
Perhaps surprisingly, <tt>synchronize_rcu()</tt>,
|
||||||
<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a>
|
<a href="#Bottom-Half Flavor"><tt>synchronize_rcu_bh()</tt></a>
|
||||||
(<a href="#Bottom-Half Flavor">discussed below</a>),
|
(<a href="#Bottom-Half Flavor">discussed below</a>),
|
||||||
and
|
<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>,
|
||||||
<a href="#Sched Flavor"><tt>synchronize_sched()</tt></a>
|
<tt>synchronize_rcu_expedited()</tt>,
|
||||||
|
<tt>synchronize_rcu_bh_expedited()</tt>, and
|
||||||
|
<tt>synchronize_sched_expedited()</tt>
|
||||||
will all operate normally
|
will all operate normally
|
||||||
during very early boot, the reason being that there is only one CPU
|
during very early boot, the reason being that there is only one CPU
|
||||||
and preemption is disabled.
|
and preemption is disabled.
|
||||||
@@ -2178,45 +2180,59 @@ state and thus a grace period, so the early-boot implementation can
|
|||||||
be a no-op.
|
be a no-op.
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Both <tt>synchronize_rcu_bh()</tt> and <tt>synchronize_sched()</tt>
|
However, once the scheduler has spawned its first kthread, this early
|
||||||
continue to operate normally through the remainder of boot, courtesy
|
boot trick fails for <tt>synchronize_rcu()</tt> (as well as for
|
||||||
of the fact that preemption is disabled across their RCU read-side
|
<tt>synchronize_rcu_expedited()</tt>) in <tt>CONFIG_PREEMPT=y</tt>
|
||||||
critical sections and also courtesy of the fact that there is still
|
kernels.
|
||||||
only one CPU.
|
The reason is that an RCU read-side critical section might be preempted,
|
||||||
However, once the scheduler starts initializing, preemption is enabled.
|
which means that a subsequent <tt>synchronize_rcu()</tt> really does have
|
||||||
There is still only a single CPU, but the fact that preemption is enabled
|
to wait for something, as opposed to simply returning immediately.
|
||||||
means that the no-op implementation of <tt>synchronize_rcu()</tt> no
|
Unfortunately, <tt>synchronize_rcu()</tt> can't do this until all of
|
||||||
longer works in <tt>CONFIG_PREEMPT=y</tt> kernels.
|
its kthreads are spawned, which doesn't happen until some time during
|
||||||
Therefore, as soon as the scheduler starts initializing, the early-boot
|
<tt>early_initcalls()</tt> time.
|
||||||
fastpath is disabled.
|
But this is no excuse: RCU is nevertheless required to correctly handle
|
||||||
This means that <tt>synchronize_rcu()</tt> switches to its runtime
|
synchronous grace periods during this time period.
|
||||||
mode of operation where it posts callbacks, which in turn means that
|
Once all of its kthreads are up and running, RCU starts running
|
||||||
any call to <tt>synchronize_rcu()</tt> will block until the corresponding
|
normally.
|
||||||
callback is invoked.
|
|
||||||
Unfortunately, the callback cannot be invoked until RCU's runtime
|
|
||||||
grace-period machinery is up and running, which cannot happen until
|
|
||||||
the scheduler has initialized itself sufficiently to allow RCU's
|
|
||||||
kthreads to be spawned.
|
|
||||||
Therefore, invoking <tt>synchronize_rcu()</tt> during scheduler
|
|
||||||
initialization can result in deadlock.
|
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr><th> </th></tr>
|
<tr><th> </th></tr>
|
||||||
<tr><th align="left">Quick Quiz:</th></tr>
|
<tr><th align="left">Quick Quiz:</th></tr>
|
||||||
<tr><td>
|
<tr><td>
|
||||||
So what happens with <tt>synchronize_rcu()</tt> during
|
How can RCU possibly handle grace periods before all of its
|
||||||
scheduler initialization for <tt>CONFIG_PREEMPT=n</tt>
|
kthreads have been spawned???
|
||||||
kernels?
|
|
||||||
</td></tr>
|
</td></tr>
|
||||||
<tr><th align="left">Answer:</th></tr>
|
<tr><th align="left">Answer:</th></tr>
|
||||||
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||||
In <tt>CONFIG_PREEMPT=n</tt> kernel, <tt>synchronize_rcu()</tt>
|
Very carefully!
|
||||||
maps directly to <tt>synchronize_sched()</tt>.
|
</font>
|
||||||
Therefore, <tt>synchronize_rcu()</tt> works normally throughout
|
|
||||||
boot in <tt>CONFIG_PREEMPT=n</tt> kernels.
|
<p><font color="ffffff">
|
||||||
However, your code must also work in <tt>CONFIG_PREEMPT=y</tt> kernels,
|
During the “dead zone” between the time that the
|
||||||
so it is still necessary to avoid invoking <tt>synchronize_rcu()</tt>
|
scheduler spawns the first task and the time that all of RCU's
|
||||||
during scheduler initialization.
|
kthreads have been spawned, all synchronous grace periods are
|
||||||
|
handled by the expedited grace-period mechanism.
|
||||||
|
At runtime, this expedited mechanism relies on workqueues, but
|
||||||
|
during the dead zone the requesting task itself drives the
|
||||||
|
desired expedited grace period.
|
||||||
|
Because dead-zone execution takes place within task context,
|
||||||
|
everything works.
|
||||||
|
Once the dead zone ends, expedited grace periods go back to
|
||||||
|
using workqueues, as is required to avoid problems that would
|
||||||
|
otherwise occur when a user task received a POSIX signal while
|
||||||
|
driving an expedited grace period.
|
||||||
|
</font>
|
||||||
|
|
||||||
|
<p><font color="ffffff">
|
||||||
|
And yes, this does mean that it is unhelpful to send POSIX
|
||||||
|
signals to random tasks between the time that the scheduler
|
||||||
|
spawns its first kthread and the time that RCU's kthreads
|
||||||
|
have all been spawned.
|
||||||
|
If there ever turns out to be a good reason for sending POSIX
|
||||||
|
signals during that time, appropriate adjustments will be made.
|
||||||
|
(If it turns out that POSIX signals are sent during this time for
|
||||||
|
no good reason, other adjustments will be made, appropriate
|
||||||
|
or otherwise.)
|
||||||
</font></td></tr>
|
</font></td></tr>
|
||||||
<tr><td> </td></tr>
|
<tr><td> </td></tr>
|
||||||
</table>
|
</table>
|
||||||
@@ -2295,12 +2311,61 @@ situation, and Dipankar Sarma incorporated <tt>rcu_barrier()</tt> into RCU.
|
|||||||
The need for <tt>rcu_barrier()</tt> for module unloading became
|
The need for <tt>rcu_barrier()</tt> for module unloading became
|
||||||
apparent later.
|
apparent later.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
<b>Important note</b>: The <tt>rcu_barrier()</tt> function is not,
|
||||||
|
repeat, <i>not</i>, obligated to wait for a grace period.
|
||||||
|
It is instead only required to wait for RCU callbacks that have
|
||||||
|
already been posted.
|
||||||
|
Therefore, if there are no RCU callbacks posted anywhere in the system,
|
||||||
|
<tt>rcu_barrier()</tt> is within its rights to return immediately.
|
||||||
|
Even if there are callbacks posted, <tt>rcu_barrier()</tt> does not
|
||||||
|
necessarily need to wait for a grace period.
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr><th> </th></tr>
|
||||||
|
<tr><th align="left">Quick Quiz:</th></tr>
|
||||||
|
<tr><td>
|
||||||
|
Wait a minute!
|
||||||
|
Each RCU callbacks must wait for a grace period to complete,
|
||||||
|
and <tt>rcu_barrier()</tt> must wait for each pre-existing
|
||||||
|
callback to be invoked.
|
||||||
|
Doesn't <tt>rcu_barrier()</tt> therefore need to wait for
|
||||||
|
a full grace period if there is even one callback posted anywhere
|
||||||
|
in the system?
|
||||||
|
</td></tr>
|
||||||
|
<tr><th align="left">Answer:</th></tr>
|
||||||
|
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||||
|
Absolutely not!!!
|
||||||
|
</font>
|
||||||
|
|
||||||
|
<p><font color="ffffff">
|
||||||
|
Yes, each RCU callbacks must wait for a grace period to complete,
|
||||||
|
but it might well be partly (or even completely) finished waiting
|
||||||
|
by the time <tt>rcu_barrier()</tt> is invoked.
|
||||||
|
In that case, <tt>rcu_barrier()</tt> need only wait for the
|
||||||
|
remaining portion of the grace period to elapse.
|
||||||
|
So even if there are quite a few callbacks posted,
|
||||||
|
<tt>rcu_barrier()</tt> might well return quite quickly.
|
||||||
|
</font>
|
||||||
|
|
||||||
|
<p><font color="ffffff">
|
||||||
|
So if you need to wait for a grace period as well as for all
|
||||||
|
pre-existing callbacks, you will need to invoke both
|
||||||
|
<tt>synchronize_rcu()</tt> and <tt>rcu_barrier()</tt>.
|
||||||
|
If latency is a concern, you can always use workqueues
|
||||||
|
to invoke them concurrently.
|
||||||
|
</font></td></tr>
|
||||||
|
<tr><td> </td></tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
<h3><a name="Hotplug CPU">Hotplug CPU</a></h3>
|
<h3><a name="Hotplug CPU">Hotplug CPU</a></h3>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The Linux kernel supports CPU hotplug, which means that CPUs
|
The Linux kernel supports CPU hotplug, which means that CPUs
|
||||||
can come and go.
|
can come and go.
|
||||||
It is of course illegal to use any RCU API member from an offline CPU.
|
It is of course illegal to use any RCU API member from an offline CPU,
|
||||||
|
with the exception of <a href="#Sleepable RCU">SRCU</a> read-side
|
||||||
|
critical sections.
|
||||||
This requirement was present from day one in DYNIX/ptx, but
|
This requirement was present from day one in DYNIX/ptx, but
|
||||||
on the other hand, the Linux kernel's CPU-hotplug implementation
|
on the other hand, the Linux kernel's CPU-hotplug implementation
|
||||||
is “interesting.”
|
is “interesting.”
|
||||||
@@ -2310,19 +2375,18 @@ The Linux-kernel CPU-hotplug implementation has notifiers that
|
|||||||
are used to allow the various kernel subsystems (including RCU)
|
are used to allow the various kernel subsystems (including RCU)
|
||||||
to respond appropriately to a given CPU-hotplug operation.
|
to respond appropriately to a given CPU-hotplug operation.
|
||||||
Most RCU operations may be invoked from CPU-hotplug notifiers,
|
Most RCU operations may be invoked from CPU-hotplug notifiers,
|
||||||
including even normal synchronous grace-period operations
|
including even synchronous grace-period operations such as
|
||||||
such as <tt>synchronize_rcu()</tt>.
|
<tt>synchronize_rcu()</tt> and <tt>synchronize_rcu_expedited()</tt>.
|
||||||
However, expedited grace-period operations such as
|
|
||||||
<tt>synchronize_rcu_expedited()</tt> are not supported,
|
|
||||||
due to the fact that current implementations block CPU-hotplug
|
|
||||||
operations, which could result in deadlock.
|
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
In addition, all-callback-wait operations such as
|
However, all-callback-wait operations such as
|
||||||
<tt>rcu_barrier()</tt> are also not supported, due to the
|
<tt>rcu_barrier()</tt> are also not supported, due to the
|
||||||
fact that there are phases of CPU-hotplug operations where
|
fact that there are phases of CPU-hotplug operations where
|
||||||
the outgoing CPU's callbacks will not be invoked until after
|
the outgoing CPU's callbacks will not be invoked until after
|
||||||
the CPU-hotplug operation ends, which could also result in deadlock.
|
the CPU-hotplug operation ends, which could also result in deadlock.
|
||||||
|
Furthermore, <tt>rcu_barrier()</tt> blocks CPU-hotplug operations
|
||||||
|
during its execution, which results in another type of deadlock
|
||||||
|
when invoked from a CPU-hotplug notifier.
|
||||||
|
|
||||||
<h3><a name="Scheduler and RCU">Scheduler and RCU</a></h3>
|
<h3><a name="Scheduler and RCU">Scheduler and RCU</a></h3>
|
||||||
|
|
||||||
@@ -2863,6 +2927,27 @@ It also motivates the <tt>smp_mb__after_srcu_read_unlock()</tt>
|
|||||||
API, which, in combination with <tt>srcu_read_unlock()</tt>,
|
API, which, in combination with <tt>srcu_read_unlock()</tt>,
|
||||||
guarantees a full memory barrier.
|
guarantees a full memory barrier.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
Also unlike other RCU flavors, SRCU's callbacks-wait function
|
||||||
|
<tt>srcu_barrier()</tt> may be invoked from CPU-hotplug notifiers,
|
||||||
|
though this is not necessarily a good idea.
|
||||||
|
The reason that this is possible is that SRCU is insensitive
|
||||||
|
to whether or not a CPU is online, which means that <tt>srcu_barrier()</tt>
|
||||||
|
need not exclude CPU-hotplug operations.
|
||||||
|
|
||||||
|
<p>
|
||||||
|
As of v4.12, SRCU's callbacks are maintained per-CPU, eliminating
|
||||||
|
a locking bottleneck present in prior kernel versions.
|
||||||
|
Although this will allow users to put much heavier stress on
|
||||||
|
<tt>call_srcu()</tt>, it is important to note that SRCU does not
|
||||||
|
yet take any special steps to deal with callback flooding.
|
||||||
|
So if you are posting (say) 10,000 SRCU callbacks per second per CPU,
|
||||||
|
you are probably totally OK, but if you intend to post (say) 1,000,000
|
||||||
|
SRCU callbacks per second per CPU, please run some tests first.
|
||||||
|
SRCU just might need a few adjustment to deal with that sort of load.
|
||||||
|
Of course, your mileage may vary based on the speed of your CPUs and
|
||||||
|
the size of your memory.
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The
|
The
|
||||||
<a href="https://lwn.net/Articles/609973/#RCU Per-Flavor API Table">SRCU API</a>
|
<a href="https://lwn.net/Articles/609973/#RCU Per-Flavor API Table">SRCU API</a>
|
||||||
@@ -3021,8 +3106,8 @@ to do some redesign to avoid this scalability problem.
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
RCU disables CPU hotplug in a few places, perhaps most notably in the
|
RCU disables CPU hotplug in a few places, perhaps most notably in the
|
||||||
expedited grace-period and <tt>rcu_barrier()</tt> operations.
|
<tt>rcu_barrier()</tt> operations.
|
||||||
If there is a strong reason to use expedited grace periods in CPU-hotplug
|
If there is a strong reason to use <tt>rcu_barrier()</tt> in CPU-hotplug
|
||||||
notifiers, it will be necessary to avoid disabling CPU hotplug.
|
notifiers, it will be necessary to avoid disabling CPU hotplug.
|
||||||
This would introduce some complexity, so there had better be a <i>very</i>
|
This would introduce some complexity, so there had better be a <i>very</i>
|
||||||
good reason.
|
good reason.
|
||||||
@@ -3096,9 +3181,5 @@ Andy Lutomirski for their help in rendering
|
|||||||
this article human readable, and to Michelle Rankin for her support
|
this article human readable, and to Michelle Rankin for her support
|
||||||
of this effort.
|
of this effort.
|
||||||
Other contributions are acknowledged in the Linux kernel's git archive.
|
Other contributions are acknowledged in the Linux kernel's git archive.
|
||||||
The cartoon is copyright (c) 2013 by Melissa Broussard,
|
|
||||||
and is provided
|
|
||||||
under the terms of the Creative Commons Attribution-Share Alike 3.0
|
|
||||||
United States license.
|
|
||||||
|
|
||||||
</body></html>
|
</body></html>
|
||||||
|
|||||||
@@ -138,6 +138,15 @@ o Be very careful about comparing pointers obtained from
|
|||||||
This sort of comparison occurs frequently when scanning
|
This sort of comparison occurs frequently when scanning
|
||||||
RCU-protected circular linked lists.
|
RCU-protected circular linked lists.
|
||||||
|
|
||||||
|
Note that if checks for being within an RCU read-side
|
||||||
|
critical section are not required and the pointer is never
|
||||||
|
dereferenced, rcu_access_pointer() should be used in place
|
||||||
|
of rcu_dereference(). The rcu_access_pointer() primitive
|
||||||
|
does not require an enclosing read-side critical section,
|
||||||
|
and also omits the smp_read_barrier_depends() included in
|
||||||
|
rcu_dereference(), which in turn should provide a small
|
||||||
|
performance gain in some CPUs (e.g., the DEC Alpha).
|
||||||
|
|
||||||
o The comparison is against a pointer that references memory
|
o The comparison is against a pointer that references memory
|
||||||
that was initialized "a long time ago." The reason
|
that was initialized "a long time ago." The reason
|
||||||
this is safe is that even if misordering occurs, the
|
this is safe is that even if misordering occurs, the
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
Using hlist_nulls to protect read-mostly linked lists and
|
Using hlist_nulls to protect read-mostly linked lists and
|
||||||
objects using SLAB_DESTROY_BY_RCU allocations.
|
objects using SLAB_TYPESAFE_BY_RCU allocations.
|
||||||
|
|
||||||
Please read the basics in Documentation/RCU/listRCU.txt
|
Please read the basics in Documentation/RCU/listRCU.txt
|
||||||
|
|
||||||
@@ -7,7 +7,7 @@ Using special makers (called 'nulls') is a convenient way
|
|||||||
to solve following problem :
|
to solve following problem :
|
||||||
|
|
||||||
A typical RCU linked list managing objects which are
|
A typical RCU linked list managing objects which are
|
||||||
allocated with SLAB_DESTROY_BY_RCU kmem_cache can
|
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
|
||||||
use following algos :
|
use following algos :
|
||||||
|
|
||||||
1) Lookup algo
|
1) Lookup algo
|
||||||
@@ -96,7 +96,7 @@ unlock_chain(); // typically a spin_unlock()
|
|||||||
3) Remove algo
|
3) Remove algo
|
||||||
--------------
|
--------------
|
||||||
Nothing special here, we can use a standard RCU hlist deletion.
|
Nothing special here, we can use a standard RCU hlist deletion.
|
||||||
But thanks to SLAB_DESTROY_BY_RCU, beware a deleted object can be reused
|
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
|
||||||
very very fast (before the end of RCU grace period)
|
very very fast (before the end of RCU grace period)
|
||||||
|
|
||||||
if (put_last_reference_on(obj) {
|
if (put_last_reference_on(obj) {
|
||||||
|
|||||||
+100
-90
@@ -1,9 +1,102 @@
|
|||||||
Using RCU's CPU Stall Detector
|
Using RCU's CPU Stall Detector
|
||||||
|
|
||||||
The rcu_cpu_stall_suppress module parameter enables RCU's CPU stall
|
This document first discusses what sorts of issues RCU's CPU stall
|
||||||
detector, which detects conditions that unduly delay RCU grace periods.
|
detector can locate, and then discusses kernel parameters and Kconfig
|
||||||
This module parameter enables CPU stall detection by default, but
|
options that can be used to fine-tune the detector's operation. Finally,
|
||||||
may be overridden via boot-time parameter or at runtime via sysfs.
|
this document explains the stall detector's "splat" format.
|
||||||
|
|
||||||
|
|
||||||
|
What Causes RCU CPU Stall Warnings?
|
||||||
|
|
||||||
|
So your kernel printed an RCU CPU stall warning. The next question is
|
||||||
|
"What caused it?" The following problems can result in RCU CPU stall
|
||||||
|
warnings:
|
||||||
|
|
||||||
|
o A CPU looping in an RCU read-side critical section.
|
||||||
|
|
||||||
|
o A CPU looping with interrupts disabled.
|
||||||
|
|
||||||
|
o A CPU looping with preemption disabled. This condition can
|
||||||
|
result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
|
||||||
|
stalls.
|
||||||
|
|
||||||
|
o A CPU looping with bottom halves disabled. This condition can
|
||||||
|
result in RCU-sched and RCU-bh stalls.
|
||||||
|
|
||||||
|
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
|
||||||
|
kernel without invoking schedule(). Note that cond_resched()
|
||||||
|
does not necessarily prevent RCU CPU stall warnings. Therefore,
|
||||||
|
if the looping in the kernel is really expected and desirable
|
||||||
|
behavior, you might need to replace some of the cond_resched()
|
||||||
|
calls with calls to cond_resched_rcu_qs().
|
||||||
|
|
||||||
|
o Booting Linux using a console connection that is too slow to
|
||||||
|
keep up with the boot-time console-message rate. For example,
|
||||||
|
a 115Kbaud serial console can be -way- too slow to keep up
|
||||||
|
with boot-time message rates, and will frequently result in
|
||||||
|
RCU CPU stall warning messages. Especially if you have added
|
||||||
|
debug printk()s.
|
||||||
|
|
||||||
|
o Anything that prevents RCU's grace-period kthreads from running.
|
||||||
|
This can result in the "All QSes seen" console-log message.
|
||||||
|
This message will include information on when the kthread last
|
||||||
|
ran and how often it should be expected to run.
|
||||||
|
|
||||||
|
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
|
||||||
|
happen to preempt a low-priority task in the middle of an RCU
|
||||||
|
read-side critical section. This is especially damaging if
|
||||||
|
that low-priority task is not permitted to run on any other CPU,
|
||||||
|
in which case the next RCU grace period can never complete, which
|
||||||
|
will eventually cause the system to run out of memory and hang.
|
||||||
|
While the system is in the process of running itself out of
|
||||||
|
memory, you might see stall-warning messages.
|
||||||
|
|
||||||
|
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
|
||||||
|
is running at a higher priority than the RCU softirq threads.
|
||||||
|
This will prevent RCU callbacks from ever being invoked,
|
||||||
|
and in a CONFIG_PREEMPT_RCU kernel will further prevent
|
||||||
|
RCU grace periods from ever completing. Either way, the
|
||||||
|
system will eventually run out of memory and hang. In the
|
||||||
|
CONFIG_PREEMPT_RCU case, you might see stall-warning
|
||||||
|
messages.
|
||||||
|
|
||||||
|
o A hardware or software issue shuts off the scheduler-clock
|
||||||
|
interrupt on a CPU that is not in dyntick-idle mode. This
|
||||||
|
problem really has happened, and seems to be most likely to
|
||||||
|
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
|
||||||
|
|
||||||
|
o A bug in the RCU implementation.
|
||||||
|
|
||||||
|
o A hardware failure. This is quite unlikely, but has occurred
|
||||||
|
at least once in real life. A CPU failed in a running system,
|
||||||
|
becoming unresponsive, but not causing an immediate crash.
|
||||||
|
This resulted in a series of RCU CPU stall warnings, eventually
|
||||||
|
leading the realization that the CPU had failed.
|
||||||
|
|
||||||
|
The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
|
||||||
|
warning. Note that SRCU does -not- have CPU stall warnings. Please note
|
||||||
|
that RCU only detects CPU stalls when there is a grace period in progress.
|
||||||
|
No grace period, no CPU stall warnings.
|
||||||
|
|
||||||
|
To diagnose the cause of the stall, inspect the stack traces.
|
||||||
|
The offending function will usually be near the top of the stack.
|
||||||
|
If you have a series of stall warnings from a single extended stall,
|
||||||
|
comparing the stack traces can often help determine where the stall
|
||||||
|
is occurring, which will usually be in the function nearest the top of
|
||||||
|
that portion of the stack which remains the same from trace to trace.
|
||||||
|
If you can reliably trigger the stall, ftrace can be quite helpful.
|
||||||
|
|
||||||
|
RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
|
||||||
|
and with RCU's event tracing. For information on RCU's event tracing,
|
||||||
|
see include/trace/events/rcu.h.
|
||||||
|
|
||||||
|
|
||||||
|
Fine-Tuning the RCU CPU Stall Detector
|
||||||
|
|
||||||
|
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
|
||||||
|
CPU stall detector, which detects conditions that unduly delay RCU grace
|
||||||
|
periods. This module parameter enables CPU stall detection by default,
|
||||||
|
but may be overridden via boot-time parameter or at runtime via sysfs.
|
||||||
The stall detector's idea of what constitutes "unduly delayed" is
|
The stall detector's idea of what constitutes "unduly delayed" is
|
||||||
controlled by a set of kernel configuration variables and cpp macros:
|
controlled by a set of kernel configuration variables and cpp macros:
|
||||||
|
|
||||||
@@ -56,6 +149,9 @@ rcupdate.rcu_task_stall_timeout
|
|||||||
And continues with the output of sched_show_task() for each
|
And continues with the output of sched_show_task() for each
|
||||||
task stalling the current RCU-tasks grace period.
|
task stalling the current RCU-tasks grace period.
|
||||||
|
|
||||||
|
|
||||||
|
Interpreting RCU's CPU Stall-Detector "Splats"
|
||||||
|
|
||||||
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
|
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
|
||||||
it will print a message similar to the following:
|
it will print a message similar to the following:
|
||||||
|
|
||||||
@@ -178,89 +274,3 @@ grace period is in flight.
|
|||||||
|
|
||||||
It is entirely possible to see stall warnings from normal and from
|
It is entirely possible to see stall warnings from normal and from
|
||||||
expedited grace periods at about the same time from the same run.
|
expedited grace periods at about the same time from the same run.
|
||||||
|
|
||||||
|
|
||||||
What Causes RCU CPU Stall Warnings?
|
|
||||||
|
|
||||||
So your kernel printed an RCU CPU stall warning. The next question is
|
|
||||||
"What caused it?" The following problems can result in RCU CPU stall
|
|
||||||
warnings:
|
|
||||||
|
|
||||||
o A CPU looping in an RCU read-side critical section.
|
|
||||||
|
|
||||||
o A CPU looping with interrupts disabled. This condition can
|
|
||||||
result in RCU-sched and RCU-bh stalls.
|
|
||||||
|
|
||||||
o A CPU looping with preemption disabled. This condition can
|
|
||||||
result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh
|
|
||||||
stalls.
|
|
||||||
|
|
||||||
o A CPU looping with bottom halves disabled. This condition can
|
|
||||||
result in RCU-sched and RCU-bh stalls.
|
|
||||||
|
|
||||||
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
|
|
||||||
kernel without invoking schedule(). Note that cond_resched()
|
|
||||||
does not necessarily prevent RCU CPU stall warnings. Therefore,
|
|
||||||
if the looping in the kernel is really expected and desirable
|
|
||||||
behavior, you might need to replace some of the cond_resched()
|
|
||||||
calls with calls to cond_resched_rcu_qs().
|
|
||||||
|
|
||||||
o Booting Linux using a console connection that is too slow to
|
|
||||||
keep up with the boot-time console-message rate. For example,
|
|
||||||
a 115Kbaud serial console can be -way- too slow to keep up
|
|
||||||
with boot-time message rates, and will frequently result in
|
|
||||||
RCU CPU stall warning messages. Especially if you have added
|
|
||||||
debug printk()s.
|
|
||||||
|
|
||||||
o Anything that prevents RCU's grace-period kthreads from running.
|
|
||||||
This can result in the "All QSes seen" console-log message.
|
|
||||||
This message will include information on when the kthread last
|
|
||||||
ran and how often it should be expected to run.
|
|
||||||
|
|
||||||
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
|
|
||||||
happen to preempt a low-priority task in the middle of an RCU
|
|
||||||
read-side critical section. This is especially damaging if
|
|
||||||
that low-priority task is not permitted to run on any other CPU,
|
|
||||||
in which case the next RCU grace period can never complete, which
|
|
||||||
will eventually cause the system to run out of memory and hang.
|
|
||||||
While the system is in the process of running itself out of
|
|
||||||
memory, you might see stall-warning messages.
|
|
||||||
|
|
||||||
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
|
|
||||||
is running at a higher priority than the RCU softirq threads.
|
|
||||||
This will prevent RCU callbacks from ever being invoked,
|
|
||||||
and in a CONFIG_PREEMPT_RCU kernel will further prevent
|
|
||||||
RCU grace periods from ever completing. Either way, the
|
|
||||||
system will eventually run out of memory and hang. In the
|
|
||||||
CONFIG_PREEMPT_RCU case, you might see stall-warning
|
|
||||||
messages.
|
|
||||||
|
|
||||||
o A hardware or software issue shuts off the scheduler-clock
|
|
||||||
interrupt on a CPU that is not in dyntick-idle mode. This
|
|
||||||
problem really has happened, and seems to be most likely to
|
|
||||||
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
|
|
||||||
|
|
||||||
o A bug in the RCU implementation.
|
|
||||||
|
|
||||||
o A hardware failure. This is quite unlikely, but has occurred
|
|
||||||
at least once in real life. A CPU failed in a running system,
|
|
||||||
becoming unresponsive, but not causing an immediate crash.
|
|
||||||
This resulted in a series of RCU CPU stall warnings, eventually
|
|
||||||
leading the realization that the CPU had failed.
|
|
||||||
|
|
||||||
The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
|
|
||||||
warning. Note that SRCU does -not- have CPU stall warnings. Please note
|
|
||||||
that RCU only detects CPU stalls when there is a grace period in progress.
|
|
||||||
No grace period, no CPU stall warnings.
|
|
||||||
|
|
||||||
To diagnose the cause of the stall, inspect the stack traces.
|
|
||||||
The offending function will usually be near the top of the stack.
|
|
||||||
If you have a series of stall warnings from a single extended stall,
|
|
||||||
comparing the stack traces can often help determine where the stall
|
|
||||||
is occurring, which will usually be in the function nearest the top of
|
|
||||||
that portion of the stack which remains the same from trace to trace.
|
|
||||||
If you can reliably trigger the stall, ftrace can be quite helpful.
|
|
||||||
|
|
||||||
RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE
|
|
||||||
and with RCU's event tracing. For information on RCU's event tracing,
|
|
||||||
see include/trace/events/rcu.h.
|
|
||||||
|
|||||||
@@ -562,7 +562,9 @@ This section presents a "toy" RCU implementation that is based on
|
|||||||
familiar locking primitives. Its overhead makes it a non-starter for
|
familiar locking primitives. Its overhead makes it a non-starter for
|
||||||
real-life use, as does its lack of scalability. It is also unsuitable
|
real-life use, as does its lack of scalability. It is also unsuitable
|
||||||
for realtime use, since it allows scheduling latency to "bleed" from
|
for realtime use, since it allows scheduling latency to "bleed" from
|
||||||
one read-side critical section to another.
|
one read-side critical section to another. It also assumes recursive
|
||||||
|
reader-writer locks: If you try this with non-recursive locks, and
|
||||||
|
you allow nested rcu_read_lock() calls, you can deadlock.
|
||||||
|
|
||||||
However, it is probably the easiest implementation to relate to, so is
|
However, it is probably the easiest implementation to relate to, so is
|
||||||
a good starting point.
|
a good starting point.
|
||||||
@@ -587,20 +589,21 @@ It is extremely simple:
|
|||||||
write_unlock(&rcu_gp_mutex);
|
write_unlock(&rcu_gp_mutex);
|
||||||
}
|
}
|
||||||
|
|
||||||
[You can ignore rcu_assign_pointer() and rcu_dereference() without
|
[You can ignore rcu_assign_pointer() and rcu_dereference() without missing
|
||||||
missing much. But here they are anyway. And whatever you do, don't
|
much. But here are simplified versions anyway. And whatever you do,
|
||||||
forget about them when submitting patches making use of RCU!]
|
don't forget about them when submitting patches making use of RCU!]
|
||||||
|
|
||||||
#define rcu_assign_pointer(p, v) ({ \
|
#define rcu_assign_pointer(p, v) \
|
||||||
smp_wmb(); \
|
({ \
|
||||||
(p) = (v); \
|
smp_store_release(&(p), (v)); \
|
||||||
})
|
})
|
||||||
|
|
||||||
#define rcu_dereference(p) ({ \
|
#define rcu_dereference(p) \
|
||||||
typeof(p) _________p1 = p; \
|
({ \
|
||||||
smp_read_barrier_depends(); \
|
typeof(p) _________p1 = p; \
|
||||||
(_________p1); \
|
smp_read_barrier_depends(); \
|
||||||
})
|
(_________p1); \
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
|
The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
|
||||||
@@ -925,7 +928,8 @@ d. Do you need RCU grace periods to complete even in the face
|
|||||||
|
|
||||||
e. Is your workload too update-intensive for normal use of
|
e. Is your workload too update-intensive for normal use of
|
||||||
RCU, but inappropriate for other synchronization mechanisms?
|
RCU, but inappropriate for other synchronization mechanisms?
|
||||||
If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
|
If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
|
||||||
|
named SLAB_DESTROY_BY_RCU). But please be careful!
|
||||||
|
|
||||||
f. Do you need read-side critical sections that are respected
|
f. Do you need read-side critical sections that are respected
|
||||||
even though they are in the middle of the idle loop, during
|
even though they are in the middle of the idle loop, during
|
||||||
|
|||||||
@@ -3800,6 +3800,14 @@
|
|||||||
spia_pedr=
|
spia_pedr=
|
||||||
spia_peddr=
|
spia_peddr=
|
||||||
|
|
||||||
|
srcutree.exp_holdoff [KNL]
|
||||||
|
Specifies how many nanoseconds must elapse
|
||||||
|
since the end of the last SRCU grace period for
|
||||||
|
a given srcu_struct until the next normal SRCU
|
||||||
|
grace period will be considered for automatic
|
||||||
|
expediting. Set to zero to disable automatic
|
||||||
|
expediting.
|
||||||
|
|
||||||
stacktrace [FTRACE]
|
stacktrace [FTRACE]
|
||||||
Enabled the stack tracer on boot up.
|
Enabled the stack tracer on boot up.
|
||||||
|
|
||||||
|
|||||||
@@ -768,7 +768,7 @@ equal to zero, in which case the compiler is within its rights to
|
|||||||
transform the above code into the following:
|
transform the above code into the following:
|
||||||
|
|
||||||
q = READ_ONCE(a);
|
q = READ_ONCE(a);
|
||||||
WRITE_ONCE(b, 1);
|
WRITE_ONCE(b, 2);
|
||||||
do_something_else();
|
do_something_else();
|
||||||
|
|
||||||
Given this transformation, the CPU is not required to respect the ordering
|
Given this transformation, the CPU is not required to respect the ordering
|
||||||
|
|||||||
@@ -324,6 +324,9 @@ config HAVE_CMPXCHG_LOCAL
|
|||||||
config HAVE_CMPXCHG_DOUBLE
|
config HAVE_CMPXCHG_DOUBLE
|
||||||
bool
|
bool
|
||||||
|
|
||||||
|
config ARCH_WEAK_RELEASE_ACQUIRE
|
||||||
|
bool
|
||||||
|
|
||||||
config ARCH_WANT_IPC_PARSE_VERSION
|
config ARCH_WANT_IPC_PARSE_VERSION
|
||||||
bool
|
bool
|
||||||
|
|
||||||
|
|||||||
@@ -146,6 +146,7 @@ config PPC
|
|||||||
select ARCH_USE_BUILTIN_BSWAP
|
select ARCH_USE_BUILTIN_BSWAP
|
||||||
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
|
select ARCH_USE_CMPXCHG_LOCKREF if PPC64
|
||||||
select ARCH_WANT_IPC_PARSE_VERSION
|
select ARCH_WANT_IPC_PARSE_VERSION
|
||||||
|
select ARCH_WEAK_RELEASE_ACQUIRE
|
||||||
select BINFMT_ELF
|
select BINFMT_ELF
|
||||||
select BUILDTIME_EXTABLE_SORT
|
select BUILDTIME_EXTABLE_SORT
|
||||||
select CLONE_BACKWARDS
|
select CLONE_BACKWARDS
|
||||||
|
|||||||
@@ -4789,7 +4789,7 @@ i915_gem_load_init(struct drm_i915_private *dev_priv)
|
|||||||
dev_priv->requests = KMEM_CACHE(drm_i915_gem_request,
|
dev_priv->requests = KMEM_CACHE(drm_i915_gem_request,
|
||||||
SLAB_HWCACHE_ALIGN |
|
SLAB_HWCACHE_ALIGN |
|
||||||
SLAB_RECLAIM_ACCOUNT |
|
SLAB_RECLAIM_ACCOUNT |
|
||||||
SLAB_DESTROY_BY_RCU);
|
SLAB_TYPESAFE_BY_RCU);
|
||||||
if (!dev_priv->requests)
|
if (!dev_priv->requests)
|
||||||
goto err_vmas;
|
goto err_vmas;
|
||||||
|
|
||||||
|
|||||||
@@ -521,7 +521,7 @@ static inline struct drm_i915_gem_request *
|
|||||||
__i915_gem_active_get_rcu(const struct i915_gem_active *active)
|
__i915_gem_active_get_rcu(const struct i915_gem_active *active)
|
||||||
{
|
{
|
||||||
/* Performing a lockless retrieval of the active request is super
|
/* Performing a lockless retrieval of the active request is super
|
||||||
* tricky. SLAB_DESTROY_BY_RCU merely guarantees that the backing
|
* tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
|
||||||
* slab of request objects will not be freed whilst we hold the
|
* slab of request objects will not be freed whilst we hold the
|
||||||
* RCU read lock. It does not guarantee that the request itself
|
* RCU read lock. It does not guarantee that the request itself
|
||||||
* will not be freed and then *reused*. Viz,
|
* will not be freed and then *reused*. Viz,
|
||||||
|
|||||||
@@ -174,7 +174,7 @@ struct drm_i915_private *mock_gem_device(void)
|
|||||||
i915->requests = KMEM_CACHE(mock_request,
|
i915->requests = KMEM_CACHE(mock_request,
|
||||||
SLAB_HWCACHE_ALIGN |
|
SLAB_HWCACHE_ALIGN |
|
||||||
SLAB_RECLAIM_ACCOUNT |
|
SLAB_RECLAIM_ACCOUNT |
|
||||||
SLAB_DESTROY_BY_RCU);
|
SLAB_TYPESAFE_BY_RCU);
|
||||||
if (!i915->requests)
|
if (!i915->requests)
|
||||||
goto err_vmas;
|
goto err_vmas;
|
||||||
|
|
||||||
|
|||||||
@@ -1115,7 +1115,7 @@ int ldlm_init(void)
|
|||||||
ldlm_lock_slab = kmem_cache_create("ldlm_locks",
|
ldlm_lock_slab = kmem_cache_create("ldlm_locks",
|
||||||
sizeof(struct ldlm_lock), 0,
|
sizeof(struct ldlm_lock), 0,
|
||||||
SLAB_HWCACHE_ALIGN |
|
SLAB_HWCACHE_ALIGN |
|
||||||
SLAB_DESTROY_BY_RCU, NULL);
|
SLAB_TYPESAFE_BY_RCU, NULL);
|
||||||
if (!ldlm_lock_slab) {
|
if (!ldlm_lock_slab) {
|
||||||
kmem_cache_destroy(ldlm_resource_slab);
|
kmem_cache_destroy(ldlm_resource_slab);
|
||||||
return -ENOMEM;
|
return -ENOMEM;
|
||||||
|
|||||||
+1
-1
@@ -2363,7 +2363,7 @@ static int jbd2_journal_init_journal_head_cache(void)
|
|||||||
jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
|
jbd2_journal_head_cache = kmem_cache_create("jbd2_journal_head",
|
||||||
sizeof(struct journal_head),
|
sizeof(struct journal_head),
|
||||||
0, /* offset */
|
0, /* offset */
|
||||||
SLAB_TEMPORARY | SLAB_DESTROY_BY_RCU,
|
SLAB_TEMPORARY | SLAB_TYPESAFE_BY_RCU,
|
||||||
NULL); /* ctor */
|
NULL); /* ctor */
|
||||||
retval = 0;
|
retval = 0;
|
||||||
if (!jbd2_journal_head_cache) {
|
if (!jbd2_journal_head_cache) {
|
||||||
|
|||||||
+1
-1
@@ -38,7 +38,7 @@ void signalfd_cleanup(struct sighand_struct *sighand)
|
|||||||
/*
|
/*
|
||||||
* The lockless check can race with remove_wait_queue() in progress,
|
* The lockless check can race with remove_wait_queue() in progress,
|
||||||
* but in this case its caller should run under rcu_read_lock() and
|
* but in this case its caller should run under rcu_read_lock() and
|
||||||
* sighand_cachep is SLAB_DESTROY_BY_RCU, we can safely return.
|
* sighand_cachep is SLAB_TYPESAFE_BY_RCU, we can safely return.
|
||||||
*/
|
*/
|
||||||
if (likely(!waitqueue_active(wqh)))
|
if (likely(!waitqueue_active(wqh)))
|
||||||
return;
|
return;
|
||||||
|
|||||||
@@ -229,7 +229,7 @@ static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence)
|
|||||||
*
|
*
|
||||||
* Function returns NULL if no refcount could be obtained, or the fence.
|
* Function returns NULL if no refcount could be obtained, or the fence.
|
||||||
* This function handles acquiring a reference to a fence that may be
|
* This function handles acquiring a reference to a fence that may be
|
||||||
* reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
|
* reallocated within the RCU grace period (such as with SLAB_TYPESAFE_BY_RCU),
|
||||||
* so long as the caller is using RCU on the pointer to the fence.
|
* so long as the caller is using RCU on the pointer to the fence.
|
||||||
*
|
*
|
||||||
* An alternative mechanism is to employ a seqlock to protect a bunch of
|
* An alternative mechanism is to employ a seqlock to protect a bunch of
|
||||||
@@ -257,7 +257,7 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep)
|
|||||||
* have successfully acquire a reference to it. If it no
|
* have successfully acquire a reference to it. If it no
|
||||||
* longer matches, we are holding a reference to some other
|
* longer matches, we are holding a reference to some other
|
||||||
* reallocated pointer. This is possible if the allocator
|
* reallocated pointer. This is possible if the allocator
|
||||||
* is using a freelist like SLAB_DESTROY_BY_RCU where the
|
* is using a freelist like SLAB_TYPESAFE_BY_RCU where the
|
||||||
* fence remains valid for the RCU grace period, but it
|
* fence remains valid for the RCU grace period, but it
|
||||||
* may be reallocated. When using such allocators, we are
|
* may be reallocated. When using such allocators, we are
|
||||||
* responsible for ensuring the reference we get is to
|
* responsible for ensuring the reference we get is to
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user