You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'mcpm' of git://git.linaro.org/people/nico/linux into devel-stable
This commit is contained in:
@@ -953,11 +953,11 @@ S: Blacksburg, Virginia 24061
|
||||
S: USA
|
||||
|
||||
N: Randy Dunlap
|
||||
E: rdunlap@xenotime.net
|
||||
W: http://www.xenotime.net/linux/linux.html
|
||||
W: http://www.linux-usb.org
|
||||
E: rdunlap@infradead.org
|
||||
W: http://www.infradead.org/~rdunlap/
|
||||
D: Linux-USB subsystem, USB core/UHCI/printer/storage drivers
|
||||
D: x86 SMP, ACPI, bootflag hacking
|
||||
D: documentation, builds
|
||||
S: (ask for current address)
|
||||
S: USA
|
||||
|
||||
|
||||
@@ -60,8 +60,7 @@ own source tree. For example:
|
||||
"dontdiff" is a list of files which are generated by the kernel during
|
||||
the build process, and should be ignored in any diff(1)-generated
|
||||
patch. The "dontdiff" file is included in the kernel tree in
|
||||
2.6.12 and later. For earlier kernel versions, you can get it
|
||||
from <http://www.xenotime.net/linux/doc/dontdiff>.
|
||||
2.6.12 and later.
|
||||
|
||||
Make sure your patch does not include any extra files which do not
|
||||
belong in a patch submission. Make sure to review your patch -after-
|
||||
|
||||
@@ -0,0 +1,498 @@
|
||||
Cluster-wide Power-up/power-down race avoidance algorithm
|
||||
=========================================================
|
||||
|
||||
This file documents the algorithm which is used to coordinate CPU and
|
||||
cluster setup and teardown operations and to manage hardware coherency
|
||||
controls safely.
|
||||
|
||||
The section "Rationale" explains what the algorithm is for and why it is
|
||||
needed. "Basic model" explains general concepts using a simplified view
|
||||
of the system. The other sections explain the actual details of the
|
||||
algorithm in use.
|
||||
|
||||
|
||||
Rationale
|
||||
---------
|
||||
|
||||
In a system containing multiple CPUs, it is desirable to have the
|
||||
ability to turn off individual CPUs when the system is idle, reducing
|
||||
power consumption and thermal dissipation.
|
||||
|
||||
In a system containing multiple clusters of CPUs, it is also desirable
|
||||
to have the ability to turn off entire clusters.
|
||||
|
||||
Turning entire clusters off and on is a risky business, because it
|
||||
involves performing potentially destructive operations affecting a group
|
||||
of independently running CPUs, while the OS continues to run. This
|
||||
means that we need some coordination in order to ensure that critical
|
||||
cluster-level operations are only performed when it is truly safe to do
|
||||
so.
|
||||
|
||||
Simple locking may not be sufficient to solve this problem, because
|
||||
mechanisms like Linux spinlocks may rely on coherency mechanisms which
|
||||
are not immediately enabled when a cluster powers up. Since enabling or
|
||||
disabling those mechanisms may itself be a non-atomic operation (such as
|
||||
writing some hardware registers and invalidating large caches), other
|
||||
methods of coordination are required in order to guarantee safe
|
||||
power-down and power-up at the cluster level.
|
||||
|
||||
The mechanism presented in this document describes a coherent memory
|
||||
based protocol for performing the needed coordination. It aims to be as
|
||||
lightweight as possible, while providing the required safety properties.
|
||||
|
||||
|
||||
Basic model
|
||||
-----------
|
||||
|
||||
Each cluster and CPU is assigned a state, as follows:
|
||||
|
||||
DOWN
|
||||
COMING_UP
|
||||
UP
|
||||
GOING_DOWN
|
||||
|
||||
+---------> UP ----------+
|
||||
| v
|
||||
|
||||
COMING_UP GOING_DOWN
|
||||
|
||||
^ |
|
||||
+--------- DOWN <--------+
|
||||
|
||||
|
||||
DOWN: The CPU or cluster is not coherent, and is either powered off or
|
||||
suspended, or is ready to be powered off or suspended.
|
||||
|
||||
COMING_UP: The CPU or cluster has committed to moving to the UP state.
|
||||
It may be part way through the process of initialisation and
|
||||
enabling coherency.
|
||||
|
||||
UP: The CPU or cluster is active and coherent at the hardware
|
||||
level. A CPU in this state is not necessarily being used
|
||||
actively by the kernel.
|
||||
|
||||
GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
|
||||
state. It may be part way through the process of teardown and
|
||||
coherency exit.
|
||||
|
||||
|
||||
Each CPU has one of these states assigned to it at any point in time.
|
||||
The CPU states are described in the "CPU state" section, below.
|
||||
|
||||
Each cluster is also assigned a state, but it is necessary to split the
|
||||
state value into two parts (the "cluster" state and "inbound" state) and
|
||||
to introduce additional states in order to avoid races between different
|
||||
CPUs in the cluster simultaneously modifying the state. The cluster-
|
||||
level states are described in the "Cluster state" section.
|
||||
|
||||
To help distinguish the CPU states from cluster states in this
|
||||
discussion, the state names are given a CPU_ prefix for the CPU states,
|
||||
and a CLUSTER_ or INBOUND_ prefix for the cluster states.
|
||||
|
||||
|
||||
CPU state
|
||||
---------
|
||||
|
||||
In this algorithm, each individual core in a multi-core processor is
|
||||
referred to as a "CPU". CPUs are assumed to be single-threaded:
|
||||
therefore, a CPU can only be doing one thing at a single point in time.
|
||||
|
||||
This means that CPUs fit the basic model closely.
|
||||
|
||||
The algorithm defines the following states for each CPU in the system:
|
||||
|
||||
CPU_DOWN
|
||||
CPU_COMING_UP
|
||||
CPU_UP
|
||||
CPU_GOING_DOWN
|
||||
|
||||
cluster setup and
|
||||
CPU setup complete policy decision
|
||||
+-----------> CPU_UP ------------+
|
||||
| v
|
||||
|
||||
CPU_COMING_UP CPU_GOING_DOWN
|
||||
|
||||
^ |
|
||||
+----------- CPU_DOWN <----------+
|
||||
policy decision CPU teardown complete
|
||||
or hardware event
|
||||
|
||||
|
||||
The definitions of the four states correspond closely to the states of
|
||||
the basic model.
|
||||
|
||||
Transitions between states occur as follows.
|
||||
|
||||
A trigger event (spontaneous) means that the CPU can transition to the
|
||||
next state as a result of making local progress only, with no
|
||||
requirement for any external event to happen.
|
||||
|
||||
|
||||
CPU_DOWN:
|
||||
|
||||
A CPU reaches the CPU_DOWN state when it is ready for
|
||||
power-down. On reaching this state, the CPU will typically
|
||||
power itself down or suspend itself, via a WFI instruction or a
|
||||
firmware call.
|
||||
|
||||
Next state: CPU_COMING_UP
|
||||
Conditions: none
|
||||
|
||||
Trigger events:
|
||||
|
||||
a) an explicit hardware power-up operation, resulting
|
||||
from a policy decision on another CPU;
|
||||
|
||||
b) a hardware event, such as an interrupt.
|
||||
|
||||
|
||||
CPU_COMING_UP:
|
||||
|
||||
A CPU cannot start participating in hardware coherency until the
|
||||
cluster is set up and coherent. If the cluster is not ready,
|
||||
then the CPU will wait in the CPU_COMING_UP state until the
|
||||
cluster has been set up.
|
||||
|
||||
Next state: CPU_UP
|
||||
Conditions: The CPU's parent cluster must be in CLUSTER_UP.
|
||||
Trigger events: Transition of the parent cluster to CLUSTER_UP.
|
||||
|
||||
Refer to the "Cluster state" section for a description of the
|
||||
CLUSTER_UP state.
|
||||
|
||||
|
||||
CPU_UP:
|
||||
When a CPU reaches the CPU_UP state, it is safe for the CPU to
|
||||
start participating in local coherency.
|
||||
|
||||
This is done by jumping to the kernel's CPU resume code.
|
||||
|
||||
Note that the definition of this state is slightly different
|
||||
from the basic model definition: CPU_UP does not mean that the
|
||||
CPU is coherent yet, but it does mean that it is safe to resume
|
||||
the kernel. The kernel handles the rest of the resume
|
||||
procedure, so the remaining steps are not visible as part of the
|
||||
race avoidance algorithm.
|
||||
|
||||
The CPU remains in this state until an explicit policy decision
|
||||
is made to shut down or suspend the CPU.
|
||||
|
||||
Next state: CPU_GOING_DOWN
|
||||
Conditions: none
|
||||
Trigger events: explicit policy decision
|
||||
|
||||
|
||||
CPU_GOING_DOWN:
|
||||
|
||||
While in this state, the CPU exits coherency, including any
|
||||
operations required to achieve this (such as cleaning data
|
||||
caches).
|
||||
|
||||
Next state: CPU_DOWN
|
||||
Conditions: local CPU teardown complete
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
|
||||
Cluster state
|
||||
-------------
|
||||
|
||||
A cluster is a group of connected CPUs with some common resources.
|
||||
Because a cluster contains multiple CPUs, it can be doing multiple
|
||||
things at the same time. This has some implications. In particular, a
|
||||
CPU can start up while another CPU is tearing the cluster down.
|
||||
|
||||
In this discussion, the "outbound side" is the view of the cluster state
|
||||
as seen by a CPU tearing the cluster down. The "inbound side" is the
|
||||
view of the cluster state as seen by a CPU setting the CPU up.
|
||||
|
||||
In order to enable safe coordination in such situations, it is important
|
||||
that a CPU which is setting up the cluster can advertise its state
|
||||
independently of the CPU which is tearing down the cluster. For this
|
||||
reason, the cluster state is split into two parts:
|
||||
|
||||
"cluster" state: The global state of the cluster; or the state
|
||||
on the outbound side:
|
||||
|
||||
CLUSTER_DOWN
|
||||
CLUSTER_UP
|
||||
CLUSTER_GOING_DOWN
|
||||
|
||||
"inbound" state: The state of the cluster on the inbound side.
|
||||
|
||||
INBOUND_NOT_COMING_UP
|
||||
INBOUND_COMING_UP
|
||||
|
||||
|
||||
The different pairings of these states results in six possible
|
||||
states for the cluster as a whole:
|
||||
|
||||
CLUSTER_UP
|
||||
+==========> INBOUND_NOT_COMING_UP -------------+
|
||||
# |
|
||||
|
|
||||
CLUSTER_UP <----+ |
|
||||
INBOUND_COMING_UP | v
|
||||
|
||||
^ CLUSTER_GOING_DOWN CLUSTER_GOING_DOWN
|
||||
# INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
|
||||
|
||||
CLUSTER_DOWN | |
|
||||
INBOUND_COMING_UP <----+ |
|
||||
|
|
||||
^ |
|
||||
+=========== CLUSTER_DOWN <------------+
|
||||
INBOUND_NOT_COMING_UP
|
||||
|
||||
Transitions -----> can only be made by the outbound CPU, and
|
||||
only involve changes to the "cluster" state.
|
||||
|
||||
Transitions ===##> can only be made by the inbound CPU, and only
|
||||
involve changes to the "inbound" state, except where there is no
|
||||
further transition possible on the outbound side (i.e., the
|
||||
outbound CPU has put the cluster into the CLUSTER_DOWN state).
|
||||
|
||||
The race avoidance algorithm does not provide a way to determine
|
||||
which exact CPUs within the cluster play these roles. This must
|
||||
be decided in advance by some other means. Refer to the section
|
||||
"Last man and first man selection" for more explanation.
|
||||
|
||||
|
||||
CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
|
||||
cluster can actually be powered down.
|
||||
|
||||
The parallelism of the inbound and outbound CPUs is observed by
|
||||
the existence of two different paths from CLUSTER_GOING_DOWN/
|
||||
INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
|
||||
model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
|
||||
COMING_UP in the basic model). The second path avoids cluster
|
||||
teardown completely.
|
||||
|
||||
CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
|
||||
model. The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
|
||||
is trivial and merely resets the state machine ready for the
|
||||
next cycle.
|
||||
|
||||
Details of the allowable transitions follow.
|
||||
|
||||
The next state in each case is notated
|
||||
|
||||
<cluster state>/<inbound state> (<transitioner>)
|
||||
|
||||
where the <transitioner> is the side on which the transition
|
||||
can occur; either the inbound or the outbound side.
|
||||
|
||||
|
||||
CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
|
||||
|
||||
Next state: CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
|
||||
Conditions: none
|
||||
Trigger events:
|
||||
|
||||
a) an explicit hardware power-up operation, resulting
|
||||
from a policy decision on another CPU;
|
||||
|
||||
b) a hardware event, such as an interrupt.
|
||||
|
||||
|
||||
CLUSTER_DOWN/INBOUND_COMING_UP:
|
||||
|
||||
In this state, an inbound CPU sets up the cluster, including
|
||||
enabling of hardware coherency at the cluster level and any
|
||||
other operations (such as cache invalidation) which are required
|
||||
in order to achieve this.
|
||||
|
||||
The purpose of this state is to do sufficient cluster-level
|
||||
setup to enable other CPUs in the cluster to enter coherency
|
||||
safely.
|
||||
|
||||
Next state: CLUSTER_UP/INBOUND_COMING_UP (inbound)
|
||||
Conditions: cluster-level setup and hardware coherency complete
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
|
||||
CLUSTER_UP/INBOUND_COMING_UP:
|
||||
|
||||
Cluster-level setup is complete and hardware coherency is
|
||||
enabled for the cluster. Other CPUs in the cluster can safely
|
||||
enter coherency.
|
||||
|
||||
This is a transient state, leading immediately to
|
||||
CLUSTER_UP/INBOUND_NOT_COMING_UP. All other CPUs on the cluster
|
||||
should consider treat these two states as equivalent.
|
||||
|
||||
Next state: CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
|
||||
Conditions: none
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
|
||||
CLUSTER_UP/INBOUND_NOT_COMING_UP:
|
||||
|
||||
Cluster-level setup is complete and hardware coherency is
|
||||
enabled for the cluster. Other CPUs in the cluster can safely
|
||||
enter coherency.
|
||||
|
||||
The cluster will remain in this state until a policy decision is
|
||||
made to power the cluster down.
|
||||
|
||||
Next state: CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
|
||||
Conditions: none
|
||||
Trigger events: policy decision to power down the cluster
|
||||
|
||||
|
||||
CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
|
||||
|
||||
An outbound CPU is tearing the cluster down. The selected CPU
|
||||
must wait in this state until all CPUs in the cluster are in the
|
||||
CPU_DOWN state.
|
||||
|
||||
When all CPUs are in the CPU_DOWN state, the cluster can be torn
|
||||
down, for example by cleaning data caches and exiting
|
||||
cluster-level coherency.
|
||||
|
||||
To avoid wasteful unnecessary teardown operations, the outbound
|
||||
should check the inbound cluster state for asynchronous
|
||||
transitions to INBOUND_COMING_UP. Alternatively, individual
|
||||
CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
|
||||
|
||||
|
||||
Next states:
|
||||
|
||||
CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
|
||||
Conditions: cluster torn down and ready to power off
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
|
||||
Conditions: none
|
||||
Trigger events:
|
||||
|
||||
a) an explicit hardware power-up operation,
|
||||
resulting from a policy decision on another
|
||||
CPU;
|
||||
|
||||
b) a hardware event, such as an interrupt.
|
||||
|
||||
|
||||
CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
|
||||
|
||||
The cluster is (or was) being torn down, but another CPU has
|
||||
come online in the meantime and is trying to set up the cluster
|
||||
again.
|
||||
|
||||
If the outbound CPU observes this state, it has two choices:
|
||||
|
||||
a) back out of teardown, restoring the cluster to the
|
||||
CLUSTER_UP state;
|
||||
|
||||
b) finish tearing the cluster down and put the cluster
|
||||
in the CLUSTER_DOWN state; the inbound CPU will
|
||||
set up the cluster again from there.
|
||||
|
||||
Choice (a) permits the removal of some latency by avoiding
|
||||
unnecessary teardown and setup operations in situations where
|
||||
the cluster is not really going to be powered down.
|
||||
|
||||
|
||||
Next states:
|
||||
|
||||
CLUSTER_UP/INBOUND_COMING_UP (outbound)
|
||||
Conditions: cluster-level setup and hardware
|
||||
coherency complete
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
|
||||
Conditions: cluster torn down and ready to power off
|
||||
Trigger events: (spontaneous)
|
||||
|
||||
|
||||
Last man and First man selection
|
||||
--------------------------------
|
||||
|
||||
The CPU which performs cluster tear-down operations on the outbound side
|
||||
is commonly referred to as the "last man".
|
||||
|
||||
The CPU which performs cluster setup on the inbound side is commonly
|
||||
referred to as the "first man".
|
||||
|
||||
The race avoidance algorithm documented above does not provide a
|
||||
mechanism to choose which CPUs should play these roles.
|
||||
|
||||
|
||||
Last man:
|
||||
|
||||
When shutting down the cluster, all the CPUs involved are initially
|
||||
executing Linux and hence coherent. Therefore, ordinary spinlocks can
|
||||
be used to select a last man safely, before the CPUs become
|
||||
non-coherent.
|
||||
|
||||
|
||||
First man:
|
||||
|
||||
Because CPUs may power up asynchronously in response to external wake-up
|
||||
events, a dynamic mechanism is needed to make sure that only one CPU
|
||||
attempts to play the first man role and do the cluster-level
|
||||
initialisation: any other CPUs must wait for this to complete before
|
||||
proceeding.
|
||||
|
||||
Cluster-level initialisation may involve actions such as configuring
|
||||
coherency controls in the bus fabric.
|
||||
|
||||
The current implementation in mcpm_head.S uses a separate mutual exclusion
|
||||
mechanism to do this arbitration. This mechanism is documented in
|
||||
detail in vlocks.txt.
|
||||
|
||||
|
||||
Features and Limitations
|
||||
------------------------
|
||||
|
||||
Implementation:
|
||||
|
||||
The current ARM-based implementation is split between
|
||||
arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
|
||||
arch/arm/common/mcpm_entry.c (everything else):
|
||||
|
||||
__mcpm_cpu_going_down() signals the transition of a CPU to the
|
||||
CPU_GOING_DOWN state.
|
||||
|
||||
__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
|
||||
state.
|
||||
|
||||
A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
|
||||
low-level power-up code in mcpm_head.S. This could
|
||||
involve CPU-specific setup code, but in the current
|
||||
implementation it does not.
|
||||
|
||||
__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
|
||||
handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
|
||||
and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
|
||||
the case of an aborted cluster power-down).
|
||||
|
||||
These functions are more complex than the __mcpm_cpu_*()
|
||||
functions due to the extra inter-CPU coordination which
|
||||
is needed for safe transitions at the cluster level.
|
||||
|
||||
A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
|
||||
the low-level power-up code in mcpm_head.S. This
|
||||
typically involves platform-specific setup code,
|
||||
provided by the platform-specific power_up_setup
|
||||
function registered via mcpm_sync_init.
|
||||
|
||||
Deep topologies:
|
||||
|
||||
As currently described and implemented, the algorithm does not
|
||||
support CPU topologies involving more than two levels (i.e.,
|
||||
clusters of clusters are not supported). The algorithm could be
|
||||
extended by replicating the cluster-level states for the
|
||||
additional topological levels, and modifying the transition
|
||||
rules for the intermediate (non-outermost) cluster levels.
|
||||
|
||||
|
||||
Colophon
|
||||
--------
|
||||
|
||||
Originally created and documented by Dave Martin for Linaro Limited, in
|
||||
collaboration with Nicolas Pitre and Achin Gupta.
|
||||
|
||||
Copyright (C) 2012-2013 Linaro Limited
|
||||
Distributed under the terms of Version 2 of the GNU General Public
|
||||
License, as defined in linux/COPYING.
|
||||
@@ -0,0 +1,211 @@
|
||||
vlocks for Bare-Metal Mutual Exclusion
|
||||
======================================
|
||||
|
||||
Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
|
||||
mechanism, with reasonable but minimal requirements on the memory
|
||||
system.
|
||||
|
||||
These are intended to be used to coordinate critical activity among CPUs
|
||||
which are otherwise non-coherent, in situations where the hardware
|
||||
provides no other mechanism to support this and ordinary spinlocks
|
||||
cannot be used.
|
||||
|
||||
|
||||
vlocks make use of the atomicity provided by the memory system for
|
||||
writes to a single memory location. To arbitrate, every CPU "votes for
|
||||
itself", by storing a unique number to a common memory location. The
|
||||
final value seen in that memory location when all the votes have been
|
||||
cast identifies the winner.
|
||||
|
||||
In order to make sure that the election produces an unambiguous result
|
||||
in finite time, a CPU will only enter the election in the first place if
|
||||
no winner has been chosen and the election does not appear to have
|
||||
started yet.
|
||||
|
||||
|
||||
Algorithm
|
||||
---------
|
||||
|
||||
The easiest way to explain the vlocks algorithm is with some pseudo-code:
|
||||
|
||||
|
||||
int currently_voting[NR_CPUS] = { 0, };
|
||||
int last_vote = -1; /* no votes yet */
|
||||
|
||||
bool vlock_trylock(int this_cpu)
|
||||
{
|
||||
/* signal our desire to vote */
|
||||
currently_voting[this_cpu] = 1;
|
||||
if (last_vote != -1) {
|
||||
/* someone already volunteered himself */
|
||||
currently_voting[this_cpu] = 0;
|
||||
return false; /* not ourself */
|
||||
}
|
||||
|
||||
/* let's suggest ourself */
|
||||
last_vote = this_cpu;
|
||||
currently_voting[this_cpu] = 0;
|
||||
|
||||
/* then wait until everyone else is done voting */
|
||||
for_each_cpu(i) {
|
||||
while (currently_voting[i] != 0)
|
||||
/* wait */;
|
||||
}
|
||||
|
||||
/* result */
|
||||
if (last_vote == this_cpu)
|
||||
return true; /* we won */
|
||||
return false;
|
||||
}
|
||||
|
||||
bool vlock_unlock(void)
|
||||
{
|
||||
last_vote = -1;
|
||||
}
|
||||
|
||||
|
||||
The currently_voting[] array provides a way for the CPUs to determine
|
||||
whether an election is in progress, and plays a role analogous to the
|
||||
"entering" array in Lamport's bakery algorithm [1].
|
||||
|
||||
However, once the election has started, the underlying memory system
|
||||
atomicity is used to pick the winner. This avoids the need for a static
|
||||
priority rule to act as a tie-breaker, or any counters which could
|
||||
overflow.
|
||||
|
||||
As long as the last_vote variable is globally visible to all CPUs, it
|
||||
will contain only one value that won't change once every CPU has cleared
|
||||
its currently_voting flag.
|
||||
|
||||
|
||||
Features and limitations
|
||||
------------------------
|
||||
|
||||
* vlocks are not intended to be fair. In the contended case, it is the
|
||||
_last_ CPU which attempts to get the lock which will be most likely
|
||||
to win.
|
||||
|
||||
vlocks are therefore best suited to situations where it is necessary
|
||||
to pick a unique winner, but it does not matter which CPU actually
|
||||
wins.
|
||||
|
||||
* Like other similar mechanisms, vlocks will not scale well to a large
|
||||
number of CPUs.
|
||||
|
||||
vlocks can be cascaded in a voting hierarchy to permit better scaling
|
||||
if necessary, as in the following hypothetical example for 4096 CPUs:
|
||||
|
||||
/* first level: local election */
|
||||
my_town = towns[(this_cpu >> 4) & 0xf];
|
||||
I_won = vlock_trylock(my_town, this_cpu & 0xf);
|
||||
if (I_won) {
|
||||
/* we won the town election, let's go for the state */
|
||||
my_state = states[(this_cpu >> 8) & 0xf];
|
||||
I_won = vlock_lock(my_state, this_cpu & 0xf));
|
||||
if (I_won) {
|
||||
/* and so on */
|
||||
I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
|
||||
if (I_won) {
|
||||
/* ... */
|
||||
}
|
||||
vlock_unlock(the_whole_country);
|
||||
}
|
||||
vlock_unlock(my_state);
|
||||
}
|
||||
vlock_unlock(my_town);
|
||||
|
||||
|
||||
ARM implementation
|
||||
------------------
|
||||
|
||||
The current ARM implementation [2] contains some optimisations beyond
|
||||
the basic algorithm:
|
||||
|
||||
* By packing the members of the currently_voting array close together,
|
||||
we can read the whole array in one transaction (providing the number
|
||||
of CPUs potentially contending the lock is small enough). This
|
||||
reduces the number of round-trips required to external memory.
|
||||
|
||||
In the ARM implementation, this means that we can use a single load
|
||||
and comparison:
|
||||
|
||||
LDR Rt, [Rn]
|
||||
CMP Rt, #0
|
||||
|
||||
...in place of code equivalent to:
|
||||
|
||||
LDRB Rt, [Rn]
|
||||
CMP Rt, #0
|
||||
LDRBEQ Rt, [Rn, #1]
|
||||
CMPEQ Rt, #0
|
||||
LDRBEQ Rt, [Rn, #2]
|
||||
CMPEQ Rt, #0
|
||||
LDRBEQ Rt, [Rn, #3]
|
||||
CMPEQ Rt, #0
|
||||
|
||||
This cuts down on the fast-path latency, as well as potentially
|
||||
reducing bus contention in contended cases.
|
||||
|
||||
The optimisation relies on the fact that the ARM memory system
|
||||
guarantees coherency between overlapping memory accesses of
|
||||
different sizes, similarly to many other architectures. Note that
|
||||
we do not care which element of currently_voting appears in which
|
||||
bits of Rt, so there is no need to worry about endianness in this
|
||||
optimisation.
|
||||
|
||||
If there are too many CPUs to read the currently_voting array in
|
||||
one transaction then multiple transations are still required. The
|
||||
implementation uses a simple loop of word-sized loads for this
|
||||
case. The number of transactions is still fewer than would be
|
||||
required if bytes were loaded individually.
|
||||
|
||||
|
||||
In principle, we could aggregate further by using LDRD or LDM, but
|
||||
to keep the code simple this was not attempted in the initial
|
||||
implementation.
|
||||
|
||||
|
||||
* vlocks are currently only used to coordinate between CPUs which are
|
||||
unable to enable their caches yet. This means that the
|
||||
implementation removes many of the barriers which would be required
|
||||
when executing the algorithm in cached memory.
|
||||
|
||||
packing of the currently_voting array does not work with cached
|
||||
memory unless all CPUs contending the lock are cache-coherent, due
|
||||
to cache writebacks from one CPU clobbering values written by other
|
||||
CPUs. (Though if all the CPUs are cache-coherent, you should be
|
||||
probably be using proper spinlocks instead anyway).
|
||||
|
||||
|
||||
* The "no votes yet" value used for the last_vote variable is 0 (not
|
||||
-1 as in the pseudocode). This allows statically-allocated vlocks
|
||||
to be implicitly initialised to an unlocked state simply by putting
|
||||
them in .bss.
|
||||
|
||||
An offset is added to each CPU's ID for the purpose of setting this
|
||||
variable, so that no CPU uses the value 0 for its ID.
|
||||
|
||||
|
||||
Colophon
|
||||
--------
|
||||
|
||||
Originally created and documented by Dave Martin for Linaro Limited, for
|
||||
use in ARM-based big.LITTLE platforms, with review and input gratefully
|
||||
received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for
|
||||
grabbing most of this text out of the relevant mail thread and writing
|
||||
up the pseudocode.
|
||||
|
||||
Copyright (C) 2012-2013 Linaro Limited
|
||||
Distributed under the terms of Version 2 of the GNU General Public
|
||||
License, as defined in linux/COPYING.
|
||||
|
||||
|
||||
References
|
||||
----------
|
||||
|
||||
[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
|
||||
Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
|
||||
|
||||
http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
|
||||
|
||||
[2] linux/arch/arm/common/vlock.S, www.kernel.org.
|
||||
@@ -30,6 +30,7 @@ The target is named "raid" and it accepts the following parameters:
|
||||
raid10 Various RAID10 inspired algorithms chosen by additional params
|
||||
- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
|
||||
- RAID1E: Integrated Adjacent Stripe Mirroring
|
||||
- RAID1E: Integrated Offset Stripe Mirroring
|
||||
- and other similar RAID10 variants
|
||||
|
||||
Reference: Chapter 4 of
|
||||
@@ -64,15 +65,15 @@ The target is named "raid" and it accepts the following parameters:
|
||||
synchronisation state for each region.
|
||||
|
||||
[raid10_copies <# copies>]
|
||||
[raid10_format near]
|
||||
[raid10_format <near|far|offset>]
|
||||
These two options are used to alter the default layout of
|
||||
a RAID10 configuration. The number of copies is can be
|
||||
specified, but the default is 2. There are other variations
|
||||
to how the copies are laid down - the default and only current
|
||||
option is "near". Near copies are what most people think of
|
||||
with respect to mirroring. If these options are left
|
||||
unspecified, or 'raid10_copies 2' and/or 'raid10_format near'
|
||||
are given, then the layouts for 2, 3 and 4 devices are:
|
||||
specified, but the default is 2. There are also three
|
||||
variations to how the copies are laid down - the default
|
||||
is "near". Near copies are what most people think of with
|
||||
respect to mirroring. If these options are left unspecified,
|
||||
or 'raid10_copies 2' and/or 'raid10_format near' are given,
|
||||
then the layouts for 2, 3 and 4 devices are:
|
||||
2 drives 3 drives 4 drives
|
||||
-------- ---------- --------------
|
||||
A1 A1 A1 A1 A2 A1 A1 A2 A2
|
||||
@@ -85,6 +86,33 @@ The target is named "raid" and it accepts the following parameters:
|
||||
3-device layout is what might be called a 'RAID1E - Integrated
|
||||
Adjacent Stripe Mirroring'.
|
||||
|
||||
If 'raid10_copies 2' and 'raid10_format far', then the layouts
|
||||
for 2, 3 and 4 devices are:
|
||||
2 drives 3 drives 4 drives
|
||||
-------- -------------- --------------------
|
||||
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
||||
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
||||
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
||||
.. .. .. .. .. .. .. .. ..
|
||||
A2 A1 A3 A1 A2 A2 A1 A4 A3
|
||||
A4 A3 A6 A4 A5 A6 A5 A8 A7
|
||||
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
||||
.. .. .. .. .. .. .. .. ..
|
||||
|
||||
If 'raid10_copies 2' and 'raid10_format offset', then the
|
||||
layouts for 2, 3 and 4 devices are:
|
||||
2 drives 3 drives 4 drives
|
||||
-------- ------------ -----------------
|
||||
A1 A2 A1 A2 A3 A1 A2 A3 A4
|
||||
A2 A1 A3 A1 A2 A2 A1 A4 A3
|
||||
A3 A4 A4 A5 A6 A5 A6 A7 A8
|
||||
A4 A3 A6 A4 A5 A6 A5 A8 A7
|
||||
A5 A6 A7 A8 A9 A9 A10 A11 A12
|
||||
A6 A5 A9 A7 A8 A10 A9 A12 A11
|
||||
.. .. .. .. .. .. .. .. ..
|
||||
Here we see layouts closely akin to 'RAID1E - Integrated
|
||||
Offset Stripe Mirroring'.
|
||||
|
||||
<#raid_devs>: The number of devices composing the array.
|
||||
Each device consists of two entries. The first is the device
|
||||
containing the metadata (if any); the second is the one containing the
|
||||
@@ -142,3 +170,5 @@ Version History
|
||||
1.3.0 Added support for RAID 10
|
||||
1.3.1 Allow device replacement/rebuild for RAID 10
|
||||
1.3.2 Fix/improve redundancy checking for RAID10
|
||||
1.4.0 Non-functional change. Removes arg from mapping function.
|
||||
1.4.1 Add RAID10 "far" and "offset" algorithm support.
|
||||
|
||||
@@ -13,9 +13,6 @@ Required parent device properties:
|
||||
4 = active high level-sensitive
|
||||
8 = active low level-sensitive
|
||||
|
||||
Optional parent device properties:
|
||||
- reg : contains the PRCMU mailbox address for the AB8500 i2c port
|
||||
|
||||
The AB8500 consists of a large and varied group of sub-devices:
|
||||
|
||||
Device IRQ Names Supply Names Description
|
||||
@@ -86,9 +83,8 @@ Non-standard child device properties:
|
||||
- stericsson,amic2-bias-vamic1 : Analoge Mic wishes to use a non-standard Vamic
|
||||
- stericsson,earpeice-cmv : Earpeice voltage (only: 950 | 1100 | 1270 | 1580)
|
||||
|
||||
ab8500@5 {
|
||||
ab8500 {
|
||||
compatible = "stericsson,ab8500";
|
||||
reg = <5>; /* mailbox 5 is i2c */
|
||||
interrupts = <0 40 0x4>;
|
||||
interrupt-controller;
|
||||
#interrupt-cells = <2>;
|
||||
|
||||
@@ -11,6 +11,9 @@ Required properties:
|
||||
- "nvidia,tegra20-uart"
|
||||
- "nxp,lpc3220-uart"
|
||||
- "ibm,qpace-nwp-serial"
|
||||
- "altr,16550-FIFO32"
|
||||
- "altr,16550-FIFO64"
|
||||
- "altr,16550-FIFO128"
|
||||
- "serial" if the port type is unknown.
|
||||
- reg : offset and length of the register set for the device.
|
||||
- interrupts : should contain uart interrupt.
|
||||
|
||||
@@ -15,7 +15,7 @@ Supported chips:
|
||||
Addresses scanned: -
|
||||
Datasheet: www.analog.com/static/imported-files/data_sheets/ADM1276.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -4,9 +4,14 @@ Kernel driver adt7410
|
||||
Supported chips:
|
||||
* Analog Devices ADT7410
|
||||
Prefix: 'adt7410'
|
||||
Addresses scanned: I2C 0x48 - 0x4B
|
||||
Addresses scanned: None
|
||||
Datasheet: Publicly available at the Analog Devices website
|
||||
http://www.analog.com/static/imported-files/data_sheets/ADT7410.pdf
|
||||
* Analog Devices ADT7420
|
||||
Prefix: 'adt7420'
|
||||
Addresses scanned: None
|
||||
Datasheet: Publicly available at the Analog Devices website
|
||||
http://www.analog.com/static/imported-files/data_sheets/ADT7420.pdf
|
||||
|
||||
Author: Hartmut Knaack <knaack.h@gmx.de>
|
||||
|
||||
@@ -27,6 +32,10 @@ value per second or even justget one sample on demand for power saving.
|
||||
Besides, it can completely power down its ADC, if power management is
|
||||
required.
|
||||
|
||||
The ADT7420 is register compatible, the only differences being the package,
|
||||
a slightly narrower operating temperature range (-40°C to +150°C), and a
|
||||
better accuracy (0.25°C instead of 0.50°C.)
|
||||
|
||||
Configuration Notes
|
||||
-------------------
|
||||
|
||||
|
||||
@@ -49,7 +49,7 @@ Supported chips:
|
||||
Addresses scanned: I2C 0x18 - 0x1f
|
||||
|
||||
Author:
|
||||
Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -8,7 +8,7 @@ Supported devices:
|
||||
Documentation:
|
||||
http://www.lineagepower.com/oem/pdf/CPLI2C.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -19,7 +19,7 @@ Supported chips:
|
||||
Datasheet:
|
||||
http://www.national.com/pf/LM/LM5066.html
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -5,13 +5,13 @@ Supported chips:
|
||||
* Linear Technology LTC2978
|
||||
Prefix: 'ltc2978'
|
||||
Addresses scanned: -
|
||||
Datasheet: http://cds.linear.com/docs/Datasheet/2978fa.pdf
|
||||
Datasheet: http://www.linear.com/product/ltc2978
|
||||
* Linear Technology LTC3880
|
||||
Prefix: 'ltc3880'
|
||||
Addresses scanned: -
|
||||
Datasheet: http://cds.linear.com/docs/Datasheet/3880f.pdf
|
||||
Datasheet: http://www.linear.com/product/ltc3880
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -8,7 +8,7 @@ Supported chips:
|
||||
Datasheet:
|
||||
http://cds.linear.com/docs/Datasheet/42612fb.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -7,7 +7,7 @@ Supported chips:
|
||||
Addresses scanned: -
|
||||
Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX16064.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -24,7 +24,7 @@ Supported chips:
|
||||
http://datasheets.maxim-ic.com/en/ds/MAX16070-MAX16071.pdf
|
||||
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -27,7 +27,7 @@ Supported chips:
|
||||
Addresses scanned: -
|
||||
Datasheet: http://datasheets.maximintegrated.com/en/ds/MAX34461.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -7,7 +7,7 @@ Supported chips:
|
||||
Addresses scanned: -
|
||||
Datasheet: http://datasheets.maxim-ic.com/en/ds/MAX8688.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -34,7 +34,7 @@ Supported chips:
|
||||
Addresses scanned: -
|
||||
Datasheet: n.a.
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Description
|
||||
|
||||
@@ -29,7 +29,7 @@ Supported chips:
|
||||
http://www.summitmicro.com/prod_select/summary/SMM766/SMM766_2086.pdf
|
||||
http://www.summitmicro.com/prod_select/summary/SMM766B/SMM766B_2122.pdf
|
||||
|
||||
Author: Guenter Roeck <guenter.roeck@ericsson.com>
|
||||
Author: Guenter Roeck <linux@roeck-us.net>
|
||||
|
||||
|
||||
Module Parameters
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user