Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM updates from Russell King:
 "The major items included in here are:

   - MCPM, multi-cluster power management, part of the infrastructure
     required for ARMs big.LITTLE support.

   - A rework of the ARM KVM code to allow re-use by ARM64.

   - Error handling cleanups of the IS_ERR_OR_NULL() madness and fixes
     of that stuff for arch/arm

   - Preparatory patches for Cortex-M3 support from Uwe Kleine-König.

  There is also a set of three patches in here from Hugh/Catalin to
  address freeing of inappropriate page tables on LPAE.  You already
  have these from akpm, but they were already part of my tree at the
  time he sent them, so unfortunately they'll end up with duplicate
  commits"

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (77 commits)
  ARM: EXYNOS: remove unnecessary use of IS_ERR_VALUE()
  ARM: IMX: remove unnecessary use of IS_ERR_VALUE()
  ARM: OMAP: use consistent error checking
  ARM: cleanup: OMAP hwmod error checking
  ARM: 7709/1: mcpm: Add explicit AFLAGS to support v6/v7 multiplatform kernels
  ARM: 7700/2: Make cpu_init() notrace
  ARM: 7702/1: Set the page table freeing ceiling to TASK_SIZE
  ARM: 7701/1: mm: Allow arch code to control the user page table ceiling
  ARM: 7703/1: Disable preemption in broadcast_tlb*_a15_erratum()
  ARM: mcpm: provide an interface to set the SMP ops at run time
  ARM: mcpm: generic SMP secondary bringup and hotplug support
  ARM: mcpm_head.S: vlock-based first man election
  ARM: mcpm: Add baremetal voting mutexes
  ARM: mcpm: introduce helpers for platform coherency exit/setup
  ARM: mcpm: introduce the CPU/cluster power API
  ARM: multi-cluster PM: secondary kernel entry code
  ARM: cacheflush: add synchronization helpers for mixed cache state accesses
  ARM: cpu hotplug: remove majority of cache flushing from platforms
  ARM: smp: flush L1 cache in cpu_die()
  ARM: tegra: remove tegra specific cpu_disable()
  ...
This commit is contained in:
Linus Torvalds
2013-05-03 09:13:19 -07:00
96 changed files with 3105 additions and 1228 deletions
@@ -0,0 +1,498 @@
Cluster-wide Power-up/power-down race avoidance algorithm
=========================================================
This file documents the algorithm which is used to coordinate CPU and
cluster setup and teardown operations and to manage hardware coherency
controls safely.
The section "Rationale" explains what the algorithm is for and why it is
needed. "Basic model" explains general concepts using a simplified view
of the system. The other sections explain the actual details of the
algorithm in use.
Rationale
---------
In a system containing multiple CPUs, it is desirable to have the
ability to turn off individual CPUs when the system is idle, reducing
power consumption and thermal dissipation.
In a system containing multiple clusters of CPUs, it is also desirable
to have the ability to turn off entire clusters.
Turning entire clusters off and on is a risky business, because it
involves performing potentially destructive operations affecting a group
of independently running CPUs, while the OS continues to run. This
means that we need some coordination in order to ensure that critical
cluster-level operations are only performed when it is truly safe to do
so.
Simple locking may not be sufficient to solve this problem, because
mechanisms like Linux spinlocks may rely on coherency mechanisms which
are not immediately enabled when a cluster powers up. Since enabling or
disabling those mechanisms may itself be a non-atomic operation (such as
writing some hardware registers and invalidating large caches), other
methods of coordination are required in order to guarantee safe
power-down and power-up at the cluster level.
The mechanism presented in this document describes a coherent memory
based protocol for performing the needed coordination. It aims to be as
lightweight as possible, while providing the required safety properties.
Basic model
-----------
Each cluster and CPU is assigned a state, as follows:
DOWN
COMING_UP
UP
GOING_DOWN
+---------> UP ----------+
| v
COMING_UP GOING_DOWN
^ |
+--------- DOWN <--------+
DOWN: The CPU or cluster is not coherent, and is either powered off or
suspended, or is ready to be powered off or suspended.
COMING_UP: The CPU or cluster has committed to moving to the UP state.
It may be part way through the process of initialisation and
enabling coherency.
UP: The CPU or cluster is active and coherent at the hardware
level. A CPU in this state is not necessarily being used
actively by the kernel.
GOING_DOWN: The CPU or cluster has committed to moving to the DOWN
state. It may be part way through the process of teardown and
coherency exit.
Each CPU has one of these states assigned to it at any point in time.
The CPU states are described in the "CPU state" section, below.
Each cluster is also assigned a state, but it is necessary to split the
state value into two parts (the "cluster" state and "inbound" state) and
to introduce additional states in order to avoid races between different
CPUs in the cluster simultaneously modifying the state. The cluster-
level states are described in the "Cluster state" section.
To help distinguish the CPU states from cluster states in this
discussion, the state names are given a CPU_ prefix for the CPU states,
and a CLUSTER_ or INBOUND_ prefix for the cluster states.
CPU state
---------
In this algorithm, each individual core in a multi-core processor is
referred to as a "CPU". CPUs are assumed to be single-threaded:
therefore, a CPU can only be doing one thing at a single point in time.
This means that CPUs fit the basic model closely.
The algorithm defines the following states for each CPU in the system:
CPU_DOWN
CPU_COMING_UP
CPU_UP
CPU_GOING_DOWN
cluster setup and
CPU setup complete policy decision
+-----------> CPU_UP ------------+
| v
CPU_COMING_UP CPU_GOING_DOWN
^ |
+----------- CPU_DOWN <----------+
policy decision CPU teardown complete
or hardware event
The definitions of the four states correspond closely to the states of
the basic model.
Transitions between states occur as follows.
A trigger event (spontaneous) means that the CPU can transition to the
next state as a result of making local progress only, with no
requirement for any external event to happen.
CPU_DOWN:
A CPU reaches the CPU_DOWN state when it is ready for
power-down. On reaching this state, the CPU will typically
power itself down or suspend itself, via a WFI instruction or a
firmware call.
Next state: CPU_COMING_UP
Conditions: none
Trigger events:
a) an explicit hardware power-up operation, resulting
from a policy decision on another CPU;
b) a hardware event, such as an interrupt.
CPU_COMING_UP:
A CPU cannot start participating in hardware coherency until the
cluster is set up and coherent. If the cluster is not ready,
then the CPU will wait in the CPU_COMING_UP state until the
cluster has been set up.
Next state: CPU_UP
Conditions: The CPU's parent cluster must be in CLUSTER_UP.
Trigger events: Transition of the parent cluster to CLUSTER_UP.
Refer to the "Cluster state" section for a description of the
CLUSTER_UP state.
CPU_UP:
When a CPU reaches the CPU_UP state, it is safe for the CPU to
start participating in local coherency.
This is done by jumping to the kernel's CPU resume code.
Note that the definition of this state is slightly different
from the basic model definition: CPU_UP does not mean that the
CPU is coherent yet, but it does mean that it is safe to resume
the kernel. The kernel handles the rest of the resume
procedure, so the remaining steps are not visible as part of the
race avoidance algorithm.
The CPU remains in this state until an explicit policy decision
is made to shut down or suspend the CPU.
Next state: CPU_GOING_DOWN
Conditions: none
Trigger events: explicit policy decision
CPU_GOING_DOWN:
While in this state, the CPU exits coherency, including any
operations required to achieve this (such as cleaning data
caches).
Next state: CPU_DOWN
Conditions: local CPU teardown complete
Trigger events: (spontaneous)
Cluster state
-------------
A cluster is a group of connected CPUs with some common resources.
Because a cluster contains multiple CPUs, it can be doing multiple
things at the same time. This has some implications. In particular, a
CPU can start up while another CPU is tearing the cluster down.
In this discussion, the "outbound side" is the view of the cluster state
as seen by a CPU tearing the cluster down. The "inbound side" is the
view of the cluster state as seen by a CPU setting the CPU up.
In order to enable safe coordination in such situations, it is important
that a CPU which is setting up the cluster can advertise its state
independently of the CPU which is tearing down the cluster. For this
reason, the cluster state is split into two parts:
"cluster" state: The global state of the cluster; or the state
on the outbound side:
CLUSTER_DOWN
CLUSTER_UP
CLUSTER_GOING_DOWN
"inbound" state: The state of the cluster on the inbound side.
INBOUND_NOT_COMING_UP
INBOUND_COMING_UP
The different pairings of these states results in six possible
states for the cluster as a whole:
CLUSTER_UP
+==========> INBOUND_NOT_COMING_UP -------------+
# |
|
CLUSTER_UP <----+ |
INBOUND_COMING_UP | v
^ CLUSTER_GOING_DOWN CLUSTER_GOING_DOWN
# INBOUND_COMING_UP <=== INBOUND_NOT_COMING_UP
CLUSTER_DOWN | |
INBOUND_COMING_UP <----+ |
|
^ |
+=========== CLUSTER_DOWN <------------+
INBOUND_NOT_COMING_UP
Transitions -----> can only be made by the outbound CPU, and
only involve changes to the "cluster" state.
Transitions ===##> can only be made by the inbound CPU, and only
involve changes to the "inbound" state, except where there is no
further transition possible on the outbound side (i.e., the
outbound CPU has put the cluster into the CLUSTER_DOWN state).
The race avoidance algorithm does not provide a way to determine
which exact CPUs within the cluster play these roles. This must
be decided in advance by some other means. Refer to the section
"Last man and first man selection" for more explanation.
CLUSTER_DOWN/INBOUND_NOT_COMING_UP is the only state where the
cluster can actually be powered down.
The parallelism of the inbound and outbound CPUs is observed by
the existence of two different paths from CLUSTER_GOING_DOWN/
INBOUND_NOT_COMING_UP (corresponding to GOING_DOWN in the basic
model) to CLUSTER_DOWN/INBOUND_COMING_UP (corresponding to
COMING_UP in the basic model). The second path avoids cluster
teardown completely.
CLUSTER_UP/INBOUND_COMING_UP is equivalent to UP in the basic
model. The final transition to CLUSTER_UP/INBOUND_NOT_COMING_UP
is trivial and merely resets the state machine ready for the
next cycle.
Details of the allowable transitions follow.
The next state in each case is notated
<cluster state>/<inbound state> (<transitioner>)
where the <transitioner> is the side on which the transition
can occur; either the inbound or the outbound side.
CLUSTER_DOWN/INBOUND_NOT_COMING_UP:
Next state: CLUSTER_DOWN/INBOUND_COMING_UP (inbound)
Conditions: none
Trigger events:
a) an explicit hardware power-up operation, resulting
from a policy decision on another CPU;
b) a hardware event, such as an interrupt.
CLUSTER_DOWN/INBOUND_COMING_UP:
In this state, an inbound CPU sets up the cluster, including
enabling of hardware coherency at the cluster level and any
other operations (such as cache invalidation) which are required
in order to achieve this.
The purpose of this state is to do sufficient cluster-level
setup to enable other CPUs in the cluster to enter coherency
safely.
Next state: CLUSTER_UP/INBOUND_COMING_UP (inbound)
Conditions: cluster-level setup and hardware coherency complete
Trigger events: (spontaneous)
CLUSTER_UP/INBOUND_COMING_UP:
Cluster-level setup is complete and hardware coherency is
enabled for the cluster. Other CPUs in the cluster can safely
enter coherency.
This is a transient state, leading immediately to
CLUSTER_UP/INBOUND_NOT_COMING_UP. All other CPUs on the cluster
should consider treat these two states as equivalent.
Next state: CLUSTER_UP/INBOUND_NOT_COMING_UP (inbound)
Conditions: none
Trigger events: (spontaneous)
CLUSTER_UP/INBOUND_NOT_COMING_UP:
Cluster-level setup is complete and hardware coherency is
enabled for the cluster. Other CPUs in the cluster can safely
enter coherency.
The cluster will remain in this state until a policy decision is
made to power the cluster down.
Next state: CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP (outbound)
Conditions: none
Trigger events: policy decision to power down the cluster
CLUSTER_GOING_DOWN/INBOUND_NOT_COMING_UP:
An outbound CPU is tearing the cluster down. The selected CPU
must wait in this state until all CPUs in the cluster are in the
CPU_DOWN state.
When all CPUs are in the CPU_DOWN state, the cluster can be torn
down, for example by cleaning data caches and exiting
cluster-level coherency.
To avoid wasteful unnecessary teardown operations, the outbound
should check the inbound cluster state for asynchronous
transitions to INBOUND_COMING_UP. Alternatively, individual
CPUs can be checked for entry into CPU_COMING_UP or CPU_UP.
Next states:
CLUSTER_DOWN/INBOUND_NOT_COMING_UP (outbound)
Conditions: cluster torn down and ready to power off
Trigger events: (spontaneous)
CLUSTER_GOING_DOWN/INBOUND_COMING_UP (inbound)
Conditions: none
Trigger events:
a) an explicit hardware power-up operation,
resulting from a policy decision on another
CPU;
b) a hardware event, such as an interrupt.
CLUSTER_GOING_DOWN/INBOUND_COMING_UP:
The cluster is (or was) being torn down, but another CPU has
come online in the meantime and is trying to set up the cluster
again.
If the outbound CPU observes this state, it has two choices:
a) back out of teardown, restoring the cluster to the
CLUSTER_UP state;
b) finish tearing the cluster down and put the cluster
in the CLUSTER_DOWN state; the inbound CPU will
set up the cluster again from there.
Choice (a) permits the removal of some latency by avoiding
unnecessary teardown and setup operations in situations where
the cluster is not really going to be powered down.
Next states:
CLUSTER_UP/INBOUND_COMING_UP (outbound)
Conditions: cluster-level setup and hardware
coherency complete
Trigger events: (spontaneous)
CLUSTER_DOWN/INBOUND_COMING_UP (outbound)
Conditions: cluster torn down and ready to power off
Trigger events: (spontaneous)
Last man and First man selection
--------------------------------
The CPU which performs cluster tear-down operations on the outbound side
is commonly referred to as the "last man".
The CPU which performs cluster setup on the inbound side is commonly
referred to as the "first man".
The race avoidance algorithm documented above does not provide a
mechanism to choose which CPUs should play these roles.
Last man:
When shutting down the cluster, all the CPUs involved are initially
executing Linux and hence coherent. Therefore, ordinary spinlocks can
be used to select a last man safely, before the CPUs become
non-coherent.
First man:
Because CPUs may power up asynchronously in response to external wake-up
events, a dynamic mechanism is needed to make sure that only one CPU
attempts to play the first man role and do the cluster-level
initialisation: any other CPUs must wait for this to complete before
proceeding.
Cluster-level initialisation may involve actions such as configuring
coherency controls in the bus fabric.
The current implementation in mcpm_head.S uses a separate mutual exclusion
mechanism to do this arbitration. This mechanism is documented in
detail in vlocks.txt.
Features and Limitations
------------------------
Implementation:
The current ARM-based implementation is split between
arch/arm/common/mcpm_head.S (low-level inbound CPU operations) and
arch/arm/common/mcpm_entry.c (everything else):
__mcpm_cpu_going_down() signals the transition of a CPU to the
CPU_GOING_DOWN state.
__mcpm_cpu_down() signals the transition of a CPU to the CPU_DOWN
state.
A CPU transitions to CPU_COMING_UP and then to CPU_UP via the
low-level power-up code in mcpm_head.S. This could
involve CPU-specific setup code, but in the current
implementation it does not.
__mcpm_outbound_enter_critical() and __mcpm_outbound_leave_critical()
handle transitions from CLUSTER_UP to CLUSTER_GOING_DOWN
and from there to CLUSTER_DOWN or back to CLUSTER_UP (in
the case of an aborted cluster power-down).
These functions are more complex than the __mcpm_cpu_*()
functions due to the extra inter-CPU coordination which
is needed for safe transitions at the cluster level.
A cluster transitions from CLUSTER_DOWN back to CLUSTER_UP via
the low-level power-up code in mcpm_head.S. This
typically involves platform-specific setup code,
provided by the platform-specific power_up_setup
function registered via mcpm_sync_init.
Deep topologies:
As currently described and implemented, the algorithm does not
support CPU topologies involving more than two levels (i.e.,
clusters of clusters are not supported). The algorithm could be
extended by replicating the cluster-level states for the
additional topological levels, and modifying the transition
rules for the intermediate (non-outermost) cluster levels.
Colophon
--------
Originally created and documented by Dave Martin for Linaro Limited, in
collaboration with Nicolas Pitre and Achin Gupta.
Copyright (C) 2012-2013 Linaro Limited
Distributed under the terms of Version 2 of the GNU General Public
License, as defined in linux/COPYING.
+211
View File
@@ -0,0 +1,211 @@
vlocks for Bare-Metal Mutual Exclusion
======================================
Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
mechanism, with reasonable but minimal requirements on the memory
system.
These are intended to be used to coordinate critical activity among CPUs
which are otherwise non-coherent, in situations where the hardware
provides no other mechanism to support this and ordinary spinlocks
cannot be used.
vlocks make use of the atomicity provided by the memory system for
writes to a single memory location. To arbitrate, every CPU "votes for
itself", by storing a unique number to a common memory location. The
final value seen in that memory location when all the votes have been
cast identifies the winner.
In order to make sure that the election produces an unambiguous result
in finite time, a CPU will only enter the election in the first place if
no winner has been chosen and the election does not appear to have
started yet.
Algorithm
---------
The easiest way to explain the vlocks algorithm is with some pseudo-code:
int currently_voting[NR_CPUS] = { 0, };
int last_vote = -1; /* no votes yet */
bool vlock_trylock(int this_cpu)
{
/* signal our desire to vote */
currently_voting[this_cpu] = 1;
if (last_vote != -1) {
/* someone already volunteered himself */
currently_voting[this_cpu] = 0;
return false; /* not ourself */
}
/* let's suggest ourself */
last_vote = this_cpu;
currently_voting[this_cpu] = 0;
/* then wait until everyone else is done voting */
for_each_cpu(i) {
while (currently_voting[i] != 0)
/* wait */;
}
/* result */
if (last_vote == this_cpu)
return true; /* we won */
return false;
}
bool vlock_unlock(void)
{
last_vote = -1;
}
The currently_voting[] array provides a way for the CPUs to determine
whether an election is in progress, and plays a role analogous to the
"entering" array in Lamport's bakery algorithm [1].
However, once the election has started, the underlying memory system
atomicity is used to pick the winner. This avoids the need for a static
priority rule to act as a tie-breaker, or any counters which could
overflow.
As long as the last_vote variable is globally visible to all CPUs, it
will contain only one value that won't change once every CPU has cleared
its currently_voting flag.
Features and limitations
------------------------
* vlocks are not intended to be fair. In the contended case, it is the
_last_ CPU which attempts to get the lock which will be most likely
to win.
vlocks are therefore best suited to situations where it is necessary
to pick a unique winner, but it does not matter which CPU actually
wins.
* Like other similar mechanisms, vlocks will not scale well to a large
number of CPUs.
vlocks can be cascaded in a voting hierarchy to permit better scaling
if necessary, as in the following hypothetical example for 4096 CPUs:
/* first level: local election */
my_town = towns[(this_cpu >> 4) & 0xf];
I_won = vlock_trylock(my_town, this_cpu & 0xf);
if (I_won) {
/* we won the town election, let's go for the state */
my_state = states[(this_cpu >> 8) & 0xf];
I_won = vlock_lock(my_state, this_cpu & 0xf));
if (I_won) {
/* and so on */
I_won = vlock_lock(the_whole_country, this_cpu & 0xf];
if (I_won) {
/* ... */
}
vlock_unlock(the_whole_country);
}
vlock_unlock(my_state);
}
vlock_unlock(my_town);
ARM implementation
------------------
The current ARM implementation [2] contains some optimisations beyond
the basic algorithm:
* By packing the members of the currently_voting array close together,
we can read the whole array in one transaction (providing the number
of CPUs potentially contending the lock is small enough). This
reduces the number of round-trips required to external memory.
In the ARM implementation, this means that we can use a single load
and comparison:
LDR Rt, [Rn]
CMP Rt, #0
...in place of code equivalent to:
LDRB Rt, [Rn]
CMP Rt, #0
LDRBEQ Rt, [Rn, #1]
CMPEQ Rt, #0
LDRBEQ Rt, [Rn, #2]
CMPEQ Rt, #0
LDRBEQ Rt, [Rn, #3]
CMPEQ Rt, #0
This cuts down on the fast-path latency, as well as potentially
reducing bus contention in contended cases.
The optimisation relies on the fact that the ARM memory system
guarantees coherency between overlapping memory accesses of
different sizes, similarly to many other architectures. Note that
we do not care which element of currently_voting appears in which
bits of Rt, so there is no need to worry about endianness in this
optimisation.
If there are too many CPUs to read the currently_voting array in
one transaction then multiple transations are still required. The
implementation uses a simple loop of word-sized loads for this
case. The number of transactions is still fewer than would be
required if bytes were loaded individually.
In principle, we could aggregate further by using LDRD or LDM, but
to keep the code simple this was not attempted in the initial
implementation.
* vlocks are currently only used to coordinate between CPUs which are
unable to enable their caches yet. This means that the
implementation removes many of the barriers which would be required
when executing the algorithm in cached memory.
packing of the currently_voting array does not work with cached
memory unless all CPUs contending the lock are cache-coherent, due
to cache writebacks from one CPU clobbering values written by other
CPUs. (Though if all the CPUs are cache-coherent, you should be
probably be using proper spinlocks instead anyway).
* The "no votes yet" value used for the last_vote variable is 0 (not
-1 as in the pseudocode). This allows statically-allocated vlocks
to be implicitly initialised to an unlocked state simply by putting
them in .bss.
An offset is added to each CPU's ID for the purpose of setting this
variable, so that no CPU uses the value 0 for its ID.
Colophon
--------
Originally created and documented by Dave Martin for Linaro Limited, for
use in ARM-based big.LITTLE platforms, with review and input gratefully
received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for
grabbing most of this text out of the relevant mail thread and writing
up the pseudocode.
Copyright (C) 2012-2013 Linaro Limited
Distributed under the terms of Version 2 of the GNU General Public
License, as defined in linux/COPYING.
References
----------
[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming
Problem", Communications of the ACM 17, 8 (August 1974), 453-455.
http://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm
[2] linux/arch/arm/common/vlock.S, www.kernel.org.
+11 -1
View File
@@ -59,6 +59,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
select HAVE_CONTEXT_TRACKING
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
@@ -1479,6 +1480,14 @@ config HAVE_ARM_TWD
help
This options enables support for the ARM timer and watchdog unit
config MCPM
bool "Multi-Cluster Power Management"
depends on CPU_V7 && SMP
help
This option provides the common power management infrastructure
for (multi-)cluster based systems, such as big.LITTLE based
systems.
choice
prompt "Memory split"
default VMSPLIT_3G
@@ -1565,8 +1574,9 @@ config SCHED_HRTICK
def_bool HIGH_RES_TIMERS
config THUMB2_KERNEL
bool "Compile the kernel in Thumb-2 mode"
bool "Compile the kernel in Thumb-2 mode" if !CPU_THUMBONLY
depends on CPU_V7 && !CPU_V6 && !CPU_V6K
default y if CPU_THUMBONLY
select AEABI
select ARM_ASM_UNIFIED
select ARM_UNWIND
+11
View File
@@ -641,6 +641,17 @@ config DEBUG_LL_INCLUDE
default "debug/zynq.S" if DEBUG_ZYNQ_UART0 || DEBUG_ZYNQ_UART1
default "mach/debug-macro.S"
config DEBUG_UNCOMPRESS
bool
default y if ARCH_MULTIPLATFORM && DEBUG_LL && \
!DEBUG_OMAP2PLUS_UART && \
!DEBUG_TEGRA_UART
config UNCOMPRESS_INCLUDE
string
default "debug/uncompress.h" if ARCH_MULTIPLATFORM
default "mach/uncompress.h"
config EARLY_PRINTK
bool "Early printk"
depends on DEBUG_LL
+3
View File
@@ -24,6 +24,9 @@ endif
AFLAGS_head.o += -DTEXT_OFFSET=$(TEXT_OFFSET)
HEAD = head.o
OBJS += misc.o decompress.o
ifeq ($(CONFIG_DEBUG_UNCOMPRESS),y)
OBJS += debug.o
endif
FONTC = $(srctree)/drivers/video/console/font_acorn_8x8.c
# string library code (-Os is enforced to keep it much smaller)
+12
View File
@@ -0,0 +1,12 @@
#include <linux/linkage.h>
#include <asm/assembler.h>
#include CONFIG_DEBUG_LL_INCLUDE
ENTRY(putc)
addruart r1, r2, r3
waituart r3, r1
senduart r0, r1
busyuart r3, r1
mov pc, lr
ENDPROC(putc)
+1 -7
View File
@@ -25,13 +25,7 @@ unsigned int __machine_arch_type;
static void putstr(const char *ptr);
extern void error(char *x);
#ifdef CONFIG_ARCH_MULTIPLATFORM
static inline void putc(int c) {}
static inline void flush(void) {}
static inline void arch_decomp_setup(void) {}
#else
#include <mach/uncompress.h>
#endif
#include CONFIG_UNCOMPRESS_INCLUDE
#ifdef CONFIG_DEBUG_ICEDCC
+3
View File
@@ -11,3 +11,6 @@ obj-$(CONFIG_SHARP_PARAM) += sharpsl_param.o
obj-$(CONFIG_SHARP_SCOOP) += scoop.o
obj-$(CONFIG_PCI_HOST_ITE8152) += it8152.o
obj-$(CONFIG_ARM_TIMER_SP804) += timer-sp.o
obj-$(CONFIG_MCPM) += mcpm_head.o mcpm_entry.o mcpm_platsmp.o vlock.o
AFLAGS_mcpm_head.o := -march=armv7-a
AFLAGS_vlock.o := -march=armv7-a
+263
View File
@@ -0,0 +1,263 @@
/*
* arch/arm/common/mcpm_entry.c -- entry point for multi-cluster PM
*
* Created by: Nicolas Pitre, March 2012
* Copyright: (C) 2012-2013 Linaro Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/irqflags.h>
#include <asm/mcpm.h>
#include <asm/cacheflush.h>
#include <asm/idmap.h>
#include <asm/cputype.h>
extern unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
{
unsigned long val = ptr ? virt_to_phys(ptr) : 0;
mcpm_entry_vectors[cluster][cpu] = val;
sync_cache_w(&mcpm_entry_vectors[cluster][cpu]);
}
static const struct mcpm_platform_ops *platform_ops;
int __init mcpm_platform_register(const struct mcpm_platform_ops *ops)
{
if (platform_ops)
return -EBUSY;
platform_ops = ops;
return 0;
}
int mcpm_cpu_power_up(unsigned int cpu, unsigned int cluster)
{
if (!platform_ops)
return -EUNATCH; /* try not to shadow power_up errors */
might_sleep();
return platform_ops->power_up(cpu, cluster);
}
typedef void (*phys_reset_t)(unsigned long);
void mcpm_cpu_power_down(void)
{
phys_reset_t phys_reset;
BUG_ON(!platform_ops);
BUG_ON(!irqs_disabled());
/*
* Do this before calling into the power_down method,
* as it might not always be safe to do afterwards.
*/
setup_mm_for_reboot();
platform_ops->power_down();
/*
* It is possible for a power_up request to happen concurrently
* with a power_down request for the same CPU. In this case the
* power_down method might not be able to actually enter a
* powered down state with the WFI instruction if the power_up
* method has removed the required reset condition. The
* power_down method is then allowed to return. We must perform
* a re-entry in the kernel as if the power_up method just had
* deasserted reset on the CPU.
*
* To simplify race issues, the platform specific implementation
* must accommodate for the possibility of unordered calls to
* power_down and power_up with a usage count. Therefore, if a
* call to power_up is issued for a CPU that is not down, then
* the next call to power_down must not attempt a full shutdown
* but only do the minimum (normally disabling L1 cache and CPU
* coherency) and return just as if a concurrent power_up request
* had happened as described above.
*/
phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
phys_reset(virt_to_phys(mcpm_entry_point));
/* should never get here */
BUG();
}
void mcpm_cpu_suspend(u64 expected_residency)
{
phys_reset_t phys_reset;
BUG_ON(!platform_ops);
BUG_ON(!irqs_disabled());
/* Very similar to mcpm_cpu_power_down() */
setup_mm_for_reboot();
platform_ops->suspend(expected_residency);
phys_reset = (phys_reset_t)(unsigned long)virt_to_phys(cpu_reset);
phys_reset(virt_to_phys(mcpm_entry_point));
BUG();
}
int mcpm_cpu_powered_up(void)
{
if (!platform_ops)
return -EUNATCH;
if (platform_ops->powered_up)
platform_ops->powered_up();
return 0;
}
struct sync_struct mcpm_sync;
/*
* __mcpm_cpu_going_down: Indicates that the cpu is being torn down.
* This must be called at the point of committing to teardown of a CPU.
* The CPU cache (SCTRL.C bit) is expected to still be active.
*/
void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster)
{
mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_GOING_DOWN;
sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
}
/*
* __mcpm_cpu_down: Indicates that cpu teardown is complete and that the
* cluster can be torn down without disrupting this CPU.
* To avoid deadlocks, this must be called before a CPU is powered down.
* The CPU cache (SCTRL.C bit) is expected to be off.
* However L2 cache might or might not be active.
*/
void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster)
{
dmb();
mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_DOWN;
sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
dsb_sev();
}
/*
* __mcpm_outbound_leave_critical: Leave the cluster teardown critical section.
* @state: the final state of the cluster:
* CLUSTER_UP: no destructive teardown was done and the cluster has been
* restored to the previous state (CPU cache still active); or
* CLUSTER_DOWN: the cluster has been torn-down, ready for power-off
* (CPU cache disabled, L2 cache either enabled or disabled).
*/
void __mcpm_outbound_leave_critical(unsigned int cluster, int state)
{
dmb();
mcpm_sync.clusters[cluster].cluster = state;
sync_cache_w(&mcpm_sync.clusters[cluster].cluster);
dsb_sev();
}
/*
* __mcpm_outbound_enter_critical: Enter the cluster teardown critical section.
* This function should be called by the last man, after local CPU teardown
* is complete. CPU cache expected to be active.
*
* Returns:
* false: the critical section was not entered because an inbound CPU was
* observed, or the cluster is already being set up;
* true: the critical section was entered: it is now safe to tear down the
* cluster.
*/
bool __mcpm_outbound_enter_critical(unsigned int cpu, unsigned int cluster)
{
unsigned int i;
struct mcpm_sync_struct *c = &mcpm_sync.clusters[cluster];
/* Warn inbound CPUs that the cluster is being torn down: */
c->cluster = CLUSTER_GOING_DOWN;
sync_cache_w(&c->cluster);
/* Back out if the inbound cluster is already in the critical region: */
sync_cache_r(&c->inbound);
if (c->inbound == INBOUND_COMING_UP)
goto abort;
/*
* Wait for all CPUs to get out of the GOING_DOWN state, so that local
* teardown is complete on each CPU before tearing down the cluster.
*
* If any CPU has been woken up again from the DOWN state, then we
* shouldn't be taking the cluster down at all: abort in that case.
*/
sync_cache_r(&c->cpus);
for (i = 0; i < MAX_CPUS_PER_CLUSTER; i++) {
int cpustate;
if (i == cpu)
continue;
while (1) {
cpustate = c->cpus[i].cpu;
if (cpustate != CPU_GOING_DOWN)
break;
wfe();
sync_cache_r(&c->cpus[i].cpu);
}
switch (cpustate) {
case CPU_DOWN:
continue;
default:
goto abort;
}
}
return true;
abort:
__mcpm_outbound_leave_critical(cluster, CLUSTER_UP);
return false;
}
int __mcpm_cluster_state(unsigned int cluster)
{
sync_cache_r(&mcpm_sync.clusters[cluster].cluster);
return mcpm_sync.clusters[cluster].cluster;
}
extern unsigned long mcpm_power_up_setup_phys;
int __init mcpm_sync_init(
void (*power_up_setup)(unsigned int affinity_level))
{
unsigned int i, j, mpidr, this_cluster;
BUILD_BUG_ON(MCPM_SYNC_CLUSTER_SIZE * MAX_NR_CLUSTERS != sizeof mcpm_sync);
BUG_ON((unsigned long)&mcpm_sync & (__CACHE_WRITEBACK_GRANULE - 1));
/*
* Set initial CPU and cluster states.
* Only one cluster is assumed to be active at this point.
*/
for (i = 0; i < MAX_NR_CLUSTERS; i++) {
mcpm_sync.clusters[i].cluster = CLUSTER_DOWN;
mcpm_sync.clusters[i].inbound = INBOUND_NOT_COMING_UP;
for (j = 0; j < MAX_CPUS_PER_CLUSTER; j++)
mcpm_sync.clusters[i].cpus[j].cpu = CPU_DOWN;
}
mpidr = read_cpuid_mpidr();
this_cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
for_each_online_cpu(i)
mcpm_sync.clusters[this_cluster].cpus[i].cpu = CPU_UP;
mcpm_sync.clusters[this_cluster].cluster = CLUSTER_UP;
sync_cache_w(&mcpm_sync);
if (power_up_setup) {
mcpm_power_up_setup_phys = virt_to_phys(power_up_setup);
sync_cache_w(&mcpm_power_up_setup_phys);
}
return 0;
}
+219
View File
@@ -0,0 +1,219 @@
/*
* arch/arm/common/mcpm_head.S -- kernel entry point for multi-cluster PM
*
* Created by: Nicolas Pitre, March 2012
* Copyright: (C) 2012-2013 Linaro Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
*
* Refer to Documentation/arm/cluster-pm-race-avoidance.txt
* for details of the synchronisation algorithms used here.
*/
#include <linux/linkage.h>
#include <asm/mcpm.h>
#include "vlock.h"
.if MCPM_SYNC_CLUSTER_CPUS
.error "cpus must be the first member of struct mcpm_sync_struct"
.endif
.macro pr_dbg string
#if defined(CONFIG_DEBUG_LL) && defined(DEBUG)
b 1901f
1902: .asciz "CPU"
1903: .asciz " cluster"
1904: .asciz ": \string"
.align
1901: adr r0, 1902b
bl printascii
mov r0, r9
bl printhex8
adr r0, 1903b
bl printascii
mov r0, r10
bl printhex8
adr r0, 1904b
bl printascii
#endif
.endm
.arm
.align
ENTRY(mcpm_entry_point)
THUMB( adr r12, BSYM(1f) )
THUMB( bx r12 )
THUMB( .thumb )
1:
mrc p15, 0, r0, c0, c0, 5 @ MPIDR
ubfx r9, r0, #0, #8 @ r9 = cpu
ubfx r10, r0, #8, #8 @ r10 = cluster
mov r3, #MAX_CPUS_PER_CLUSTER
mla r4, r3, r10, r9 @ r4 = canonical CPU index
cmp r4, #(MAX_CPUS_PER_CLUSTER * MAX_NR_CLUSTERS)
blo 2f
/* We didn't expect this CPU. Try to cheaply make it quiet. */
1: wfi
wfe
b 1b
2: pr_dbg "kernel mcpm_entry_point\n"
/*
* MMU is off so we need to get to various variables in a
* position independent way.
*/
adr r5, 3f
ldmia r5, {r6, r7, r8, r11}
add r6, r5, r6 @ r6 = mcpm_entry_vectors
ldr r7, [r5, r7] @ r7 = mcpm_power_up_setup_phys
add r8, r5, r8 @ r8 = mcpm_sync
add r11, r5, r11 @ r11 = first_man_locks
mov r0, #MCPM_SYNC_CLUSTER_SIZE
mla r8, r0, r10, r8 @ r8 = sync cluster base
@ Signal that this CPU is coming UP:
mov r0, #CPU_COMING_UP
mov r5, #MCPM_SYNC_CPU_SIZE
mla r5, r9, r5, r8 @ r5 = sync cpu address
strb r0, [r5]
@ At this point, the cluster cannot unexpectedly enter the GOING_DOWN
@ state, because there is at least one active CPU (this CPU).
mov r0, #VLOCK_SIZE
mla r11, r0, r10, r11 @ r11 = cluster first man lock
mov r0, r11
mov r1, r9 @ cpu
bl vlock_trylock @ implies DMB
cmp r0, #0 @ failed to get the lock?
bne mcpm_setup_wait @ wait for cluster setup if so
ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
cmp r0, #CLUSTER_UP @ cluster already up?
bne mcpm_setup @ if not, set up the cluster
@ Otherwise, release the first man lock and skip setup:
mov r0, r11
bl vlock_unlock
b mcpm_setup_complete
mcpm_setup:
@ Control dependency implies strb not observable before previous ldrb.
@ Signal that the cluster is being brought up:
mov r0, #INBOUND_COMING_UP
strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
dmb
@ Any CPU trying to take the cluster into CLUSTER_GOING_DOWN from this
@ point onwards will observe INBOUND_COMING_UP and abort.
@ Wait for any previously-pending cluster teardown operations to abort
@ or complete:
mcpm_teardown_wait:
ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
cmp r0, #CLUSTER_GOING_DOWN
bne first_man_setup
wfe
b mcpm_teardown_wait
first_man_setup:
dmb
@ If the outbound gave up before teardown started, skip cluster setup:
cmp r0, #CLUSTER_UP
beq mcpm_setup_leave
@ power_up_setup is now responsible for setting up the cluster:
cmp r7, #0
mov r0, #1 @ second (cluster) affinity level
blxne r7 @ Call power_up_setup if defined
dmb
mov r0, #CLUSTER_UP
strb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
dmb
mcpm_setup_leave:
@ Leave the cluster setup critical section:
mov r0, #INBOUND_NOT_COMING_UP
strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND]
dsb
sev
mov r0, r11
bl vlock_unlock @ implies DMB
b mcpm_setup_complete
@ In the contended case, non-first men wait here for cluster setup
@ to complete:
mcpm_setup_wait:
ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER]
cmp r0, #CLUSTER_UP
wfene
bne mcpm_setup_wait
dmb
mcpm_setup_complete:
@ If a platform-specific CPU setup hook is needed, it is
@ called from here.
cmp r7, #0
mov r0, #0 @ first (CPU) affinity level
blxne r7 @ Call power_up_setup if defined
dmb
@ Mark the CPU as up:
mov r0, #CPU_UP
strb r0, [r5]
@ Observability order of CPU_UP and opening of the gate does not matter.
mcpm_entry_gated:
ldr r5, [r6, r4, lsl #2] @ r5 = CPU entry vector
cmp r5, #0
wfeeq
beq mcpm_entry_gated
dmb
pr_dbg "released\n"
bx r5
.align 2
3: .word mcpm_entry_vectors - .
.word mcpm_power_up_setup_phys - 3b
.word mcpm_sync - 3b
.word first_man_locks - 3b
ENDPROC(mcpm_entry_point)
.bss
.align CACHE_WRITEBACK_ORDER
.type first_man_locks, #object
first_man_locks:
.space VLOCK_SIZE * MAX_NR_CLUSTERS
.align CACHE_WRITEBACK_ORDER
.type mcpm_entry_vectors, #object
ENTRY(mcpm_entry_vectors)
.space 4 * MAX_NR_CLUSTERS * MAX_CPUS_PER_CLUSTER
.type mcpm_power_up_setup_phys, #object
ENTRY(mcpm_power_up_setup_phys)
.space 4 @ set by mcpm_sync_init()
+92
View File
@@ -0,0 +1,92 @@
/*
* linux/arch/arm/mach-vexpress/mcpm_platsmp.c
*
* Created by: Nicolas Pitre, November 2012
* Copyright: (C) 2012-2013 Linaro Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Code to handle secondary CPU bringup and hotplug for the cluster power API.
*/
#include <linux/init.h>
#include <linux/smp.h>
#include <linux/spinlock.h>
#include <linux/irqchip/arm-gic.h>
#include <asm/mcpm.h>
#include <asm/smp.h>
#include <asm/smp_plat.h>
static void __init simple_smp_init_cpus(void)
{
}
static int __cpuinit mcpm_boot_secondary(unsigned int cpu, struct task_struct *idle)
{
unsigned int mpidr, pcpu, pcluster, ret;
extern void secondary_startup(void);
mpidr = cpu_logical_map(cpu);
pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
pr_debug("%s: logical CPU %d is physical CPU %d cluster %d\n",
__func__, cpu, pcpu, pcluster);
mcpm_set_entry_vector(pcpu, pcluster, NULL);
ret = mcpm_cpu_power_up(pcpu, pcluster);
if (ret)
return ret;
mcpm_set_entry_vector(pcpu, pcluster, secondary_startup);
arch_send_wakeup_ipi_mask(cpumask_of(cpu));
dsb_sev();
return 0;
}
static void __cpuinit mcpm_secondary_init(unsigned int cpu)
{
mcpm_cpu_powered_up();
gic_secondary_init(0);
}
#ifdef CONFIG_HOTPLUG_CPU
static int mcpm_cpu_disable(unsigned int cpu)
{
/*
* We assume all CPUs may be shut down.
* This would be the hook to use for eventual Secure
* OS migration requests as described in the PSCI spec.
*/
return 0;
}
static void mcpm_cpu_die(unsigned int cpu)
{
unsigned int mpidr, pcpu, pcluster;
mpidr = read_cpuid_mpidr();
pcpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
pcluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
mcpm_set_entry_vector(pcpu, pcluster, NULL);
mcpm_cpu_power_down();
}
#endif
static struct smp_operations __initdata mcpm_smp_ops = {
.smp_init_cpus = simple_smp_init_cpus,
.smp_boot_secondary = mcpm_boot_secondary,
.smp_secondary_init = mcpm_secondary_init,
#ifdef CONFIG_HOTPLUG_CPU
.cpu_disable = mcpm_cpu_disable,
.cpu_die = mcpm_cpu_die,
#endif
};
void __init mcpm_smp_set_ops(void)
{
smp_set_ops(&mcpm_smp_ops);
}
+108
View File
@@ -0,0 +1,108 @@
/*
* vlock.S - simple voting lock implementation for ARM
*
* Created by: Dave Martin, 2012-08-16
* Copyright: (C) 2012-2013 Linaro Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
*
* This algorithm is described in more detail in
* Documentation/arm/vlocks.txt.
*/
#include <linux/linkage.h>
#include "vlock.h"
/* Select different code if voting flags can fit in a single word. */
#if VLOCK_VOTING_SIZE > 4
#define FEW(x...)
#define MANY(x...) x
#else
#define FEW(x...) x
#define MANY(x...)
#endif
@ voting lock for first-man coordination
.macro voting_begin rbase:req, rcpu:req, rscratch:req
mov \rscratch, #1
strb \rscratch, [\rbase, \rcpu]
dmb
.endm
.macro voting_end rbase:req, rcpu:req, rscratch:req
dmb
mov \rscratch, #0
strb \rscratch, [\rbase, \rcpu]
dsb
sev
.endm
/*
* The vlock structure must reside in Strongly-Ordered or Device memory.
* This implementation deliberately eliminates most of the barriers which
* would be required for other memory types, and assumes that independent
* writes to neighbouring locations within a cacheline do not interfere
* with one another.
*/
@ r0: lock structure base
@ r1: CPU ID (0-based index within cluster)
ENTRY(vlock_trylock)
add r1, r1, #VLOCK_VOTING_OFFSET
voting_begin r0, r1, r2
ldrb r2, [r0, #VLOCK_OWNER_OFFSET] @ check whether lock is held
cmp r2, #VLOCK_OWNER_NONE
bne trylock_fail @ fail if so
@ Control dependency implies strb not observable before previous ldrb.
strb r1, [r0, #VLOCK_OWNER_OFFSET] @ submit my vote
voting_end r0, r1, r2 @ implies DMB
@ Wait for the current round of voting to finish:
MANY( mov r3, #VLOCK_VOTING_OFFSET )
0:
MANY( ldr r2, [r0, r3] )
FEW( ldr r2, [r0, #VLOCK_VOTING_OFFSET] )
cmp r2, #0
wfene
bne 0b
MANY( add r3, r3, #4 )
MANY( cmp r3, #VLOCK_VOTING_OFFSET + VLOCK_VOTING_SIZE )
MANY( bne 0b )
@ Check who won:
dmb
ldrb r2, [r0, #VLOCK_OWNER_OFFSET]
eor r0, r1, r2 @ zero if I won, else nonzero
bx lr
trylock_fail:
voting_end r0, r1, r2
mov r0, #1 @ nonzero indicates that I lost
bx lr
ENDPROC(vlock_trylock)
@ r0: lock structure base
ENTRY(vlock_unlock)
dmb
mov r1, #VLOCK_OWNER_NONE
strb r1, [r0, #VLOCK_OWNER_OFFSET]
dsb
sev
bx lr
ENDPROC(vlock_unlock)
+29
View File
@@ -0,0 +1,29 @@
/*
* vlock.h - simple voting lock implementation
*
* Created by: Dave Martin, 2012-08-16
* Copyright: (C) 2012-2013 Linaro Limited
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#ifndef __VLOCK_H
#define __VLOCK_H
#include <asm/mcpm.h>
/* Offsets and sizes are rounded to a word (4 bytes) */
#define VLOCK_OWNER_OFFSET 0
#define VLOCK_VOTING_OFFSET 4
#define VLOCK_VOTING_SIZE ((MAX_CPUS_PER_CLUSTER + 3) / 4 * 4)
#define VLOCK_SIZE (VLOCK_VOTING_OFFSET + VLOCK_VOTING_SIZE)
#define VLOCK_OWNER_NONE 0
#endif /* ! __VLOCK_H */
+24
View File
@@ -243,6 +243,29 @@ typedef struct {
#define ATOMIC64_INIT(i) { (i) }
#ifdef CONFIG_ARM_LPAE
static inline u64 atomic64_read(const atomic64_t *v)
{
u64 result;
__asm__ __volatile__("@ atomic64_read\n"
" ldrd %0, %H0, [%1]"
: "=&r" (result)
: "r" (&v->counter), "Qo" (v->counter)
);
return result;
}
static inline void atomic64_set(atomic64_t *v, u64 i)
{
__asm__ __volatile__("@ atomic64_set\n"
" strd %2, %H2, [%1]"
: "=Qo" (v->counter)
: "r" (&v->counter), "r" (i)
);
}
#else
static inline u64 atomic64_read(const atomic64_t *v)
{
u64 result;
@@ -269,6 +292,7 @@ static inline void atomic64_set(atomic64_t *v, u64 i)
: "r" (&v->counter), "r" (i)
: "cc");
}
#endif
static inline void atomic64_add(u64 i, atomic64_t *v)
{
+75
View File
@@ -363,4 +363,79 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
flush_cache_all();
}
/*
* Memory synchronization helpers for mixed cached vs non cached accesses.
*
* Some synchronization algorithms have to set states in memory with the
* cache enabled or disabled depending on the code path. It is crucial
* to always ensure proper cache maintenance to update main memory right
* away in that case.
*
* Any cached write must be followed by a cache clean operation.
* Any cached read must be preceded by a cache invalidate operation.
* Yet, in the read case, a cache flush i.e. atomic clean+invalidate
* operation is needed to avoid discarding possible concurrent writes to the
* accessed memory.
*
* Also, in order to prevent a cached writer from interfering with an
* adjacent non-cached writer, each state variable must be located to
* a separate cache line.
*/
/*
* This needs to be >= the max cache writeback size of all
* supported platforms included in the current kernel configuration.
* This is used to align state variables to their own cache lines.
*/
#define __CACHE_WRITEBACK_ORDER 6 /* guessed from existing platforms */
#define __CACHE_WRITEBACK_GRANULE (1 << __CACHE_WRITEBACK_ORDER)
/*
* There is no __cpuc_clean_dcache_area but we use it anyway for
* code intent clarity, and alias it to __cpuc_flush_dcache_area.
*/
#define __cpuc_clean_dcache_area __cpuc_flush_dcache_area
/*
* Ensure preceding writes to *p by this CPU are visible to
* subsequent reads by other CPUs:
*/
static inline void __sync_cache_range_w(volatile void *p, size_t size)
{
char *_p = (char *)p;
__cpuc_clean_dcache_area(_p, size);
outer_clean_range(__pa(_p), __pa(_p + size));
}
/*
* Ensure preceding writes to *p by other CPUs are visible to
* subsequent reads by this CPU. We must be careful not to
* discard data simultaneously written by another CPU, hence the
* usage of flush rather than invalidate operations.
*/
static inline void __sync_cache_range_r(volatile void *p, size_t size)
{
char *_p = (char *)p;
#ifdef CONFIG_OUTER_CACHE
if (outer_cache.flush_range) {
/*
* Ensure dirty data migrated from other CPUs into our cache
* are cleaned out safely before the outer cache is cleaned:
*/
__cpuc_clean_dcache_area(_p, size);
/* Clean and invalidate stale data for *p from outer ... */
outer_flush_range(__pa(_p), __pa(_p + size));
}
#endif
/* ... and inner cache: */
__cpuc_flush_dcache_area(_p, size);
}
#define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))
#define sync_cache_r(ptr) __sync_cache_range_r(ptr, sizeof *(ptr))
#endif
+15 -1
View File
@@ -42,6 +42,8 @@
#define vectors_high() (0)
#endif
#ifdef CONFIG_CPU_CP15
extern unsigned long cr_no_alignment; /* defined in entry-armv.S */
extern unsigned long cr_alignment; /* defined in entry-armv.S */
@@ -82,6 +84,18 @@ static inline void set_copro_access(unsigned int val)
isb();
}
#endif
#else /* ifdef CONFIG_CPU_CP15 */
/*
* cr_alignment and cr_no_alignment are tightly coupled to cp15 (at least in the
* minds of the developers). Yielding 0 for machines without a cp15 (and making
* it read-only) is fine for most cases and saves quite some #ifdeffery.
*/
#define cr_no_alignment UL(0)
#define cr_alignment UL(0)
#endif /* ifdef CONFIG_CPU_CP15 / else */
#endif /* ifndef __ASSEMBLY__ */
#endif
+49 -26
View File
@@ -38,32 +38,6 @@
#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK)
extern unsigned int processor_id;
#ifdef CONFIG_CPU_CP15
#define read_cpuid(reg) \
({ \
unsigned int __val; \
asm("mrc p15, 0, %0, c0, c0, " __stringify(reg) \
: "=r" (__val) \
: \
: "cc"); \
__val; \
})
#define read_cpuid_ext(ext_reg) \
({ \
unsigned int __val; \
asm("mrc p15, 0, %0, c0, " ext_reg \
: "=r" (__val) \
: \
: "cc"); \
__val; \
})
#else
#define read_cpuid(reg) (processor_id)
#define read_cpuid_ext(reg) 0
#endif
#define ARM_CPU_IMP_ARM 0x41
#define ARM_CPU_IMP_INTEL 0x69
@@ -82,6 +56,46 @@ extern unsigned int processor_id;
#define ARM_CPU_XSCALE_ARCH_V2 0x4000
#define ARM_CPU_XSCALE_ARCH_V3 0x6000
extern unsigned int processor_id;
#ifdef CONFIG_CPU_CP15
#define read_cpuid(reg) \
({ \
unsigned int __val; \
asm("mrc p15, 0, %0, c0, c0, " __stringify(reg) \
: "=r" (__val) \
: \
: "cc"); \
__val; \
})
#define read_cpuid_ext(ext_reg) \
({ \
unsigned int __val; \
asm("mrc p15, 0, %0, c0, " ext_reg \
: "=r" (__val) \
: \
: "cc"); \
__val; \
})
#else /* ifdef CONFIG_CPU_CP15 */
/*
* read_cpuid and read_cpuid_ext should only ever be called on machines that
* have cp15 so warn on other usages.
*/
#define read_cpuid(reg) \
({ \
WARN_ON_ONCE(1); \
0; \
})
#define read_cpuid_ext(reg) read_cpuid(reg)
#endif /* ifdef CONFIG_CPU_CP15 / else */
#ifdef CONFIG_CPU_CP15
/*
* The CPU ID never changes at run time, so we might as well tell the
* compiler that it's constant. Use this function to read the CPU ID
@@ -92,6 +106,15 @@ static inline unsigned int __attribute_const__ read_cpuid_id(void)
return read_cpuid(CPUID_ID);
}
#else /* ifdef CONFIG_CPU_CP15 */
static inline unsigned int __attribute_const__ read_cpuid_id(void)
{
return processor_id;
}
#endif /* ifdef CONFIG_CPU_CP15 / else */
static inline unsigned int __attribute_const__ read_cpuid_implementor(void)
{
return (read_cpuid_id() & 0xFF000000) >> 24;
+18 -18
View File
@@ -18,12 +18,12 @@
* ================
*
* We have the following to choose from:
* arm6 - ARM6 style
* arm7 - ARM7 style
* v4_early - ARMv4 without Thumb early abort handler
* v4t_late - ARMv4 with Thumb late abort handler
* v4t_early - ARMv4 with Thumb early abort handler
* v5tej_early - ARMv5 with Thumb and Java early abort handler
* v5t_early - ARMv5 with Thumb early abort handler
* v5tj_early - ARMv5 with Thumb and Java early abort handler
* xscale - ARMv5 with Thumb with Xscale extensions
* v6_early - ARMv6 generic early abort handler
* v7_early - ARMv7 generic early abort handler
@@ -39,14 +39,6 @@
# endif
#endif
#ifdef CONFIG_CPU_ABRT_LV4T
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
# else
# define CPU_DABORT_HANDLER v4t_late_abort
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV4
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
@@ -55,6 +47,14 @@
# endif
#endif
#ifdef CONFIG_CPU_ABRT_LV4T
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
# else
# define CPU_DABORT_HANDLER v4t_late_abort
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV4T
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
@@ -63,14 +63,6 @@
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV5TJ
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
# else
# define CPU_DABORT_HANDLER v5tj_early_abort
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV5T
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
@@ -79,6 +71,14 @@
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV5TJ
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
# else
# define CPU_DABORT_HANDLER v5tj_early_abort
# endif
#endif
#ifdef CONFIG_CPU_ABRT_EV6
# ifdef CPU_DABORT_HANDLER
# define MULTI_DABORT 1
+4
View File
@@ -211,4 +211,8 @@
#define HSR_HVC_IMM_MASK ((1UL << 16) - 1)
#define HSR_DABT_S1PTW (1U << 7)
#define HSR_DABT_CM (1U << 8)
#define HSR_DABT_EA (1U << 9)
#endif /* __ARM_KVM_ARM_H__ */
+1 -1
View File
@@ -75,7 +75,7 @@ extern char __kvm_hyp_code_end[];
extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
extern void __kvm_flush_vm_context(void);
extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
#endif

Some files were not shown because too many files have changed in this diff Show More