Commit Graph

238 Commits

Author SHA1 Message Date
Olof Johansson
e306dfd06f ARM64: unwind: Fix PC calculation
The frame PC value in the unwind code used to just take the saved LR
value and use that.  That's incorrect as a stack trace, since it shows
the return path stack, not the call path stack.

In particular, it shows faulty information in case the bl is done as
the very last instruction of one label, since the return point will be
in the next label. That can easily be seen with tail calls to panic(),
which is marked __noreturn and thus doesn't have anything useful after it.

Easiest here is to just correct the unwind code and do a -4, to get the
actual call site for the backtrace instead of the return site.

Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: stable@vger.kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-17 09:16:33 +00:00
Will Deacon
8e86f0b409 arm64: atomics: fix use of acquire + release for full barrier semantics
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

	// A, B, C are independent memory locations

	<Access [A]>

	// atomic_op (B)
1:	ldaxr	x0, [B]		// Exclusive load with acquire
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b

	<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

	<Access [A]>

	// atomic_op (B)
	dmb	ish		// Full barrier
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stxr	w1, x0, [B]	// Exclusive store
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

	<Access [A]>

	// atomic_op (B)
1:	ldxr	x0, [B]		// Exclusive load
	<op(B)>
	stlxr	w1, x0, [B]	// Exclusive store with release
	cbnz	w1, 1b
	dmb	ish		// Full barrier

	<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07 16:45:43 +00:00
Nathan Lynch
d4022a3352 arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
Update wall-to-monotonic fields in the VDSO data page
unconditionally.  These are used to service CLOCK_MONOTONIC_COARSE,
which is not guarded by use_syscall.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 11:55:49 +00:00
Nathan Lynch
069b918623 arm64: vdso: fix coarse clock handling
When __kernel_clock_gettime is called with a CLOCK_MONOTONIC_COARSE or
CLOCK_REALTIME_COARSE clock id, it returns incorrectly to whatever the
caller has placed in x2 ("ret x2" to return from the fast path).  Fix
this by saving x30/LR to x2 only in code that will call
__do_get_tspec, restoring x30 afterward, and using a plain "ret" to
return from the routine.

Also: while the resulting tv_nsec value for CLOCK_REALTIME and
CLOCK_MONOTONIC must be computed using intermediate values that are
left-shifted by cs_shift (x12, set by __do_get_tspec), the results for
coarse clocks should be calculated using unshifted values
(xtime_coarse_nsec is in units of actual nanoseconds).  The current
code shifts intermediate values by x12 unconditionally, but x12 is
uninitialized when servicing a coarse clock.  Fix this by setting x12
to 0 once we know we are dealing with a coarse clock id.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-05 11:55:30 +00:00
Will Deacon
4050740348 arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k
Whilst the text segment for our VDSO is marked as PT_LOAD in the ELF
headers, it is mapped by the kernel and not actually subject to
demand-paging. ld doesn't realise this, and emits a p_align field of 64k
(the maximum supported page size), which conflicts with the load address
picked by the kernel on 4k systems, which will be 4k aligned. This
causes GDB to fail with "Failed to read a valid object file image from
memory" when attempting to load the VDSO.

This patch passes the -n option to ld, which prevents it from aligning
PT_LOAD segments to the maximum page size.

Cc: <stable@vger.kernel.org>
Reported-by: Kyle McMartin <kyle@redhat.com>
Acked-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-04 17:52:47 +00:00
Linus Torvalds
deb2a1d29b Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pyll ARM64 patches from Catalin Marinas:
 - Build fix with DMA_CMA enabled
 - Introduction of PTE_WRITE to distinguish between writable but clean
   and truly read-only pages
 - FIQs enabling/disabling clean-up (they aren't used on arm64)
 - CPU resume fix for the per-cpu offset restoring
 - Code comment typos

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: Introduce PTE_WRITE
  arm64: mm: Remove PTE_BIT_FUNC macro
  arm64: FIQs are unused
  arm64: mm: fix the function name in comment of cpu_do_switch_mm
  arm64: fix build error if DMA_CMA is enabled
  arm64: kernel: fix per-cpu offset restore on resume
  arm64: mm: fix the function name in comment of __flush_dcache_area
  arm64: mm: use ubfm for dcache_line_size
2014-01-31 14:25:52 -08:00
Nicolas Pitre
f864b61ee4 arm64: FIQs are unused
So any FIQ handling is superfluous at the moment.  The functions to
disable/enable FIQs is kept around if ever someone needs them in the
future, but existing calling sites including arch_cpu_idle_prepare()
may go for now.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-30 13:51:43 +00:00
Lorenzo Pieralisi
fb4a96029c arm64: kernel: fix per-cpu offset restore on resume
The introduction of percpu offset optimisation through tpidr_el1 in:

Commit id :7158627686f02319c50c8d9d78f75d4c8
"arm64: percpu: implement optimised pcpu access using tpidr_el1"

requires cpu_{suspend/resume} to restore the tpidr_el1 register upon resume
so that percpu variables can be addressed correctly when a CPU comes out
of reset from warm-boot.

This patch fixes cpu_{suspend}/{resume} tpidr_el1 restoration on resume, by
calling the set_my_cpu_offset C API, as it is done on primary and secondary
CPUs on cold boot, so that, even if the register used to store the percpu
offset is changed, the save and restore of general purpose registers does not
have to be updated.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-24 14:27:40 +00:00
Linus Torvalds
82b51734b4 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull ARM64 updates from Catalin Marinas:
 - CPU suspend support on top of PSCI (firmware Power State Coordination
   Interface)
 - jump label support
 - CMA can now be enabled on arm64
 - HWCAP bits for crypto and CRC32 extensions
 - optimised percpu using tpidr_el1 register
 - code cleanup

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: fix typo in entry.S
  arm64: kernel: restore HW breakpoint registers in cpu_suspend
  jump_label: use defined macros instead of hard-coding for better readability
  arm64, jump label: optimize jump label implementation
  arm64, jump label: detect %c support for ARM64
  arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
  arm64: move encode_insn_immediate() from module.c to insn.c
  arm64: introduce interfaces to hotpatch kernel and module code
  arm64: introduce basic aarch64 instruction decoding helpers
  arm64: dts: Reduce size of virtio block device for foundation model
  arm64: Remove unused __data_loc variable
  arm64: Enable CMA
  arm64: Warn on NULL device structure for dma APIs
  arm64: Add hwcaps for crypto and CRC32 extensions.
  arm64: drop redundant macros from read_cpuid()
  arm64: Remove outdated comment
  arm64: cmpxchg: update macros to prevent warnings
  arm64: support single-step and breakpoint handler hooks
  ARM64: fix framepointer check in unwind_frame
  ARM64: check stack pointer in get_wchan
  ...
2014-01-20 15:40:44 -08:00
Neil Zhang
883c057367 arm64: fix typo in entry.S
Commit 64681787 (arm64: let the core code deal with preempt_count)
changed the code, but left the comments unchanged, fix it.

Signed-off-by: Neil Zhang <zhangwm@marvell.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-13 13:55:13 +00:00
Lorenzo Pieralisi
65c021bb49 arm64: kernel: restore HW breakpoint registers in cpu_suspend
When a CPU resumes from low-power, it restores HW breakpoint and
watchpoint slots through a CPU PM notifier. Since we want to enable
debugging as early as possible in the resume path, the mdscr content
is restored along the general purpose registers in the cpu_suspend API
and debug exceptions are reenabled when cpu_suspend returns. Since the
CPU PM notifier is run after a CPU has been resumed, we cannot expect
HW breakpoint registers to contain sane values till the notifier is run,
since the HW breakpoints registers content is unknown at reset; this means
that the CPU might run with debug exceptions enabled, mdscr restored but HW
breakpoint registers containing junk values that can trigger spurious
debug exceptions.

This patch fixes current HW breakpoints restore by moving the HW breakpoints
registers restoration to the cpu_suspend API, before the debug exceptions are
enabled. This way, as soon as the cpu_suspend function returns the
kernel can resume debugging with sane values in HW breakpoint registers.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-10 17:51:35 +00:00
Jiang Liu
9732cafd9d arm64, jump label: optimize jump label implementation
Optimize jump label implementation for ARM64 by dynamically patching
kernel text.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-08 15:23:53 +00:00
Jiang Liu
5c5bf25d4f arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
Introduce aarch64_insn_gen_{nop|branch_imm}() helper functions, which
will be used to implement jump label on ARM64.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-08 15:21:29 +00:00
Jiang Liu
c84fced8d9 arm64: move encode_insn_immediate() from module.c to insn.c
Function encode_insn_immediate() will be used by other instruction
manipulate related functions, so move it into insn.c and rename it
as aarch64_insn_encode_immediate().

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-08 15:21:29 +00:00
Jiang Liu
ae16480785 arm64: introduce interfaces to hotpatch kernel and module code
Introduce three interfaces to patch kernel and module code:
aarch64_insn_patch_text_nosync():
	patch code without synchronization, it's caller's responsibility
	to synchronize all CPUs if needed.
aarch64_insn_patch_text_sync():
	patch code and always synchronize with stop_machine()
aarch64_insn_patch_text():
	patch code and synchronize with stop_machine() if needed

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-08 15:21:29 +00:00
Jiang Liu
b11a64a48c arm64: introduce basic aarch64 instruction decoding helpers
Introduce basic aarch64 instruction decoding helper
aarch64_get_insn_class() and aarch64_insn_hotpatch_safe().

Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jiang Liu <liuj97@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-01-08 15:21:28 +00:00
Geoff Levand
b22cf637bb arm64: Remove unused __data_loc variable
The __data_loc variable is an unused left over from the 32 bit arm implementation.
Remove that variable and adjust the __mmap_switched startup routine accordingly.

Signed-off-by: Geoff Levand <geoff@infradead.org> for Huawei, Linaro
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-20 12:04:48 +00:00
Catalin Marinas
0a5be743e8 Merge tag 'arm64-suspend' of git://linux-arm.org/linux-2.6-lp into upstream
* tag 'arm64-suspend' of git://linux-arm.org/linux-2.6-lp:
  arm64: add CPU power management menu/entries
  arm64: kernel: add PM build infrastructure
  arm64: kernel: add CPU idle call
  arm64: enable generic clockevent broadcast
  arm64: kernel: implement HW breakpoints CPU PM notifier
  arm64: kernel: refactor code to install/uninstall breakpoints
  arm: kvm: implement CPU PM notifier
  arm64: kernel: implement fpsimd CPU PM notifier
  arm64: kernel: cpu_{suspend/resume} implementation
  arm64: kernel: suspend/resume registers save/restore
  arm64: kernel: build MPIDR_EL1 hash function data structure
  arm64: kernel: add MPIDR_EL1 accessors macros

Conflicts:
	arch/arm64/Kconfig
2013-12-19 17:57:51 +00:00
Steve Capper
4bff28ccda arm64: Add hwcaps for crypto and CRC32 extensions.
Advertise the optional cryptographic and CRC32 instructions to
user space where present. Several hwcap bits [3-7] are allocated.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
[bit 2 is taken now so use bits 3-7 instead]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:44:08 +00:00
Liviu Dudau
81cac69944 arm64: Remove outdated comment
Code referenced in the comment has moved to arch/arm64/kernel/cputable.c

Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:44:06 +00:00
Sandeepa Prabhu
ee6214cec7 arm64: support single-step and breakpoint handler hooks
AArch64 Single Steping and Breakpoint debug exceptions will be
used by multiple debug framworks like kprobes & kgdb.

This patch implements the hooks for those frameworks to register
their own handlers for handling breakpoint and single step events.

Reworked the debug exception handler in entry.S: do_dbg to route
software breakpoint (BRK64) exception to do_debug_exception()

Signed-off-by: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:11 +00:00
Konstantin Khlebnikov
26920dd2da ARM64: fix framepointer check in unwind_frame
We need at least 24 bytes above frame pointer.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:10 +00:00
Konstantin Khlebnikov
408c3658b0 ARM64: check stack pointer in get_wchan
get_wchan() is lockless. Task may wakeup at any time and change its own stack,
thus each next stack frame may be overwritten and filled with random stuff.

Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:09 +00:00
Will Deacon
12a0ef7b0a arm64: use generic strnlen_user and strncpy_from_user functions
This patch implements the word-at-a-time interface for arm64 using the
same algorithm as ARM. We use the fls64 macro, which expands to a clz
instruction via a compiler builtin. Big-endian configurations make use
of the implementation from asm-generic.

With this implemented, we can replace our byte-at-a-time strnlen_user
and strncpy_from_user functions with the optimised generic versions.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:06 +00:00
Will Deacon
7158627686 arm64: percpu: implement optimised pcpu access using tpidr_el1
This patch implements optimised percpu variable accesses using the
el1 r/w thread register (tpidr_el1) along the same lines as arch/arm/.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19 17:43:06 +00:00