Commit Graph

292 Commits

Author SHA1 Message Date
zhichang.yuan d875c9b372 arm64: lib: Implement optimized memcmp routine
This patch, based on Linaro's Cortex Strings library, adds
an assembly optimized memcmp() function.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
Signed-off-by: Deepak Saxena <dsaxena@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23 15:07:57 +01:00
Arun KS b9acc49ee9 arm64: Fix deadlock scenario with smp_send_stop()
If one process calls sys_reboot and that process then stops other
CPUs while those CPUs are within a spin_lock() region we can
potentially encounter a deadlock scenario like below.

CPU 0                   CPU 1
-----                   -----
                        spin_lock(my_lock)
smp_send_stop()
 <send IPI>             handle_IPI()
                         disable_preemption/irqs
                          while(1);
 <PREEMPT>
spin_lock(my_lock) <--- Waits forever

We shouldn't attempt to run any other tasks after we send a stop
IPI to a CPU so disable preemption so that this task runs to
completion. We use local_irq_disable() here for cross-arch
consistency with x86.

Based-on-work-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Arun KS <getarunks@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-16 18:03:33 +01:00
Arun KS 90f51a09ef arm64: Fix machine_shutdown() definition
This patch ports most of commit 19ab428f4b "ARM: 7759/1: decouple CPU
offlining from reboot/shutdown" by Stephen Warren from arch/arm to
arch/arm64.

machine_shutdown() is a hook for kexec. Add a comment saying so, since
it isn't obvious from the function name.

Halt, power-off, and restart have different requirements re: stopping
secondary CPUs than kexec has. The former simply require the secondary
CPUs to be quiesced somehow, whereas kexec requires them to be
completely non-operational, so that no matter where the kexec target
images are written in RAM, they won't influence operation of the
secondary CPUS,which could happen if the CPUs were still executing some
kind of pin loop. To this end, modify machine_halt, power_off, and
restart to call smp_send_stop() directly, rather than calling
machine_shutdown().

In machine_shutdown(), replace the call to smp_send_stop() with a call
to disable_nonboot_cpus(). This completely disables all but one CPU,
thus satisfying the kexec requirements a couple paragraphs above.

Signed-off-by: Arun KS <getarunks@gmail.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-16 18:03:18 +01:00
Larry Bassel eb631bb5bf arm64: Support arch_irq_work_raise() via self IPIs
Support for arch_irq_work_raise() was missing from
arm64 (a prerequisite for FULL_NOHZ).

This patch is based on the arm32 patch ARM 7872/1.

commit bf18525fd7
Author: Stephen Boyd <sboyd@codeaurora.org>
Date:   Tue Oct 29 20:32:56 2013 +0100

    ARM: 7872/1: Support arch_irq_work_raise() via self IPIs

    By default, IRQ work is run from the tick interrupt (see
    irq_work_run() in update_process_times()). When we're in full
    NOHZ mode, restarting the tick requires the use of IRQ work and
    if the only place we run IRQ work is in the tick interrupt we
    have an unbreakable cycle. Implement arch_irq_work_raise() via
    self IPIs to break this cycle and get the tick started again.
    Note that we implement this via IPIs which are only available on
    SMP builds. This shouldn't be a problem because full NOHZ is only
    supported on SMP builds anyway.

    Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
    Reviewed-by: Kevin Hilman <khilman@linaro.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Signed-off-by: Larry Bassel <larry.bassel@linaro.org>
Reviewed-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-16 17:42:21 +01:00
Mark Brown ebdc94470d arm64: topology: Add support for topology DT bindings
Add support for parsing the explicit topology bindings to discover the
topology of the system.

Since it is not currently clear how to map multi-level clusters for the
scheduler all leaf clusters are presented to the scheduler at the same
level. This should be enough to provide good support for current systems.

Signed-off-by: Mark Brown <broonie@linaro.org>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-16 17:12:41 +01:00
Mark Brown c31bf0488d arm64: topology: Initialise default topology state immediately
As a legacy of the way 32 bit ARM did things the topology code uses a null
topology map by default and then overwrites it by mapping cores with no
information to a cluster by themselves later. In order to make it simpler
to reset things as part of recovering from parse failures in firmware
information directly set this configuration on init. A core will always be
its own sibling so there should be no risk of confusion with firmware
provided information.

Signed-off-by: Mark Brown <broonie@linaro.org>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-16 17:12:16 +01:00
Catalin Marinas cf5c95db57 Merge tag 'for-3.16' of git://git.linaro.org/people/ard.biesheuvel/linux-arm into upstream
FPSIMD register bank context switching and crypto algorithms
optimisations for arm64 from Ard Biesheuvel.

* tag 'for-3.16' of git://git.linaro.org/people/ard.biesheuvel/linux-arm:
  arm64/crypto: AES-ECB/CBC/CTR/XTS using ARMv8 NEON and Crypto Extensions
  arm64: pull in <asm/simd.h> from asm-generic
  arm64/crypto: AES in CCM mode using ARMv8 Crypto Extensions
  arm64/crypto: AES using ARMv8 Crypto Extensions
  arm64/crypto: GHASH secure hash using ARMv8 Crypto Extensions
  arm64/crypto: SHA-224/SHA-256 using ARMv8 Crypto Extensions
  arm64/crypto: SHA-1 using ARMv8 Crypto Extensions
  arm64: add support for kernel mode NEON in interrupt context
  arm64: defer reloading a task's FPSIMD state to userland resume
  arm64: add abstractions for FPSIMD state manipulation
  asm-generic: allow generic unaligned access if the arch supports it

Conflicts:
	arch/arm64/include/asm/thread_info.h
2014-05-16 10:05:11 +01:00
AKASHI Takahiro fd92d4a54a arm64: is_compat_task is defined both in asm/compat.h and linux/compat.h
Some kernel files may include both linux/compat.h and asm/compat.h directly
or indirectly. Since both header files contain is_compat_task() under
!CONFIG_COMPAT, compiling them with !CONFIG_COMPAT will eventually fail.
Such files include kernel/auditsc.c, kernel/seccomp.c and init/do_mountfs.c
(do_mountfs.c may read asm/compat.h via asm/ftrace.h once ftrace is
implemented).

So this patch proactively
1) removes is_compat_task() under !CONFIG_COMPAT from asm/compat.h
2) replaces asm/compat.h to linux/compat.h in kernel/*.c,
   but asm/compat.h is still necessary in ptrace.c and process.c because
   they use is_compat_thread().

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-12 16:43:29 +01:00
AKASHI Takahiro 3157858fef arm64: split syscall_trace() into separate functions for enter/exit
As done in arm, this change makes it easy to confirm we invoke syscall
related hooks, including syscall tracepoint, audit and seccomp which would
be implemented later, in correct order. That is, undoing operations in the
opposite order on exit that they were done on entry.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-12 16:43:29 +01:00
AKASHI Takahiro 449f81a4da arm64: make a single hook to syscall_trace() for all syscall features
Currently syscall_trace() is called only for ptrace.
With additional TIF_xx flags defined, it is now called in all the cases
of audit, ftrace and seccomp in addition to ptrace.

Acked-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-12 16:43:28 +01:00
Will Deacon 2a2830703a arm64: debug: avoid accessing mdscr_el1 on fault paths where possible
Since mdscr_el1 is part of the debug register group, it is highly likely
to be trapped by a hypervisor to prevent virtual machines from debugging
(buggering?) each other. Unfortunately, this absolutely destroys our
performance, since we access the register on many of our low-level
fault handling paths to keep track of the various debug state machines.

This patch removes our dependency on mdscr_el1 in the case that debugging
is not being used. More specifically we:

  - Use TIF_SINGLESTEP to indicate that a task is stepping at EL0 and
    avoid disabling step in the MDSCR when we don't need to.
    MDSCR_EL1.SS handling is moved to kernel_entry, when trapping from
    userspace.

  - Ensure debug exceptions are re-enabled on *all* exception entry
    paths, even the debug exception handling path (where we re-enable
    exceptions after invoking the handler). Since we can now rely on
    MDSCR_EL1.SS being cleared by the entry code, exception handlers can
    usually enable debug immediately before enabling interrupts.

  - Remove all debug exception unmasking from ret_to_user and
    el1_preempt, since we will never get here with debug exceptions
    masked.

This results in a slight change to kernel debug behaviour, where we now
step into interrupt handlers and data aborts from EL1 when debugging the
kernel, which is actually a useful thing to do. A side-effect of this is
that it *does* potentially prevent stepping off {break,watch}points when
there is a high-frequency interrupt source (e.g. a timer), so a debugger
would need to use either breakpoints or manually disable interrupts to
get around this issue.

With this patch applied, guest performance is restored under KVM when
debug register accesses are trapped (and we get a measurable performance
increase on the host on Cortex-A57 too).

Cc: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-12 16:43:28 +01:00
Will Deacon d0488597a1 arm64: head: fix cache flushing and barriers in set_cpu_boot_mode_flag
set_cpu_boot_mode_flag is used to identify which exception levels are
encountered across the system by CPUs trying to enter the kernel. The
basic algorithm is: if a CPU is booting at EL2, it will set a flag at
an offset of #4 from __boot_cpu_mode, a cacheline-aligned variable.
Otherwise, a flag is set at an offset of zero into the same cacheline.
This enables us to check that all CPUs booted at the same exception
level.

This cacheline is written with the stage-1 MMU off (that is, via a
strongly-ordered mapping) and will bypass any clean lines in the cache,
leading to potential coherence problems when the variable is later
checked via the normal, cacheable mapping of the kernel image.

This patch reworks the broken flushing code so that we:

  (1) Use a DMB to order the strongly-ordered write of the cacheline
      against the subsequent cache-maintenance operation (by-VA
      operations only hazard against normal, cacheable accesses).

  (2) Use a single dc ivac instruction to invalidate any clean lines
      containing a stale copy of the line after it has been updated.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 17:04:12 +01:00
Will Deacon 98f7685ee6 arm64: barriers: make use of barrier options with explicit barriers
When calling our low-level barrier macros directly, we can often suffice
with more relaxed behaviour than the default "all accesses, full system"
option.

This patch updates the users of dsb() to specify the option which they
actually require.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 17:03:15 +01:00
Catalin Marinas a501e32430 arm64: Clean up the default pgprot setting
The primary aim of this patchset is to remove the pgprot_default and
prot_sect_default global variables and rely strictly on predefined
values. The original goal was to be able to run SMP kernels on UP
hardware by not setting the Shareability bit. However, it is unlikely to
see UP ARMv8 hardware and even if we do, the Shareability bit is no
longer assumed to disable cacheable accesses.

A side effect is that the device mappings now have the Shareability
attribute set. The hardware, however, should ignore it since Device
accesses are always Outer Shareable.

Following the removal of the two global variables, there is some PROT_*
macro reshuffling and cleanup, including the __PAGE_* macros (replaced
by PAGE_*).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
2014-05-09 15:53:37 +01:00
Catalin Marinas 15af1942dd arm64: Expose ESR_EL1 information to user when SIGSEGV/SIGBUS
This information is useful for instruction emulators to detect
read/write and access size without having to decode the faulting
instruction. The current patch exports it via sigcontext (struct
esr_context) and is only valid for SIGSEGV and SIGBUS.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 15:47:49 +01:00
Catalin Marinas 0e0276d1e1 arm64: Remove the aux_context structure
This patch removes the aux_context structure (and the containing file)
to allow the placement of the _aarch64_ctx end magic based on the
context stored on the signal stack.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 15:47:48 +01:00
Catalin Marinas 9141300a58 arm64: Provide read/write fault information in compat signal handlers
For AArch32, bit 11 (WnR) of the FSR/ESR register is set when the fault
was caused by a write access and applications like Qemu rely on such
information being provided in sigcontext. This patch introduces the
ESR_EL1 tracking for the arm64 kernel faults and sets bit 11 accordingly
in compat sigcontext.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 15:47:47 +01:00
Catalin Marinas 6400111399 arm64: Remove boot thread synchronisation for spin-table release method
The synchronisation with the boot thread already happens in __cpu_up()
via wait_for_completion_timeout(). In addition, __cpu_up() calls are
protected by the cpu_add_remove_lock mutex and already serialised.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-09 15:47:46 +01:00
Catalin Marinas a41dc0e841 arm64: Implement cache_line_size() based on CTR_EL0.CWG
The hardware provides the maximum cache line size in the system via the
CTR_EL0.CWG bits. This patch implements the cache_line_size() function
to read such information, together with a sanity check if the statically
defined L1_CACHE_BYTES is smaller than the hardware value.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
2014-05-09 15:47:45 +01:00
Ard Biesheuvel 190f1ca85d arm64: add support for kernel mode NEON in interrupt context
This patch modifies kernel_neon_begin() and kernel_neon_end(), so
they may be called from any context. To address the case where only
a couple of registers are needed, kernel_neon_begin_partial(u32) is
introduced which takes as a parameter the number of bottom 'n' NEON
q-registers required. To mark the end of such a partial section, the
regular kernel_neon_end() should be used.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08 11:31:57 +02:00
Ard Biesheuvel 005f78cd88 arm64: defer reloading a task's FPSIMD state to userland resume
If a task gets scheduled out and back in again and nothing has touched
its FPSIMD state in the mean time, there is really no reason to reload
it from memory. Similarly, repeated calls to kernel_neon_begin() and
kernel_neon_end() will preserve and restore the FPSIMD state every time.

This patch defers the FPSIMD state restore to the last possible moment,
i.e., right before the task returns to userland. If a task does not return to
userland at all (for any reason), the existing FPSIMD state is preserved
and may be reused by the owning task if it gets scheduled in again on the
same CPU.

This patch adds two more functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_restore_current_state -> ensure current's FPSIMD state is loaded
- fpsimd_flush_task_state -> invalidate live copies of a task's FPSIMD state

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08 11:31:57 +02:00
Ard Biesheuvel c51f92693c arm64: add abstractions for FPSIMD state manipulation
There are two tacit assumptions in the FPSIMD handling code that will no longer
hold after the next patch that optimizes away some FPSIMD state restores:
. the FPSIMD registers of this CPU contain the userland FPSIMD state of
  task 'current';
. when switching to a task, its FPSIMD state will always be restored from
  memory.

This patch adds the following functions to abstract away from straight FPSIMD
register file saves and restores:
- fpsimd_preserve_current_state -> ensure current's FPSIMD state is saved
- fpsimd_update_current_state -> replace current's FPSIMD state

Where necessary, the signal handling and fork code are updated to use the above
wrappers instead of poking into the FPSIMD registers directly.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2014-05-08 11:31:41 +02:00
Catalin Marinas 6ecba8eb51 arm64: Use bus notifiers to set per-device coherent DMA ops
Recently, the default DMA ops have been changed to non-coherent for
alignment with 32-bit ARM platforms (and DT files). This patch adds bus
notifiers to be able to set the coherent DMA ops (with no cache
maintenance) for devices explicitly marked as coherent via the
"dma-coherent" DT property.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-03 22:20:34 +01:00
Marc Zyngier f774b7d10e arm64: fixmap: fix missing sub-page offset for earlyprintk
Commit d57c33c5da (add generic fixmap.h) added (among other
similar things) set_fixmap_io to deal with early ioremap of devices.

More recently, commit bf4b558eba (arm64: add early_ioremap support)
converted the arm64 earlyprintk to use set_fixmap_io. A side effect of
this conversion is that my virtual machines have stopped booting when
I pass "earlyprintk=uart8250-8bit,0x3f8" to the guest kernel.

Turns out that the new earlyprintk code doesn't care at all about
sub-page offsets, and just assumes that the earlyprintk device will
be page-aligned. Obviously, that doesn't play well with the above example.

Further investigation shows that set_fixmap_io uses __set_fixmap instead
of __set_fixmap_offset. A fix is to introduce a set_fixmap_offset_io that
uses the latter, and to remove the superflous call to fix_to_virt
(which only returns the value that set_fixmap_io has already given us).

With this applied, my VMs are back in business. Tested on a Cortex-A57
platform with kvmtool as platform emulation.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-03 22:20:31 +01:00
Chanho Min bc3ee18a7a arm64: init: Move of_clk_init to time_init
Clock providers should be initialized before clocksource_of_init.
If not, Clock source initialization can be fail to get the clock.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Chanho Min <chanho.min@lge.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-04-25 18:15:56 +01:00