The pmd containing memblock_limit is cleared by prepare_page_table()
which creates the opportunity for early_alloc() to allocate unmapped
memory if memblock_limit is not pmd aligned causing a boot-time hang.
Commit 965278dcb8 ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
attempted to resolve this problem, but there is a path through the
adjust_lowmem_bounds() routine where if all memory regions start and
end on pmd-aligned addresses the memblock_limit will be set to
arm_lowmem_limit.
Since arm_lowmem_limit can be affected by the vmalloc early parameter,
the value of arm_lowmem_limit may not be pmd-aligned. This commit
corrects this oversight such that memblock_limit is always rounded
down to pmd-alignment.
Fixes: 965278dcb8 ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull KVM updates from Paolo Bonzini:
"ARM:
- HYP mode stub supports kexec/kdump on 32-bit
- improved PMU support
- virtual interrupt controller performance improvements
- support for userspace virtual interrupt controller (slower, but
necessary for KVM on the weird Broadcom SoCs used by the Raspberry
Pi 3)
MIPS:
- basic support for hardware virtualization (ImgTec P5600/P6600/I6400
and Cavium Octeon III)
PPC:
- in-kernel acceleration for VFIO
s390:
- support for guests without storage keys
- adapter interruption suppression
x86:
- usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits
- emulation of CPL3 CPUID faulting
generic:
- first part of VCPU thread request API
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
kvm: nVMX: Don't validate disabled secondary controls
KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
Revert "KVM: Support vCPU-based gfn->hva cache"
tools/kvm: fix top level makefile
KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
KVM: Documentation: remove VM mmap documentation
kvm: nVMX: Remove superfluous VMX instruction fault checks
KVM: x86: fix emulation of RSM and IRET instructions
KVM: mark requests that need synchronization
KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
KVM: add explicit barrier to kvm_vcpu_kick
KVM: perform a wake_up in kvm_make_all_cpus_request
KVM: mark requests that do not need a wakeup
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: x86: always use kvm_make_request instead of set_bit
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
s390: kvm: Cpu model support for msa6, msa7 and msa8
KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
kvm: better MWAIT emulation for guests
KVM: x86: virtualize cpuid faulting
...
To cope with the variety in ARM architectures and configurations, the
pagetable attributes for kernel memory are generated at runtime to match
the system the kernel finds itself on. This calculated value is stored
in pgprot_kernel.
However, when early fixmap support was added for ARM (commit
a5f4c561b3) the attributes used for mappings were hard coded because
pgprot_kernel is not set up early enough. Unfortunately, when fixmap is
used after early boot this means the memory being mapped can have
different attributes to existing mappings, potentially leading to
unpredictable behaviour. A specific problem also exists due to the hard
coded values not include the 'shareable' attribute which means on
systems where this matters (e.g. those with multiple CPU clusters) the
cache contents for a memory location can become inconsistent between
CPUs.
To resolve these issues we change fixmap to use the same memory
attributes (from pgprot_kernel) that the rest of the kernel uses. To
enable this we need to refactor the initialisation code so
build_mem_type_table() is called early enough. Note, that relies on early
param parsing for memory type overrides passed via the kernel command
line, so we need to make sure this call is still after
parse_early_params().
[ardb: keep early_fixmap_init() before param parsing, for earlycon]
Fixes: a5f4c561b3 ("ARM: 8415/1: early fixmap support for earlycon")
Cc: <stable@vger.kernel.org> # v4.3+
Tested-by: afzal mohammed <afzal.mohd.ma@gmail.com>
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
The KVM code needs to be able to compute the address of
symbols in its idmap page (the equivalent of a virt_to_idmap()
call). Unfortunately, virt_to_idmap is slightly complicated,
depending on the use of arch_phys_to_idmap_offset or not, and
none of that is readily available at HYP.
Instead, expose a single kimage_voffset variable which contains the
offset between a kernel VA and its idmap address, enabling the
VA->IDMAP conversion. This allows the KVM code to behave similarily
to its arm64 counterpart.
Tested-by: Keerthy <j-keerthy@ti.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
In preparation for adding CONFIG_DEBUG_VIRTUAL support, define a set of
common constants: KERNEL_START and KERNEL_END which abstract
CONFIG_XIP_KERNEL vs. !CONFIG_XIP_KERNEL. Update the code where
relevant.
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
adjust_lowmem_bounds is responsible for setting up the boundary for
lowmem/highmem. This needs to be setup before memblock reservations can
occur. At the time memblock reservations can occur, memory can also be
removed from the system. The lowmem/highmem boundary and end of memory
may be affected by this but it is currently not recalculated. On some
systems this may be harmless, on others this may result in incorrect
ranges being passed to the main memory allocator. Correct this by
recalculating the lowmem/highmem boundary after all reservations have
been made.
Tested-by: Magnus Lilja <lilja.magnus@gmail.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The logic for sanity_check_meminfo has become difficult to
follow. Clean up the code so it's more obvious what the code
is actually trying to do. Additionally, meminfo is now removed
so rename the function to better describe its purpose.
Tested-by: Magnus Lilja <lilja.magnus@gmail.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The cachepolicy variable gets initialized using a masked pmd
value. So far, the pmd has been masked with flags valid for the
2-page table format, but the 3-page table format requires a
different mask. On LPAE, this lead to a wrong assumption of what
initial cache policy has been used. Later a check forces the
cache policy to writealloc and prints the following warning:
Forcing write-allocate cache policy for SMP
This patch introduces a new definition PMD_SECT_CACHE_MASK for
both page table formats which masks in all cache flags in both
cases.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Guided by grsecurity's analogous __read_only markings in arch/arm,
this applies several uses of __ro_after_init to structures that are
only updated during __init.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The late_alloc() PTE allocation function used by create_mapping_late()
does not call pgtable_page_ctor() on PTE pages it allocates, leaving
the per-page spinlock uninitialized.
Since generic page table manipulation code may assume that translation
table pages that are not owned by init_mm are covered by fully
constructed struct pages, the following crash may occur with the new
UEFI memory attributes table code.
efi: memattr: Processing EFI Memory Attributes table:
efi: memattr: 0x0000ffa16000-0x0000ffa82fff [Runtime Code |RUN| | |XP| | | | | | | | ]
Unable to handle kernel NULL pointer dereference at virtual address 00000010
pgd = c0204000
[00000010] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc4-00063-g3882aa7b340b #361
Hardware name: Generic DT based system
task: ed858000 ti: ed842000 task.ti: ed842000
PC is at __lock_acquire+0xa0/0x19a8
...
[<c038c830>] (__lock_acquire) from [<c038e4f8>] (lock_acquire+0x6c/0x88)
[<c038e4f8>] (lock_acquire) from [<c0c06134>] (_raw_spin_lock+0x2c/0x3c)
[<c0c06134>] (_raw_spin_lock) from [<c0410384>] (apply_to_page_range+0xe8/0x238)
[<c0410384>] (apply_to_page_range) from [<c1205f34>] (efi_set_mapping_permissions+0x54/0x5c)
[<c1205f34>] (efi_set_mapping_permissions) from [<c1247474>] (efi_memattr_apply_permissions+0x2b8/0x378)
[<c1247474>] (efi_memattr_apply_permissions) from [<c1248258>] (arm_enable_runtime_services+0x1f0/0x22c)
[<c1248258>] (arm_enable_runtime_services) from [<c0301f0c>] (do_one_initcall+0x44/0x174)
[<c0301f0c>] (do_one_initcall) from [<c1200d10>] (kernel_init_freeable+0x90/0x1e8)
[<c1200d10>] (kernel_init_freeable) from [<c0bff690>] (kernel_init+0x8/0x114)
[<c0bff690>] (kernel_init) from [<c0307ed0>] (ret_from_fork+0x14/0x24)
The crash is due to the fact that the UEFI page tables are not owned by
init_mm, but are not covered by fully constructed struct pages.
Given that the UEFI subsystem is currently the only user of
create_mapping_late(), add an unconditional call to pgtable_page_ctor() to
late_alloc().
Fixes: 9fc68b717c ("ARM/efi: Apply strict permissions for UEFI Runtime Services regions")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
To limit the amount of mapped low memory, we determine a physical address
boundary based on the start of the vmalloc area using __pa().
Strictly speaking, the vmalloc area location is arbitrary and does not
necessarily corresponds to a valid physical address. For example, if
PAGE_OFFSET = 0x80000000
PHYS_OFFSET = 0x90000000
vmalloc_min = 0xf0000000
then __pa(vmalloc_min) overflows and returns a wrapped 0 when phys_addr_t
is a 32-bit type. Then the code that follows determines that the entire
physical memory is above that boundary and no low memory gets mapped at
all:
|[...]
|Machine model: Freescale i.MX51 NA04 Board
|Ignoring RAM at 0x90000000-0xb0000000 (!CONFIG_HIGHMEM)
|Consider using a HIGHMEM enabled kernel.
To avoid this problem let's make vmalloc_limit a 64-bit value all the
time and determine that boundary explicitly without using __pa().
Reported-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull ARM updates from Russell King:
"Another mixture of changes this time around:
- Split XIP linker file from main linker file to make it more
maintainable, and various XIP fixes, and clean up a resulting
macro.
- Decompressor cleanups from Masahiro Yamada
- Avoid printing an error for a missing L2 cache
- Remove some duplicated symbols in System.map, and move
vectors/stubs back into kernel VMA
- Various low priority fixes from Arnd
- Updates to allow bus match functions to return negative errno
values, touching some drivers and the driver core. Greg has acked
these changes.
- Virtualisation platform udpates form Jean-Philippe Brucker.
- Security enhancements from Kees Cook
- Rework some Kconfig dependencies and move PSCI idle management code
out of arch/arm into drivers/firmware/psci.c
- ARM DMA mapping updates, touching media, acked by Mauro.
- Fix places in ARM code which should be using virt_to_idmap() so
that Keystone2 can work.
- Fix Marvell Tauros2 to work again with non-DT boots.
- Provide a delay timer for ARM Orion platforms"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (45 commits)
ARM: 8546/1: dma-mapping: refactor to fix coherent+cma+gfp=0
ARM: 8547/1: dma-mapping: store buffer information
ARM: 8543/1: decompressor: rename suffix_y to compress-y
ARM: 8542/1: decompressor: merge piggy.*.S and simplify Makefile
ARM: 8541/1: decompressor: drop redundant FORCE in Makefile
ARM: 8540/1: decompressor: use clean-files instead of extra-y to clean files
ARM: 8539/1: decompressor: drop more unneeded assignments to "targets"
ARM: 8538/1: decompressor: drop unneeded assignments to "targets"
ARM: 8532/1: uncompress: mark putc as inline
ARM: 8531/1: turn init_new_context into an inline function
ARM: 8530/1: remove VIRT_TO_BUS
ARM: 8537/1: drop unused DEBUG_RODATA from XIP_KERNEL
ARM: 8536/1: mm: hide __start_rodata_section_aligned for non-debug builds
ARM: 8535/1: mm: DEBUG_RODATA makes no sense with XIP_KERNEL
ARM: 8534/1: virt: fix hyp-stub build for pre-ARMv7 CPUs
ARM: make the physical-relative calculation more obvious
ARM: 8512/1: proc-v7.S: Adjust stack address when XIP_KERNEL
ARM: 8411/1: Add default SPARSEMEM settings
ARM: 8503/1: clk_register_clkdev: remove format string interface
ARM: 8529/1: remove 'i' and 'zi' targets
...
There are few things about *pte_alloc*() helpers worth cleaning up:
- 'vma' argument is unused, let's drop it;
- most __pte_alloc() callers do speculative check for pmd_none(),
before taking ptl: let's introduce pte_alloc() macro which does
the check.
The only direct user of __pte_alloc left is userfaultfd, which has
different expectation about atomicity wrt pmd.
- pte_alloc_map() and pte_alloc_map_lock() are redefined using
pte_alloc().
[sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
[sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For an XIP build, _etext does not represent the end of the
binary image that needs to stay mapped into the MODULES_VADDR area.
Years ago, data came before text in the memory map. However,
now that the order is text/init/data, an XIP_KERNEL needs to map
up to the data location in order to keep from cutting off
parts of the kernel that are needed.
We only map up to the beginning of data because data has already been
copied, so there's no reason to keep it around anymore.
A new symbol is created to make it clear what it is we are referring
to.
This fixes the bug where you might lose the end of your kernel area
after page table setup is complete.
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull ARM SoC multiplatform code updates from Arnd Bergmann:
"This branch is the culmination of 5 years of effort to bring the ARMv6
and ARMv7 platforms together such that they can all be enabled and
boot the same kernel. It has been a tremendous amount of cleanup and
refactoring by a huge number of people, and creation of several new
(and major) subsystems to better abstract out all the platform details
in an appropriate manner.
The bulk of this branch is a large patchset from Arnd that brings
several of the more minor and older platforms we have closer to
multiplatform support. Among these are MMP, S3C64xx, Orion5x, mv78xx0
and realview Much of this is moving around header files from old mach
directories, but there are also some cleanup patches of debug_ll
(lowlevel debug per-platform options) and other parts.
Linus Walleij also has some patchs to clean up the older ARM Realview
platforms by finally introducing DT support, and Rob Herring has some
for ARM Versatile which is now DT-only. Both of these platforms are
now multiplatform.
Finally, a couple of patches from Russell for Dove PMU, and a fix from
Valentin Rothberg for Exynos ADC, which were rebased on top of the
series to avoid conflicts"
* tag 'armsoc-multiplatform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (75 commits)
ARM: realview: don't select SMP_ON_UP for UP builds
ARM: s3c: simplify s3c_irqwake_{e,}intallow definition
ARM: s3c64xx: fix pm-debug compilation
iio: exynos-adc: fix irqf_oneshot.cocci warnings
ARM: realview: build realview-dt SMP support only when used
ARM: realview: select apropriate targets
ARM: realview: clean up header files
ARM: realview: make all header files local
ARM: no longer make CPU targets visible separately
ARM: integrator: use explicit core module options
ARM: realview: enable multiplatform
ARM: make default platform work for NOMMU
ARM: debug-ll: move DEBUG_LL_UART_EFM32 to correct Kconfig location
ARM: defconfig: use correct debug_ll settings
ARM: versatile: convert to multi-platform
ARM: versatile: merge mach code into a single file
ARM: versatile: switch to DT only booting and remove legacy code
ARM: versatile: add DT based PCI detection
ARM: pxa: mark ezx structures as __maybe_unused
ARM: pxa: mark raumfeld init functions as __maybe_unused
...
The VMSA field of MMFR0 (bottom 4 bits) is incremented for each
added feature. PXN is supported if the value is >= 4 and LPAE
is supported if it is >= 5.
In case a kernel with CONFIG_ARM_LPAE disabled is used on a
processor that supports LPAE, we can still use PXN in short
descriptors. So check for >= 4 not == 4.
Signed-off-by: Jungseung Lee <js07.lee@samsung.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Take the new memblock attribute MEMBLOCK_NOMAP into account when
deciding whether a certain region is or should be covered by the
kernel direct mapping.
Tested-by: Ryan Harkin <ryan.harkin@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Add support to the kernel translation table population routines for
creating non-global mappings. This will be used by the UEFI runtime
services, which will use temporary mappings in the userland range.
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To allow __create_mapping() to be used for populating UEFI Runtime
Services page tables, factor out the allocation routine 'early_alloc'
and pass it down as a function pointer into alloc_init_[pud|pmd|pte].
This way, new users of __create_mapping() can supply another allocation
function.
Tested-by: Ryan Harkin <ryan.harkin@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
In order to be able to reuse the core mapping logic of create_mapping
for mapping the UEFI Runtime Services into a private set of page tables,
split it off from create_mapping() into a separate function
__create_mapping which we will wire up in a subsequent patch.
Tested-by: Ryan Harkin <ryan.harkin@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
This enables the generic early_ioremap implementation for ARM.
It uses the fixmap region reserved for kmap. Since early_ioremap
is only supported before paging_init(), and kmap is only supported
afterwards, this is guaranteed not to cause any clashes.
Tested-by: Ryan Harkin <ryan.harkin@linaro.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
In a multiplatform configuration, we may end up building a kernel for
both Marvell PJ1 and an ARMv4 CPU implementation. In that case, the
xscale-cp0 code is built with gcc -march=armv4{,t}, which results in a
build error from the coprocessor instructions.
Since we know this code will only have to run on an actual xscale
processor, we can simply build the entire file for ARMv5TE.
Related to this, we need to handle the iWMMXT initialization sequence
differently during boot, to ensure we don't try to touch xscale
specific registers on other CPUs from the xscale_cp0_init initcall.
cpu_is_xscale() used to be hardcoded to '1' in any configuration that
enables any XScale-compatible core, but this breaks once we can have a
combined kernel with MMP1 and something else.
In this patch, I replace the existing cpu_is_xscale() macro with a new
cpu_is_xscale_family() macro that evaluates true for xscale, xsc3 and
mohawk, which makes the behavior more deterministic.
The two existing users of cpu_is_xscale() are modified accordingly,
but slightly change behavior for kernels that enable CPU_MOHAWK without
also enabling CPU_XSCALE or CPU_XSC3. Previously, these would leave leave
PMD_BIT4 in the page tables untouched, now they clear it as we've always
done for kernels that enable both MOHAWK and the support for the older
CPU types.
Since the previous behavior was inconsistent, I assume it was
unintentional.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>