Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Larger virtual address space on 64-bit server CPUs. By default we
     use a 128TB virtual address space, but a process can request access
     to the full 512TB by passing a hint to mmap().

   - Support for the new Power9 "XIVE" interrupt controller.

   - TLB flushing optimisations for the radix MMU on Power9.

   - Support for CAPI cards on Power9, using the "Coherent Accelerator
     Interface Architecture 2.0".

   - The ability to configure the mmap randomisation limits at build and
     runtime.

   - Several small fixes and cleanups to the kprobes code, as well as
     support for KPROBES_ON_FTRACE.

   - Major improvements to handling of system reset interrupts,
     correctly treating them as NMIs, giving them a dedicated stack and
     using a new hypervisor call to trigger them, all of which should
     aid debugging and robustness.

   - Many fixes and other minor enhancements.

  Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple,
  Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton
  Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt,
  Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy,
  Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy,
  Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin,
  Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J
  Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
  R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver
  O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell
  Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C.
  Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar,
  Yang Shi"

* tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it
  powerpc/powernv: Fix TCE kill on NVLink2
  powerpc/mm/radix: Drop support for CPUs without lockless tlbie
  powerpc/book3s/mce: Move add_taint() later in virtual mode
  powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body
  powerpc/smp: Document irq enable/disable after migrating IRQs
  powerpc/mpc52xx: Don't select user-visible RTAS_PROC
  powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset()
  powerpc/eeh: Clean up and document event handling functions
  powerpc/eeh: Avoid use after free in eeh_handle_special_event()
  cxl: Mask slice error interrupts after first occurrence
  cxl: Route eeh events to all drivers in cxl_pci_error_detected()
  cxl: Force context lock during EEH flow
  powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST
  powerpc/xmon: Teach xmon oops about radix vectors
  powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
  powerpc/pseries: Enable VFIO
  powerpc/powernv: Fix iommu table size calculation hook for small tables
  powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
  powerpc: Add arch/powerpc/tools directory
  ...
This commit is contained in:
Linus Torvalds
2017-05-05 11:36:44 -07:00
212 changed files with 8702 additions and 2794 deletions
@@ -26,7 +26,7 @@
| nios2: | TODO |
| openrisc: | TODO |
| parisc: | TODO |
| powerpc: | TODO |
| powerpc: | ok |
| s390: | TODO |
| score: | TODO |
| sh: | TODO |
+13 -2
View File
@@ -21,7 +21,7 @@ Introduction
Hardware overview
=================
POWER8 FPGA
POWER8/9 FPGA
+----------+ +---------+
| | | |
| CPU | | AFU |
@@ -34,7 +34,7 @@ Hardware overview
| | CAPP |<------>| |
+---+------+ PCIE +---------+
The POWER8 chip has a Coherently Attached Processor Proxy (CAPP)
The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
unit which is part of the PCIe Host Bridge (PHB). This is managed
by Linux by calls into OPAL. Linux doesn't directly program the
CAPP.
@@ -59,6 +59,17 @@ Hardware overview
the fault. The context to which this fault is serviced is based on
who owns that acceleration function.
POWER8 <-----> PSL Version 8 is compliant to the CAIA Version 1.0.
POWER9 <-----> PSL Version 9 is compliant to the CAIA Version 2.0.
This PSL Version 9 provides new features such as:
* Interaction with the nest MMU on the P9 chip.
* Native DMA support.
* Supports sending ASB_Notify messages for host thread wakeup.
* Supports Atomic operations.
* ....
Cards with a PSL9 won't work on a POWER8 system and cards with a
PSL8 won't work on a POWER9 system.
AFU Modes
=========
@@ -105,21 +105,21 @@ memory is held.
If there is no waiting dump data, then only the memory required
to hold CPU state, HPTE region, boot memory dump and elfcore
header, is reserved at the top of memory (see Fig. 1). This area
is *not* released: this region will be kept permanently reserved,
so that it can act as a receptacle for a copy of the boot memory
content in addition to CPU state and HPTE region, in the case a
crash does occur.
header, is usually reserved at an offset greater than boot memory
size (see Fig. 1). This area is *not* released: this region will
be kept permanently reserved, so that it can act as a receptacle
for a copy of the boot memory content in addition to CPU state
and HPTE region, in the case a crash does occur.
o Memory Reservation during first kernel
Low memory Top of memory
Low memory Top of memory
0 boot memory size |
| | |<--Reserved dump area -->|
V V | Permanent Reservation V
+-----------+----------/ /----------+---+----+-----------+----+
| | |CPU|HPTE| DUMP |ELF |
+-----------+----------/ /----------+---+----+-----------+----+
| | |<--Reserved dump area -->| |
V V | Permanent Reservation | V
+-----------+----------/ /---+---+----+-----------+----+------+
| | |CPU|HPTE| DUMP |ELF | |
+-----------+----------/ /---+---+----+-----------+----+------+
| ^
| |
\ /
@@ -135,12 +135,12 @@ crash does occur.
0 boot memory size |
| |<------------- Reserved dump area ----------- -->|
V V V
+-----------+----------/ /----------+---+----+-----------+----+
| | |CPU|HPTE| DUMP |ELF |
+-----------+----------/ /----------+---+----+-----------+----+
| |
V V
Used by second /proc/vmcore
+-----------+----------/ /---+---+----+-----------+----+------+
| | |CPU|HPTE| DUMP |ELF | |
+-----------+----------/ /---+---+----+-----------+----+------+
| |
V V
Used by second /proc/vmcore
kernel to boot
Fig. 2
+2 -1
View File
@@ -5310,6 +5310,7 @@ M: Scott Wood <oss@buserror.net>
L: linuxppc-dev@lists.ozlabs.org
L: linux-arm-kernel@lists.infradead.org
S: Maintained
F: Documentation/devicetree/bindings/powerpc/fsl/
F: drivers/soc/fsl/
F: include/linux/fsl/
@@ -7568,7 +7569,7 @@ Q: http://patchwork.ozlabs.org/project/linuxppc-dev/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
S: Supported
F: Documentation/ABI/stable/sysfs-firmware-opal-*
F: Documentation/devicetree/bindings/powerpc/opal/
F: Documentation/devicetree/bindings/powerpc/
F: Documentation/devicetree/bindings/rtc/rtc-opal.txt
F: Documentation/devicetree/bindings/i2c/i2c-opal.txt
F: Documentation/powerpc/
+63 -3
View File
@@ -22,6 +22,48 @@ config MMU
bool
default y
config ARCH_MMAP_RND_BITS_MAX
# On Book3S 64, the default virtual address space for 64-bit processes
# is 2^47 (128TB). As a maximum, allow randomisation to consume up to
# 32T of address space (2^45), which should ensure a reasonable gap
# between bottom-up and top-down allocations for applications that
# consume "normal" amounts of address space. Book3S 64 only supports 64K
# and 4K page sizes.
default 29 if PPC_BOOK3S_64 && PPC_64K_PAGES # 29 = 45 (32T) - 16 (64K)
default 33 if PPC_BOOK3S_64 # 33 = 45 (32T) - 12 (4K)
#
# On all other 64-bit platforms (currently only Book3E), the virtual
# address space is 2^46 (64TB). Allow randomisation to consume up to 16T
# of address space (2^44). Only 4K page sizes are supported.
default 32 if 64BIT # 32 = 44 (16T) - 12 (4K)
#
# For 32-bit, use the compat values, as they're the same.
default ARCH_MMAP_RND_COMPAT_BITS_MAX
config ARCH_MMAP_RND_BITS_MIN
# Allow randomisation to consume up to 1GB of address space (2^30).
default 14 if 64BIT && PPC_64K_PAGES # 14 = 30 (1GB) - 16 (64K)
default 18 if 64BIT # 18 = 30 (1GB) - 12 (4K)
#
# For 32-bit, use the compat values, as they're the same.
default ARCH_MMAP_RND_COMPAT_BITS_MIN
config ARCH_MMAP_RND_COMPAT_BITS_MAX
# Total virtual address space for 32-bit processes is 2^31 (2GB).
# Allow randomisation to consume up to 512MB of address space (2^29).
default 11 if PPC_256K_PAGES # 11 = 29 (512MB) - 18 (256K)
default 13 if PPC_64K_PAGES # 13 = 29 (512MB) - 16 (64K)
default 15 if PPC_16K_PAGES # 15 = 29 (512MB) - 14 (16K)
default 17 # 17 = 29 (512MB) - 12 (4K)
config ARCH_MMAP_RND_COMPAT_BITS_MIN
# Total virtual address space for 32-bit processes is 2^31 (2GB).
# Allow randomisation to consume up to 8MB of address space (2^23).
default 5 if PPC_256K_PAGES # 5 = 23 (8MB) - 18 (256K)
default 7 if PPC_64K_PAGES # 7 = 23 (8MB) - 16 (64K)
default 9 if PPC_16K_PAGES # 9 = 23 (8MB) - 14 (16K)
default 11 # 11 = 23 (8MB) - 12 (4K)
config HAVE_SETUP_PER_CPU_AREA
def_bool PPC64
@@ -38,6 +80,11 @@ config NR_IRQS
/proc/interrupts. If you configure your system to have too few,
drivers will fail to load or worse - handle with care.
config NMI_IPI
bool
depends on SMP && (DEBUGGER || KEXEC_CORE)
default y
config STACKTRACE_SUPPORT
bool
default y
@@ -119,6 +166,8 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_CBPF_JIT if !PPC64
@@ -141,6 +190,7 @@ config PPC
select HAVE_IRQ_EXIT_ON_IRQ_STACK
select HAVE_KERNEL_GZIP
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
select HAVE_KRETPROBES
select HAVE_LIVEPATCH if HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_MEMBLOCK
@@ -489,7 +539,7 @@ config KEXEC_FILE
config RELOCATABLE
bool "Build a relocatable kernel"
depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE))
depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
select NONSTATIC_KERNEL
select MODULE_REL_CRCS if MODVERSIONS
help
@@ -523,7 +573,7 @@ config RELOCATABLE_TEST
config CRASH_DUMP
bool "Build a kdump crash kernel"
depends on PPC64 || 6xx || FSL_BOOKE || (44x && !SMP)
select RELOCATABLE if (PPC64 && !COMPILE_TEST) || 44x || FSL_BOOKE
select RELOCATABLE if PPC64 || 44x || FSL_BOOKE
help
Build a kernel suitable for use as a kdump capture kernel.
The same kernel binary can be used as production kernel and dump
@@ -585,7 +635,7 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SPARSEMEM_DEFAULT
def_bool y
depends on (SMP && PPC_PSERIES) || PPC_PS3
depends on PPC_BOOK3S_64
config SYS_SUPPORTS_HUGETLBFS
bool
@@ -677,6 +727,16 @@ config PPC_256K_PAGES
endchoice
config THREAD_SHIFT
int "Thread shift" if EXPERT
range 13 15
default "15" if PPC_256K_PAGES
default "14" if PPC64
default "13"
help
Used to define the stack size. The default is almost always what you
want. Only change this if you know what you are doing.
config FORCE_MAX_ZONEORDER
int "Maximum zone order"
range 8 9 if PPC64 && PPC_64K_PAGES
+1 -12
View File
@@ -136,7 +136,7 @@ CFLAGS-$(CONFIG_GENERIC_CPU) += -mcpu=powerpc64
endif
ifdef CONFIG_MPROFILE_KERNEL
ifeq ($(shell $(srctree)/arch/powerpc/scripts/gcc-check-mprofile-kernel.sh $(CC) -I$(srctree)/include -D__KERNEL__),OK)
ifeq ($(shell $(srctree)/arch/powerpc/tools/gcc-check-mprofile-kernel.sh $(CC) -I$(srctree)/include -D__KERNEL__),OK)
CC_FLAGS_FTRACE := -pg -mprofile-kernel
KBUILD_CPPFLAGS += -DCC_USING_MPROFILE_KERNEL
else
@@ -274,17 +274,6 @@ PHONY += $(BOOT_TARGETS1) $(BOOT_TARGETS2)
boot := arch/$(ARCH)/boot
ifeq ($(CONFIG_RELOCATABLE),y)
quiet_cmd_relocs_check = CALL $<
cmd_relocs_check = $(CONFIG_SHELL) $< "$(OBJDUMP)" "$(obj)/vmlinux"
PHONY += relocs_check
relocs_check: arch/powerpc/relocs_check.sh vmlinux
$(call cmd,relocs_check)
zImage: relocs_check
endif
$(BOOT_TARGETS1): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(patsubst %,$(boot)/%,$@)
$(BOOT_TARGETS2): vmlinux
+34
View File
@@ -0,0 +1,34 @@
# ===========================================================================
# Post-link powerpc pass
# ===========================================================================
#
# 1. Check that vmlinux relocations look sane
PHONY := __archpost
__archpost:
include include/config/auto.conf
include scripts/Kbuild.include
quiet_cmd_relocs_check = CHKREL $@
cmd_relocs_check = $(CONFIG_SHELL) $(srctree)/arch/powerpc/tools/relocs_check.sh "$(OBJDUMP)" "$@"
# `@true` prevents complaint when there is nothing to be done
vmlinux: FORCE
@true
ifdef CONFIG_RELOCATABLE
$(call if_changed,relocs_check)
endif
%.ko: FORCE
@true
clean:
@true
PHONY += FORCE clean
FORCE:
.PHONY: $(PHONY)
+3 -3
View File
@@ -33,7 +33,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_BPF_SYSCALL=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_MODULES=y
@@ -261,7 +261,7 @@ CONFIG_NILFS2_FS=m
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_ISO9660_FS=m
CONFIG_ISO9660_FS=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=m
@@ -306,7 +306,7 @@ CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPT_CRC32C_VPMSUM=m
CONFIG_CRYPTO_CRC32C_VPMSUM=m
CONFIG_CRYPTO_MD5_PPC=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA256=y
+3 -3
View File
@@ -19,7 +19,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_BPF_SYSCALL=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_MODULES=y
@@ -290,7 +290,7 @@ CONFIG_NILFS2_FS=m
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_ISO9660_FS=m
CONFIG_ISO9660_FS=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=m
@@ -339,7 +339,7 @@ CONFIG_PPC_EARLY_DEBUG=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPT_CRC32C_VPMSUM=m
CONFIG_CRYPTO_CRC32C_VPMSUM=m
CONFIG_CRYPTO_MD5_PPC=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA256=y
+3 -3
View File
@@ -34,7 +34,7 @@ CONFIG_BLK_DEV_INITRD=y
CONFIG_BPF_SYSCALL=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
CONFIG_JUMP_LABEL=y
CONFIG_MODULES=y
@@ -259,7 +259,7 @@ CONFIG_NILFS2_FS=m
CONFIG_AUTOFS4_FS=m
CONFIG_FUSE_FS=m
CONFIG_OVERLAY_FS=m
CONFIG_ISO9660_FS=m
CONFIG_ISO9660_FS=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=m
@@ -303,7 +303,7 @@ CONFIG_XMON=y
CONFIG_CRYPTO_TEST=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPT_CRC32C_VPMSUM=m
CONFIG_CRYPTO_CRC32C_VPMSUM=m
CONFIG_CRYPTO_MD5_PPC=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_SHA256=y
@@ -17,6 +17,8 @@
#include <asm/checksum.h>
#include <linux/uaccess.h>
#include <asm/epapr_hcalls.h>
#include <asm/dcr.h>
#include <asm/mmu_context.h>
#include <uapi/asm/ucontext.h>
@@ -120,6 +122,8 @@ extern s64 __ashrdi3(s64, int);
extern int __cmpdi2(s64, s64);
extern int __ucmpdi2(u64, u64);
/* tracing */
void _mcount(void);
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip);
#endif /* _ASM_POWERPC_ASM_PROTOTYPES_H */
+8
View File
@@ -55,6 +55,14 @@
#define PPC_BITEXTRACT(bits, ppc_bit, dst_bit) \
((((bits) >> PPC_BITLSHIFT(ppc_bit)) & 1) << (dst_bit))
#define PPC_BITLSHIFT32(be) (32 - 1 - (be))
#define PPC_BIT32(bit) (1UL << PPC_BITLSHIFT32(bit))
#define PPC_BITMASK32(bs, be) ((PPC_BIT32(bs) - PPC_BIT32(be))|PPC_BIT32(bs))
#define PPC_BITLSHIFT8(be) (8 - 1 - (be))
#define PPC_BIT8(bit) (1UL << PPC_BITLSHIFT8(bit))
#define PPC_BITMASK8(bs, be) ((PPC_BIT8(bs) - PPC_BIT8(be))|PPC_BIT8(bs))
#include <asm/barrier.h>
/* Macro for generating the ***_bits() functions */
+1 -1
View File
@@ -8,7 +8,7 @@
#define H_PTE_INDEX_SIZE 9
#define H_PMD_INDEX_SIZE 7
#define H_PUD_INDEX_SIZE 9
#define H_PGD_INDEX_SIZE 9
#define H_PGD_INDEX_SIZE 12
#ifndef __ASSEMBLY__
#define H_PTE_TABLE_SIZE (sizeof(pte_t) << H_PTE_INDEX_SIZE)
@@ -4,10 +4,14 @@
#define H_PTE_INDEX_SIZE 8
#define H_PMD_INDEX_SIZE 5
#define H_PUD_INDEX_SIZE 5
#define H_PGD_INDEX_SIZE 12
#define H_PGD_INDEX_SIZE 15
#define H_PAGE_COMBO 0x00001000 /* this is a combo 4k page */
#define H_PAGE_4K_PFN 0x00002000 /* PFN is for a single 4k page */
/*
* 64k aligned address free up few of the lower bits of RPN for us
* We steal that here. For more deatils look at pte_pfn/pfn_pte()
*/
#define H_PAGE_COMBO _RPAGE_RPN0 /* this is a combo 4k page */
#define H_PAGE_4K_PFN _RPAGE_RPN1 /* PFN is for a single 4k page */
/*
* We need to differentiate between explicit huge page and THP huge
* page, since THP huge page also need to track real subpage details
+5 -11
View File
@@ -6,19 +6,13 @@
* Common bits between 4K and 64K pages in a linux-style PTE.
* Additional bits may be defined in pgtable-hash64-*.h
*
* Note: We only support user read/write permissions. Supervisor always
* have full read/write to pages above PAGE_OFFSET (pages below that
* always use the user access permissions).
*
* We could create separate kernel read-only if we used the 3 PP bits
* combinations that newer processors provide but we currently don't.
*/
#define H_PAGE_BUSY 0x00800 /* software: PTE & hash are busy */
#define H_PTE_NONE_MASK _PAGE_HPTEFLAGS
#define H_PAGE_F_GIX_SHIFT 57
#define H_PAGE_F_GIX (7ul << 57) /* HPTE index within HPTEG */
#define H_PAGE_F_SECOND (1ul << 60) /* HPTE is in 2ndary HPTEG */
#define H_PAGE_HASHPTE (1ul << 61) /* PTE has associated HPTE */
#define H_PAGE_F_GIX_SHIFT 56
#define H_PAGE_BUSY _RPAGE_RSV1 /* software: PTE & hash are busy */
#define H_PAGE_F_SECOND _RPAGE_RSV2 /* HPTE is in 2ndary HPTEG */
#define H_PAGE_F_GIX (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
#define H_PAGE_HASHPTE _RPAGE_RPN43 /* PTE has associated HPTE */
#ifdef CONFIG_PPC_64K_PAGES
#include <asm/book3s/64/hash-64k.h>
+1 -1
View File
@@ -46,7 +46,7 @@ static inline pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
*/
VM_WARN_ON(page_shift == mmu_psize_defs[MMU_PAGE_1G].shift);
if (page_shift == mmu_psize_defs[MMU_PAGE_2M].shift)
return __pte(pte_val(entry) | _PAGE_LARGE);
return __pte(pte_val(entry) | R_PAGE_LARGE);
else
return entry;
}
+126 -74
View File
@@ -39,6 +39,7 @@
/* Bits in the SLB VSID word */
#define SLB_VSID_SHIFT 12
#define SLB_VSID_SHIFT_256M SLB_VSID_SHIFT
#define SLB_VSID_SHIFT_1T 24
#define SLB_VSID_SSIZE_SHIFT 62
#define SLB_VSID_B ASM_CONST(0xc000000000000000)
@@ -408,7 +409,7 @@ static inline unsigned long hpt_vpn(unsigned long ea,
static inline unsigned long hpt_hash(unsigned long vpn,
unsigned int shift, int ssize)
{
int mask;
unsigned long mask;
unsigned long hash, vsid;
/* VPN_SHIFT can be atmost 12 */
@@ -491,13 +492,14 @@ extern void slb_set_size(u16 size);
* We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated
* from mmu context id and effective segment id of the address.
*
* For user processes max context id is limited to ((1ul << 19) - 5)
* for kernel space, we use the top 4 context ids to map address as below
* For user processes max context id is limited to MAX_USER_CONTEXT.
* For kernel space, we use context ids 1-4 to map addresses as below:
* NOTE: each context only support 64TB now.
* 0x7fffc - [ 0xc000000000000000 - 0xc0003fffffffffff ]
* 0x7fffd - [ 0xd000000000000000 - 0xd0003fffffffffff ]
* 0x7fffe - [ 0xe000000000000000 - 0xe0003fffffffffff ]
* 0x7ffff - [ 0xf000000000000000 - 0xf0003fffffffffff ]
* 0x00001 - [ 0xc000000000000000 - 0xc0003fffffffffff ]
* 0x00002 - [ 0xd000000000000000 - 0xd0003fffffffffff ]
* 0x00003 - [ 0xe000000000000000 - 0xe0003fffffffffff ]
* 0x00004 - [ 0xf000000000000000 - 0xf0003fffffffffff ]
*
* The proto-VSIDs are then scrambled into real VSIDs with the
* multiplicative hash:
@@ -511,20 +513,28 @@ extern void slb_set_size(u16 size);
* robust scattering in the hash table (at least based on some initial
* results).
*
* We also consider VSID 0 special. We use VSID 0 for slb entries mapping
* bad address. This enables us to consolidate bad address handling in
* hash_page.
* We use VSID 0 to indicate an invalid VSID. The means we can't use context id
* 0, because a context id of 0 and an EA of 0 gives a proto-VSID of 0, which
* will produce a VSID of 0.
*
* We also need to avoid the last segment of the last context, because that
* would give a protovsid of 0x1fffffffff. That will result in a VSID 0
* because of the modulo operation in vsid scramble. But the vmemmap
* (which is what uses region 0xf) will never be close to 64TB in size
* (it's 56 bytes per page of system memory).
* because of the modulo operation in vsid scramble.
*/
/*
* Max Va bits we support as of now is 68 bits. We want 19 bit
* context ID.
* Restrictions:
* GPU has restrictions of not able to access beyond 128TB
* (47 bit effective address). We also cannot do more than 20bit PID.
* For p4 and p5 which can only do 65 bit VA, we restrict our CONTEXT_BITS
* to 16 bits (ie, we can only have 2^16 pids at the same time).
*/
#define VA_BITS 68
#define CONTEXT_BITS 19
#define ESID_BITS 18
#define ESID_BITS_1T 6
#define ESID_BITS (VA_BITS - (SID_SHIFT + CONTEXT_BITS))
#define ESID_BITS_1T (VA_BITS - (SID_SHIFT_1T + CONTEXT_BITS))
#define ESID_BITS_MASK ((1 << ESID_BITS) - 1)
#define ESID_BITS_1T_MASK ((1 << ESID_BITS_1T) - 1)
@@ -532,63 +542,70 @@ extern void slb_set_size(u16 size);
/*
* 256MB segment
* The proto-VSID space has 2^(CONTEX_BITS + ESID_BITS) - 1 segments
* available for user + kernel mapping. The top 4 contexts are used for
* kernel mapping. Each segment contains 2^28 bytes. Each
* context maps 2^46 bytes (64TB) so we can support 2^19-1 contexts
* (19 == 37 + 28 - 46).
* available for user + kernel mapping. VSID 0 is reserved as invalid, contexts
* 1-4 are used for kernel mapping. Each segment contains 2^28 bytes. Each
* context maps 2^49 bytes (512TB).
*
* We also need to avoid the last segment of the last context, because that
* would give a protovsid of 0x1fffffffff. That will result in a VSID 0
* because of the modulo operation in vsid scramble.
*/
#define MAX_USER_CONTEXT ((ASM_CONST(1) << CONTEXT_BITS) - 5)
#define MAX_USER_CONTEXT ((ASM_CONST(1) << CONTEXT_BITS) - 2)
#define MIN_USER_CONTEXT (5)
/* Would be nice to use KERNEL_REGION_ID here */
#define KERNEL_REGION_CONTEXT_OFFSET (0xc - 1)
/*
* For platforms that support on 65bit VA we limit the context bits
*/
#define MAX_USER_CONTEXT_65BIT_VA ((ASM_CONST(1) << (65 - (SID_SHIFT + ESID_BITS))) - 2)
/*
* This should be computed such that protovosid * vsid_mulitplier
* doesn't overflow 64 bits. It should also be co-prime to vsid_modulus
* doesn't overflow 64 bits. The vsid_mutliplier should also be
* co-prime to vsid_modulus. We also need to make sure that number
* of bits in multiplied result (dividend) is less than twice the number of
* protovsid bits for our modulus optmization to work.
*
* The below table shows the current values used.
* |-------+------------+----------------------+------------+-------------------|
* | | Prime Bits | proto VSID_BITS_65VA | Total Bits | 2* prot VSID_BITS |
* |-------+------------+----------------------+------------+-------------------|
* | 1T | 24 | 25 | 49 | 50 |
* |-------+------------+----------------------+------------+-------------------|
* | 256MB | 24 | 37 | 61 | 74 |
* |-------+------------+----------------------+------------+-------------------|
*
* |-------+------------+----------------------+------------+--------------------|
* | | Prime Bits | proto VSID_BITS_68VA | Total Bits | 2* proto VSID_BITS |
* |-------+------------+----------------------+------------+--------------------|
* | 1T | 24 | 28 | 52 | 56 |
* |-------+------------+----------------------+------------+--------------------|
* | 256MB | 24 | 40 | 64 | 80 |
* |-------+------------+----------------------+------------+--------------------|
*
*/
#define VSID_MULTIPLIER_256M ASM_CONST(12538073) /* 24-bit prime */
#define VSID_BITS_256M (CONTEXT_BITS + ESID_BITS)
#define VSID_MODULUS_256M ((1UL<<VSID_BITS_256M)-1)
#define VSID_BITS_256M (VA_BITS - SID_SHIFT)
#define VSID_BITS_65_256M (65 - SID_SHIFT)
/*
* Modular multiplicative inverse of VSID_MULTIPLIER under modulo VSID_MODULUS
*/
#define VSID_MULINV_256M ASM_CONST(665548017062)
#define VSID_MULTIPLIER_1T ASM_CONST(12538073) /* 24-bit prime */
#define VSID_BITS_1T (CONTEXT_BITS + ESID_BITS_1T)
#define VSID_MODULUS_1T ((1UL<<VSID_BITS_1T)-1)
#define VSID_BITS_1T (VA_BITS - SID_SHIFT_1T)
#define VSID_BITS_65_1T (65 - SID_SHIFT_1T)
#define VSID_MULINV_1T ASM_CONST(209034062)
/* 1TB VSID reserved for VRMA */
#define VRMA_VSID 0x1ffffffUL
#define USER_VSID_RANGE (1UL << (ESID_BITS + SID_SHIFT))
/*
* This macro generates asm code to compute the VSID scramble
* function. Used in slb_allocate() and do_stab_bolted. The function
* computed is: (protovsid*VSID_MULTIPLIER) % VSID_MODULUS
*
* rt = register containing the proto-VSID and into which the
* VSID will be stored
* rx = scratch register (clobbered)
*
* - rt and rx must be different registers
* - The answer will end up in the low VSID_BITS bits of rt. The higher
* bits may contain other garbage, so you may need to mask the
* result.
*/
#define ASM_VSID_SCRAMBLE(rt, rx, size) \
lis rx,VSID_MULTIPLIER_##size@h; \
ori rx,rx,VSID_MULTIPLIER_##size@l; \
mulld rt,rt,rx; /* rt = rt * MULTIPLIER */ \
\
srdi rx,rt,VSID_BITS_##size; \
clrldi rt,rt,(64-VSID_BITS_##size); \
add rt,rt,rx; /* add high and low bits */ \
/* NOTE: explanation based on VSID_BITS_##size = 36 \
* Now, r3 == VSID (mod 2^36-1), and lies between 0 and \
* 2^36-1+2^28-1. That in particular means that if r3 >= \
* 2^36-1, then r3+1 has the 2^36 bit set. So, if r3+1 has \
* the bit clear, r3 already has the answer we want, if it \
* doesn't, the answer is the low 36 bits of r3+1. So in all \
* cases the answer is the low 36 bits of (r3 + ((r3+1) >> 36))*/\
addi rx,rt,1; \
srdi rx,rx,VSID_BITS_##size; /* extract 2^VSID_BITS bit */ \
add rt,rt,rx
/* 4 bits per slice and we have one slice per 1TB */
#define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41)
#define SLICE_ARRAY_SIZE (H_PGTABLE_RANGE >> 41)
#define TASK_SLICE_ARRAY_SZ(x) ((x)->context.addr_limit >> 41)
#ifndef __ASSEMBLY__
@@ -634,7 +651,7 @@ static inline void subpage_prot_init_new_context(struct mm_struct *mm) { }
#define vsid_scramble(protovsid, size) \
((((protovsid) * VSID_MULTIPLIER_##size) % VSID_MODULUS_##size))
#else /* 1 */
/* simplified form avoiding mod operation */
#define vsid_scramble(protovsid, size) \
({ \
unsigned long x; \
@@ -642,6 +659,21 @@ static inline void subpage_prot_init_new_context(struct mm_struct *mm) { }
x = (x >> VSID_BITS_##size) + (x & VSID_MODULUS_##size); \
(x + ((x+1) >> VSID_BITS_##size)) & VSID_MODULUS_##size; \
})
#else /* 1 */
static inline unsigned long vsid_scramble(unsigned long protovsid,
unsigned long vsid_multiplier, int vsid_bits)
{
unsigned long vsid;
unsigned long vsid_modulus = ((1UL << vsid_bits) - 1);
/*
* We have same multipler for both 256 and 1T segements now
*/
vsid = protovsid * vsid_multiplier;
vsid = (vsid >> vsid_bits) + (vsid & vsid_modulus);
return (vsid + ((vsid + 1) >> vsid_bits)) & vsid_modulus;
}
#endif /* 1 */
/* Returns the segment size indicator for a user address */
@@ -656,36 +688,56 @@ static inline int user_segment_size(unsigned long addr)
static inline unsigned long get_vsid(unsigned long context, unsigned long ea,
int ssize)
{
unsigned long va_bits = VA_BITS;
unsigned long vsid_bits;
unsigned long protovsid;
/*
* Bad address. We return VSID 0 for that
*/
if ((ea & ~REGION_MASK) >= H_PGTABLE_RANGE)
return 0;
if (ssize == MMU_SEGSIZE_256M)
return vsid_scramble((context << ESID_BITS)
| ((ea >> SID_SHIFT) & ESID_BITS_MASK), 256M);
return vsid_scramble((context << ESID_BITS_1T)
| ((ea >> SID_SHIFT_1T) & ESID_BITS_1T_MASK), 1T);
if (!mmu_has_feature(MMU_FTR_68_BIT_VA))
va_bits = 65;
if (ssize == MMU_SEGSIZE_256M) {
vsid_bits = va_bits - SID_SHIFT;
protovsid = (context << ESID_BITS) |
((ea >> SID_SHIFT) & ESID_BITS_MASK);
return vsid_scramble(protovsid, VSID_MULTIPLIER_256M, vsid_bits);
}
/* 1T segment */
vsid_bits = va_bits - SID_SHIFT_1T;
protovsid = (context << ESID_BITS_1T) |
((ea >> SID_SHIFT_1T) & ESID_BITS_1T_MASK);
return vsid_scramble(protovsid, VSID_MULTIPLIER_1T, vsid_bits);
}
/*
* This is only valid for addresses >= PAGE_OFFSET
*
* For kernel space, we use the top 4 context ids to map address as below
* 0x7fffc - [ 0xc000000000000000 - 0xc0003fffffffffff ]
* 0x7fffd - [ 0xd000000000000000 - 0xd0003fffffffffff ]
* 0x7fffe - [ 0xe000000000000000 - 0xe0003fffffffffff ]
* 0x7ffff - [ 0xf000000000000000 - 0xf0003fffffffffff ]
*/
static inline unsigned long get_kernel_vsid(unsigned long ea, int ssize)
{
unsigned long context;
if (!is_kernel_addr(ea))
return 0;
/*
* kernel take the top 4 context from the available range
* For kernel space, we use context ids 1-4 to map the address space as
* below:
*
* 0x00001 - [ 0xc000000000000000 - 0xc0003fffffffffff ]
* 0x00002 - [ 0xd000000000000000 - 0xd0003fffffffffff ]
* 0x00003 - [ 0xe000000000000000 - 0xe0003fffffffffff ]
* 0x00004 - [ 0xf000000000000000 - 0xf0003fffffffffff ]
*
* So we can compute the context from the region (top nibble) by
* subtracting 11, or 0xc - 1.
*/
context = (MAX_USER_CONTEXT) + ((ea >> 60) - 0xc) + 1;
context = (ea >> 60) - KERNEL_REGION_CONTEXT_OFFSET;
return get_vsid(context, ea, ssize);
}
+9
View File
@@ -65,6 +65,8 @@ extern struct patb_entry *partition_tb;
* MAX_USER_CONTEXT * 16 bytes of space.
*/
#define PRTB_SIZE_SHIFT (CONTEXT_BITS + 4)
#define PRTB_ENTRIES (1ul << CONTEXT_BITS)
/*
* Power9 currently only support 64K partition table size.
*/
@@ -73,13 +75,20 @@ extern struct patb_entry *partition_tb;
typedef unsigned long mm_context_id_t;
struct spinlock;
/* Maximum possible number of NPUs in a system. */
#define NV_MAX_NPUS 8
typedef struct {
mm_context_id_t id;
u16 user_psize; /* page size index */
/* NPU NMMU context */
struct npu_context *npu_context;
#ifdef CONFIG_PPC_MM_SLICES
u64 low_slices_psize; /* SLB page size encodings */
unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
unsigned long addr_limit;
#else
u16 sllp; /* SLB page size encoding */
#endif
+41 -17
View File
@@ -13,6 +13,7 @@
#define _PAGE_BIT_SWAP_TYPE 0
#define _PAGE_RO 0
#define _PAGE_SHARED 0
#define _PAGE_EXEC 0x00001 /* execute permission */
#define _PAGE_WRITE 0x00002 /* write access allowed */
@@ -37,21 +38,47 @@
#define _RPAGE_RSV3 0x0400000000000000UL
#define _RPAGE_RSV4 0x0200000000000000UL
#ifdef CONFIG_MEM_SOFT_DIRTY
#define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking */
#else
#define _PAGE_SOFT_DIRTY 0x00000
#endif
#define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */
#define _PAGE_PTE 0x4000000000000000UL /* distinguishes PTEs from pointers */
#define _PAGE_PRESENT 0x8000000000000000UL /* pte contains a translation */
/*
* For P9 DD1 only, we need to track whether the pte's huge.
* Top and bottom bits of RPN which can be used by hash
* translation mode, because we expect them to be zero
* otherwise.
*/
#define _PAGE_LARGE _RPAGE_RSV1
#define _RPAGE_RPN0 0x01000
#define _RPAGE_RPN1 0x02000
#define _RPAGE_RPN44 0x0100000000000000UL
#define _RPAGE_RPN43 0x0080000000000000UL
#define _RPAGE_RPN42 0x0040000000000000UL
#define _RPAGE_RPN41 0x0020000000000000UL
/* Max physical address bit as per radix table */
#define _RPAGE_PA_MAX 57
#define _PAGE_PTE (1ul << 62) /* distinguishes PTEs from pointers */
#define _PAGE_PRESENT (1ul << 63) /* pte contains a translation */
/*
* Max physical address bit we will use for now.
*
* This is mostly a hardware limitation and for now Power9 has
* a 51 bit limit.
*
* This is different from the number of physical bit required to address
* the last byte of memory. That is defined by MAX_PHYSMEM_BITS.
* MAX_PHYSMEM_BITS is a linux limitation imposed by the maximum
* number of sections we can support (SECTIONS_SHIFT).
*
* This is different from Radix page table limitation above and
* should always be less than that. The limit is done such that
* we can overload the bits between _RPAGE_PA_MAX and _PAGE_PA_MAX
* for hash linux page table specific bits.
*
* In order to be compatible with future hardware generations we keep
* some offsets and limit this for now to 53
*/
#define _PAGE_PA_MAX 53
#define _PAGE_SOFT_DIRTY _RPAGE_SW3 /* software: software dirty tracking */
#define _PAGE_SPECIAL _RPAGE_SW2 /* software: special page */
/*
* Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
* Instead of fixing all of them, add an alternate define which
@@ -59,10 +86,11 @@
*/
#define _PAGE_NO_CACHE _PAGE_TOLERANT
/*
* We support 57 bit real address in pte. Clear everything above 57, and
* every thing below PAGE_SHIFT;
* We support _RPAGE_PA_MAX bit real address in pte. On the linux side
* we are limited by _PAGE_PA_MAX. Clear everything above _PAGE_PA_MAX
* and every thing below PAGE_SHIFT;
*/
#define PTE_RPN_MASK (((1UL << 57) - 1) & (PAGE_MASK))
#define PTE_RPN_MASK (((1UL << _PAGE_PA_MAX) - 1) & (PAGE_MASK))
/*
* set of bits not changed in pmd_modify. Even though we have hash specific bits
* in here, on radix we expect them to be zero.
@@ -205,10 +233,6 @@ extern unsigned long __pte_frag_nr;
extern unsigned long __pte_frag_size_shift;
#define PTE_FRAG_SIZE_SHIFT __pte_frag_size_shift
#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
/*
* Pgtable size used by swapper, init in asm code
*/
#define MAX_PGD_TABLE_SIZE (sizeof(pgd_t) << RADIX_PGD_INDEX_SIZE)
#define PTRS_PER_PTE (1 << PTE_INDEX_SIZE)
#define PTRS_PER_PMD (1 << PMD_INDEX_SIZE)
+7 -1
View File
@@ -11,6 +11,12 @@
#include <asm/book3s/64/radix-4k.h>
#endif
/*
* For P9 DD1 only, we need to track whether the pte's huge.
*/
#define R_PAGE_LARGE _RPAGE_RSV1
#ifndef __ASSEMBLY__
#include <asm/book3s/64/tlbflush-radix.h>
#include <asm/cpu_has_feature.h>
@@ -252,7 +258,7 @@ static inline int radix__pmd_trans_huge(pmd_t pmd)
static inline pmd_t radix__pmd_mkhuge(pmd_t pmd)
{
if (cpu_has_feature(CPU_FTR_POWER9_DD1))
return __pmd(pmd_val(pmd) | _PAGE_PTE | _PAGE_LARGE);
return __pmd(pmd_val(pmd) | _PAGE_PTE | R_PAGE_LARGE);
return __pmd(pmd_val(pmd) | _PAGE_PTE);
}
static inline void radix__pmdp_huge_split_prepare(struct vm_area_struct *vma,

Some files were not shown because too many files have changed in this diff Show More