Pull x86 fixes from Ingo Molnar:
"Misc fixes and small updates all around the place:
- Fix mitigation state sysfs output
- Fix an FPU xstate/sxave code assumption bug triggered by
Architectural LBR support
- Fix Lightning Mountain SoC TSC frequency enumeration bug
- Fix kexec debug output
- Fix kexec memory range assumption bug
- Fix a boundary condition in the crash kernel code
- Optimize porgatory.ro generation a bit
- Enable ACRN guests to use X2APIC mode
- Reduce a __text_poke() IRQs-off critical section for the benefit of
PREEMPT_RT"
* tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Acquire pte lock with interrupts enabled
x86/bugs/multihit: Fix mitigation reporting when VMX is not in use
x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs
x86/purgatory: Don't generate debug info for purgatory.ro
x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC
kexec_file: Correctly output debugging information for the PT_LOAD ELF header
kexec: Improve & fix crash_exclude_mem_range() to handle overlapping ranges
x86/crash: Correct the address boundary of function parameters
x86/acrn: Remove redundant chars from ACRN signature
x86/acrn: Allow ACRN guest to use X2APIC mode
Pull perf fixes from Ingo Molnar:
"Misc fixes, an expansion of perf syscall access to CAP_PERFMON
privileged tools, plus a RAPL HW-enablement for Intel SPR platforms"
* tag 'perf-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/rapl: Add support for Intel SPR platform
perf/x86/rapl: Support multiple RAPL unit quirks
perf/x86/rapl: Fix missing psys sysfs attributes
hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration
kprobes: Remove show_registers() function prototype
perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability
Pull timekeeping updates from Thomas Gleixner:
"A set of timekeeping/VDSO updates:
- Preparatory work to allow S390 to switch over to the generic VDSO
implementation.
S390 requires that the VDSO data pointer is handed in to the
counter read function when time namespace support is enabled.
Adding the pointer is a NOOP for all other architectures because
the compiler is supposed to optimize that out when it is unused in
the architecture specific inline. The change also solved a similar
problem for MIPS which fortunately has time namespaces not yet
enabled.
S390 needs to update clock related VDSO data independent of the
timekeeping updates. This was solved so far with yet another
sequence counter in the S390 implementation. A better solution is
to utilize the already existing VDSO sequence count for this. The
core code now exposes helper functions which allow to serialize
against the timekeeper code and against concurrent readers.
S390 needs extra data for their clock readout function. The initial
common VDSO data structure did not provide a way to add that. It
now has an embedded architecture specific struct embedded which
defaults to an empty struct.
Doing this now avoids tree dependencies and conflicts post rc1 and
allows all other architectures which work on generic VDSO support
to work from a common upstream base.
- A trivial comment fix"
* tag 'timers-urgent-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Delete repeated words in comments
lib/vdso: Allow to add architecture-specific vdso data
timekeeping/vsyscall: Provide vdso_update_begin/end()
vdso/treewide: Add vdso_data pointer argument to __arch_get_hw_counter()
Pull more timer updates from Thomas Gleixner:
"A set of posix CPU timer changes which allows to defer the heavy work
of posix CPU timers into task work context. The tick interrupt is
reduced to a quick check which queues the work which is doing the
heavy lifting before returning to user space or going back to guest
mode. Moving this out is deferring the signal delivery slightly but
posix CPU timers are inaccurate by nature as they depend on the tick
so there is no real damage. The relevant test cases all passed.
This lifts the last offender for RT out of the hard interrupt context
tick handler, but it also has the general benefit that the actual
heavy work is accounted to the task/process and not to the tick
interrupt itself.
Further optimizations are possible to break long sighand lock hold and
interrupt disabled (on !RT kernels) times when a massive amount of
posix CPU timers (which are unpriviledged) is armed for a
task/process.
This is currently only enabled for x86 because the architecture has to
ensure that task work is handled in KVM before entering a guest, which
was just established for x86 with the new common entry/exit code which
got merged post 5.8 and is not the case for other KVM architectures"
* tag 'timers-core-2020-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Select POSIX_CPU_TIMERS_TASK_WORK
posix-cpu-timers: Provide mechanisms to defer timer handling to task_work
posix-cpu-timers: Split run_posix_cpu_timers()
Pull more xen updates from Juergen Gross:
- Remove support for running as 32-bit Xen PV-guest.
32-bit PV guests are rarely used, are lacking security fixes for
Meltdown, and can be easily replaced by PVH mode. Another series for
doing more cleanup will follow soon (removal of 32-bit-only pvops
functionality).
- Fixes and additional features for the Xen display frontend driver.
* tag 'for-linus-5.9-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
drm/xen-front: Pass dumb buffer data offset to the backend
xen: Sync up with the canonical protocol definition in Xen
drm/xen-front: Add YUYV to supported formats
drm/xen-front: Fix misused IS_ERR_OR_NULL checks
xen/gntdev: Fix dmabuf import with non-zero sgt offset
x86/xen: drop tests for highmem in pv code
x86/xen: eliminate xen-asm_64.S
x86/xen: remove 32-bit Xen PV guest support
Pull hyper-v fixes from Wei Liu:
- fix oops reporting on Hyper-V
- make objtool happy
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Make hv_setup_sched_clock inline
Drivers: hv: vmbus: Only notify Hyper-V for die events that are oops
syzbot found its way in 86_fsgsbase_read_task() and triggered this oops:
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 6866 Comm: syz-executor262 Not tainted 5.8.0-syzkaller #0
RIP: 0010:x86_fsgsbase_read_task+0x16d/0x310 arch/x86/kernel/process_64.c:393
Call Trace:
putreg32+0x3ab/0x530 arch/x86/kernel/ptrace.c:876
genregs32_set arch/x86/kernel/ptrace.c:1026 [inline]
genregs32_set+0xa4/0x100 arch/x86/kernel/ptrace.c:1006
copy_regset_from_user include/linux/regset.h:326 [inline]
ia32_arch_ptrace arch/x86/kernel/ptrace.c:1061 [inline]
compat_arch_ptrace+0x36c/0xd90 arch/x86/kernel/ptrace.c:1198
__do_compat_sys_ptrace kernel/ptrace.c:1420 [inline]
__se_compat_sys_ptrace kernel/ptrace.c:1389 [inline]
__ia32_compat_sys_ptrace+0x220/0x2f0 kernel/ptrace.c:1389
do_syscall_32_irqs_on arch/x86/entry/common.c:84 [inline]
__do_fast_syscall_32+0x57/0x80 arch/x86/entry/common.c:126
do_fast_syscall_32+0x2f/0x70 arch/x86/entry/common.c:149
entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
This can happen if ptrace() or sigreturn() pokes an LDT selector into FS
or GS for a task with no LDT and something tries to read the base before
a return to usermode notices the bad selector and fixes it.
The fix is to make sure ldt pointer is not NULL.
Fixes: 07e1d88ada ("x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately")
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pte lock is never acquired in-IRQ context so it does not require interrupts
to be disabled. The lock is a regular spinlock which cannot be acquired
with interrupts disabled on RT.
RT complains about pte_lock() in __text_poke() because it's invoked after
disabling interrupts.
__text_poke() has to disable interrupts as use_temporary_mm() expects
interrupts to be off because it invokes switch_mm_irqs_off() and uses
per-CPU (current active mm) data.
Move the PTE lock handling outside the interrupt disabled region.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by; Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20200813105026.bvugytmsso6muljw@linutronix.de
Pull more KVM updates from Paolo Bonzini:
"PPC:
- Improvements and bugfixes for secure VM support, giving reduced
startup time and memory hotplug support.
- Locking fixes in nested KVM code
- Increase number of guests supported by HV KVM to 4094
- Preliminary POWER10 support
ARM:
- Split the VHE and nVHE hypervisor code bases, build the EL2 code
separately, allowing for the VHE code to now be built with
instrumentation
- Level-based TLB invalidation support
- Restructure of the vcpu register storage to accomodate the NV code
- Pointer Authentication available for guests on nVHE hosts
- Simplification of the system register table parsing
- MMU cleanups and fixes
- A number of post-32bit cleanups and other fixes
MIPS:
- compilation fixes
x86:
- bugfixes
- support for the SERIALIZE instruction"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
KVM: MIPS/VZ: Fix build error caused by 'kvm_run' cleanup
x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled
MIPS: KVM: Convert a fallthrough comment to fallthrough
MIPS: VZ: Only include loongson_regs.h for CPU_LOONGSON64
x86: Expose SERIALIZE for supported cpuid
KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled
KVM: arm64: Move S1PTW S2 fault logic out of io_mem_abort()
KVM: arm64: Don't skip cache maintenance for read-only memslots
KVM: arm64: Handle data and instruction external aborts the same way
KVM: arm64: Rename kvm_vcpu_dabt_isextabt()
KVM: arm: Add trace name for ARM_NISV
KVM: arm64: Ensure that all nVHE hyp code is in .hyp.text
KVM: arm64: Substitute RANDOMIZE_BASE for HARDEN_EL2_VECTORS
KVM: arm64: Make nVHE ASLR conditional on RANDOMIZE_BASE
KVM: PPC: Book3S HV: Rework secure mem slot dropping
KVM: PPC: Book3S HV: Move kvmppc_svm_page_out up
KVM: PPC: Book3S HV: Migrate hot plugged memory
KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs
KVM: PPC: Book3S HV: Track the state GFNs associated with secure VMs
KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
...
Merge more updates from Andrew Morton:
- most of the rest of MM (memcg, hugetlb, vmscan, proc, compaction,
mempolicy, oom-kill, hugetlbfs, migration, thp, cma, util,
memory-hotplug, cleanups, uaccess, migration, gup, pagemap),
- various other subsystems (alpha, misc, sparse, bitmap, lib, bitops,
checkpatch, autofs, minix, nilfs, ufs, fat, signals, kmod, coredump,
exec, kdump, rapidio, panic, kcov, kgdb, ipc).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (164 commits)
mm/gup: remove task_struct pointer for all gup code
mm: clean up the last pieces of page fault accountings
mm/xtensa: use general page fault accounting
mm/x86: use general page fault accounting
mm/sparc64: use general page fault accounting
mm/sparc32: use general page fault accounting
mm/sh: use general page fault accounting
mm/s390: use general page fault accounting
mm/riscv: use general page fault accounting
mm/powerpc: use general page fault accounting
mm/parisc: use general page fault accounting
mm/openrisc: use general page fault accounting
mm/nios2: use general page fault accounting
mm/nds32: use general page fault accounting
mm/mips: use general page fault accounting
mm/microblaze: use general page fault accounting
mm/m68k: use general page fault accounting
mm/ia64: use general page fault accounting
mm/hexagon: use general page fault accounting
mm/csky: use general page fault accounting
...
Some of our servers spend significant time at kernel boot initializing
memory block sysfs directories and then creating symlinks between them and
the corresponding nodes. The slowness happens because the machines get
stuck with the smallest supported memory block size on x86 (128M), which
results in 16,288 directories to cover the 2T of installed RAM. The
search for each memory block is noticeable even with commit 4fb6eabf10
("drivers/base/memory.c: cache memory blocks in xarray to accelerate
lookup").
Commit 078eb6aa50 ("x86/mm/memory_hotplug: determine block size based on
the end of boot memory") chooses the block size based on alignment with
memory end. That addresses hotplug failures in qemu guests, but for bare
metal systems whose memory end isn't aligned to even the smallest size, it
leaves them at 128M.
Make kernels that aren't running on a hypervisor use the largest supported
size (2G) to minimize overhead on big machines. Kernel boot goes 7%
faster on the aforementioned servers, shaving off half a second.
[daniel.m.jordan@oracle.com: v3]
Link: http://lkml.kernel.org/r/20200714205450.945834-1-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200609225451.3542648-1-daniel.m.jordan@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull virtio updates from Michael Tsirkin:
- IRQ bypass support for vdpa and IFC
- MLX5 vdpa driver
- Endianness fixes for virtio drivers
- Misc other fixes
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
vdpa/mlx5: fix up endian-ness for mtu
vdpa: Fix pointer math bug in vdpasim_get_config()
vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
vdpa/mlx5: fix memory allocation failure checks
vdpa/mlx5: Fix uninitialised variable in core/mr.c
vdpa_sim: init iommu lock
virtio_config: fix up warnings on parisc
vdpa/mlx5: Add VDPA driver for supported mlx5 devices
vdpa/mlx5: Add shared memory registration code
vdpa/mlx5: Add support library for mlx5 VDPA implementation
vdpa/mlx5: Add hardware descriptive header file
vdpa: Modify get_vq_state() to return error code
net/vdpa: Use struct for set/get vq state
vdpa: remove hard coded virtq num
vdpasim: support batch updating
vhost-vdpa: support IOTLB batching hints
vhost-vdpa: support get/set backend features
vhost: generialize backend features setting/getting
vhost-vdpa: refine ioctl pre-processing
vDPA: dont change vq irq after DRIVER_OK
...
Pull iommu updates from Joerg Roedel:
- Remove of the dev->archdata.iommu (or similar) pointers from most
architectures. Only Sparc is left, but this is private to Sparc as
their drivers don't use the IOMMU-API.
- ARM-SMMU updates from Will Deacon:
- Support for SMMU-500 implementation in Marvell Armada-AP806 SoC
- Support for SMMU-500 implementation in NVIDIA Tegra194 SoC
- DT compatible string updates
- Remove unused IOMMU_SYS_CACHE_ONLY flag
- Move ARM-SMMU drivers into their own subdirectory
- Intel VT-d updates from Lu Baolu:
- Misc tweaks and fixes for vSVA
- Report/response page request events
- Cleanups
- Move the Kconfig and Makefile bits for the AMD and Intel drivers into
their respective subdirectory.
- MT6779 IOMMU Support
- Support for new chipsets in the Renesas IOMMU driver
- Other misc cleanups and fixes (e.g. to improve compile test coverage)
* tag 'iommu-updates-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (77 commits)
iommu/amd: Move Kconfig and Makefile bits down into amd directory
iommu/vt-d: Move Kconfig and Makefile bits down into intel directory
iommu/arm-smmu: Move Arm SMMU drivers into their own subdirectory
iommu/vt-d: Skip TE disabling on quirky gfx dedicated iommu
iommu: Add gfp parameter to io_pgtable_ops->map()
iommu: Mark __iommu_map_sg() as static
iommu/vt-d: Rename intel-pasid.h to pasid.h
iommu/vt-d: Add page response ops support
iommu/vt-d: Report page request faults for guest SVA
iommu/vt-d: Add a helper to get svm and sdev for pasid
iommu/vt-d: Refactor device_to_iommu() helper
iommu/vt-d: Disable multiple GPASID-dev bind
iommu/vt-d: Warn on out-of-range invalidation address
iommu/vt-d: Fix devTLB flush for vSVA
iommu/vt-d: Handle non-page aligned address
iommu/vt-d: Fix PASID devTLB invalidation
iommu/vt-d: Remove global page support in devTLB flush
iommu/vt-d: Enforce PASID devTLB field mask
iommu: Make some functions static
iommu/amd: Remove double zero check
...
With support for 32-bit pv guests gone pure pv-code no longer needs to
test for highmem. Dropping those tests removes the need for flushing
in some places.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
With 32-bit pv-guest support removed xen-asm_64.S can be merged with
xen-asm.S
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Xen is requiring 64-bit machines today and since Xen 4.14 it can be
built without 32-bit PV guest support. There is no need to carry the
burden of 32-bit PV guest support in the kernel any longer, as new
guests can be either HVM or PVH, or they can use a 64 bit kernel.
Remove the 32-bit Xen PV support from the kernel.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>