Pull x86 paravirt updates from Ingo Molnar:
"Two main changes:
- Remove no longer used parts of the paravirt infrastructure and put
large quantities of paravirt ops under a new config option
PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross)
- Enable PV spinlocks on Hyperv (Yi Sun)"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyperv: Enable PV qspinlock for Hyper-V
x86/hyperv: Add GUEST_IDLE_MSR support
x86/paravirt: Clean up native_patch()
x86/paravirt: Prevent redefinition of SAVE_FLAGS macro
x86/xen: Make xen_reservation_lock static
x86/paravirt: Remove unneeded mmu related paravirt ops bits
x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella
x86/paravirt: Introduce new config option PARAVIRT_XXL
x86/paravirt: Remove unused paravirt bits
x86/paravirt: Use a single ops structure
x86/paravirt: Remove clobbers from struct paravirt_patch_site
x86/paravirt: Remove clobbers parameter from paravirt patch functions
x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static
x86/xen: Add SPDX identifier in arch/x86/xen files
x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM
x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c
x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
Merge -rc6 in, for two reasons:
1) Resolve a trivial conflict in the blk-mq-tag.c documentation
2) A few important regression fixes went into upstream directly, so
they aren't in the 4.20 branch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* tag 'v4.19-rc6': (780 commits)
Linux 4.19-rc6
MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c
cpufreq: qcom-kryo: Fix section annotations
perf/core: Add sanity check to deal with pinned event failure
xen/blkfront: correct purging of persistent grants
Revert "xen/blkfront: When purging persistent grants, keep them in the buffer"
selftests/powerpc: Fix Makefiles for headers_install change
blk-mq: I/O and timer unplugs are inverted in blktrace
dax: Fix deadlock in dax_lock_mapping_entry()
x86/boot: Fix kexec booting failure in the SEV bit detection code
bcache: add separate workqueue for journal_write to avoid deadlock
drm/amd/display: Fix Edid emulation for linux
drm/amd/display: Fix Vega10 lightup on S3 resume
drm/amdgpu: Fix vce work queue was not cancelled when suspend
Revert "drm/panel: Add device_link from panel device to DRM device"
xen/blkfront: When purging persistent grants, keep them in the buffer
clocksource/drivers/timer-atmel-pit: Properly handle error cases
block: fix deadline elevator drain for zoned block devices
ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Having multiple externs in arch headers is not a good way to provide
a common interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.
Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull input updates from Dmitry Torokhov:
- a new driver for Rohm BU21029 touch controller
- new bitmap APIs: bitmap_alloc, bitmap_zalloc and bitmap_free
- updates to Atmel, eeti. pxrc and iforce drivers
- assorted driver cleanups and fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (57 commits)
MAINTAINERS: Add PhoenixRC Flight Controller Adapter
Input: do not use WARN() in input_alloc_absinfo()
Input: mark expected switch fall-throughs
Input: raydium_i2c_ts - use true and false for boolean values
Input: evdev - switch to bitmap API
Input: gpio-keys - switch to bitmap_zalloc()
Input: elan_i2c_smbus - cast sizeof to int for comparison
bitmap: Add bitmap_alloc(), bitmap_zalloc() and bitmap_free()
md: Avoid namespace collision with bitmap API
dm: Avoid namespace collision with bitmap API
Input: pm8941-pwrkey - add resin entry
Input: pm8941-pwrkey - abstract register offsets and event code
Input: iforce - reorganize joystick configuration lists
Input: atmel_mxt_ts - move completion to after config crc is updated
Input: atmel_mxt_ts - don't report zero pressure from T9
Input: atmel_mxt_ts - zero terminate config firmware file
Input: atmel_mxt_ts - refactor config update code to add context struct
Input: atmel_mxt_ts - config CRC may start at T71
Input: atmel_mxt_ts - remove unnecessary debug on ENOMEM
Input: atmel_mxt_ts - remove duplicate setup of ABS_MT_PRESSURE
...
Extend grant table module API to allow allocating buffers that can
be used for DMA operations and mapping foreign grant references
on top of those.
The resulting buffer is similar to the one allocated by the balloon
driver in that proper memory reservation is made by
({increase|decrease}_reservation and VA mappings are updated if
needed).
This is useful for sharing foreign buffers with HW drivers which
cannot work with scattered buffers provided by the balloon driver,
but require DMAable memory instead.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Memory {increase|decrease}_reservation and VA mappings update/reset
code used in balloon driver can be made common, so other drivers can
also re-use the same functionality without open-coding.
Create a dedicated file for the shared code and export corresponding
symbols for other kernel modules.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull xen fixes from Juergen Gross:
"This contains the following fixes/cleanups:
- the removal of a BUG_ON() which wasn't necessary and which could
trigger now due to a recent change
- a correction of a long standing bug happening very rarely in Xen
dom0 when a hypercall buffer from user land was not accessible by
the hypervisor for very short periods of time due to e.g. page
migration or compaction
- usage of EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() in a
Xen-related driver (no breakage possible as using those symbols
without others already exported via EXPORT-SYMBOL_GPL() wouldn't
make any sense)
- a simplification for Xen PVH or Xen ARM guests
- some additional error handling for callers of xenbus_printf()"
* tag 'for-linus-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Remove unnecessary BUG_ON from __unbind_from_irq()
xen: add new hypercall buffer mapping device
xen/scsiback: add error handling for xenbus_printf
scsi: xen-scsifront: add error handling for xenbus_printf
xen/grant-table: Export gnttab_{alloc|free}_pages as GPL
xen: add error handling for xenbus_printf
xen: share start flags between PV and PVH
Use a global variable to store the start flags for both PV and PVH.
This allows the xen_initial_domain macro to work properly on PVH.
Note that ARM is also switched to use the new variable.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
This is the sync up with the canonical definitions of the input,
sound and display protocols in Xen.
Changes to kbdif:
1. Add missing string constants for {feature|request}-raw-pointer
to align with the rest of the interface file.
2. Add new XenStore feature fields, so it is possible to individually
control set of exposed virtual devices for each guest OS:
- set feature-disable-keyboard to 1 if no keyboard device needs
to be created
- set feature-disable-pointer to 1 if no pointer device needs
to be created
3. Move multi-touch device parameters to backend nodes: these are
described as a part of frontend's XenBus configuration nodes
while they belong to backend's configuration. Fix this by moving
the parameters to the proper section.
Unique-id field:
1. Add unique-id XenBus entry for virtual input and display.
2. Change type of unique-id field to string for sndif to align with
display and input protocols.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.
This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.
NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
allow a PV tools domain to map guest pages either by GFN or MFN, thus
the term 'mfn' has been swapped for 'pfn' in the lower layers of the
remap code.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
This is the sync up with the canonical definition of the sound
protocol in Xen:
1. Protocol version was referenced in the protocol description,
but missed its definition. Fixed by adding a constant
for current protocol version.
2. Some of the request descriptions have "reserved" fields
missed: fixed by adding corresponding entries.
3. Extend the size of the requests and responses to 64 octets.
Bump protocol version to 2.
4. Add explicit back and front synchronization
In order to provide explicit synchronization between backend and
frontend the following changes are introduced in the protocol:
- add new ring buffer for sending asynchronous events from
backend to frontend to report number of bytes played by the
frontend (XENSND_EVT_CUR_POS)
- introduce trigger events for playback control: start/stop/pause/resume
- add "req-" prefix to event-channel and ring-ref to unify naming
of the Xen event channels for requests and events
5. Add explicit back and front parameter negotiation
In order to provide explicit stream parameter negotiation between
backend and frontend the following changes are introduced in the protocol:
add XENSND_OP_HW_PARAM_QUERY request to read/update
configuration space for the parameters given: request passes
desired parameter's intervals/masks and the response to this request
returns allowed min/max intervals/masks to be used.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pre-4.17 kernels ignored start_info's rsdp_paddr pointer and instead
relied on finding RSDP in standard location in BIOS RO memory. This
has worked since that's where Xen used to place it.
However, with recent Xen change (commit 4a5733771e6f ("libxl: put RSDP
for PVH guest near 4GB")) it prefers to keep RSDP at a "non-standard"
address. Even though as of commit b17d9d1df3 ("x86/xen: Add pvh
specific rsdp address retrieval function") Linux is able to find RSDP,
for back-compatibility reasons we need to indicate to Xen that we can
handle this, an we do so by setting XENFEAT_linux_rsdp_unrestricted
flag in ELF notes.
(Also take this opportunity and sync features.h header file with Xen)
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Pull xen fixes from Juergen Gross:
"This contains two fixes for running under Xen:
- a fix avoiding resource conflicts between adding mmio areas and
memory hotplug
- a fix setting NX bits in page table entries copied from Xen when
running a PV guest"
* tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Mark unallocated host memory as UNUSABLE
x86-64/Xen: eliminate W+X mappings
Commit f5775e0b61 ("x86/xen: discard RAM regions above the maximum
reservation") left host memory not assigned to dom0 as available for
memory hotplug.
Unfortunately this also meant that those regions could be used by
others. Specifically, commit fa564ad963 ("x86/PCI: Enable a 64bit BAR
on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
addresses as MMIO.
To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
effectively reverting f5775e0b61) and keep track of that region as
a hostmem resource that can be used for the hotplug.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Pull xen updates from Juergen Gross:
"Xen features and fixes for v4.15-rc1
Apart from several small fixes it contains the following features:
- a series by Joao Martins to add vdso support of the pv clock
interface
- a series by Juergen Gross to add support for Xen pv guests to be
able to run on 5 level paging hosts
- a series by Stefano Stabellini adding the Xen pvcalls frontend
driver using a paravirtualized socket interface"
* tag 'for-linus-4.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (34 commits)
xen/pvcalls: fix potential endless loop in pvcalls-front.c
xen/pvcalls: Add MODULE_LICENSE()
MAINTAINERS: xen, kvm: track pvclock-abi.h changes
x86/xen/time: setup vcpu 0 time info page
x86/xen/time: set pvclock flags on xen_time_init()
x86/pvclock: add setter for pvclock_pvti_cpu0_va
ptp_kvm: probe for kvm guest availability
xen/privcmd: remove unused variable pageidx
xen: select grant interface version
xen: update arch/x86/include/asm/xen/cpuid.h
xen: add grant interface version dependent constants to gnttab_ops
xen: limit grant v2 interface to the v1 functionality
xen: re-introduce support for grant v2 interface
xen: support priv-mapping in an HVM tools domain
xen/pvcalls: remove redundant check for irq >= 0
xen/pvcalls: fix unsigned less than zero error check
xen/time: Return -ENODEV from xen_get_wallclock()
xen/pvcalls-front: mark expected switch fall-through
xen: xenbus_probe_frontend: mark expected switch fall-throughs
xen/time: do not decrease steal time after live migration on xen
...
In order to support pvclock vdso on xen we need to setup the time
info page for vcpu 0 and register the page with Xen using the
VCPUOP_register_vcpu_time_memory_area hypercall. This hypercall
will also forcefully update the pvti which will set some of the
necessary flags for vdso. Afterwards we check if it supports the
PVCLOCK_TSC_STABLE_BIT flag which is mandatory for having
vdso/vsyscall support. And if so, it will set the cpu 0 pvti that
will be later on used when mapping the vdso image.
The xen headers are also updated to include the new hypercall for
registering the secondary vcpu_time_info struct.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
As there is currently no user for sub-page grants or transient grants
remove that functionality. This at once makes it possible to switch
from grant v2 to grant v1 without restrictions, as there is no loss of
functionality other than the limited frame number width related to
the switch.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The grant v2 support was removed from the kernel with
commit 438b33c714 ("xen/grant-table:
remove support for V2 tables") as the higher memory footprint of v2
grants resulted in less grants being possible for a kernel compared
to the v1 grant interface.
As machines with more than 16TB of memory are expected to be more
common in the near future support of grant v2 is mandatory in order
to be able to run a Xen pv domain at any memory location.
So re-add grant v2 support basically by reverting above commit.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
After guest live migration on xen, steal time in /proc/stat
(cpustat[CPUTIME_STEAL]) might decrease because steal returned by
xen_steal_lock() might be less than this_rq()->prev_steal_time which is
derived from previous return value of xen_steal_clock().
For instance, steal time of each vcpu is 335 before live migration.
cpu 198 0 368 200064 1962 0 0 1340 0 0
cpu0 38 0 81 50063 492 0 0 335 0 0
cpu1 65 0 97 49763 634 0 0 335 0 0
cpu2 38 0 81 50098 462 0 0 335 0 0
cpu3 56 0 107 50138 374 0 0 335 0 0
After live migration, steal time is reduced to 312.
cpu 200 0 370 200330 1971 0 0 1248 0 0
cpu0 38 0 82 50123 500 0 0 312 0 0
cpu1 65 0 97 49832 634 0 0 312 0 0
cpu2 39 0 82 50167 462 0 0 312 0 0
cpu3 56 0 107 50207 374 0 0 312 0 0
Since runstate times are cumulative and cleared during xen live migration
by xen hypervisor, the idea of this patch is to accumulate runstate times
to global percpu variables before live migration suspend. Once guest VM is
resumed, xen_get_runstate_snapshot_cpu() would always return the sum of new
runstate times and previously accumulated times stored in global percpu
variables.
Comment above HYPERVISOR_suspend() has been removed as it is inaccurate:
the call can return an error code (e.g., possibly -EPERM in the future).
Similar and more severe issue would impact prior linux 4.8-4.10 as
discussed by Michael Las at
https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest,
which would overflow steal time and lead to 100% st usage in top command
for linux 4.8-4.10. A backport of this patch would fix that issue.
[boris: added linux/slab.h to driver/xen/time.c, slightly reformatted
commit message]
References: https://0xstubs.org/debugging-a-flaky-cpu-steal-time-counter-on-a-paravirtualized-xen-guest
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>