Originally Xen PV drivers only use single-page ring to pass along
information. This might limit the throughput between frontend and
backend.
The patch extends Xenbus driver to support multi-page ring, which in
general should improve throughput if ring is the bottleneck. Changes to
various frontend / backend to adapt to the new interface are also
included.
Affected Xen drivers:
* blkfront/back
* netfront/back
* pcifront/back
* scsifront/back
* vtpmfront
The interface is documented, as before, in xenbus_client.c.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Make the IOCTL_PRIVCMD_MMAPBATCH_V2 (and older V1 version) map
multiple frames at a time rather than one at a time, despite the pages
being non-consecutive GFNs.
xen_remap_foreign_mfn_array() is added which maps an array of GFNs
(instead of a consecutive range of GFNs).
Since per-frame errors are returned in an array, privcmd must set the
MMAPBATCH_V1 error bits as part of the "report errors" phase, after
all the frames are mapped.
Migrate times are significantly improved (when using a PV toolstack
domain). For example, for an idle 12 GiB PV guest:
Before After
real 0m38.179s 0m26.868s
user 0m15.096s 0m13.652s
sys 0m28.988s 0m18.732s
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Auto-translated physmap guests (arm, arm64 and x86 PVHVM/PVH) map and
unmap foreign GFNs using the same method (updating the physmap).
Unify the two arm and x86 implementations into one commont one.
Note that on arm and arm64, the correct error code will be returned
(instead of always -EFAULT) and map/unmap failure warnings are no
longer printed. These changes are required if the foreign domain is
paging (-ENOENT failures are expected and must be propagated up to the
caller).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The header include/xen/interface/xen.h doesn't contain all definitions
from Xen's version of that header. Update it accordingly.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Hypercalls submitted by user space tools via the privcmd driver can
take a long time (potentially many 10s of seconds) if the hypercall
has many sub-operations.
A fully preemptible kernel may deschedule such as task in any upcall
called from a hypercall continuation.
However, in a kernel with voluntary or no preemption, hypercall
continuations in Xen allow event handlers to be run but the task
issuing the hypercall will not be descheduled until the hypercall is
complete and the ioctl returns to user space. These long running
tasks may also trigger the kernel's soft lockup detection.
Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
bracket hypercalls that may be preempted. Use these in the privcmd
driver.
When returning from an upcall, call xen_maybe_preempt_hcall() which
adds a schedule point if if the current task was within a preemptible
hypercall.
Since _cond_resched() can move the task to a different CPU, clear and
set xen_in_preemptible_hcall around the call.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Pull networking updates from David Miller:
1) More iov_iter conversion work from Al Viro.
[ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
wrong, and this pull actually adds an extra commit on top of the
branch I'm pulling to fix that up, so that the pre-merge state is
ok. - Linus ]
2) Various optimizations to the ipv4 forwarding information base trie
lookup implementation. From Alexander Duyck.
3) Remove sock_iocb altogether, from CHristoph Hellwig.
4) Allow congestion control algorithm selection via routing metrics.
From Daniel Borkmann.
5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.
6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.
7) Add xmit_more support to r8169, e1000, and e1000e drivers. From
Florian Westphal.
8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.
9) Add BPF packet actions to packet scheduler, from Jiri Pirko.
10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.
11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
Kwok.
12) More sanely handle out-of-window dupacks, which can result in
serious ACK storms. From Neal Cardwell.
13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
Patrick McHardy, and Thomas Graf.
14) Support xmit_more in be2net, from Sathya Perla.
15) Group Policy extensions for vxlan, from Thomas Graf.
16) Remove Checksum Offload support for vxlan, from Tom Herbert.
17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From
Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
crypto: fix af_alg_make_sg() conversion to iov_iter
ipv4: Namespecify TCP PMTU mechanism
i40e: Fix for stats init function call in Rx setup
tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
ipv6: Make __ipv6_select_ident static
ipv6: Fix fragment id assignment on LE arches.
bridge: Fix inability to add non-vlan fdb entry
net: Mellanox: Delete unnecessary checks before the function call "vunmap"
cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
net: dsa: Remove redundant phy_attach()
IB/mlx4: Reset flow support for IB kernel ULPs
IB/mlx4: Always use the correct port for mirrored multicast attachments
net/bonding: Fix potential bad memory access during bonding events
tipc: remove tipc_snprintf
tipc: nl compat add noop and remove legacy nl framework
tipc: convert legacy nl stats show to nl compat
tipc: convert legacy nl net id get to nl compat
tipc: convert legacy nl net id set to nl compat
...
In an x86 PV guest, get_user_pages_fast() on a userspace address range
containing foreign mappings does not work correctly because the M2P
lookup of the MFN from a userspace PTE may return the wrong page.
Force get_user_pages_fast() to fail on such addresses by marking the PTEs
as special.
If Xen has XENFEAT_gnttab_map_avail_bits (available since at least
4.0), we can do so efficiently in the grant map hypercall. Otherwise,
it needs to be done afterwards. This is both inefficient and racy
(the mapping is visible to the task before we fixup the PTEs), but
will be fine for well-behaved applications that do not use the mapping
until after the mmap() system call returns.
Guests with XENFEAT_auto_translated_physmap (ARM and x86 HVM or PVH)
do not need this since get_user_pages() has always worked correctly
for them.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Introduce gnttab_unmap_refs_async() that can be used to safely unmap
pages that may be in use (ref count > 1). If the pages are in use the
unmap is deferred and retried later. This polling is not very clever
but it should be good enough if the cases where the delay is necessary
are rare.
The initial delay is 5 ms and is increased linearly on each subsequent
retry (to reduce load if the page is in use for a long time).
This is needed to allow block backends using grant mapping to safely
use network storage (block or filesystem based such as iSCSI or NFS).
The network storage driver may complete a block request whilst there
is a queued network packet retry (because the ack from the remote end
races with deciding to queue the retry). The pages for the retried
packet would be grant unmapped and the network driver (or hardware)
would access the unmapped page.
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Use the "foreign" page flag to mark pages that have a grant map. Use
page->private to store information of the grant (the granting domain
and the grant reference).
Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages
suitable to for granted maps.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
When unmapping grants, instead of converting the kernel map ops to
unmap ops on the fly, pre-populate the set of unmap ops.
This allows the grant unmap for the kernel mappings to be trivially
batched in the future.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
pfn_to_mfn(page_to_pfn(p)) is a common use case so add a generic
helper for it.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the native code here can't work properly, as the hypervisor would
normally have cleared the two reason bits by the time Dom0 gets to see
the NMI (if passed to it at all). There's a shared info field for this,
and there's an existing hook to use - just fit the two together. This
is particularly relevant so that NMIs intended to be handled by APEI /
GHES actually make it to the respective handler.
Note that the hook can (and should) be used irrespective of whether
being in Dom0, as accessing port 0x61 in a DomU would be even worse,
while the shared info field would just hold zero all the time. Note
further that hardware NMI handling for PVH doesn't currently work
anyway due to missing code in the hypervisor (but it is expected to
work the native rather than the PV way).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Introduce support for new hypercall GNTTABOP_cache_flush.
Use it to perform cache flashing on pages used for dma when necessary.
If GNTTABOP_cache_flush is supported by the hypervisor, we don't need to
bounce dma map operations that involve foreign grants and non-coherent
devices.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse
errors.
Replace the uses with standard structure definitions instead. This is
similar to pci and usb device registration.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
To be able to use an initially unmapped initrd with xen the following
header files must be synced to a newer version from the xen tree:
include/xen/interface/elfnote.h
include/xen/interface/xen.h
As the KEXEC and DUMPCORE related ELFNOTES are not relevant for the
kernel they are omitted from elfnote.h.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Add the definition of pvSCSI protocol used between the pvSCSI frontend
in a XEN domU and the pvSCSI backend in a XEN driver domain (usually
Dom0).
This header was originally provided by Fujitsu for Xen based on Linux
2.6.18. Changes are:
- Added comments.
- Adapt to Linux style guide.
- Add support for larger SG-lists by putting them in an own granted
page.
- Remove stale definitions.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
The flag tells us that the hypervisor maps a grant page to guest
physical address == machine address of the page in addition to the
normal grant mapping address. It is needed to properly issue cache
maintenance operation at the completion of a DMA operation involving a
foreign grant.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: Denis Schneider <v1ne2go@gmail.com>
Pull Xen updates from David Vrabel:
- remove unused V2 grant table support
- note that Konrad is xen-blkkback/front maintainer
- add 'xen_nopv' option to disable PV extentions for x86 HVM guests
- misc minor cleanups
* tag 'stable/for-linus-3.17-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: Document the 'quirks' sysfs file
xen/pciback: Fix error return code in xen_pcibk_attach()
xen/events: drop negativity check of unsigned parameter
xen/setup: Remove Identity Map Debug Message
xen/events/fifo: remove a unecessary use of BM()
xen/events/fifo: ensure all bitops are properly aligned even on x86
xen/events/fifo: reset control block and local HEADs on resume
xen/arm: use BUG_ON
xen/grant-table: remove support for V2 tables
x86/xen: safely map and unmap grant frames when in atomic context
MAINTAINERS: Make me the Xen block subsystem (front and back) maintainer
xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.
Pull EFI changes from Ingo Molnar:
"Main changes in this cycle are:
- arm64 efi stub fixes, preservation of FP/SIMD registers across
firmware calls, and conversion of the EFI stub code into a static
library - Ard Biesheuvel
- Xen EFI support - Daniel Kiper
- Support for autoloading the efivars driver - Lee, Chun-Yi
- Use the PE/COFF headers in the x86 EFI boot stub to request that
the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael
Brown
- Consolidate all the x86 EFI quirks into one file - Saurabh Tangri
- Additional error logging in x86 EFI boot stub - Ulf Winkelvos
- Support loading initrd above 4G in EFI boot stub - Yinghai Lu
- EFI reboot patches for ACPI hardware reduced platforms"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
efi/arm64: Handle missing virtual mapping for UEFI System Table
arch/x86/xen: Silence compiler warnings
xen: Silence compiler warnings
x86/efi: Request desired alignment via the PE/COFF headers
x86/efi: Add better error logging to EFI boot stub
efi: Autoload efivars
efi: Update stale locking comment for struct efivars
arch/x86: Remove efi_set_rtc_mmss()
arch/x86: Replace plain strings with constants
xen: Put EFI machinery in place
xen: Define EFI related stuff
arch/x86: Remove redundant set_bit(EFI_MEMMAP) call
arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call
efi: Introduce EFI_PARAVIRT flag
arch/x86: Do not access EFI memory map if it is not available
efi: Use early_mem*() instead of early_io*()
arch/ia64: Define early_memunmap()
x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag
efi/reboot: Allow powering off machines using EFI
efi/reboot: Add generic wrapper around EfiResetSystem()
...
Pull Xen fix from David Vrabel:
"Fix BUG when trying to expand the grant table. This seems to occur
often during boot with Ubuntu 14.04 PV guests"
* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: safely map and unmap grant frames when in atomic context