Baihua reported an error when boot an ARM64 guest with PAGE_SIZE=64k and
BALLOON is enabled:
hv_vmbus: registering driver hv_balloon
hv_vmbus: probe failed for device 1eccfd72-4b41-45ef-b73a-4a6e44c12924 (-22)
The cause of this is that the ringbuffer size for hv_balloon is not
adjusted with VMBUS_RING_SIZE(), which makes the size not large enough
for ringbuffers on guest with PAGE_SIZE=64k. Therefore use
VMBUS_RING_SIZE() to calculate the ringbuffer size. Note that the old
size (20 * 1024) counts a 4k header in the total size, while
VMBUS_RING_SIZE() expects the parameter as the payload size, so use
16 * 1024.
Cc: <stable@vger.kernel.org> # 5.15.x
Reported-by: Baihua Lu <baihua.lu@microsoft.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20211101150026.736124-1-boqun.feng@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull hyperv updates from Wei Liu:
- Initial patch set for Hyper-V isolation VM support (Tianyu Lan)
- Fix a warning on preemption (Vitaly Kuznetsov)
- A bunch of misc cleanup patches
* tag 'hyperv-next-signed-20211102' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Protect set_hv_tscchange_cb() against getting preempted
Drivers: hv : vmbus: Adding NULL pointer check
x86/hyperv: Remove duplicate include
x86/hyperv: Remove duplicated include in hv_init
Drivers: hv: vmbus: Remove unused code to check for subchannels
Drivers: hv: vmbus: Initialize VMbus ring buffer for Isolation VM
Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message
x86/hyperv: Add ghcb hvcall support for SNP VM
x86/hyperv: Add Write/Read MSR registers via ghcb page
Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM
x86/hyperv: Add new hvcall guest address host visibility support
x86/hyperv: Initialize shared memory boundary in the Isolation VM.
x86/hyperv: Initialize GHCB page in Isolation VM
VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
AMD SNP support. This patch is to map the ring buffer
address in extra address space via vmap_pfn(). Hyperv set
memory host visibility hvcall smears data in the ring buffer
and so reset the ring buffer memory to zero after mapping.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-10-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary).
Introduce monitor_pages_original[] in the struct vmbus_connection
to store monitor page virtual address returned by hv_alloc_hyperv_
zeroed_page() and free monitor page via monitor_pages_original in
the vmbus_disconnect(). The monitor_pages[] is to used to access
monitor page and it is initialized to be equal with monitor_pages_
original. The monitor_pages[] will be overridden in the isolation VM
with va of extra address. Introduce monitor_pages_pa[] to store
monitor pages' physical address and use it to populate pa in the
initiate msg.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-9-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Hyperv provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyperv requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB page.
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20211025122116.264793-7-ltykernel@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
On arm64 randconfig builds, hyperv sometimes fails with this
error:
In file included from drivers/hv/hv_trace.c:3:
In file included from drivers/hv/hyperv_vmbus.h:16:
In file included from arch/arm64/include/asm/sync_bitops.h:5:
arch/arm64/include/asm/bitops.h:11:2: error: only <linux/bitops.h> can be included directly
In file included from include/asm-generic/bitops/hweight.h:5:
include/asm-generic/bitops/arch_hweight.h:9:9: error: implicit declaration of function '__sw_hweight32' [-Werror,-Wimplicit-function-declaration]
include/asm-generic/bitops/atomic.h:17:7: error: implicit declaration of function 'BIT_WORD' [-Werror,-Wimplicit-function-declaration]
Include the correct header first.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211018131929.2260087-1-arnd@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The following crash happens when a never-used device is unbound from
uio_hv_generic driver:
kernel BUG at mm/slub.c:321!
invalid opcode: 0000 [#1] SMP PTI
CPU: 0 PID: 4001 Comm: bash Kdump: loaded Tainted: G X --------- --- 5.14.0-0.rc2.23.el9.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
RIP: 0010:__slab_free+0x1d5/0x3d0
...
Call Trace:
? pick_next_task_fair+0x18e/0x3b0
? __cond_resched+0x16/0x40
? vunmap_pmd_range.isra.0+0x154/0x1c0
? __vunmap+0x22d/0x290
? hv_ringbuffer_cleanup+0x36/0x40 [hv_vmbus]
kfree+0x331/0x380
? hv_uio_remove+0x43/0x60 [uio_hv_generic]
hv_ringbuffer_cleanup+0x36/0x40 [hv_vmbus]
vmbus_free_ring+0x21/0x60 [hv_vmbus]
hv_uio_remove+0x4f/0x60 [uio_hv_generic]
vmbus_remove+0x23/0x30 [hv_vmbus]
__device_release_driver+0x17a/0x230
device_driver_detach+0x3c/0xa0
unbind_store+0x113/0x130
...
The problem appears to be that we free 'ring_info->pkt_buffer' twice:
first, when the device is unbound from in-kernel driver (netvsc in this
case) and second from hv_uio_remove(). Normally, ring buffer is supposed
to be re-initialized from hv_uio_open() but this happens when UIO device
is being opened and this is not guaranteed to happen.
Generally, it is OK to call hv_ringbuffer_cleanup() twice for the same
channel (which is being handed over between in-kernel drivers and UIO) even
if we didn't call hv_ringbuffer_init() in between. We, however, need to
avoid kfree() call for an already freed pointer.
Fixes: adae1e931a ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210831143916.144983-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Pull hyperv updates from Wei Liu:
- make Hyper-V code arch-agnostic (Michael Kelley)
- fix sched_clock behaviour on Hyper-V (Ani Sinha)
- fix a fault when Linux runs as the root partition on MSHV (Praveen
Kumar)
- fix VSS driver (Vitaly Kuznetsov)
- cleanup (Sonia Sharma)
* tag 'hyperv-next-signed-20210831' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv_utils: Set the maximum packet size for VSS driver to the length of the receive buffer
Drivers: hv: Enable Hyper-V code to be built on ARM64
arm64: efi: Export screen_info
arm64: hyperv: Initialize hypervisor on boot
arm64: hyperv: Add panic handler
arm64: hyperv: Add Hyper-V hypercall and register access utilities
x86/hyperv: fix root partition faults when writing to VP assist page MSR
hv: hyperv.h: Remove unused inline functions
drivers: hv: Decouple Hyper-V clock/timer code from VMbus drivers
x86/hyperv: add comment describing TSC_INVARIANT_CONTROL MSR setting bit 0
Drivers: hv: Move Hyper-V misc functionality to arch-neutral code
Drivers: hv: Add arch independent default functions for some Hyper-V handlers
Drivers: hv: Make portions of Hyper-V init code be arch neutral
x86/hyperv: fix for unwanted manipulation of sched_clock when TSC marked unstable
asm-generic/hyperv: Add missing #include of nmi.h
Commit adae1e931a ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out
of the ring buffer") introduced a notion of maximum packet size and for
KVM and FCOPY drivers set it to the length of the receive buffer. VSS
driver wasn't updated, this means that the maximum packet size is now
VMBUS_DEFAULT_MAX_PKT_SIZE (4k). Apparently, this is not enough. I'm
observing a packet of 6304 bytes which is being truncated to 4096. When
VSS driver tries to read next packet from ring buffer it starts from the
wrong offset and receives garbage.
Set the maximum packet size to 'HV_HYP_PAGE_SIZE * 2' in VSS driver. This
matches the length of the receive buffer and is in line with other utils
drivers.
Fixes: adae1e931a ("Drivers: hv: vmbus: Copy packets sent by Hyper-V out of the ring buffer")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20210825133857.847866-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The vmbus module uses a rotational algorithm to assign target CPUs to
a device's channels. Depending on the timing of different device's channel
offers, different channels of a device may be assigned to the same CPU.
For example on a VM with 2 CPUs, if NIC A and B's channels are offered
in the following order, NIC A will have both channels on CPU0, and
NIC B will have both channels on CPU1 -- see below. This kind of
assignment causes RSS load that is spreading across different channels
to end up on the same CPU.
Timing of channel offers:
NIC A channel 0
NIC B channel 0
NIC A channel 1
NIC B channel 1
VMBUS ID 14: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
Device_ID = {cab064cd-1f31-47d5-a8b4-9d57e320cccd}
Sysfs path: /sys/bus/vmbus/devices/cab064cd-1f31-47d5-a8b4-9d57e320cccd
Rel_ID=14, target_cpu=0
Rel_ID=17, target_cpu=0
VMBUS ID 16: Class_ID = {f8615163-df3e-46c5-913f-f2d2f965ed0e} - Synthetic network adapter
Device_ID = {244225ca-743e-4020-a17d-d7baa13d6cea}
Sysfs path: /sys/bus/vmbus/devices/244225ca-743e-4020-a17d-d7baa13d6cea
Rel_ID=16, target_cpu=1
Rel_ID=18, target_cpu=1
Update the vmbus CPU assignment algorithm to avoid duplicate CPU
assignments within a device.
The new algorithm iterates num_online_cpus + 1 times.
The existing rotational algorithm to find "next NUMA & CPU" is still here.
But if the resulting CPU is already used by the same device, it will try
the next CPU.
In the last iteration, it assigns the channel to the next available CPU
like the existing algorithm. This is not normally expected, because
during device probe, we limit the number of channels of a device to
be <= number of online CPUs.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626459673-17420-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Hyper-V clock/timer code in hyperv_timer.c is mostly independent from
other VMbus drivers, but building for ARM64 without hyperv_timer.c
shows some remaining entanglements. A default implementation of
hv_read_reference_counter can just read a Hyper-V synthetic register
and be independent of hyperv_timer.c, so move this code out and into
hv_common.c. Then it can be used by the timesync driver even if
hyperv_timer.c isn't built on a particular architecture. If
hyperv_timer.c *is* built, it can override with a faster implementation.
Also provide stubs for stimer functions called by the VMbus driver when
hyperv_timer.c isn't built.
No functional changes.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626220906-22629-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Architecture independent Hyper-V code calls various arch-specific handlers
when needed. To aid in supporting multiple architectures, provide weak
defaults that can be overridden by arch-specific implementations where
appropriate. But when arch-specific overrides aren't needed or haven't
been implemented yet for a particular architecture, these stubs reduce
the amount of clutter under arch/.
No functional change.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626287687-2045-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
The code to allocate and initialize the hv_vp_index array is
architecture neutral. Similarly, the code to allocate and
populate the hypercall input and output arg pages is architecture
neutral. Move both sets of code out from arch/x86 and into
utility functions in drivers/hv/hv_common.c that can be shared
by Hyper-V initialization on ARM64.
No functional changes. However, the allocation of the hypercall
input and output arg pages is done differently so that the
size is always the Hyper-V page size, even if not the same as
the guest page size (such as with ARM64's 64K page size).
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1626287687-2045-2-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...