Pull virtio updates from Michael Tsirkin:
- vdpa generic device type support
- more virtio hardening for broken devices (but on the same theme,
revert some virtio hotplug hardening patches - they were misusing
some interrupt flags and had to be reverted)
- RSS support in virtio-net
- max device MTU support in mlx5 vdpa
- akcipher support in virtio-crypto
- shared IRQ support in ifcvf vdpa
- a minor performance improvement in vhost
- enable virtio mem for ARM64
- beginnings of advance dma support
- cleanups, fixes all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits)
vdpa/mlx5: Avoid processing works if workqueue was destroyed
vhost: handle error while adding split ranges to iotlb
vdpa: support exposing the count of vqs to userspace
vdpa: change the type of nvqs to u32
vdpa: support exposing the config size to userspace
vdpa/mlx5: re-create forwarding rules after mac modified
virtio: pci: check bar values read from virtio config space
Revert "virtio_pci: harden MSI-X interrupts"
Revert "virtio-pci: harden INTX interrupts"
drivers/net/virtio_net: Added RSS hash report control.
drivers/net/virtio_net: Added RSS hash report.
drivers/net/virtio_net: Added basic RSS support.
drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
virtio: use virtio_device_ready() in virtio_device_restore()
tools/virtio: compile with -pthread
tools/virtio: fix after premapped buf support
virtio_ring: remove flags check for unmap packed indirect desc
virtio_ring: remove flags check for unmap split indirect desc
virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed()
net/mlx5: Add support for configuring max device MTU
...
vhost_iotlb_add_range_ctx() handles the range [0, ULONG_MAX] by
splitting it into two ranges and adding them separately. The return
value of adding the first range to the iotlb is currently ignored.
Check the return value and bail out in case of an error.
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220312141121.4981-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb entries")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
In vhost_enable_notify() we enable the notifications and we read
the avail index to check if new buffers have become available in
the meantime.
We are not caching the avail index, so when the device will call
vhost_get_vq_desc(), it will find the old value in the cache and
it will read the avail index again.
It would be better to refresh the cache every time we read avail
index, so let's change vhost_enable_notify() caching the value in
`avail_idx` and compare it with `last_avail_idx` to check if there
are new buffers available.
We don't expect a significant performance boost because
the above path is not very common, indeed vhost_enable_notify()
is often called with unlikely(), expecting that avail index has
not been updated.
We ran virtio-test/vhost-test and noticed minimal improvement as
expected. To stress the patch more, we modified vhost_test.ko to
call vhost_enable_notify()/vhost_disable_notify() on every cycle
when calling vhost_get_vq_desc(); in this case we observed a more
evident improvement, with a reduction of the test execution time
of about 3.7%.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220121153108.187291-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, ipsec, and wireless.
A few last minute revert / disable and fix patches came down from our
sub-trees. We're not waiting for any fixes at this point.
Current release - regressions:
- Revert "netfilter: nat: force port remap to prevent shadowing
well-known ports", restore working conntrack on asymmetric paths
- Revert "ath10k: drop beacon and probe response which leak from
other channel", restore working AP and mesh mode on QCA9984
- eth: intel: fix hang during reboot/shutdown
Current release - new code bugs:
- netfilter: nf_tables: disable register tracking, it needs more work
to cover all corner cases
Previous releases - regressions:
- ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
extension headers get specified
- esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
value more selectively
- bnx2x: fix driver load failure when FW not present in initrd
Previous releases - always broken:
- vsock: stop destroying unrelated sockets in nested virtualization
- packet: fix slab-out-of-bounds access in packet_recvmsg()
Misc:
- add Paolo Abeni to networking maintainers!"
* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
iavf: Fix hang during reboot/shutdown
net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
net: bcmgenet: skip invalid partial checksums
bnx2x: fix built-in kernel driver load failure
net: phy: mscc: Add MODULE_FIRMWARE macros
net: dsa: Add missing of_node_put() in dsa_port_parse_of
net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
Revert "ath10k: drop beacon and probe response which leak from other channel"
hv_netvsc: Add check for kvmalloc_array
iavf: Fix double free in iavf_reset_task
ice: destroy flow director filter mutex after releasing VSIs
ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
Add Paolo Abeni to networking maintainers
atm: eni: Add check for dma_map_single
net/packet: fix slab-out-of-bounds access in packet_recvmsg()
net: mdio: mscc-miim: fix duplicate debugfs entry
net: phy: marvell: Fix invalid comparison in the resume and suspend functions
esp6: fix check on ipv6_skip_exthdr's return value
net: dsa: microchip: add spi_device_id tables
netfilter: nf_tables: disable register tracking
...
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.
There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.
Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.
[1] https://android.googlesource.com/device/google/cuttlefish/
Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb
entries") tries to reject the IOTLB message whose size is zero. But
the size is not necessarily meaningful, one example is the batching
hint, so the commit breaks that.
Fixing this be reject zero size message only if the message is used to
update/invalidate the IOTLB.
Fixes: e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb entries")
Reported-by: Eli Cohen <elic@nvidia.com>
Cc: Anirudh Rayabharam <mail@anirudhrb.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eli Cohen <elic@nvidia.com>
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.
Call Trace:
<TASK>
iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Reported by syzbot at:
https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87
To fix this, do two things:
1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
by splitting it into two entries.
Fixes: 0bbe30668d ("vhost: factor out IOTLB")
Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.
vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.
Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.
When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.
Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call reset using the wrapper function vdpa_reset() to make sure the
operation is serialized with cf_mutex.
This comes to protect from the following possible scenario:
vhost_vdpa_set_status() could call the reset op. Since the call is not
protected by cf_mutex, a netlink thread calling vdpa_dev_config_fill
could get passed the VIRTIO_CONFIG_S_FEATURES_OK check in
vdpa_dev_config_fill() and end up reporting wrong features.
Fixes: 5f6e85953d8f ("vdpa: Read device configuration only if FEATURES_OK")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220111183400.38418-3-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.
Example:
$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().
Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.
To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:
get_features -> get_device_features
set_features -> set_driver_features
Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The return type of get_config_size is size_t so it makes
sense to change the type of the variable holding its result.
That said, this already got taken care of (differently, and arguably
not as well) by commit 3ed21c1451 ("vdpa: check that offsets are
within bounds").
The added 'c->off > size' test in that commit will be done as an
unsigned comparison on 32-bit (safe due to not being signed).
On a 64-bit platform, it will be done as a signed comparison, but in
that case the comparison will be done in 64-bit, and 'c->off' being an
u32 it will be valid thanks to the extended range (ie both values will
be positive in 64 bits).
So this was a real bug, but it was already addressed and marked for stable.
Signed-off-by: Laura Abbott <labbott@kernel.org>
Reported-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>