Commit Graph

808 Commits

Author SHA1 Message Date
Linus Torvalds f4f5d7cfb2 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:

 - vdpa generic device type support

 - more virtio hardening for broken devices (but on the same theme,
   revert some virtio hotplug hardening patches - they were misusing
   some interrupt flags and had to be reverted)

 - RSS support in virtio-net

 - max device MTU support in mlx5 vdpa

 - akcipher support in virtio-crypto

 - shared IRQ support in ifcvf vdpa

 - a minor performance improvement in vhost

 - enable virtio mem for ARM64

 - beginnings of advance dma support

 - cleanups, fixes all over the place

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits)
  vdpa/mlx5: Avoid processing works if workqueue was destroyed
  vhost: handle error while adding split ranges to iotlb
  vdpa: support exposing the count of vqs to userspace
  vdpa: change the type of nvqs to u32
  vdpa: support exposing the config size to userspace
  vdpa/mlx5: re-create forwarding rules after mac modified
  virtio: pci: check bar values read from virtio config space
  Revert "virtio_pci: harden MSI-X interrupts"
  Revert "virtio-pci: harden INTX interrupts"
  drivers/net/virtio_net: Added RSS hash report control.
  drivers/net/virtio_net: Added RSS hash report.
  drivers/net/virtio_net: Added basic RSS support.
  drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
  virtio: use virtio_device_ready() in virtio_device_restore()
  tools/virtio: compile with -pthread
  tools/virtio: fix after premapped buf support
  virtio_ring: remove flags check for unmap packed indirect desc
  virtio_ring: remove flags check for unmap split indirect desc
  virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed()
  net/mlx5: Add support for configuring max device MTU
  ...
2022-03-31 13:57:15 -07:00
Anirudh Rayabharam 03a91c9af2 vhost: handle error while adding split ranges to iotlb
vhost_iotlb_add_range_ctx() handles the range [0, ULONG_MAX] by
splitting it into two ranges and adding them separately. The return
value of adding the first range to the iotlb is currently ignored.
Check the return value and bail out in case of an error.

Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220312141121.4981-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb entries")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28 16:53:00 -04:00
Longpeng b04d910af3 vdpa: support exposing the count of vqs to userspace
- GET_VQS_COUNT: the count of virtqueues that exposed

Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220315032553.455-4-longpeng2@huawei.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28 16:53:00 -04:00
Longpeng 81d46d6931 vdpa: change the type of nvqs to u32
Change vdpa_device.nvqs and vhost_vdpa.nvqs to use u32

Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220315032553.455-3-longpeng2@huawei.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br></blockquote><div><br></div><div>Acked-by: Jason Wang &lt;<a href="mailto:jasowang@redhat.com">jasowang@redhat.com</a>&gt;</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28 16:53:00 -04:00
Longpeng a61280dddd vdpa: support exposing the config size to userspace
- GET_CONFIG_SIZE: return the size of the virtio config space.

The size contains the fields which are conditional on feature
bits.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Longpeng <longpeng2@huawei.com>
Link: https://lore.kernel.org/r/20220315032553.455-2-longpeng2@huawei.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28 16:53:00 -04:00
Zhu Lingshan cce0ab2b2a vhost_vdpa: don't setup irq offloading when irq_num < 0
When irq number is negative(e.g., -EINVAL), the virtqueue
may be disabled or the virtqueues are sharing a device irq.
In such case, we should not setup irq offloading for a virtqueue.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28 16:52:57 -04:00
Stefano Garzarella d3bb267bbd vhost: cache avail index in vhost_enable_notify()
In vhost_enable_notify() we enable the notifications and we read
the avail index to check if new buffers have become available in
the meantime.

We are not caching the avail index, so when the device will call
vhost_get_vq_desc(), it will find the old value in the cache and
it will read the avail index again.

It would be better to refresh the cache every time we read avail
index, so let's change vhost_enable_notify() caching the value in
`avail_idx` and compare it with `last_avail_idx` to check if there
are new buffers available.

We don't expect a significant performance boost because
the above path is not very common, indeed vhost_enable_notify()
is often called with unlikely(), expecting that avail index has
not been updated.

We ran virtio-test/vhost-test and noticed minimal improvement as
expected. To stress the patch more, we modified vhost_test.ko to
call vhost_enable_notify()/vhost_disable_notify() on every cycle
when calling vhost_get_vq_desc(); in this case we observed a more
evident improvement, with a reduction of the test execution time
of about 3.7%.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220121153108.187291-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-03-28 16:52:57 -04:00
Jakub Kicinski e243f39685 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17 13:56:58 -07:00
Linus Torvalds 551acdc3c3 Merge tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter, ipsec, and wireless.

  A few last minute revert / disable and fix patches came down from our
  sub-trees. We're not waiting for any fixes at this point.

  Current release - regressions:

   - Revert "netfilter: nat: force port remap to prevent shadowing
     well-known ports", restore working conntrack on asymmetric paths

   - Revert "ath10k: drop beacon and probe response which leak from
     other channel", restore working AP and mesh mode on QCA9984

   - eth: intel: fix hang during reboot/shutdown

  Current release - new code bugs:

   - netfilter: nf_tables: disable register tracking, it needs more work
     to cover all corner cases

  Previous releases - regressions:

   - ipv6: fix skb_over_panic in __ip6_append_data when (admin-only)
     extension headers get specified

   - esp6: fix ESP over TCP/UDP, interpret ipv6_skip_exthdr's return
     value more selectively

   - bnx2x: fix driver load failure when FW not present in initrd

  Previous releases - always broken:

   - vsock: stop destroying unrelated sockets in nested virtualization

   - packet: fix slab-out-of-bounds access in packet_recvmsg()

  Misc:

   - add Paolo Abeni to networking maintainers!"

* tag 'net-5.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (26 commits)
  iavf: Fix hang during reboot/shutdown
  net: mscc: ocelot: fix backwards compatibility with single-chain tc-flower offload
  net: bcmgenet: skip invalid partial checksums
  bnx2x: fix built-in kernel driver load failure
  net: phy: mscc: Add MODULE_FIRMWARE macros
  net: dsa: Add missing of_node_put() in dsa_port_parse_of
  net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit()
  Revert "ath10k: drop beacon and probe response which leak from other channel"
  hv_netvsc: Add check for kvmalloc_array
  iavf: Fix double free in iavf_reset_task
  ice: destroy flow director filter mutex after releasing VSIs
  ice: fix NULL pointer dereference in ice_update_vsi_tx_ring_stats()
  Add Paolo Abeni to networking maintainers
  atm: eni: Add check for dma_map_single
  net/packet: fix slab-out-of-bounds access in packet_recvmsg()
  net: mdio: mscc-miim: fix duplicate debugfs entry
  net: phy: marvell: Fix invalid comparison in the resume and suspend functions
  esp6: fix check on ipv6_skip_exthdr's return value
  net: dsa: microchip: add spi_device_id tables
  netfilter: nf_tables: disable register tracking
  ...
2022-03-17 12:55:26 -07:00
Jiyong Park 8e6ed96376 vsock: each transport cycles only on its own sockets
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11 23:14:19 -08:00
Jakub Kicinski 1e8a3f0d2a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
net/dsa/dsa2.c
  commit afb3cc1a39 ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
  commit e83d565378 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/

drivers/net/ethernet/intel/ice/ice.h
  commit 97b0129146 ("ice: Fix error with handling of bonding MTU")
  commit 43113ff734 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/

drivers/staging/gdm724x/gdm_lte.c
  commit fc7f750dc9 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
  commit 4bcc4249b4 ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10 17:16:56 -08:00
Jason Wang 95932ab2ea vhost: allow batching hint without size
Commit e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb
entries") tries to reject the IOTLB message whose size is zero. But
the size is not necessarily meaningful, one example is the batching
hint, so the commit breaks that.

Fixing this be reject zero size message only if the message is used to
update/invalidate the IOTLB.

Fixes: e2ae38cf3d ("vhost: fix hung thread due to erroneous iotlb entries")
Reported-by: Eli Cohen <elic@nvidia.com>
Cc: Anirudh Rayabharam <mail@anirudhrb.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eli Cohen <elic@nvidia.com>
2022-03-10 08:12:04 -05:00
Stefano Garzarella 4c8093637b vhost: remove avail_event arg from vhost_update_avail_event()
In vhost_update_avail_event() we never used the `avail_event` argument,
since its introduction in commit 2723feaa8e ("vhost: set log when
updating used flags or avail event").

Let's remove it to clean up the code.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220113141134.186773-1-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 06:06:50 -05:00
Anirudh Rayabharam e2ae38cf3d vhost: fix hung thread due to erroneous iotlb entries
In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when
start is 0 and last is ULONG_MAX. One instance where it can happen
is when userspace sends an IOTLB message with iova=size=uaddr=0
(vhost_process_iotlb_msg). So, an entry with size = 0, start = 0,
last = ULONG_MAX ends up in the iotlb. Next time a packet is sent,
iotlb_access_ok() loops indefinitely due to that erroneous entry.

	Call Trace:
	 <TASK>
	 iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340
	 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366
	 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104
	 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372
	 kthread+0x2e9/0x3a0 kernel/kthread.c:377
	 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
	 </TASK>

Reported by syzbot at:
	https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87

To fix this, do two things:

1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map
   a range with size 0.
2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX]
   by splitting it into two entries.

Fixes: 0bbe30668d ("vhost: factor out IOTLB")
Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com
Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com>
Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 06:05:45 -05:00
Si-Wei Liu e0077cc13b vdpa: factor out vdpa_set_features_unlocked for vdpa internal use
No functional change introduced. vdpa bus driver such as virtio_vdpa
or vhost_vdpa is not supposed to take care of the locking for core
by its own. The locked API vdpa_set_features should suffice the
bus driver's need.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04 11:56:33 -05:00
Harold Huang 74a335a07a tuntap: add sanity checks about msg_controllen in sendmsg
In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to
tun_sendmsg. Although we donot use msg_controllen in this path, we should
check msg_controllen to make sure the caller pass a valid msg_ctl.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Harold Huang <baymaxhuang@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-02 22:00:59 -08:00
Stefano Garzarella a58da53ffd vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.

vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.

Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.

When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.

Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-23 12:32:33 +00:00
Eli Cohen 680ab9d69a vdpa: Protect vdpa reset with cf_mutex
Call reset using the wrapper function vdpa_reset() to make sure the
operation is serialized with cf_mutex.

This comes to protect from the following possible scenario:

vhost_vdpa_set_status() could call the reset op. Since the call is not
protected by cf_mutex, a netlink thread calling vdpa_dev_config_fill
could get passed the VIRTIO_CONFIG_S_FEATURES_OK check in
vdpa_dev_config_fill() and end up reporting wrong features.

Fixes: 5f6e85953d8f ("vdpa: Read device configuration only if FEATURES_OK")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220111183400.38418-3-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:54 -05:00
Eli Cohen f6d955d808 vdpa: Avoid taking cf_mutex lock on get status
Avoid the wrapper holding cf_mutex since it is not protecting anything.
To avoid confusion and unnecessary overhead incurred by it, remove.

Fixes: f489f27bc0ab ("vdpa: Sync calls set/get config/status with cf_mutex")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220111183400.38418-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:54 -05:00
Eli Cohen aba21aff77 vdpa: Allow to configure max data virtqueues
Add netlink support to configure the max virtqueue pairs for a device.
At least one pair is required. The maximum is dictated by the device.

Example:
$ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Eli Cohen 73bc0dbb59 vdpa: Sync calls set/get config/status with cf_mutex
Add wrappers to get/set status and protect these operations with
cf_mutex to serialize these operations with respect to get/set config
operations.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Eli Cohen a64917bc2e vdpa: Provide interface to read driver features
Provide an interface to read the negotiated features. This is needed
when building the netlink message in vdpa_dev_net_config_fill().

Also fix the implementation of vdpa_dev_net_config_fill() to use the
negotiated features instead of the device features.

To make APIs clearer, make the following name changes to struct
vdpa_config_ops so they better describe their operations:

get_features -> get_device_features
set_features -> set_driver_features

Finally, add get_driver_features to return the negotiated features and
add implementation to all the upstream drivers.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Laura Abbott 870aaff92e vdpa: clean up get_config_size ret value handling
The return type of get_config_size is size_t so it makes
sense to change the type of the variable holding its result.

That said, this already got taken care of (differently, and arguably
not as well) by commit 3ed21c1451 ("vdpa: check that offsets are
within bounds").

The added 'c->off > size' test in that commit will be done as an
unsigned comparison on 32-bit (safe due to not being signed).

On a 64-bit platform, it will be done as a signed comparison, but in
that case the comparison will be done in 64-bit, and 'c->off' being an
u32 it will be valid thanks to the extended range (ie both values will
be positive in 64 bits).

So this was a real bug, but it was already addressed and marked for stable.

Signed-off-by: Laura Abbott <labbott@kernel.org>
Reported-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:53 -05:00
Xianting Tian 0800639207 vhost/test: fix memory leak of vhost virtqueues
We need free the vqs in .release(), which are allocated in .open().

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211228030924.3468439-1-xianting.tian@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14 18:50:53 -05:00
Eugenio Pérez 23118b09e6 vdpa: Avoid duplicate call to vp_vdpa get_status
It has no sense to call get_status twice, since we already have a
variable for that.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Link: https://lore.kernel.org/r/20211104195833.2089796-1-eperezma@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14 18:50:52 -05:00