Commit Graph

274 Commits

Author SHA1 Message Date
Linus Torvalds
f56dbdda43 Merge tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv updates from Wei Liu:

 - Harden hv_sock driver (Andrea Parri)

 - Harden Hyper-V PCI driver (Andrea Parri)

 - Fix multi-MSI for Hyper-V PCI driver (Jeffrey Hugo)

 - Fix Hyper-V PCI to reduce boot time (Dexuan Cui)

 - Remove code for long EOL'ed Hyper-V versions (Michael Kelley, Saurabh
   Sengar)

 - Fix balloon driver error handling (Shradha Gupta)

 - Fix a typo in vmbus driver (Julia Lawall)

 - Ignore vmbus IMC device (Michael Kelley)

 - Add a new error message to Hyper-V DRM driver (Saurabh Sengar)

* tag 'hyperv-next-signed-20220528' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (28 commits)
  hv_balloon: Fix balloon_probe() and balloon_remove() error handling
  scsi: storvsc: Removing Pre Win8 related logic
  Drivers: hv: vmbus: fix typo in comment
  PCI: hv: Fix synchronization between channel callback and hv_pci_bus_exit()
  PCI: hv: Add validation for untrusted Hyper-V values
  PCI: hv: Fix interrupt mapping for multi-MSI
  PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()
  drm/hyperv: Remove support for Hyper-V 2008 and 2008R2/Win7
  video: hyperv_fb: Remove support for Hyper-V 2008 and 2008R2/Win7
  scsi: storvsc: Remove support for Hyper-V 2008 and 2008R2/Win7
  Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7
  x86/hyperv: Disable hardlockup detector by default in Hyper-V guests
  drm/hyperv: Add error message for fb size greater than allocated
  PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time
  PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI
  Drivers: hv: vmbus: Refactor the ring-buffer iterator functions
  Drivers: hv: vmbus: Accept hv_sock offers in isolated guests
  hv_sock: Add validation for untrusted Hyper-V values
  hv_sock: Copy packets sent by Hyper-V out of the ring buffer
  hv_sock: Check hv_pkt_iter_first_raw()'s return value
  ...
2022-05-28 11:39:01 -07:00
Stefano Garzarella
bd50c5dc18 vsock/virtio: add support for device suspend/resume
Implement .freeze and .restore callbacks of struct virtio_driver
to support device suspend/resume.

During suspension all connected sockets are reset and VQs deleted.
During resume the VQs are re-initialized.

Reported by: Vilas R K <vilas.r.k@intel.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-02 16:04:34 -07:00
Stefano Garzarella
a103209886 vsock/virtio: factor our the code to initialize and delete VQs
Add virtio_vsock_vqs_init() and virtio_vsock_vqs_del() with the code
that was in virtio_vsock_probe() and virtio_vsock_remove to initialize
and delete VQs.

These new functions will be used in the next commit to support device
suspend/resume

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-02 16:04:34 -07:00
Andrea Parri (Microsoft)
dbde6d0c7a hv_sock: Add validation for untrusted Hyper-V values
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer.  Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hvs_stream_dequeue().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220428145107.7878-4-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-28 15:01:14 +00:00
Andrea Parri (Microsoft)
066f3377fb hv_sock: Copy packets sent by Hyper-V out of the ring buffer
Pointers to VMbus packets sent by Hyper-V are used by the hv_sock driver
within the guest VM.  Hyper-V can send packets with erroneous values or
modify packet fields after they are processed by the guest.  To defend
against these scenarios, copy the incoming packet after validating its
length and offset fields using hv_pkt_iter_{first,next}().  Use
HVS_PKT_LEN(HVS_MTU_SIZE) to initialize the buffer which holds the
copies of the incoming packets.  In this way, the packet can no longer
be modified by the host.

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220428145107.7878-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-28 15:01:14 +00:00
Andrea Parri (Microsoft)
71abb94ff6 hv_sock: Check hv_pkt_iter_first_raw()'s return value
The function returns NULL if the ring buffer doesn't contain enough
readable bytes to constitute a packet descriptor.  The ring buffer's
write_index is in memory which is shared with the Hyper-V host, an
erroneous or malicious host could thus change its value and overturn
the result of hvs_stream_has_data().

Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220428145107.7878-2-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-04-28 15:01:14 +00:00
Oliver Hartkopp
f4b41f062c net: remove noblock parameter from skb_recv_datagram()
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'

As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
into 'flags' and 'noblock' with finally obsolete bit operations like this:

skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);

And this is not even done consistently with the 'flags' parameter.

This patch removes the obsolete and costly splitting into two parameters
and only performs bit operations when really needed on the caller side.

One missing conversion thankfully reported by kernel test robot. I missed
to enable kunit tests to build the mctp code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 13:45:26 +01:00
Stefano Garzarella
88704454ef vsock/virtio: enable VQs early on probe
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after probe returns, but virtio-vsock
driver uses VQs in the probe function to fill rx and event VQs
with new buffers.

Let's fix this, calling virtio_device_ready() before using VQs
in the probe function.

Fixes: 0ea9e1d3a9 ("VSOCK: Introduce virtio_transport.ko")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-24 18:36:36 -07:00
Stefano Garzarella
c1011c0b3a vsock/virtio: read the negotiated features before using VQs
Complete the driver configuration, reading the negotiated features,
before using the VQs in the virtio_vsock_probe().

Fixes: 53efbba12c ("virtio/vsock: enable SEQPACKET for transport")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-24 18:36:36 -07:00
Stefano Garzarella
4b5f1ad556 vsock/virtio: initialize vdev->priv before using VQs
When we fill VQs with empty buffers and kick the host, it may send
an interrupt. `vdev->priv` must be initialized before this since it
is used in the virtqueue callbacks.

Fixes: 0deab087b1 ("vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-24 18:36:35 -07:00
Jiyong Park
8e6ed96376 vsock: each transport cycles only on its own sockets
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.

There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.

Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.

[1] https://android.googlesource.com/device/google/cuttlefish/

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jiyong Park <jiyong@google.com>
Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11 23:14:19 -08:00
Seth Forshee
b9208492fc vsock: remove vsock from connected table when connect is interrupted by a signal
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.

Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.

Note for backporting: this patch requires d5afa82c97 ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17 08:56:02 -08:00
Linus Torvalds
3bf6a9e36e Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
 "virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes.

   - partial support for < MAX_ORDER - 1 granularity for virtio-mem

   - driver_override for vdpa

   - sysfs ABI documentation for vdpa

   - multiqueue config support for mlx5 vdpa

   - and misc fixes, cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits)
  vdpa/mlx5: Fix tracking of current number of VQs
  vdpa/mlx5: Fix is_index_valid() to refer to features
  vdpa: Protect vdpa reset with cf_mutex
  vdpa: Avoid taking cf_mutex lock on get status
  vdpa/vdpa_sim_net: Report max device capabilities
  vdpa: Use BIT_ULL for bit operations
  vdpa/vdpa_sim: Configure max supported virtqueues
  vdpa/mlx5: Report max device capabilities
  vdpa: Support reporting max device capabilities
  vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
  vdpa: Add support for returning device configuration information
  vdpa/mlx5: Support configuring max data virtqueue
  vdpa/mlx5: Fix config_attr_mask assignment
  vdpa: Allow to configure max data virtqueues
  vdpa: Read device configuration only if FEATURES_OK
  vdpa: Sync calls set/get config/status with cf_mutex
  vdpa/mlx5: Distribute RX virtqueues in RQT object
  vdpa: Provide interface to read driver features
  vdpa: clean up get_config_size ret value handling
  virtio_ring: mark ring unused on error
  ...
2022-01-18 10:05:48 +02:00
Michael S. Tsirkin
d9679d0013 virtio: wrap config->reset calls
This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14 18:50:52 -05:00
David S. Miller
e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Jakub Kicinski
b6459415b3 net: Don't include filter.h from net/sock.h
sock.h is pretty heavily used (5k objects rebuilt on x86 after
it's touched). We can drop the include of filter.h from it and
add a forward declaration of struct sk_filter instead.
This decreases the number of rebuilt objects when bpf.h
is touched from ~5k to ~1k.

There's a lot of missing includes this was masking. Primarily
in networking tho, this time.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29 08:48:14 -08:00
Jakub Kicinski
7cd2802d74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 16:13:19 -08:00
Wei Wang
1db8f5fc2e virtio/vsock: fix the transport to work with VMADDR_CID_ANY
The VMADDR_CID_ANY flag used by a socket means that the socket isn't bound
to any specific CID. For example, a host vsock server may want to be bound
with VMADDR_CID_ANY, so that a guest vsock client can connect to the host
server with CID=VMADDR_CID_HOST (i.e. 2), and meanwhile, a host vsock
client can connect to the same local server with CID=VMADDR_CID_LOCAL
(i.e. 1).

The current implementation sets the destination socket's svm_cid to a
fixed CID value after the first client's connection, which isn't an
expected operation. For example, if the guest client first connects to the
host server, the server's svm_cid gets set to VMADDR_CID_HOST, then other
host clients won't be able to connect to the server anymore.

Reproduce steps:
1. Run the host server:
   socat VSOCK-LISTEN:1234,fork -
2. Run a guest client to connect to the host server:
   socat - VSOCK-CONNECT:2:1234
3. Run a host client to connect to the host server:
   socat - VSOCK-CONNECT:1:1234

Without this patch, step 3. above fails to connect, and socat complains
"socat[1720] E connect(5, AF=40 cid:1 port:1234, 16): Connection
reset by peer".
With this patch, the above works well.

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Link: https://lore.kernel.org/r/20211126011823.1760-1-wei.w.wang@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-12-08 15:41:50 -05:00
Kees Cook
c0e084e342 hv_sock: Extract hvs_send_data() helper that takes only header
When building under -Warray-bounds, the compiler is especially
conservative when faced with casts from a smaller object to a larger
object. While this has found many real bugs, there are some cases that
are currently false positives (like here). With this as one of the last
few instances of the warning in the kernel before -Warray-bounds can be
enabled globally, rearrange the functions so that there is a header-only
version of hvs_send_data(). Silences this warning:

net/vmw_vsock/hyperv_transport.c: In function 'hvs_shutdown_lock_held.constprop':
net/vmw_vsock/hyperv_transport.c:231:32: warning: array subscript 'struct hvs_send_buf[0]' is partly outside array bounds of 'struct vmpipe_proto_header[1]' [-Warray-bounds]
  231 |         send_buf->hdr.pkt_type = 1;
      |         ~~~~~~~~~~~~~~~~~~~~~~~^~~
net/vmw_vsock/hyperv_transport.c:465:36: note: while referencing 'hdr'
  465 |         struct vmpipe_proto_header hdr;
      |                                    ^~~

This change results in no executable instruction differences.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/20211207063217.2591451-1-keescook@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:49:19 -08:00
Eiichi Tsukata
c7cd82b905 vsock: prevent unnecessary refcnt inc for nonblocking connect
Currently vosck_connect() increments sock refcount for nonblocking
socket each time it's called, which can lead to memory leak if
it's called multiple times because connect timeout function decrements
sock refcount only once.

Fixes it by making vsock_connect() return -EALREADY immediately when
sock state is already SS_CONNECTING.

Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-10 14:36:11 +00:00
Richard Palethorpe
4c1e34c0db vsock: Enable y2038 safe timeval for timeout
Reuse the timeval compat code from core/sock to handle 32-bit and
64-bit timeval structures. Also introduce a new socket option define
to allow using y2038 safe timeval under 32-bit.

The existing behavior of sock_set_timeout and vsock's timeout setter
differ when the time value is out of bounds. vsocks current behavior
is retained at the expense of not being able to share the full
implementation.

This allows the LTP test vsock01 to pass under 32-bit compat mode.

Fixes: fe0c72f3db ("socket: move compat timeout handling into sock.c")
Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Cc: Richard Palethorpe <rpalethorpe@richiejp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08 16:21:53 +01:00
Richard Palethorpe
685c3f2fba vsock: Refactor vsock_*_getsockopt to resemble sock_getsockopt
In preparation for sharing the implementation of sock_get_timeout.

Signed-off-by: Richard Palethorpe <rpalethorpe@suse.com>
Cc: Richard Palethorpe <rpalethorpe@richiejp.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08 16:21:53 +01:00
Arseny Krasnov
8fc92b7c15 af_vsock: rename variables in receive loop
Record is supported via MSG_EOR flag, while current logic operates
with message, so rename variables from 'record' to 'message'.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210903123306.3273757-1-arseny.krasnov@kaspersky.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06 02:25:16 -04:00
Arseny Krasnov
8d5ac871b5 virtio/vsock: support MSG_EOR bit processing
If packet has 'EOR' bit - set MSG_EOR in 'recvmsg()' flags.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210903123251.3273639-1-arseny.krasnov@kaspersky.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-05 16:23:09 -04:00
Arseny Krasnov
9af8f10616 virtio/vsock: rename 'EOR' to 'EOM' bit.
This current implemented bit is used to mark end of messages
('EOM' - end of message), not records('EOR' - end of record).
Also rename 'record' to 'message' in implementation as it is
different things.

Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210903123109.3273053-1-arseny.krasnov@kaspersky.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-05 16:23:08 -04:00