Commit Graph

345 Commits

Author SHA1 Message Date
Jason Wang 92502fe86c virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support
When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
packet that exceeds a single page which could not be handled
correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
supported. While at it, forbid XDP for ECN (which comes only from GRO)
too to prevent user from misconfiguration.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:55 -05:00
Jason Wang 5c33474d41 virtio-net: make rx buf size estimation works for XDP
We don't update ewma rx buf size in the case of XDP. This will lead
underestimation of rx buf size which causes host to produce more than
one buffers. This will greatly increase the possibility of XDP page
linearization.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:55 -05:00
Jason Wang b00f70b0da virtio-net: unbreak csumed packets for XDP_PASS
We drop csumed packet when do XDP for packets. This breaks
XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag
to be set. With this patch, simple TCP works for XDP_PASS.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:54 -05:00
Jason Wang 1830f8935f virtio-net: correctly handle XDP_PASS for linearized packets
When XDP_PASS were determined for linearized packets, we try to get
new buffers in the virtqueue and build skbs from them. This is wrong,
we should create skbs based on existed buffers instead. Fixing them by
creating skb based on xdp_page.

With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:54 -05:00
Jason Wang 56a86f84b8 virtio-net: fix page miscount during XDP linearizing
We don't put page during linearizing, the would cause leaking when
xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
put page accordingly. Also decrease the number of buffers during
linearizing to make sure caller can free buffers correctly when packet
exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
huge number of packets.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:54 -05:00
Jason Wang 275be061b3 virtio-net: correctly xmit linearized page on XDP_TX
After we linearize page, we should xmit this page instead of the page
of first buffer which may lead unexpected result. With this patch, we
can see correct packet during XDP_TX.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:54 -05:00
Jason Wang 73b62bd085 virtio-net: remove the warning before XDP linearizing
Since we use EWMA to estimate the size of rx buffer. When rx buffer
size is underestimated, it's usual to have a packet with more than one
buffers. Consider this is not a bug, remove the warning and correct
the comment before XDP linearizing.

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 13:48:53 -05:00
John Fastabend 72979a6c35 virtio_net: xdp, add slowpath case for non contiguous buffers
virtio_net XDP support expects receive buffers to be contiguous.
If this is not the case we enable a slowpath to allow connectivity
to continue but at a significan performance overhead associated with
linearizing data. To make it painfully aware to users that XDP is
running in a degraded mode we throw an xdp buffer error.

To linearize packets we allocate a page and copy the segments of
the data, including the header, into it. After this the page can be
handled by XDP code flow as normal.

Then depending on the return code the page is either freed or sent
to the XDP xmit path. There is no attempt to optimize this path.

This case is being handled simple as a precaution in case some
unknown backend were to generate packets in this form. To test this
I had to hack qemu and force it to generate these packets. I do not
expect this case to be generated by "real" backends.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
John Fastabend 56434a01b1 virtio_net: add XDP_TX support
This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.

Before sending the packet the header is zeroed.  Also XDP is expected
to handle checksum correctly so no checksum offload  support is
provided.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
John Fastabend 672aafd5d8 virtio_net: add dedicated XDP transmit queues
XDP requires using isolated transmit queues to avoid interference
with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
adds a XDP queue per cpu when a XDP program is loaded and does not
expose the queues to the OS via the normal API call to
netif_set_real_num_tx_queues(). This way the stack will never push
an skb to these queues.

However virtio/vhost/qemu implementation only allows for creating
TX/RX queue pairs at this time so creating only TX queues was not
possible. And because the associated RX queues are being created I
went ahead and exposed these to the stack and let the backend use
them. This creates more RX queues visible to the network stack than
TX queues which is worth mentioning but does not cause any issues as
far as I can tell.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
John Fastabend f600b69050 virtio_net: Add XDP support
This adds XDP support to virtio_net. Some requirements must be
met for XDP to be enabled depending on the mode. First it will
only be supported with LRO disabled so that data is not pushed
across multiple buffers. Second the MTU must be less than a page
size to avoid having to handle XDP across multiple pages.

If mergeable receive is enabled this patch only supports the case
where header and data are in the same buf which we can check when
a packet is received by looking at num_buf. If the num_buf is
greater than 1 and a XDP program is loaded the packet is dropped
and a warning is thrown. When any_header_sg is set this does not
happen and both header and data is put in a single buffer as expected
so we check this when XDP programs are loaded.  Subsequent patches
will process the packet in a degraded mode to ensure connectivity
and correctness is not lost even if backend pushes packets into
multiple buffers.

If big packets mode is enabled and MTU/LRO conditions above are
met then XDP is allowed.

This patch was tested with qemu with vhost=on and vhost=off where
mergeable and big_packet modes were forced via hard coding feature
negotiation. Multiple buffers per packet was forced via a small
test patch to vhost.c in the vhost=on qemu mode.

Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
Jason Wang a220871be6 virtio-net: correctly enable multiqueue
Commit 4490001029 ("virtio-net: enable
multiqueue by default") blindly set the affinity instead of queues
during probe which can cause a mismatch of #queues between guest and
host. This patch fixes it by setting queues.

Reported-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-13 10:37:38 -05:00
David S. Miller c63d352f05 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-06 21:33:19 -05:00
Andy Lutomirski e37e2ff350 virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
pointer to the stack and it will OOPS.  Copy the address to the heap
to prevent the crash.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Reported-by: zbyszek@in.waw.pl
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-06 11:38:43 -05:00
Jason Wang 4490001029 virtio-net: enable multiqueue by default
We use single queue even if multiqueue is enabled and let admin to
enable it through ethtool later. This is used to avoid possible
regression (small packet TCP stream transmission). But looks like an
overkill since:

- single queue user can disable multiqueue when launching qemu
- brings extra troubles for the management since it needs extra admin
  tool in guest to enable multiqueue
- multiqueue performs much better than single queue in most of the
  cases

So this patch enables multiqueue by default: if #queues is less than or
equal to #vcpu, enable as much as queue pairs; if #queues is greater
than #vcpu, enable #vcpu queue pairs.

Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: Jeremy Eder <jeder@redhat.com>
Cc: Marko Myllynen <myllynen@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-28 13:17:40 -05:00
David S. Miller f9aa9dc7d2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All conflicts were simple overlapping changes except perhaps
for the Thunder driver.

That driver has a change_mtu method explicitly for sending
a message to the hardware.  If that fails it returns an
error.

Normally a driver doesn't need an ndo_change_mtu method becuase those
are usually just range changes, which are now handled generically.
But since this extra operation is needed in the Thunder driver, it has
to stay.

However, if the message send fails we have to restore the original
MTU before the change because the entire call chain expects that if
an error is thrown by ndo_change_mtu then the MTU did not change.
Therefore code is added to nicvf_change_mtu to remember the original
MTU, and to restore it upon nicvf_update_hw_max_frs() failue.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-22 13:27:16 -05:00
Eric Dumazet 963abe5c8a virtio-net: add a missing synchronize_net()
It seems many drivers do not respect napi_hash_del() contract.

When napi_hash_del() is used before netif_napi_del(), an RCU grace
period is needed before freeing NAPI object.

Fixes: 91815639d8 ("virtio-net: rx busy polling support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-16 15:09:29 -05:00
David S. Miller bb598c1b8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of bug fixes in 'net' overlapping other changes in
'net-next-.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15 10:54:36 -05:00
Michael S. Tsirkin f3358507c1 virtio-net: drop legacy features in virtio 1 mode
Virtio 1.0 spec says VIRTIO_F_ANY_LAYOUT and VIRTIO_NET_F_GSO are
legacy-only feature bits. Do not negotiate them in virtio 1 mode.  Note
this is a spec violation so we need to backport it to stable/downstream
kernels.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-07 20:35:46 -05:00
Aaron Conole 93a205ee98 virtio-net: Update the mtu code to match virtio spec
The virtio committee recently ratified a change, VIRTIO-152, which
defines the mtu field to be 'max' MTU, not simply desired MTU.

This commit brings the virtio-net device in compliance with VIRTIO-152.

Additionally, drop the max_mtu branch - it cannot be taken since the u16
returned by virtio_cread16 will never exceed the initial value of
max_mtu.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Acked-by: "Michael S. Tsirkin" <mst@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 11:58:51 -04:00
Jarod Wilson d0c2c9973e net: use core MTU range checking in virt drivers
hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add

virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu

vmxnet3:
- set min/max_mtu

xen-netback:
- min_mtu = 0, max_mtu = 65517

xen-netfront:
- min_mtu = 0, max_mtu = 65535

unisys/visor:
- clean up defines a little to not clash with network core or add
  redundat definitions

CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Sebastian Andrzej Siewior 8017c27919 net/virtio-net: Convert to hotplug state machine
Install the callbacks via the state machine.

The driver supports multiple instances and therefore the new
cpuhp_state_add_instance_nocalls() infrastrucure is used. The driver
currently uses get_online_cpus() to avoid missing a CPU hotplug event while
invoking virtnet_set_affinity(). This could be avoided by using
cpuhp_state_add_instance() variant which holds the hotplug lock and invokes
callback during registration. This is more or less a 1:1 conversion of the
current code.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Will Deacon <will.deacon@arm.com>
Cc: virtualization@lists.linux-foundation.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/1471024183-12666-7-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-02 20:05:06 +02:00
Andy Lutomirski a725ee3e44 virtio-net: Remove more stack DMA
VLAN and MQ control was doing DMA from the stack.  Fix it.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 19:25:43 -07:00
Mike Rapoport d1dc06dcd0 virtio_net: fix csum generation for virtio-net devices
The commit e858fae2b0 ("virtio_net: use common code for virtio_net_hdr
and skb GSO conversion") replaced the tun code for header manipulation
with the generic helpers. While doing so, it implictly moved the
skb_partial_csum_set() invocation after eth_type_trans(), which
invalidate the current gso start/offset values.
Fix it by moving the helper invocation before the mac pulling.

Fixes: e858fae2b0 ("virtio_net: use common code for virtio_net_hdr and
skb GSO conversion")

Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14 03:03:34 -04:00
Mike Rapoport e858fae2b0 virtio_net: use common code for virtio_net_hdr and skb GSO conversion
Replace open coded conversion between virtio_net_hdr to skb GSO info with
virtio_net_hdr_{from,to}_skb

Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-10 23:03:55 -07:00