commit d631b94e7a
virtio: change comment in transmit
started clarifying the logic behind queue state management,
but introduced an inaccuracy: TX_BUSY does not cause
a BUG message.
Clean this up some more, explaining the tradeoffs in detail.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
received is 0, no need to minus it and use "+=" to reassign it
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The original comment was not really informative or funny
as well as sexist. Replace it with a better explanation of
why the driver does stop and what the impacts are.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't delete napi from hash list during module exit. This will
cause the following panic when doing module load and unload:
BUG: unable to handle kernel paging request at 0000004e00000075
IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0
PGD 3c5d5067 PUD 0
Oops: 0000 [#1] SMP
...
Call Trace:
[<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net]
[<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net]
[<ffffffff8139e667>] virtio_dev_probe+0x137/0x200
[<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250
[<ffffffff814c81d3>] __driver_attach+0x93/0xa0
[<ffffffff814c8140>] ? __device_attach+0x40/0x40
[<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0
[<ffffffff814c7a79>] driver_attach+0x19/0x20
[<ffffffff814c76f0>] bus_add_driver+0x170/0x220
[<ffffffffa0a60000>] ? 0xffffffffa0a60000
[<ffffffff814c894f>] driver_register+0x5f/0xf0
[<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30
[<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net]
This patch fixes this by doing this in virtnet_free_queues(). And also
don't delete napi in virtnet_freeze() since it will call
virtnet_free_queues() which has already did this.
Fixes 91815639d8 ("virtio-net: rx busy polling support")
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull virtio updates from Rusty Russell:
"OK, this has the big virtio 1.0 implementation, as specified by OASIS.
On top of tht is the major rework of lguest, to use PCI and virtio
1.0, to double-check the implementation.
Then comes the inevitable fixes and cleanups from that work"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
virtio_net: unconditionally define struct virtio_net_hdr_v1.
tools/lguest: don't use legacy definitions for net device in example launcher.
virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
tools/lguest: use common error macros in the example launcher.
tools/lguest: give virtqueues names for better error messages
tools/lguest: more documentation and checking of virtio 1.0 compliance.
lguest: don't look in console features to find emerg_wr.
tools/lguest: don't start devices until DRIVER_OK status set.
tools/lguest: handle indirect partway through chain.
tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
tools/lguest: rename virtio_pci_cfg_cap field to match spec.
tools/lguest: fix features_accepted logic in example launcher.
tools/lguest: handle device reset correctly in example launcher.
virtual: Documentation: simplify and generalize paravirt_ops.txt
lguest: remove NOTIFY call and eventfd facility.
lguest: remove NOTIFY facility from demonstration launcher.
lguest: use the PCI console device's emerg_wr for early boot messages.
lguest: always put console in PCI slot #1.
...
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 3d0ad09412.
Now that GSO functionality can correctly track if the fragment
id has been selected and select a fragment id if necessary,
we can re-enable UFO on tap/macvap and virtio devices.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/net needs config space access so make it
fail gracefully if not there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's no need to do header check for virtio-net since:
- Host sets dodgy for all gso packets from guest and check the header.
- Host should be prepared for all kinds of evil packets from guest, since
malicious guest can send any kinds of packet.
So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit d75b1ade56 (net: less
interrupt masking in NAPI) breaks virtio_net in an insidious way.
It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty. However, like some
other drivers virtio_net tries to do a last-ditch check and if
there is more work it will call napi_schedule and then immediately
process some of this new work. Should the entire budget be consumed
while processing such new work then we will violate the new caller
contract.
This patch fixes this by not touching any work when we reschedule
in virtio_net.
The worst part of this bug is that the list corruption causes other
napi users to be moved off-list. In my case I was chasing a stall
in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
was virtio_net which caused the stall even though the virtio_net
poll was still functioning perfectly after IPsec stalled.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spec states that mac in config space is only driver-writable in the
legacy case. Fence writing it in virtnet_set_mac_address() in the
virtio 1.0 case.
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
With VERSION_1 virtio_net uses same header size
whether mergeable buffers are enabled or not.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Our buffer length check is not strict enough for mergeable
buffers: buffer can still be shorter that header + address
by 2 bytes.
Fix that up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
virtio 1.0 doesn't use virtio_net_hdr anymore, and in fact, it's not
really useful since virtio_net_hdr_mrg_rxbuf includes that as the first
field anyway.
Let's drop it, precalculate header len and store within vi instead.
This way we can also remove struct skb_vnet_hdr.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Too many places poke at [rs]q->vq->vdev->priv just to get
the vi structure. Let's just pass the pointer around: seems
cleaner, and might even be faster.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Based on patches by Rusty Russell, Cornelia Huck.
Note: more code changes are needed for 1.0 support
(due to different header size).
So we don't advertize support for 1.0 yet.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
IPv6 does not allow fragmentation by routers, so there is no
fragmentation ID in the fixed header. UFO for IPv6 requires the ID to
be passed separately, but there is no provision for this in the virtio
net protocol.
Until recently our software implementation of UFO/IPv6 generated a new
ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet
passed through a tap, which is even worse.
Unfortunately there is no distinction between UFO/IPv4 and v6
features, so disable UFO on taps and virtio_net completely until we
have a proper solution.
We cannot depend on VM managers respecting the tap feature flags, so
keep accepting UFO packets but log a warning the first time we do
this.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
commit 0b725a2ca6
net: Remove ndo_xmit_flush netdev operation, use signalling instead.
added code that looks at skb->xmit_more after the skb has
been put in TX VQ. Since some paths process the ring and free the skb
immediately, this can cause use after free.
Fix by storing xmit_more in a local variable.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after restore returns, virtio net violated this
rule by using receive VQs within restore.
To fix, call virtio_device_ready before using VQs.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In the extremely unlikely event that driver initialization fails after
RX buffers are added, virtio net frees RX buffers while VQs are
still active, potentially causing device to use a freed buffer.
To fix, reset device first - same as we do on device removal.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>