Commit Graph

92 Commits

Author SHA1 Message Date
Vlad Yasevich 40b8fe45d1 macvtap: Fix race between device delete and open.
In macvtap device delete and open calls can race and
this causes a list curruption of the vlan queue_list.

The race intself is triggered by the idr accessors
that located the vlan device.  The device is stored
into and removed from the idr under both an rtnl and
a mutex.  However, when attempting to locate the device
in idr, only a mutex is taken.  As a result, once cpu
perfoming a delete may take an rtnl and wait for the mutex,
while another cput doing an open() will take the idr
mutex first to fetch the device pointer and later take
an rtnl to add a queue for the device which may have
just gotten deleted.

With this patch, we now hold the rtnl for the duration
of the macvtap_open() call thus making sure that
open will not race with delete.

CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:20:26 -04:00
Vlad Yasevich cbdb04279c mactap: Fix checksum errors for non-gso packets in bridge mode
The following is a problematic configuration:

 VM1: virtio-net device connected to macvtap0@eth0
 VM2: e1000 device connect to macvtap1@eth0

The problem is is that virtio-net supports checksum offloading
and thus sends the packets to the host with CHECKSUM_PARTIAL set.
On the other hand, e1000 does not support any acceleration.

For small TCP packets (and this includes the 3-way handshake),
e1000 ends up receiving packets that only have a partial checksum
set.  This causes TCP to fail checksum validation and to drop
packets.  As a result tcp connections can not be established.

Commit 3e4f8b7873
	macvtap: Perform GSO on forwarding path.
fixes this issue for large packets wthat will end up undergoing GSO.
This commit adds a check for the non-GSO case and attempts to
compute the checksum for partially checksummed packets in the
non-GSO case.

CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrian Nord <nightnord@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-30 16:12:22 -04:00
Paul Gortmaker a81ab36bf5 drivers/net: delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.   Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

This covers everything under drivers/net except for wireless, which
has been submitted separately.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:53:26 -08:00
David S. Miller 143c905494 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c
	drivers/net/macvtap.c

Both minor merge hassles, simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18 16:42:06 -05:00
Tom Herbert 3958afa1b2 net: Change skb_get_rxhash to skb_get_hash
Changing name of function as part of making the hash in skbuff to be
generic property, not just for receive path.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 16:36:21 -05:00
Vlad Yasevich 2f6a1b6607 macvlan: Remove custom recieve and forward handlers
Since now macvlan and macvtap use the same receive and
forward handlers, we can remove them completely and use
netif_rx and dev_forward_skb() directly.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:38:39 -05:00
Vlad Yasevich 6acf54f1cf macvtap: Add support of packet capture on macvtap device.
Macvtap device currently doesn not allow a user to capture
traffic on due to the fact that it steals the packets
from the network stack before the skb->dev is set correctly
on the receive side, and that use uses macvlan transmit
path directly on the send side.  As a result, we never
get a change to give traffic to the taps while the correct
device is set in the skb.

This patch makes macvtap device behave almost exaclty like
macvlan.  On the send side, we switch to using dev_queue_xmit().
On the receive side, to deliver packets to macvtap, we now
use netif_rx and dev_forward_skb just like macvlan.  The only
differnce now is that macvtap has its own rx_handler which is
attached to the macvtap netdev.  It is here that we now steal
the packet and provide it to the socket.

As a result, we can now capture traffic on the macvtap device:
   tcpdump -i macvtap0

It also gives us the abilit to add tc actions to the macvtap
device and actually utilize different bandwidth management
queues on output.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-12 13:38:39 -05:00
Jason Wang ce232ce01d macvtap: signal truncated packets
macvtap_put_user() never return a value grater than iov length, this in fact
bypasses the truncated checking in macvtap_recvmsg(). Fix this by always
returning the size of packet plus the possible vlan header to let the trunca
checking work.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11 15:23:06 -05:00
David S. Miller bbd37626e6 net: Revert macvtap/tun truncation signalling changes.
Jason Wang and Michael S. Tsirkin are still discussing how
to properly fix this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:10:21 -05:00
Jason Wang 730054da38 macvtap: signal truncated packets
macvtap_put_user() never return a value grater than iov length, this in fact
bypasses the truncated checking in macvtap_recvmsg(). Fix this by always
returning the size of packet plus the possible vlan header to let the truncated
checking work.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:06:49 -05:00
David S. Miller de2aa4760b Revert "macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()"
This reverts commit 41e4af69a5.

MSG_TRUNC handling was broken and is going to be fixed in the
'net' tree, so revert this.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10 22:06:18 -05:00
Zhi Yong Wu 41e4af69a5 macvtap: remove useless codes in macvtap_aio_read() and macvtap_recvmsg()
By checking related codes, it is impossible that ret > len or total_len,
so we should remove some useless coeds in both above functions.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:35:42 -05:00
David S. Miller 34f9f43710 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that
Daniel's direct transmit changes depend upon.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 20:20:14 -05:00
Zhi Yong Wu 55ec8e25cd macvtap: remove unused parameter in macvtap_do_read()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 15:22:05 -05:00
Zhi Yong Wu 359d44d764 macvtap: remove the dead branch
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 15:22:05 -05:00
Zhi Yong Wu e6ebc7f16c macvtap: update file current position
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:42:14 -05:00
Vlad Yasevich 006da7b07b macvtap: Do not double-count received packets
Currently macvlan will count received packets after calling each
vlans receive handler.   Macvtap attempts to count the packet
yet again when the user reads the packet from the tap socket.
This code doesn't do this consistently either.  Remove the
counting from macvtap and let only macvlan count received
packets.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-29 16:07:08 -05:00
Jason Wang cd3e22b75c macvtap: fix tx_dropped counting error
After commit 8ffab51b3d
(macvlan: lockless tx path), tx stat counter were converted to percpu stat
structure. So we need use to this also for tx_dropped in macvtap. Otherwise, the
management won't notice the dropping packet in macvtap tx path.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:32:22 -05:00
Jason Wang 16a3fa2863 macvtap: limit head length of skb allocated
We currently use hdr_len as a hint of head length which is advertised by
guest. But when guest advertise a very big value, it can lead to an 64K+
allocating of kmalloc() which has a very high possibility of failure when host
memory is fragmented or under heavy stress. The huge hdr_len also reduce the
effect of zerocopy or even disable if a gso skb is linearized in guest.

To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
head, which guarantees an order 0 allocation each time.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 16:05:27 -05:00
David S. Miller b05930f5d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/trans.c
	include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26 16:37:08 -04:00
Vlad Yasevich e5733321d5 macvtap: Ignore tap features when VNET_HDR is off
When the user turns off VNET_HDR support on the
macvtap device, there is no way to provide any
offload information to the user.  So, it's safer
to ignore offload setting then depend on the user
setting them correctly.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:09:12 -07:00
Vlad Yasevich e558b0188b macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.
When the user turns off IFF_VNET_HDR flag, attempts to change
offload features via TUNSETOFFLOAD do not work.  This could cause
GSO packets to be delivered to the user when the user is
not prepared to handle them.

To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is
disabled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:09:11 -07:00
Vlad Yasevich a567dd6252 macvtap: simplify usage of tap_features
In macvtap, tap_features specific the features of that the user
has specified via ioctl().  If we treat macvtap as a macvlan+tap
then we could all the tap a pseudo-device and give it other features
like SG and GSO.  Then we can stop using the features of lower
device (macvlan) when forwarding the traffic the tap.

This solves the issue of possible checksum offload mismatch between
tap feature and macvlan features.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 13:09:11 -07:00
David S. Miller 2ff1cf12c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-08-16 15:37:26 -07:00
Eric Dumazet 29d7919692 macvtap: fix two races
Since commit ac4e4af1e5 ("macvtap: Consistently use rcu functions"),
Thomas gets two different warnings :

BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892
caller is macvtap_do_read+0x45c/0x600 [macvtap]
CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13
Call Trace:
([<00000000001126ee>] show_trace+0x126/0x144)
 [<00000000001127d2>] show_stack+0xc6/0xd4
 [<000000000068bcec>] dump_stack+0x74/0xd8
 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114
 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap]
 [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap]
 [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net]
 [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
 [<000000000015f3ac>] kthread+0xd8/0xe4
 [<00000000006934a6>] kernel_thread_starter+0x6/0xc
 [<00000000006934a0>] kernel_thread_starter+0x0/0xc

And

BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898
caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan]
CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16
Call Trace:
([<00000000001126ee>] show_trace+0x126/0x144)
 [<00000000001127d2>] show_stack+0xc6/0xd4
 [<000000000068bdb8>] dump_stack+0x74/0xd4
 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114
 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan]
 [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap]
 [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap]
 [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net]
 [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
 [<000000000015f3ac>] kthread+0xd8/0xe4
 [<000000000069356e>] kernel_thread_starter+0x6/0xc
 [<0000000000693568>] kernel_thread_starter+0x0/0xc
2 locks held by vhost-45897/45898:
 #0:  (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net]
 #1:  (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap]

In the first case, macvtap_put_user() calls macvlan_count_rx()
in a preempt-able context, and this is not allowed.

In the second case, macvtap_get_user() calls
macvlan_start_xmit() with BH enabled, and this is not allowed.

Reported-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Bisected-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-11 21:48:58 -07:00