Commit Graph

181 Commits

Author SHA1 Message Date
Tom Herbert c8cd0989bd net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUM
These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.

This patch also:
    - Cleans up can_checksum_protocol
    - Simplifies netdev_intersect_features

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15 16:50:20 -05:00
Tom Herbert a188222b6e net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
set of features for offloading all checksums. This is a mask of the
checksum offload related features bits. It is incorrect to set both
NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
features of a device.

This patch:
  - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
    NETIF_F_ALL_CSUM is being used as a mask).
  - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
    use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-15 16:50:08 -05:00
Sabrina Dubroca e639b8d8a7 macvlan: fix leak in macvlan_handle_frame
Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.

Fixes: 8a4eb5734e ("net: introduce rx_handler results and logic around that")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 14:39:29 -05:00
Eric W. Biederman 19bcf9f203 ipv4: Pass struct net into ip_defrag and ip_check_defrag
The function ip_defrag is called on both the input and the output
paths of the networking stack.  In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.

So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12 19:44:16 -07:00
Toshiaki Makita f56e67b515 macvlan: Don't segment multiple tagged packets on macvlan device
Macvlan/macvtap devices don't need to segment multiple tagged packets
since the lower devices can segment them.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-03 14:24:49 -07:00
Vlad Yasevich efdbd2b30c macvlan: Propagate promiscuity setting to lower devices.
When a macvlan device is placed in promiscuous mode, it currently
just sets it's multicast mask to permissive, but doesn't change
the state of the lower device.  As a result, not all multicast
traffic can be received on such device.  Additionally, none of
a vlan traffic can be received on such device as well.
This patch propagates the promiscuous mode setting to lower device
so that lower device may receive all packets that macvlan may
be interested in.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-04 00:14:13 -04:00
Nicolas Dichtel ef5fa6bc46 macvlan: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:05:00 -04:00
Eric W. Biederman d476059e77 net: Kill dev_rebuild_header
Now that there are no more users kill dev_rebuild_header and all of it's
implementations.

This is long overdue.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 16:43:41 -05:00
Nicolas Dichtel eaca400f1d macvlan: advertise link netns via netlink
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-23 17:51:15 -08:00
Mahesh Bandewar d6b00fec5d macvlan: play well with ipvlan device
If device is already used as an ipvlan port then refuse to
use it as a macvlan port at early stage of port creation.

	thost1:~# ip link add link eth0 ipvl0 type ipvlan
	thost1:~# echo $?
	0
	thost1:~# ip link add link eth0 mvl0 type macvlan
	RTNETLINK answers: Device or resource busy
	thost1:~# echo $?
	2
	thost1:~#

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:10:06 -05:00
Michal Kubeček 62dbe83015 macvlan: allow setting LRO independently of lower device
Since commit fbe168ba91 ("net: generic dev_disable_lro() stacked
device handling"), dev_disable_lro() zeroes NETIF_F_LRO feature flag
first for a macvlan device and then for its lower device. As an attempt
to set NETIF_F_LRO to zero is ignored, dev_disable_lro() issues a
warning and taints kernel.

Allowing NETIF_F_LRO to be set independently of the lower device
consists of three parts:

  - add the flag to hw_features to allow toggling it
  - allow setting it to 0 even if lower device has the flag set
  - add the flag to MACVLAN_FEATURES to restore copying from lower
    device on macvlan creation

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 16:00:22 -05:00
Jiri Pirko f6f6424ba7 net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:18 -08:00
Jason Wang a523a5ecc8 macvlan: delay the header check for dodgy packets into lower device
We do header check twice for a dodgy packet. One is done before
macvlan_start_xmit(), another is done before lower device's
ndo_start_xmit(). The first one seems redundant so this patch tries to
delay header check until a packet reaches its lower device (or macvtap)
through always enabling NETIF_F_GSO_ROBUST for macvlan device.

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-29 20:44:27 -08:00
Eric Dumazet fe0ca7328d macvlan: fix a race on port dismantle and possible skb leaks
We need to cancel the work queue after rcu grace period,
otherwise it can be rescheduled by incoming packets.

We need to purge queue if some skbs are still in it.

We can use __skb_queue_head_init() variant in
macvlan_process_broadcast()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-25 16:24:02 -04:00
jbaron@akamai.com d1dd911930 macvlan: optimize the receive path
The netif_rx() call on the fast path of macvlan_handle_frame() appears to
be there to ensure that we properly throttle incoming packets. However, it
would appear as though the proper throttling is already in place for all
possible ingress paths, and that the call is redundant. If packets are arriving
from the physical NIC, we've already throttled them by this point. Otherwise,
if they are coming via macvlan_queue_xmit(), it calls either
'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in
the broadcast case, we are throttling via macvlan_broadcast_enqueue().

The test results below are from off the box to an lxc instance running macvlan.
Once the tranactions/sec stop increasing, the cpu idle time has gone to 0.
Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card.

for i in {10,100,200,300,400,500};
do super_netperf $i -H $ip -t TCP_RR; done
Average of 5 runs.

trans/sec 		 trans/sec
(3.17-rc7-net-next)      (3.17-rc7-net-next + this patch)
----------               ----------
208101                   211534 (+1.6%)
839493                   850162 (+1.3%)
845071                   844053 (-.12%)
816330                   819623 (+.4%)
778700                   789938 (+1.4%)
735984                   754408 (+2.5%)

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:09:47 -04:00
jbaron@akamai.com 4c9799359b macvlan: pass 'bool' type to macvlan_count_rx()
Pass last argument to macvlan_count_rx() as the correct bool type.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:09:47 -04:00
Eric Dumazet 0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
Michael Braun 79cf79abce macvlan: add source mode
This patch adds a new mode of operation to macvlan, called "source".
It allows one to set a list of allowed mac address, which is used
to match against source mac address from received frames on underlying
interface.
This enables creating mac based VLAN associations, instead of standard
port or tag based. The feature is useful to deploy 802.1x mac based
behavior, where drivers of underlying interfaces doesn't allows that.

Configuration is done through the netlink interface using e.g.:
 ip link add link eth0 name macvlan0 type macvlan mode source
 ip link add link eth0 name macvlan1 type macvlan mode source
 ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
 ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22
 ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33
 ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33
 ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44

This allows clients with MAC addresses 00:11:11:11:11:11,
00:22:22:22:22:22 to be part of only VLAN associated with macvlan0
interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN
associated with macvlan1 interface. And client with MAC address
00:33:33:33:33:33 to be associated with both VLANs.

Based on work of Stefan Gula <steweg@gmail.com>

v8: last version of Stefan Gula for Kernel 3.2.1
v9: rework onto linux-next 2014-03-12 by Michael Braun
    add MACADDR_SET command, enable to configure mac for source mode
    while creating interface
v10:
  - reduce indention level
  - rename source_list to source_entry
  - use aligned 64bit ether address
  - use hash_64 instead of addr[5]
v11:
  - rebase for 3.14 / linux-next 20.04.2014
v12
  - rebase for linux-next 2014-09-25

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 15:37:01 -04:00
Nicolas Dichtel 07d92d5cc9 macvlan: allow to enqueue broadcast pkt on virtual device
Since commit 412ca1550c ("macvlan: Move broadcasts into a work queue"), the
driver uses tx_queue_len of the master device as the limit of packets enqueuing.
Problem is that virtual drivers have this value set to 0, thus all broadcast
packets were rejected.
Because tx_queue_len was arbitrarily chosen, I replace it with a static limit
of 1000 (also arbitrarily chosen).

CC: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Thibaut Collet <thibaut.collet@6wind.com>
Suggested-by: Thibaut Collet <thibaut.collet@6wind.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:10:07 -04:00
Francesco Ruggeri 0d0162e7a3 net: allow macvlans to move to net namespace
I cannot move a macvlan interface created on top of a bonding interface
to a different namespace:

% ip netns add dummy0
% ip link add link bond0 mac0 type macvlan
% ip link set mac0 netns dummy0
RTNETLINK answers: Invalid argument
%

The problem seems to be that commit f939981492 ("bonding: Don't allow
bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
on bonding interfaces, and commit 797f87f83b ("macvlan: fix netdev
feature propagation from lower device") causes macvlan interfaces
to inherit its features from the lower device.

NETIF_F_NETNS_LOCAL should not be inherited from the lower device
by a macvlan.
Patch tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:07:20 -04:00
Vlad Yasevich 8a50f11c3b macvlan: Allow setting multicast filter on all macvlan types
Currently, macvlan code restricts multicast and unicast
filter setting only to passthru devices.  As a result,
if a guest using macvtap wants to receive multicast
traffic, it has to set IFF_ALLMULTI or IFF_PROMISC.

This patch makes it possible to use the fdb interface
to add multicast addresses to the filter thus allowing
a guest to receive only targeted multicast traffic.

CC: John Fastabend <john.r.fastabend@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-21 16:54:25 -07:00
David S. Miller 5e3c516b51 Revert "macvlan: simplify the structure port"
This reverts commit a188a54d11.

It causes crashes

====================
[   80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878
[   80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   80.691289] PGD 22c102067 PUD 235bf0067 PMD 0
[   80.706611] Oops: 0002 [#1] SMP
[   80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca
[   80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1
[   80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
[   80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000
[   80.845770] RIP: 0010:[<ffffffff810832e4>]  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   80.875326] RSP: 0018:ffff88022e847b28  EFLAGS: 00010046
[   80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000
[   80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db
[   80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000
[   80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70
[   80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440
[   81.012929] FS:  00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000
[   81.040400] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0
[   81.083689] Stack:
[   81.090739]  ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000
[   81.116253]  ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1
[   81.141766]  ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000
[   81.167282] Call Trace:
[   81.175768]  [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170
[   81.195985]  [<ffffffff8108494b>] cancel_work_sync+0xb/0x10
[   81.214769]  [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan]
[   81.237844]  [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan]
[   81.259209]  [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0
[   81.281140]  [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0
[   81.302786]  [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan]
[   81.326439]  [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70
[   81.346366]  [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20
[   81.367439]  [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70
[   81.390228]  [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20
[   81.411587]  [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0
[   81.433518]  [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110
[   81.455735]  [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30
[   81.475094]  [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe]
[   81.495886]  [<ffffffff813512a2>] pci_device_remove+0x32/0x60
[   81.515246]  [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0
[   81.536321]  [<ffffffff814c76f8>] driver_detach+0xc8/0xd0
[   81.554530]  [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0
[   81.573888]  [<ffffffff814c828b>] driver_unregister+0x2b/0x60
[   81.593246]  [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0
[   81.613749]  [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe]
[   81.635401]  [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0
[   81.655334]  [<ffffffff8187a395>] ? sysret_check+0x22/0x5d
[   81.673833]  [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
[   81.696339]  [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   81.717985]  [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b
[   81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8
[   81.807026] RIP  [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0
[   81.828468]  RSP <ffff88022e847b28>
[   81.840384] CR2: 0000000000000878
[   81.851731] ---[ end trace 9f6c7232e3464e11 ]---
====================

This bug could be triggered by these steps:

modprobe ixgbe ; modprobe macvlan
ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan
ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan
rmmod ixgbe

Reported-by: "Keller, Jacob E" <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-14 14:32:49 -07:00
Vlad Yasevich 081e83a78d macvlan: Initialize vlan_features to turn on offload support.
Macvlan devices do not initialize vlan_features.  As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level
device.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31 22:10:01 -07:00
David S. Miller 902455e007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/core/rtnetlink.c
	net/core/skbuff.c

Both conflicts were very simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 16:02:55 -07:00
Eric Dumazet 87757a917b net: force a list_del() in unregister_netdevice_many()
unregister_netdevice_many() API is error prone and we had too
many bugs because of dangling LIST_HEAD on stacks.

See commit f87e6f4793 ("net: dont leave active on stack LIST_HEAD")

In fact, instead of making sure no caller leaves an active list_head,
just force a list_del() in the callee. No one seems to need to access
the list after unregister_netdevice_many()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-08 14:15:14 -07:00