Commit Graph

533526 Commits

Author SHA1 Message Date
Jon Paul Maloy
d39bbd445d tipc: move link input queue to tipc_node
At present, the link input queue and the name distributor receive
queues are fields aggregated in struct tipc_link. This is a hazard,
because a link might be deleted while a receiving socket still keeps
reference to one of the queues.

This commit fixes this bug. However, rather than adding yet another
reference counter to the critical data path, we move the two queues
to safe ground inside struct tipc_node, which is already protected, and
let the link code only handle references to the queues. This is also
in line with planned later changes in this area.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:41:14 -07:00
Jon Paul Maloy
d3a43b907a tipc: move link creation from neighbor discoverer to node
As a step towards turning links into node internal entities, we move the
creation of links from the neighbor discovery logics to the node's link
control logics.

We also create an additional entry for the link's media address in the
newly introduced struct tipc_link_entry, since this is where it is
needed in the upcoming commits. The current copy in struct tipc_link
is kept for now, but will be removed later.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:41:14 -07:00
Jon Paul Maloy
9d13ec65ed tipc: introduce link entry structure to struct tipc_node
struct 'tipc_node' currently contains two arrays for link attributes,
one for the link pointers, and one for the usable link MTUs.

We now group those into a new struct 'tipc_link_entry', and intoduce
one single array consisting of such enties. Apart from being a cosmetic
improvement, this is a starting point for the strict master-slave
relation between node and link that we will introduce in the following
commits.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:41:14 -07:00
Jiri Benc
6acc232660 net: remove skb_frag_add_head
It's not used anywhere.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:38:49 -07:00
David S. Miller
bd265242c1 Merge branch 'offload_fwd_mark'
Scott Feldman says:

====================
switchdev: avoid duplicate packet forwarding

v3:

 - Per Nicolas Dichtel review: remove errant empty union.

v2:

 - Per davem review: in sk_buff, union fwd_mark with secmark to save space
   since features appear to be mutually exclusive.
 - Per Simon Horman review:
   - fix grammar in switchdev.txt wrt fwd_mark
   - remove some unrelated changes that snuck in

v1:

This patchset was previously submitted as RFC.  No changes from the last
version (v2) sent under RFC.  Including RFC version history here for reference.

RFC v2:

 - s/fwd_mark/offload_fwd_mark
 - use consume_skb rather than kfree_skb when dropping pkt on egress.
 - Use Jiri's suggestion to use ifindex of one of the ports in a group
   as the mark for all the ports in the group.  This can be done with
   no additional storage (no hashtable from v1).  To pull it off, we
   need some simple recursive routines to walk the netdev tree ensuring
   all leaves in the tree (ports) in the same group (e.g. bridge)
   belonging to the same switch device will have the same offload fwd mark.
   Maybe someone sees a better design for the recusive routines?  They're
   not too bad, and should cover the stacked driver cases.

RFC v1:

With switchdev support for offloading L2/L3 forwarding data path to a
switch device, we have a general problem where both the device and the
kernel may forward the packet, resulting in duplicate packets on the wire.
Anytime a packet is forwarded by the device and a copy is sent to the CPU,
there is potential for duplicate forwarding, as the kernel may also do a
forwarding lookup and send the packet on the wire.

The specific problem this patch series is interested in solving is avoiding
duplicate packets on bridged ports.  There was a previous RFC from Roopa
(http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this
problem, but didn't solve the problem of mixed ports in the bridge from
different devices; there was no way to exclude some ports from forwarding
and include others.  This RFC solves that problem by tagging the ingressing
packet with a unique mark, and then comparing the packet mark with the
egress port mark, and skip forwarding when there is a match.  For the mixed
ports bridge case, only those ports with matching marks are skipped.

The switchdev port driver must do two things:

1) Generate a fwd_mark for each switch port, using some unique key of the
   switch device (and optionally port).  This is done when the port netdev
   is registered or if the port's group membership changes (joins/leaves
   a bridge, for example).

2) On packet ingress from port, mark the skb with the ingress port's
   fwd_mark.  If the device supports it, it's useful to only mark skbs
   which were already forwarded by the device.  If the device does not
   support such indication, all skbs can be marked, even if they're
   local dst.

Two new 32-bit fields are added to struct sk_buff and struct netdevice to
hold the fwd_mark.  I've wrapped these with CONFIG_NET_SWITCHDEV for now. I
tried using skb->mark for this purpose, but ebtables can overwrite the
skb->mark before the bridge gets it, so that will not work.

In general, this fwd_mark can be used for any case where a packet is
forwarded by the device and a copy is sent to the CPU, to avoid the kernel
re-forwarding the packet.  sFlow is another use-case that comes to mind,
but I haven't explored the details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:45 -07:00
Scott Feldman
a48037e7c6 switchdev: update documentation for offload_fwd_mark
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:45 -07:00
Scott Feldman
3f98a8e636 rocker: add offload_fwd_mark support
If device flags ingress packet as "fwd offload", mark the
skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark.  This
will be the hint to the kernel that this packet has already been forwarded
by device to egress ports matching skb->offlaod_fwd_mark.

For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and
port ifindex.  If port is bridged, use the bridge ifindex rather than the
port ifindex.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:45 -07:00
Scott Feldman
1a3b2ec93d switchdev: add offload_fwd_mark generator helper
skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex.  group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.

The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Scott Feldman
d754f98b50 net: add phys ID compare helper to test if two IDs are the same
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Scott Feldman
0c4f691ff6 net: don't reforward packets already forwarded by offload device
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark.  The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Simon Horman
8254973fa3 rocker: forward packets to CPU when port is joined to openvswitch
Teach rocker to forward packets to CPU when a port is joined to Open vSwitch.
There is scope to later refine what is passed up as per Open vSwitch flows
on a port.

This does not change the behaviour of rocker ports that are
not joined to Open vSwitch.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:26:03 -07:00
Nikolay Aleksandrov
a7ce45a74b bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not defined
Fix:
net/bridge/br_if.c: In function 'br_dev_delete':
>> net/bridge/br_if.c:284:2: error: implicit declaration of function
>> 'br_multicast_dev_del' [-Werror=implicit-function-declaration]
     br_multicast_dev_del(br);
     ^
   cc1: some warnings being treated as errors

when igmp snooping is not defined.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 16:19:03 -07:00
Phil Sutter
a0a9f33bdf net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not set
Newly created flows don't have flowi6_oif set (at least if the
associated socket is not interface-bound). This leads to a mismatch in
__xfrm6_selector_match() for policies which specify an interface in the
selector (sel->ifindex != 0).

Backtracing shows this happens in code-paths originating from e.g.
ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was
not tested for.)

In summary, this patch fixes policy matching on outgoing interface for
locally generated packets.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:59:32 -07:00
Nikolay Aleksandrov
22f94e6256 bonding: trivial: remove unused variables
Get rid of these:
drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’:
drivers/net/bonding//bond_main.c:3754:6: warning: variable
‘slaves_in_agg’ set but not used [-Wunused-but-set-variable]
  int slaves_in_agg;
      ^
  CC [M]  drivers/net/bonding//bond_3ad.o
drivers/net/bonding//bond_3ad.c: In function
‘ad_marker_response_received’:
drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’
set but not used [-Wunused-but-set-parameter]
 static void ad_marker_response_received(struct bond_marker *marker,
                                                             ^
drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set
but not used [-Wunused-but-set-parameter]
      struct port *port)
                   ^

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:50:32 -07:00
David S. Miller
5b1d0d8f52 Merge branch 'bridge-temp-and-perm'
Nikolay Aleksandrov says:

====================
bridge: multicast: temp and perm entries behaviour enhancements

Patch 01 adds a notify when a group is deleted via br_multicast_del_pg()
(on expire, on device delete or on device down).
Patch 02 changes how bridge device and bridge port delete and down/up are
handled. Until now on bridge down all groups were flushed, now only the
temp ones are (same for port), perm entries are flushed only on port or
bridge removal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:49:11 -07:00
Satish Ashok
e10177abf8 bridge: multicast: fix handling of temp and perm entries
When the bridge (or port) is brought down/up flush only temp entries and
leave the perm ones. Flush perm entries only when deleting the bridge
device or the associated port.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:49:10 -07:00
Nikolay Aleksandrov
ef8299de7e bridge: multicast: notify on group delete
Group notifications were not sent when a group expired or was deleted
due to bridge/port device being deleted. So add br_mdb_notify() to
br_multicast_del_pg().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:49:10 -07:00
David S. Miller
03b6dc7d17 Merge branch 'bpf_cgroup_classid'
Daniel Borkmann says:

====================
BPF update

This small helper allows for accessing net_cls cgroups classid. Please
see individual patches for more details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:41:30 -07:00
Daniel Borkmann
8d20aabe1c ebpf: add helper to retrieve net_cls's classid cookie
It would be very useful to retrieve the net_cls's classid from an eBPF
program to allow for a more fine-grained classification, it could be
directly used or in conjunction with additional policies. I.e. docker,
but also tooling such as cgexec, can easily run applications via net_cls
cgroups:

  cgcreate -g net_cls:/foo
  echo 42 > foo/net_cls.classid
  cgexec -g net_cls:foo <prog>

Thus, their respecitve classid cookie of foo can then be looked up on
the egress path to apply further policies. The helper is desigend such
that a non-zero value returns the cgroup id.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:41:30 -07:00
Daniel Borkmann
b87a173e25 cls_cgroup: factor out classid retrieval
Split out retrieving the cgroups net_cls classid retrieval into its
own function, so that it can be reused later on from other parts of
the traffic control subsystem. If there's no skb->sk, then the small
helper returns 0 as well, which in cls_cgroup terms means 'could not
classify'.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:41:30 -07:00
Govindarajulu Varadarajan
d9382bda4e enic: allow adaptive coalesce setting for msi/legacy intr
* Allow setting of adaptive coalescing setting for all types of interrupt.

* In msi & legacy intr, we use single interrupt for rx & tx. In this case
  tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs.
  Do not display tx_coal values for msi/intx. And do not allow user to set
  this as well.

* Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings.
  For other values, driver does not return error. So ethtool succeeds for
  unsupported values. Introduce enic_coalesce_valid() function to validate
  the coalescing values.

* If user requests for coalesce value greater than what adaptor supports,
  driver uses the max value. We should at least log this.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:39:34 -07:00
Govindarajulu Varadarajan
fc865d6b4a enic: add adaptive coalescing intr for intx and msi poll
Adaptive interrupt coalescing is available for msix. This patch adds the support
for msi poll. Interface for adaptive interrupt coalescing is already added in
driver. We just did not enable it for legacy intr & msi.

enic_calc_int_moderation() & enic_set_int_moderation() are defined as static
after enic_poll. Since enic_poll needs it, move both of these function
definitions above enic_poll. No change in functionality.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:39:34 -07:00
YOSHIFUJI Hideaki
c15df306fc ipv6: Remove unused arguments for __ipv6_dev_get_saddr().
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-16 01:00:56 -07:00
David S. Miller
0d0578815b Merge branch 'protodown'
Anuradha Karuppiah says:

====================
net: Introduce protodown flag.

User space daemons can detect errors in the network that need to be
notified to the switch device drivers.

Drivers can react to this error state by doing a phy-down on the
switch-port which would result in a carrier-off locally and on the directly
connected switch. Doing that would prevent loops and black-holes in the
network.

One such use case is the multi-chassis LAG application -

1. The MLAG application runs on peer switches (say Switch0 and Switch1)
   synchronizing states, forwarding entries etc. between the two
   switches over the peer-link (this is a link directly connecting the
   two switches).
2. An MLAG election process designates one of the switches as a primary
   (for e.g. Switch0 is primary and Switch1 is secondary).
3. The peer link plays a critical role in allowing Switch0-Switch1 to
   function as a single LAG partner to the downstream dual-connected
   servers. When the peer-link between the switches goes down we have a
   split-brain situation. Switch0 and Switch1 are no longer in sync and
   are acting independently. This can result in traffic loops and
   traffic black-holing in the network.
4. To prevent these problems the MLAG application on the secondary
   switch phy-downs the MLAG ports on detecting the peer-link down.
   This will be seen as a carrier down on servers that are
   dual-connected to Switch0 and Switch1.
5. Specifically a dual-connected server will see a carrier-down on the
   port connected to the MLAG secondary, Switch1, and will stop using
   that port for traffic TX. So traffic black holing is prevented.

v6 to v7:
   Removed some unnecessary code in response to review comments.

v5 to v6:
   Replaced proto_flags with a simple proto_down boolean attribute in
   response to Dave's comments.

v4 to v5:
   Changed the ip link display format for protodown to match the set as
   recommended by Stephen.

v3 to v4:
   I have moved protodown out of IFF_XXX and introduced a separate
   proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify
   switch port errors. This is in response to Stephen's comments that
   adding a new IFF_XXX may break user space.

   I have used rocker as the sample switch driver. And to test this
   functionality I used the qemu-rocker patch that Scott sent out in
   response to the v3 posting (needed to set link up/down when phy is
   enabled/disabled).

v1 to v2:
   Based on Dave's suggestion I have moved out aggregating of error bits
   across applications to a user space framework. This patch now simply
   notifies an aggregated error bit to drivers enabling them to handle
   the error gracefully.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15 21:39:40 -07:00
Anuradha Karuppiah
c305524617 rocker: Handle protodown notifications.
protodown can be set by user space applications like MLAG on detecting
errors on a switch port. This patch provides sample switch driver changes
for handling protodown. Rocker PHYS disables the port in response to
protodown.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15 21:39:40 -07:00