Commit Graph

533 Commits

Author SHA1 Message Date
Nikolay Aleksandrov 1df6b6aa33 bonding: convert fail_over_mac to use the new option API
This patch adds the necessary changes so fail_over_mac would use
the new bonding option API. Also fixes a trivial copy/paste error in
bond_check_params where the wrong variable was used for the error msg.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov edf36b24c5 bonding: convert arp_all_targets to use the new option API
This patch adds the necessary changes so arp_all_targets would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov 162288810c bonding: convert arp_validate to use the new option API
This patch adds the necessary changes so arp_validate would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov a4b32ce7f8 bonding: convert xmit_hash_policy to use the new option API
This patch adds the necessary changes so xmit_hash_policy would use the
new bonding option API. Also fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:41 -08:00
Nikolay Aleksandrov aa59d8517d bonding: convert packets_per_slave to use the new option API
This patch adds the necessary changes so packets_per_slave would use the
new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:41 -08:00
Nikolay Aleksandrov 2b3798d5e1 bonding: convert mode setting to use the new option API
This patch makes the bond's mode setting use the new option API and
adds support for dependency printing which relies on having an entry for
the mode option in the bond_opts[] array.
Also add the ability to print the mode name when mode dependency fails
and fix some trivial/style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 15:38:41 -08:00
Hannes Frederic Sowa 809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
sfeldma@cumulusnetworks.com 1d3ee88ae0 bonding: add netlink attributes to slave link dev
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest.  Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes.  Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:51:58 -08:00
sfeldma@cumulusnetworks.com 07699f9a7c bonding: add sysfs /slave dir for bond slave devices.
Add sub-directory under /sys/class/net/<interface>/slave with
read-only attributes for slave.  Directory only appears when
<interface> is a slave.

$ tree /sys/class/net/eth2/slave/
/sys/class/net/eth2/slave/
├── ad_aggregator_id
├── link_failure_count
├── mii_status
├── perm_hwaddr
├── queue_id
└── state

$ cat /sys/class/net/eth2/slave/*
2
0
up
40:02:10:ef:06:01
0
active

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:51:58 -08:00
Veaceslav Falico 3ec775b9fb bonding: handle slave's name change with primary_slave logic
Currently, if a slave's name change, we just pass it by. However, if the
slave is a current primary_slave, then we end up with using a slave, whose
name != params.primary, for primary_slave. And vice-versa, if we don't have
a primary_slave but have params.primary set - we will not detected a new
primary_slave.

Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave
accordingly. Also, if the primary_slave was changed, issue a reselection of
the active slave, cause the priorities have changed.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:26:47 -08:00
Ying Xue 0917b9334b bonding: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that bond_do_ioctl() is protected
under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handler in it, this would
help us avoid to change reference counter of interface once.

dev_ioctl()
  rtnl_lock()
  dev_ifsioc()
    bond_do_ioctl()
  rtnl_unlock()

Additionally we also change the coding style in bond_do_ioctl(),
letting it more readable for us.

Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:46 -08:00
David S. Miller 0a379e21c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-14 14:42:42 -08:00
Jason Wang f663dd9aaf net: core: explicitly select a txq before doing l2 forwarding
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 13:23:08 -05:00
sfeldma@cumulusnetworks.com ec029fac3e bonding: add ad_select attribute netlink support
Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 21:03:21 -05:00
stephen hemminger 6da67d2608 bonding: make more functions static
More functions in bonding that can be declared static because
they are only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:43:36 -05:00
dingtianhong 844223abe9 bonding: use ether_addr_equal_64bits to instead of ether_addr_equal
The net_device.dev_addr have more than 2 bytes of additional data after
the mac addr, so it is safe to use the ether_addr_equal_64bits().

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:58:55 -05:00
dingtianhong d316dedd4d bonding: remove unwanted return value for bond_dev_queue_xmit()
The return value for bond_dev_queue_xmit() will not be used anymore,
so remove the return value.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:58:55 -05:00
dingtianhong 3900f29021 bonding: slight optimizztion for bond_slave_override()
When the skb is xmit by the function bond_slave_override(),
it will have duplicate judgement for slave state, and I think it
will consumes a little performance, maybe it is negligible,
so I simplify the function and remove the unwanted judgement.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:58:15 -05:00
dingtianhong 834db4bcdf bonding: ust micro BOND_NO_USE_ARP to simplify the mode check
The bond 3ad and TLB/ALB has the same check path, so combine them.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 00:40:31 -05:00
dingtianhong 3a7129e527 bonding: add option lp_interval for loading module
The bond driver could set the lp_interval when loading module.

Suggested-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 00:40:31 -05:00
stephen hemminger da131ddbff bonding: make local function static
bond_xmit_slave_id is only used in main.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-29 16:34:25 -05:00
dingtianhong f23691095b bonding: rebuild the bond_resend_igmp_join_requests_delayed()
The bond_resend_igmp_join_requests_delayed() and
bond_resend_igmp_join_requests() should be integrated,
because the bond_resend_igmp_join_requests_delayed() did
nothing except bond_resend_igmp_join_requests().

The bond igmp_retrans could only be changed in bond_change_active_slave
and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
the bond_resend_igmp_join_requests already hold RTNL, so no need
to free RTNL and hold curr_slave_lock again, it may be a small optimization,
so move the igmp_retrans in RTNL and remove the curr_slave_lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14 01:58:02 -05:00
dingtianhong be79bd048a bonding: add RCU for bond_3ad_state_machine_handler()
The bond_3ad_state_machine_handler() use the bond lock to protect
the bond slave list and slave port together, but it is not enough,
the bond slave list was link and unlink in RTNL, not bond lock,
so I add RCU to protect the slave list from leaving.

The bond lock is still used here, because when the slave has been
removed from the list by the time the state machine runs, it appears
to be possible for both function to manupulate the same aggregator->lag_ports
by finding the aggregator via two different ports that are both members of
that aggregator (i.e., port A of the agg is being unbound, and port B
of the agg is runing its state machine).

If I remove the bond lock, there are nothing to mutex changes
to aggregator->lag_ports between bond_3ad_state_machine_handler and
bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
aggregator->lag_ports.

There was a lot of function need RCU protect, I have two choice
to make the function in RCU-safe, (1) create new similar functions
and make the bond slave list in RCU. (2) modify the existed functions
and make them in read-side critical section, because the RCU
read-side critical sections may be nested.

I choose (2) because it is no need to create more similar functions.

The nots in the function is still too old, clean up the nots.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14 01:58:02 -05:00
dingtianhong c851703544 bonding: remove unwanted lock for bond enslave and release
The bond_change_active_slave() and bond_select_active_slave()
do't need bond lock anymore, so remove the unwanted bond lock
for these two functions.

The bond_select_active_slave() will release and acquire
curr_slave_lock, so the curr_slave_lock need to protect
the function.

In bond enslave and bond release, the bond slave list is also
protected by RTNL, so bond lock is no need to exist, remove
the lock and clean the functions.

Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14 01:58:02 -05:00
dingtianhong eb9fa4b019 bonding: rebuild the lock use for bond_activebackup_arp_mon()
The bond_activebackup_arp_mon() use the bond lock for read to
protect the slave list, it is no effect, and the RTNL is only
called for bond_ab_arp_commit() and peer notify, for the performance
better, use RCU to replace with the bond lock, to the bond slave
list need to called in RCU, add a new bond_first_slave_rcu()
to get the first slave in RCU protection.

In bond_ab_arp_probe(), the bond->current_arp_slave may changd
if bond release slave, just like:

        bond_ab_arp_probe()                     bond_release()
        cpu 0                                   cpu 1
        ...
        if (bond->current_arp_slave...)         ...
        ...                             bond->current_arp_slave = NULl
        bond->current_arp_slave->dev->name      ...

So the current_arp_slave need to dereference in the section.

Suggested-by: Nikolay Aleksandrov <nikolay@redhat.com>
Suggested-by: Jay Vosburgh <fubar@us.ibm.com>
Suggested-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-14 01:58:02 -05:00