Commit Graph

706005 Commits

Author SHA1 Message Date
Eric Dumazet 64bc17811b ipv4: speedup ipv6 tunnels dismantle
Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real    1m38.965s
user    0m0.688s
sys     0m37.017s

After patch:
$ time ./add_del_unshare.sh
net_namespace        135    291   5504    1    2 : tunables    8    4    0 : slabdata    135    291      0

real	0m22.117s
user	0m0.728s
sys	0m35.328s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:24 -07:00
Eric Dumazet bb401caefe ipv6: speedup ipv6 tunnels dismantle
Implement exit_batch() method to dismantle more devices
per round.

(rtnl_lock() ...
 unregister_netdevice_many() ...
 rtnl_unlock())

Tested:
$ cat add_del_unshare.sh
for i in `seq 1 40`
do
 (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) &
done
wait ; grep net_namespace /proc/slabinfo

Before patch :
$ time ./add_del_unshare.sh
net_namespace        110    267   5504    1    2 : tunables    8    4    0 : slabdata    110    267      0

real    3m25.292s
user    0m0.644s
sys     0m40.153s

After patch:

$ time ./add_del_unshare.sh
net_namespace        126    282   5504    1    2 : tunables    8    4    0 : slabdata    126    282      0

real	1m38.965s
user	0m0.688s
sys	0m37.017s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet 789e6ddb0b tcp: batch tcp_net_metrics_exit
When dealing with a list of dismantling netns, we can scan
tcp_metrics once, saving cpu cycles.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet a90c9347e9 ipv6: addrlabel: per netns list
Having a global list of labels do not scale to thousands of
netns in the cloud era. This causes quadratic behavior on
netns creation and deletion.

This is time having a per netns list of ~10 labels.

Tested:

$ time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.637 MB perf.data (~158898 samples) ]

real    0m20.837s # instead of 0m24.227s
user    0m0.328s
sys     0m20.338s # instead of 0m23.753s

    16.17%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
    12.30%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     6.76%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     5.78%       ip  [kernel.kallsyms]  [k] memset_erms
     5.77%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     5.18%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
     4.96%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     3.82%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
     3.33%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     2.11%       ip  [kernel.kallsyms]  [k] unmap_page_range
     1.77%       ip  [kernel.kallsyms]  [k] __wake_up
     1.69%       ip  [kernel.kallsyms]  [k] strlen
     1.17%       ip  [kernel.kallsyms]  [k] __wake_up_common
     1.09%       ip  [kernel.kallsyms]  [k] insert_header
     1.04%       ip  [kernel.kallsyms]  [k] page_remove_rmap
     1.01%       ip  [kernel.kallsyms]  [k] consume_skb
     0.98%       ip  [kernel.kallsyms]  [k] netlink_trim
     0.51%       ip  [kernel.kallsyms]  [k] kernfs_link_sibling
     0.51%       ip  [kernel.kallsyms]  [k] filemap_map_pages
     0.46%       ip  [kernel.kallsyms]  [k] memcpy_erms

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet d464e84eed kobject: factorize skb setup in kobject_uevent_net_broadcast()
We can build one skb and let it be cloned in netlink.

This is much faster, and use less memory (all clones will
share the same skb->head)

Tested:

time perf record (for f in `seq 1 3000` ; do ip netns add tast$f; done)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 4.110 MB perf.data (~179584 samples) ]

real    0m24.227s # instead of 0m52.554s
user    0m0.329s
sys 0m23.753s # instead of 0m51.375s

    14.77%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
    14.56%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
    11.65%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     6.19%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     5.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     4.97%       ip  [kernel.kallsyms]  [k] memset_erms
     4.67%       ip  [kernel.kallsyms]  [k] refcount_sub_and_test
     4.41%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     3.59%       ip  [kernel.kallsyms]  [k] refcount_inc_not_zero
     3.13%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.55%       ip  [kernel.kallsyms]  [k] __wake_up
     1.20%       ip  [kernel.kallsyms]  [k] strlen
     1.03%       ip  [kernel.kallsyms]  [k] __wake_up_common
     0.93%       ip  [kernel.kallsyms]  [k] consume_skb
     0.92%       ip  [kernel.kallsyms]  [k] netlink_trim
     0.87%       ip  [kernel.kallsyms]  [k] insert_header
     0.63%       ip  [kernel.kallsyms]  [k] unmap_page_range

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet 4a336a23d6 kobject: copy env blob in one go
No need to iterate over strings, just copy in one efficient memcpy() call.

Tested:
time perf record "(for f in `seq 1 3000` ; do ip netns add tast$f; done)"
[ perf record: Woken up 10 times to write data ]
[ perf record: Captured and wrote 8.224 MB perf.data (~359301 samples) ]

real    0m52.554s  # instead of 1m7.492s
user    0m0.309s
sys 0m51.375s # instead of 1m6.875s

     9.88%       ip  [kernel.kallsyms]  [k] netlink_broadcast_filtered
     8.86%       ip  [kernel.kallsyms]  [k] string
     7.37%       ip  [kernel.kallsyms]  [k] __ip6addrlbl_add
     5.68%       ip  [kernel.kallsyms]  [k] netlink_has_listeners
     5.52%       ip  [kernel.kallsyms]  [k] memcpy_erms
     4.76%       ip  [kernel.kallsyms]  [k] __alloc_skb
     4.54%       ip  [kernel.kallsyms]  [k] vsnprintf
     3.94%       ip  [kernel.kallsyms]  [k] format_decode
     3.80%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node_trace
     3.71%       ip  [kernel.kallsyms]  [k] kmem_cache_alloc_node
     3.66%       ip  [kernel.kallsyms]  [k] kobject_uevent_env
     3.38%       ip  [kernel.kallsyms]  [k] strlen
     2.65%       ip  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
     2.20%       ip  [kernel.kallsyms]  [k] kfree
     2.09%       ip  [kernel.kallsyms]  [k] memset_erms
     2.07%       ip  [kernel.kallsyms]  [k] ___cache_free
     1.95%       ip  [kernel.kallsyms]  [k] kmem_cache_free
     1.91%       ip  [kernel.kallsyms]  [k] _raw_read_lock
     1.45%       ip  [kernel.kallsyms]  [k] ksize
     1.25%       ip  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
     1.00%       ip  [kernel.kallsyms]  [k] widen_string

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Eric Dumazet 16dff336b3 kobject: add kobject_uevent_net_broadcast()
This removes some #ifdef pollution and will ease follow up patches.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:32:23 -07:00
Cong Wang 752fbcc334 net_sched: no need to free qdisc in RCU callback
gen estimator has been rewritten in commit 1c0d32fde5
("net_sched: gen_estimator: complete rewrite of rate estimators"),
the caller no longer needs to wait for a grace period. So this
patch gets rid of it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:30:03 -07:00
Jim Hanko bd7d2106b6 team: fall back to hash if table entry is empty
If the hash to port mapping table does not have a valid port (i.e. when
a port goes down), fall back to the simple hashing mechanism to avoid
dropping packets.

Signed-off-by: Jim Hanko <hanko@drivescale.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:18:58 -07:00
David S. Miller d3b557712f Merge branch 'test_rhashtable-dont-allocate-huge-static-array'
Florian Westphal says:

====================
test_rhashtable: don't allocate huge static array

Add a test case for the rhlist interface.
While at it, cleanup current rhashtable test a bit and add a check
for max_size support.

No changes since v1, except in last patch.
kbuild robot complained about large onstack allocation caused by
struct rhltable when lockdep is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:15:48 -07:00
Florian Westphal cdd4de372e test_rhashtable: add test case for rhl_table interface
also test rhltable.  rhltable remove operations are slow as
deletions require a list walk, thus test with 1/16th of the given
entry count number to get a run duration similar to rhashtable one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:15:47 -07:00
Florian Westphal a6359bd8dd test_rhashtable: add a check for max_size
add a test that tries to insert more than max_size elements.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:15:47 -07:00
Florian Westphal f651616e79 test_rhashtable: don't use global entries variable
pass the entries to test as an argument instead.
Followup patch will add an rhlist test case; rhlist delete opererations
are slow so we need to use a smaller number to test it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:15:47 -07:00
Florian Westphal 7e936bd734 test_rhashtable: don't allocate huge static array
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:15:47 -07:00
David S. Miller 3d5cc728c4 Merge branch 'dsa-b53-bcm_sf2-cleanups'
Florian Fainelli says:

====================
net: dsa: b53/bcm_sf2 cleanups

This patch series is a first pass set of clean-ups to reduce the number of LOCs
between b53 and bcm_sf2 and sharing as many functions as possible.

There is a number of additional cleanups queued up locally that require more
thorough testing.

Changes in v3:

- remove one extra argument for the b53_build_io_op macro (David Laight)
- added additional Reviewed-by tags from Vivien

Changes in v2:

- added Reviewed-by tags from Vivien
- added a missing EXPORT_SYMBOL() in patch 8
- fixed a typo in patch 5
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:54 -07:00
Florian Fainelli f86ad77faf net: dsa: bcm_sf2: Utilize b53_{enable, disable}_port
Export b53_{enable,disable}_port and use these two functions in
bcm_sf2_port_setup and bcm_sf2_port_disable. The generic functions
cannot be used without wrapping because we need to manage additional
switch integration details (PHY, Broadcom tag etc.).

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:54 -07:00
Florian Fainelli 152b6fd60a net: dsa: bcm_sf2: Use SF2_NUM_EGRESS_QUEUES for CFP
The magic number 8 in 3 locations in bcm_sf2_cfp.c actually designates
the number of switch port egress queues, so use that define instead of
open-coding it.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:54 -07:00
Florian Fainelli aac028672c net: dsa: b53: Export b53_imp_vlan_setup()
bcm_sf2 and b53 do exactly the same thing, so share that piece.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:54 -07:00
Florian Fainelli f43a2dbe95 net: dsa: b53: Wire-up EEE
Add support for enabling and disabling EEE, as well as re-negotiating it in
.adjust_link() and in .port_enable().

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:54 -07:00
Florian Fainelli 22256b0afb net: dsa: b53: Move EEE functions to b53
Move the bcm_sf2 EEE-related functions to the b53 driver because this is shared
code amongst Gigabit capable switch, only 5325 and 5365 are too old to support
that.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00
Florian Fainelli 909d812a66 net: dsa: b53: Define EEE register page
In preparation for migrating the EEE code from bcm_sf2 to b53, define the full
EEE register page and offsets within that page.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00
Florian Fainelli b409a9efa1 net: dsa: b53: Move Broadcom header setup to b53
The code to enable Broadcom tags/headers is largely switch independent,
and in preparation for enabling it for multiple devices with b53, move
the code we have in bcm_sf2.c to b53_common.c

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00
Florian Fainelli 5345862e9a net: dsa: b53: Use a macro to define I/O operations
Instead of repeating the same pattern: acquire mutex, read/write,
release mutex, define a macro: b53_build_op() which takes the type
(read|write), I/O size, and value (scalar or pointer). This helps with
fixing bugs that could exist (e.g: missing barrier, lock etc.).

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00
Florian Fainelli e85ec74ace net: dsa: bcm_sf2: Defer port enabling to calling port_enable
There is no need to configure the enabled ports once in bcm_sf2_sw_setup() and
then a second time around when dsa_switch_ops::port_enable is called, just do
it when port_enable is called which is better in terms of power consumption and
correctness.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00
Florian Fainelli 34c8befd13 net: dsa: b53: Defer port enabling to calling port_enable
There is no need to configure the enabled ports once in b53_setup() and then a
second time around when dsa_switch_ops::port_enable is called, just do it when
port_enable is called which is better in terms of power consumption and
correctness.

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 16:08:53 -07:00