Commit Graph

856722 Commits

Author SHA1 Message Date
wenxu
4e481908c5 flow_offload: move tc indirect block to flow offload
move tc indirect block to flow_offload and rename
it to flow indirect block.The nf_tables can use the
indr block architecture.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
e4da910211 cls_api: add flow_indr_block_call function
This patch make indr_block_call don't access struct tc_indr_block_cb
and tc_indr_block_dev directly

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
f843698857 cls_api: remove the tcf_block cache
Remove the tcf_block in the tc_indr_block_dev for muti-subsystem
support.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
wenxu
242453c227 cls_api: modify the tc_indr_block_ing_cmd parameters.
This patch make tc_indr_block_ing_cmd can't access struct
tc_indr_block_dev and tc_indr_block_cb.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:44:30 -07:00
David S. Miller
61552d2ce8 Merge branch 'net-batched-receive-in-GRO-path'
Edward Cree says:

====================
net: batched receive in GRO path

This series listifies part of GRO processing, in a manner which allows those
 packets which are not GROed (i.e. for which dev_gro_receive returns
 GRO_NORMAL) to be passed on to the listified regular receive path.
dev_gro_receive() itself is not listified, nor the per-protocol GRO
 callback, since GRO's need to hold packets on lists under napi->gro_hash
 makes keeping the packets on other lists awkward, and since the GRO control
 block state of held skbs can refer only to one 'new' skb at a time.
Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb
 onto a list in the napi struct, which is received at the end of the napi
 poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch.

Performance figures with this series, collected on a back-to-back pair of
 Solarflare sfn8522-r2 NICs with 120-second NetPerf tests.  In the stats,
 sample size n for old and new code is 6 runs each; p is from a Welch t-test.
Tests were run both with GRO enabled and disabled, the latter simulating
 uncoalesceable packets (e.g. due to IP or TCP options).  The receive side
 (which was the device under test) had the NetPerf process pinned to one CPU,
 and the device interrupts pinned to a second CPU.  CPU utilisation figures
 (used in cases of line-rate performance) are summed across all CPUs.
net.core.gro_normal_batch was left at its default value of 8.

TCP 4 streams, GRO on: all results line rate (9.415Gbps)
net-next: 210.3% cpu
after #1: 181.5% cpu (-13.7%, p=0.031 vs net-next)
after #3: 196.7% cpu (- 8.4%, p=0.136 vs net-next)
TCP 4 streams, GRO off:
net-next: 8.017 Gbps
after #1: 7.785 Gbps (- 2.9%, p=0.385 vs net-next)
after #3: 7.604 Gbps (- 5.1%, p=0.282 vs net-next.  But note *)
TCP 1 stream, GRO off:
net-next: 6.553 Gbps
after #1: 6.444 Gbps (- 1.7%, p=0.302 vs net-next)
after #3: 6.790 Gbps (+ 3.6%, p=0.169 vs net-next)
TCP 1 stream, GRO on, busy_read = 50: all results line rate
net-next: 156.0% cpu
after #1: 174.5% cpu (+11.9%, p=0.015 vs net-next)
after #3: 165.0% cpu (+ 5.8%, p=0.147 vs net-next)
TCP 1 stream, GRO off, busy_read = 50:
net-next: 6.488 Gbps
after #1: 6.625 Gbps (+ 2.1%, p=0.059 vs net-next)
after #3: 7.351 Gbps (+13.3%, p=0.026 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload
net-next: 995.083 us
after #1: 969.167 us (- 2.6%, p=0.204 vs net-next)
after #3: 976.433 us (- 1.9%, p=0.254 vs net-next)
TCP_RR 100 streams, GRO off, 8000 byte payload, busy_read = 50:
net-next:   2.851 ms
after #1:   2.871 ms (+ 0.7%, p=0.134 vs net-next)
after #3:   2.937 ms (+ 3.0%, p<0.001 vs net-next)
TCP_RR 100 streams, GRO off, 1 byte payload, busy_read = 50:
net-next: 867.317 us
after #1: 865.717 us (- 0.2%, p=0.334 vs net-next)
after #3: 868.517 us (+ 0.1%, p=0.414 vs net-next)

(*) These tests produced a mixture of line-rate and below-line-rate results,
 meaning that statistically speaking the results were 'censored' by the
 upper bound, and were thus not normally distributed, making a Welch t-test
 mathematically invalid.  I therefore also calculated estimators according
 to [1], which gave the following:
net-next: 8.133 Gbps
after #1: 8.130 Gbps (- 0.0%, p=0.499 vs net-next)
after #3: 7.680 Gbps (- 5.6%, p=0.285 vs net-next)
(though my procedure for determining ν wasn't mathematically well-founded
 either, so take that p-value with a grain of salt).
A further check came from dividing the bandwidth figure by the CPU usage for
 each test run, giving:
net-next: 3.461
after #1: 3.198 (- 7.6%, p=0.145 vs net-next)
after #3: 3.641 (+ 5.2%, p=0.280 vs net-next)

The above results are fairly mixed, and in most cases not statistically
 significant.  But I think we can roughly conclude that the series
 marginally improves non-GROable throughput, without hurting latency
 (except in the large-payload busy-polling case, which in any case yields
 horrid performance even on net-next (almost triple the latency without
 busy-poll).  Also, drivers which, unlike sfc, pass UDP traffic to GRO
 would expect to see a benefit from gaining access to batching.

Changed in v3:
 * gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one
 * removed RFC tags (no comments after a week means no-one objects, right?)

Changed in v2:
 * During busy poll, call gro_normal_list() to receive batched packets
   after each cycle of the napi busy loop.  See comments in Patch #3 for
   complications of doing the same in busy_poll_stop().

[1]: Cohen 1959, doi: 10.1080/00401706.1959.10489859
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree
323ebb61e3 net: use listified RX for handling GRO_NORMAL skbs
When GRO decides not to coalesce a packet, in napi_frags_finish(), instead
 of passing it to the stack immediately, place it on a list in the napi
 struct.  Then, at flush time (napi_complete_done(), napi_poll(), or
 napi_busy_loop()), call netif_receive_skb_list_internal() on the list.
We'd like to do that in napi_gro_flush(), but it's not called if
 !napi->gro_bitmask, so we have to do it in the callers instead.  (There are
 a handful of drivers that call napi_gro_flush() themselves, but it's not
 clear why, or whether this will affect them.)
Because a full 64 packets is an inefficiently large batch, also consume the
 list whenever it exceeds gro_normal_batch, a new net/core sysctl that
 defaults to 8.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree
6727013694 sfc: falcon: don't score irq moderation points for GRO
Same rationale as for sfc, except that this wasn't performance-tested.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Edward Cree
5e040d4b1a sfc: don't score irq moderation points for GRO
We already scored points when handling the RX event, no-one else does this,
 and looking at the history it appears this was originally meant to only
 score on merges, not on GRO_NORMAL.  Moreover, it gets in the way of
 changing GRO to not immediately pass GRO_NORMAL skbs to the stack.
Performance testing with four TCP streams received on a single CPU (where
 throughput was line rate of 9.4Gbps in all tests) showed a 13.7% reduction
 in RX CPU usage (n=6, p=0.03).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:22:29 -07:00
Rahul Verma
5e6d9fc761 qed: Add new ethtool supported port types based on media.
Supported ports in ethtool <eth1> are displayed based on media type.
For media type fibre and twinaxial, port type is "FIBRE". Media type
Base-T is "TP" and media KR is "Backplane".

V1->V2:
Corrected the subject.

Signed-off-by: Rahul Verma <rahulv@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:14:07 -07:00
Chuhong Yuan
ad2dcba008 cxgb4: smt: Use normal int for refcount
All refcount operations are protected by spinlocks now.
Then the atomic counter can be replaced by a normal int.

This patch depends on PATCH 1/2.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:12:17 -07:00
Chuhong Yuan
4a8937b838 cxgb4: smt: Add lock for atomic_dec_and_test
The atomic_dec_and_test() is not safe because it is
outside of locks.
Move the locks of t4_smte_free() to its caller,
cxgb4_smt_release() to protect the atomic decrement.

Fixes: 3bdb376e69 ("cxgb4: introduce SMT ops to prepare for SMAC rewrite support")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:12:17 -07:00
David Ahern
e858ef1cd4 selftests: Add l2tp tests
Add IPv4 and IPv6 l2tp tests. Current set is over IP and with
IPsec.

v2
- add l2tp.sh to TEST_PROGS in Makefile

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:08:09 -07:00
Alexey Dobriyan
9d2f112383 net: delete "register" keyword
Delete long obsoleted "register" keyword.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 18:03:42 -07:00
Chuhong Yuan
4b4de39850 mkiss: Use refcount_t for refcount
refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 17:58:40 -07:00
Chuhong Yuan
31168a6d12 dpaa_eth: Use refcount_t for refcount
refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 17:57:54 -07:00
David S. Miller
b3a598eb0d Merge tag 'batadv-next-for-davem-20190808' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Replace usage of strlcpy with strscpy, by Sven Eckelmann

 - Add OGMv2 per-interface queue and aggregations, by Linus Luessing
   (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08 11:28:40 -07:00
David S. Miller
13dfb3fa49 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Just minor overlapping changes in the conflicts here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 18:44:57 -07:00
Linus Torvalds
33920f1ec5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Yeah I should have sent a pull request last week, so there is a lot
  more here than usual:

   1) Fix memory leak in ebtables compat code, from Wenwen Wang.

   2) Several kTLS bug fixes from Jakub Kicinski (circular close on
      disconnect etc.)

   3) Force slave speed check on link state recovery in bonding 802.3ad
      mode, from Thomas Falcon.

   4) Clear RX descriptor bits before assigning buffers to them in
      stmmac, from Jose Abreu.

   5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
      loops, from Nishka Dasgupta.

   6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.

   7) Need to hold sock across skb->destructor invocation, from Cong
      Wang.

   8) IP header length needs to be validated in ipip tunnel xmit, from
      Haishuang Yan.

   9) Use after free in ip6 tunnel driver, also from Haishuang Yan.

  10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
      Heiner Kallweit.

  11) Upon bridge device init failure, we need to delete the local fdb.
      From Nikolay Aleksandrov.

  12) Handle erros from of_get_mac_address() properly in stmmac, from
      Martin Blumenstingl.

  13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
      Kadlecsik.

  14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
      some devices, so revert. From Johannes Berg.

  15) Fix deadlock in rxrpc, from David Howells.

  16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
      Haibing.

  17) Fix mvpp2 crash on module removal, from Matteo Croce.

  18) Fix race in genphy_update_link, from Heiner Kallweit.

  19) bpf_xdp_adjust_head() stopped working with generic XDP when we
      fixes generic XDP to support stacked devices properly, fix from
      Jesper Dangaard Brouer.

  20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
      David Ahern.

  21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
  net: dsa: sja1105: Fix memory leak on meta state machine error path
  net: dsa: sja1105: Fix memory leak on meta state machine normal path
  net: dsa: sja1105: Really fix panic on unregistering PTP clock
  net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
  net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
  net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
  net: sched: sample: allow accessing psample_group with rtnl
  net: sched: police: allow accessing police->params with rtnl
  net: hisilicon: Fix dma_map_single failed on arm64
  net: hisilicon: fix hip04-xmit never return TX_BUSY
  net: hisilicon: make hip04_tx_reclaim non-reentrant
  tc-testing: updated vlan action tests with batch create/delete
  net sched: update vlan action for batched events operations
  net: stmmac: tc: Do not return a fragment entry
  net: stmmac: Fix issues when number of Queues >= 4
  net: stmmac: xgmac: Fix XGMAC selftests
  be2net: disable bh with spin_lock in be_process_mcc
  net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
  net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
  net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
  ...
2019-08-06 17:11:59 -07:00
David S. Miller
05bb520376 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2019-08-05

This series contains updates to i40e driver only.

Dmitrii adds missing statistic counters for VEB and VEB TC's.

Slawomir adds support for logging the "Disable Firmware LLDP" flag
option and its current status.

Jake fixes an issue where VF's being notified of their link status
before their queues are enabled which was causing issues.  So always
report link status down when the VF queues are not enabled.  Also adds
future proofing when statistics are added or removed by adding checks to
ensure the data pointer for the strings lines up with the expected
statistics count.

Czeslaw fixes the advertised mode reported in ethtool for FEC, where the
"None BaseR RS" was always being displayed no matter what the mode it
was in.  Also added logging information when the PF is entering or
leaving "allmulti" (or promiscuous) mode.  Fixed up the logging logic
for VF's when leaving multicast mode to not include unicast as well.

v2: drop Aleksandr's patch (previously patch #2 in the series) to
    display the VF MAC address that is set by the VF while community
    feedback is addressed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:41:45 -07:00
Yifeng Sun
aa733660db openvswitch: Print error when ovs_execute_actions() fails
Currently in function ovs_dp_process_packet(), return values of
ovs_execute_actions() are silently discarded. This patch prints out
an debug message when error happens so as to provide helpful hints
for debugging.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:38:12 -07:00
David S. Miller
feac1d6802 Merge branch 'sja1105-fixes'
Vladimir Oltean says:

====================
Fixes for SJA1105 DSA: FDBs, Learning and PTP

This is an assortment of functional fixes for the sja1105 switch driver
targeted for the "net" tree (although they apply on net-next just as
well).

Patch 1/5 ("net: dsa: sja1105: Fix broken learning with vlan_filtering
disabled") repairs a breakage introduced in the early development stages
of the driver: support for traffic from the CPU has broken "normal"
frame forwarding (based on DMAC) - there is connectivity through the
switch only because all frames are flooded.
I debated whether this patch qualifies as a fix, since it puts the
switch into a mode it has never operated in before (aka SVL). But
"normal" forwarding did use to work before the "Traffic support for
SJA1105 DSA driver" patchset, and arguably this patch should have been
part of that.
Also, it would be strange for this feature to be broken in the 5.2 LTS.

Patch 2/5 ("net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as
well") is a simplification of a previous FDB-related patch that is
currently in the 5.3 rc's.

Patches 3/5 - 5/5 fix various crashes found while running linuxptp over the
switch ports for extended periods of time, or in conjunction with other
error conditions. The fixed-up commits were all introduced in 5.2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean
93fa8587b2 net: dsa: sja1105: Fix memory leak on meta state machine error path
When RX timestamping is enabled and two link-local (non-meta) frames are
received in a row, this constitutes an error.

The tagger is always caching the last link-local frame, in an attempt to
merge it with the meta follow-up frame when that arrives. To recover
from the above error condition, the initial cached link-local frame is
dropped and the second frame in a row is cached (in expectance of the
second meta frame).

However, when dropping the initial link-local frame, its backing memory
was being leaked.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean
f163fed276 net: dsa: sja1105: Fix memory leak on meta state machine normal path
After a meta frame is received, it is associated with the cached
sp->data->stampable_skb from the DSA tagger private structure.

Cached means its refcount is incremented with skb_get() in order for
dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.

The mistake is that skb_unref() is not the correct function to use. It
will correctly decrement the refcount (which will go back to zero) but
the skb memory will not be freed.  That is the job of kfree_skb(), which
also calls skb_unref().

But it turns out that freeing the cached stampable_skb is in fact not
necessary.  It is still a perfectly valid skb, and now it is even
annotated with the partial RX timestamp.  So remove the skb_copy()
altogether and simply pass the stampable_skb with a refcount of 1
(incremented by us, decremented by dsa_switch_rcv) up the stack.

Fixes: f3097be21b ("net: dsa: sja1105: Add a state machine for RX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean
6cb0abbdf9 net: dsa: sja1105: Really fix panic on unregistering PTP clock
The IS_ERR_OR_NULL(priv->clock) check inside
sja1105_ptp_clock_unregister() is preventing cancel_delayed_work_sync
from actually being run.

Additionally, sja1105_ptp_clock_unregister() does not actually get run,
when placed in sja1105_remove(). The DSA switch gets torn down, but the
sja1105 module does not get unregistered. So sja1105_ptp_clock_unregister
needs to be moved to sja1105_teardown, to be symmetrical with
sja1105_ptp_clock_register which is called from the DSA sja1105_setup.

It is strange to fix a "fixes" patch, but the probe failure can only be
seen when the attached PHY does not respond to MDIO (issue which I can't
pinpoint the reason to) and it goes away after I power-cycle the board.
This time the patch was validated on a failing board, and the kernel
panic from the fixed commit's message can no longer be seen.

Fixes: 29dd908d35 ("net: dsa: sja1105: Cancel PTP delayed work on unregister")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00
Vladimir Oltean
4b7da3d808 net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
It looks like the FDB dump taken from first-generation switches also
contains information on whether entries are static or not. So use that
instead of searching through the driver's tables.

Fixes: d763778224 ("net: dsa: sja1105: Implement is_static for FDB entries on E/T")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06 14:37:02 -07:00