Commit Graph

242 Commits

Author SHA1 Message Date
Eric Dumazet
f78ed2204d netpoll: accept NULL np argument in netpoll_send_skb()
netpoll_send_skb() callers seem to leak skb if
the np pointer is NULL. While this should not happen, we
can make the code more robust.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Eric Dumazet
1ddabdfaf7 netpoll: netpoll_send_skb() returns transmit status
Some callers want to know if the packet has been sent or
dropped, to inform upper stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Eric Dumazet
fb1eee476b netpoll: move netpoll_send_skb() out of line
There is no need to inline this helper, as we intend to add more
code in this function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Eric Dumazet
307f660d05 netpoll: remove dev argument from netpoll_send_skb_on_dev()
netpoll_send_skb_on_dev() can get the device pointer directly from np->dev

Rename it to __netpoll_send_skb()

Following patch will move netpoll_send_skb() out-of-line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-07 18:11:07 -07:00
Yunjian Wang
a54776f2c4 netpoll: Fix use correct return type for ndo_start_xmit()
The method ndo_start_xmit() returns a value of type netdev_tx_t. Fix
the ndo function to use the correct type.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29 11:47:16 -07:00
Feng Sun
2c1644cf6d net: fix skb use after free in netpoll
After commit baeababb5b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo->txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun <loyou85@gmail.com>
Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:52:02 -07:00
Florian Westphal
2638eb8b50 net: ipv4: provide __rcu annotation for ifa_list
ifa_list is protected by rcu, yet code doesn't reflect this.

Add the __rcu annotations and fix up all places that are now reported by
sparse.

I've done this in the same commit to not add intermediate patches that
result in new warnings.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:08:36 -07:00
Thomas Gleixner
457c899653 treewide: Add SPDX license identifier for missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Linus Torvalds
80f232121b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb->xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
2019-05-07 22:03:58 -07:00
Sakari Ailus
d75f773c86 treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.

The changes have been produced by the following command:

	git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
	while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

And verifying the result.

Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
2019-04-09 14:19:06 +02:00
Paolo Abeni
4bd97d51a5 net: dev: rename queue selection helpers.
With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.

Suggested-by: Eric Dumazet <edumazet@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:54 -07:00
Linus Torvalds
e0c38a4d1f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New ipset extensions for matching on destination MAC addresses, from
    Stefano Brivio.

 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to
    nfp driver. From Stefano Brivio.

 3) Implement GRO for plain UDP sockets, from Paolo Abeni.

 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT
    bit so that we could support the entire vlan_tci value.

 5) Rework the IPSEC policy lookups to better optimize more usecases,
    from Florian Westphal.

 6) Infrastructure changes eliminating direct manipulation of SKB lists
    wherever possible, and to always use the appropriate SKB list
    helpers. This work is still ongoing...

 7) Lots of PHY driver and state machine improvements and
    simplifications, from Heiner Kallweit.

 8) Various TSO deferral refinements, from Eric Dumazet.

 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov.

10) Batch dropping of XDP packets in tuntap, from Jason Wang.

11) Lots of cleanups and improvements to the r8169 driver from Heiner
    Kallweit, including support for ->xmit_more. This driver has been
    getting some much needed love since he started working on it.

12) Lots of new forwarding selftests from Petr Machata.

13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel.

14) Packed ring support for virtio, from Tiwei Bie.

15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov.

16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu.

17) Implement coalescing on TCP backlog queue, from Eric Dumazet.

18) Implement carrier change in tun driver, from Nicolas Dichtel.

19) Support msg_zerocopy in UDP, from Willem de Bruijn.

20) Significantly improve garbage collection of neighbor objects when
    the table has many PERMANENT entries, from David Ahern.

21) Remove egdev usage from nfp and mlx5, and remove the facility
    completely from the tree as it no longer has any users. From Oz
    Shlomo and others.

22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and
    therefore abort the operation before the commit phase (which is the
    NETDEV_CHANGEADDR event). From Petr Machata.

23) Add indirect call wrappers to avoid retpoline overhead, and use them
    in the GRO code paths. From Paolo Abeni.

24) Add support for netlink FDB get operations, from Roopa Prabhu.

25) Support bloom filter in mlxsw driver, from Nir Dotan.

26) Add SKB extension infrastructure. This consolidates the handling of
    the auxiliary SKB data used by IPSEC and bridge netfilter, and is
    designed to support the needs to MPTCP which could be integrated in
    the future.

27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits)
  net: dccp: fix kernel crash on module load
  drivers/net: appletalk/cops: remove redundant if statement and mask
  bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw
  net/net_namespace: Check the return value of register_pernet_subsys()
  net/netlink_compat: Fix a missing check of nla_parse_nested
  ieee802154: lowpan_header_create check must check daddr
  net/mlx4_core: drop useless LIST_HEAD
  mlxsw: spectrum: drop useless LIST_HEAD
  net/mlx5e: drop useless LIST_HEAD
  iptunnel: Set tun_flags in the iptunnel_metadata_reply from src
  net/mlx5e: fix semicolon.cocci warnings
  staging: octeon: fix build failure with XFRM enabled
  net: Revert recent Spectre-v1 patches.
  can: af_can: Fix Spectre v1 vulnerability
  packet: validate address length if non-zero
  nfc: af_nfc: Fix Spectre v1 vulnerability
  phonet: af_phonet: Fix Spectre v1 vulnerability
  net: core: Fix Spectre v1 vulnerability
  net: minor cleanup in skb_ext_add()
  net: drop the unused helper skb_ext_get()
  ...
2018-12-27 13:04:52 -08:00
Petr Machata
00f54e6892 net: core: dev: Add extack argument to dev_open()
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-06 13:26:06 -08:00
Ingo Molnar
4bbfd7467c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions
  to their vanilla RCU counterparts.  This series is a step
  towards complete removal of the RCU-bh and RCU-sched update-side
  functions.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
  updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for
  rcutorture testing.

- Convert spin_is_locked() assertions to instead use lockdep.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein
  for a bag-on-head-class bug.

- RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04 07:52:30 +01:00
Paul E. McKenney
5da54c1810 net/core: Replace call_rcu_bh() and synchronize_rcu_bh()
Now that call_rcu()'s callback is not invoked until after all bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place of
call_rcu_bh().  Similarly, synchronize_rcu() can be used in place of
synchronize_rcu_bh().  This commit therefore makes these changes.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <netdev@vger.kernel.org>
2018-12-01 12:38:47 -08:00
Matwey V. Kornilov
d016b4a356 net: core: netpoll: Enable netconsole IPv6 link local address
There is no reason to discard using source link local address when
remote netconsole IPv6 address is set to be link local one.

The patch allows administrators to use IPv6 netconsole without
explicitly configuring source address:

    netconsole=@/,@fe80::5054:ff:fe2f:6012/

Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05 17:07:10 -08:00
Debabrata Banerjee
c9fbd71f73 netpoll: allow cleanup to be synchronous
This fixes a problem introduced by:
commit 2cde6acd49 ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")

When using netconsole on a bond, __netpoll_cleanup can asynchronously
recurse multiple times, each __netpoll_free_async call can result in
more __netpoll_free_async's. This means there is now a race between
cleanup_work queues on multiple netpoll_info's on multiple devices and
the configuration of a new netpoll. For example if a netconsole is set
to enable 0, reconfigured, and enable 1 immediately, this netconsole
will likely not work.

Given the reason for __netpoll_free_async is it can be called when rtnl
is not locked, if it is locked, we should be able to execute
synchronously. It appears to be locked everywhere it's called from.

Generalize the design pattern from the teaming driver for current
callers of __netpoll_free_async.

CC: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 17:01:43 -07:00
David S. Miller
4899542314 Revert "bond: take rcu lock in netpoll_send_skb_on_dev"
This reverts commit 6fe9487892.

It is causing more serious regressions than the RCU warning
it is fixing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 10:45:08 -07:00
Dave Jones
6fe9487892 bond: take rcu lock in netpoll_send_skb_on_dev
The bonding driver lacks the rcu lock when it calls down into
netdev_lower_get_next_private_rcu from bond_poll_controller, which
results in a trace like:

WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
Workqueue: bond0 bond_mii_monitor
RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
Call Trace:
 bond_poll_controller+0x52/0x170
 netpoll_poll_dev+0x79/0x290
 netpoll_send_skb_on_dev+0x158/0x2c0
 netpoll_send_udp+0x2d5/0x430
 write_ext_msg+0x1e0/0x210
 console_unlock+0x3c4/0x630
 vprintk_emit+0xfa/0x2f0
 printk+0x52/0x6e
 ? __netdev_printk+0x12b/0x220
 netdev_info+0x64/0x80
 ? bond_3ad_set_carrier+0xe9/0x180
 bond_select_active_slave+0x1fc/0x310
 bond_mii_monitor+0x709/0x9b0
 process_one_work+0x221/0x5e0
 worker_thread+0x4f/0x3b0
 kthread+0x100/0x140
 ? process_one_work+0x5e0/0x5e0
 ? kthread_delayed_work_timer_fn+0x90/0x90
 ret_from_fork+0x24/0x30

We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
before we call down into netpoll_poll_dev, so just take the lock there.

Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 23:25:25 -07:00
Eric Dumazet
c24498c682 netpoll: do not test NAPI_STATE_SCHED in poll_one_napi()
Since we do no longer require NAPI drivers to provide
an ndo_poll_controller(), napi_schedule() has not been done
before poll_one_napi() invocation.

So testing NAPI_STATE_SCHED is likely to cause early returns.

While we are at it, remove outdated comment.

Note to future bisections : This change might surface prior
bugs in drivers. See commit 73f21c653f ("bnxt_en: Fix TX
timeout during netpoll.") for one occurrence.

Fixes: ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28 11:12:28 -07:00
Eric Dumazet
ac3d9dd034 netpoll: make ndo_poll_controller() optional
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

It seems that all networking drivers that do use NAPI
for their TX completions, should not provide a ndo_poll_controller().

NAPI drivers have netpoll support already handled
in core networking stack, since netpoll_poll_dev()
uses poll_napi(dev) to iterate through registered
NAPI contexts for a device.

This patch allows netpoll_poll_dev() to process NAPI
contexts even for drivers not providing ndo_poll_controller(),
allowing for following patches in NAPI drivers.

Also we export netpoll_poll_dev() so that it can be called
by bonding/team drivers in following patches.

Reported-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 21:55:24 -07:00
Frederic Weisbecker
af07339373 netpoll: Use lockdep to assert IRQs are disabled/enabled
Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correctness
debug code is enabled.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-14-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 11:13:54 +01:00
Matthias Kaehlcke
0c3a8f8b8f netpoll: Fix device name check in netpoll_setup()
Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
when checking if the device name is set:

if (np->dev_name) {
  ...

However the field is a character array, therefore the condition always
yields true. Check instead whether the first byte of the array has a
non-zero value.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-26 17:01:43 -07:00
WANG Cong
230cd1279d netpoll: shut up a kernel warning on refcount
When we convert atomic_t to refcount_t, a new kernel warning
on "increment on 0" is introduced in the netpoll code,
zap_completion_queue(). In fact for this special case, we know
the refcount is 0 and we just have to set it to 1 to satisfy
the following dev_kfree_skb_any(), so we can just use
refcount_set(..., 1) instead.

Fixes: 633547973f ("net: convert sk_buff.users from atomic_t to refcount_t")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: Reshetova, Elena <elena.reshetova@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-14 08:16:59 -07:00
Reshetova, Elena
433cea4d9b net: convert netpoll_info.refcnt from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 07:39:08 -07:00