Commit Graph

967784 Commits

Author SHA1 Message Date
Magnus Karlsson b8c7aece29 xsk: Introduce padding between more ring pointers
Introduce one cache line worth of padding between the consumer pointer
and the flags field as well as between the flags field and the start
of the descriptors in all the lockless rings. This so that the x86 HW
adjacency prefetcher will not prefetch the adjacent pointer/field when
only one pointer/field is going to be used. This improves throughput
performance for the l2fwd sample app with 1% on my machine with HW
prefetching turned on in the BIOS.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-4-git-send-email-magnus.karlsson@gmail.com
2020-11-17 22:07:40 +01:00
Magnus Karlsson f320460b94 i40e: Remove unnecessary sw_ring access from xsk Tx
Remove the unnecessary access to the software ring for the AF_XDP
zero-copy driver. This was used to record the length of the packet so
that the driver Tx completion code could sum this up to produce the
total bytes sent. This is now performed during the transmission of the
packet, so no need to record this in the software ring.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-3-git-send-email-magnus.karlsson@gmail.com
2020-11-17 22:07:40 +01:00
Magnus Karlsson 90da4b3208 samples/bpf: Increment Tx stats at sending
Increment the statistics over how many Tx packets have been sent at
the time of sending instead of at the time of completion. This as a
completion event means that the buffer has been sent AND returned to
user space. The packet always gets sent shortly after sendto() is
called. The kernel might, for performance reasons, decide to not
return every single buffer to user space immediately after sending,
for example, only after a batch of packets have been
transmitted. Incrementing the number of packets sent at completion,
will in that case be confusing as if you send a single packet, the
counter might show zero for a while even though the packet has been
transmitted.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-2-git-send-email-magnus.karlsson@gmail.com
2020-11-17 22:07:40 +01:00
Alan Maguire de91e631bd libbpf: bpf__find_by_name[_kind] should use btf__get_nr_types()
When operating on split BTF, btf__find_by_name[_kind] will not
iterate over all types since they use btf->nr_types to show
the number of types to iterate over. For split BTF this is
the number of types _on top of base BTF_, so it will
underestimate the number of types to iterate over, especially
for vmlinux + module BTF, where the latter is much smaller.

Use btf__get_nr_types() instead.

Fixes: ba451366bf ("libbpf: Implement basic split BTF support")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1605437195-2175-1-git-send-email-alan.maguire@oracle.com
2020-11-16 20:51:34 -08:00
Martin KaFai Lau b93ef089d3 bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usage
The intention of the current check is to avoid using bpf_sk_storage
in irq and nmi.  Jakub pointed out that the current check cannot
do that.  For example, in_serving_softirq() returns true
if the softirq handling is interrupted by hard irq.

Fixes: 8e4597c627 ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com
2020-11-16 16:46:01 -08:00
Santucci Pierpaolo 024cd2cbd1 selftest/bpf: Fix IPV6FR handling in flow dissector
From second fragment on, IPV6FR program must stop the dissection of IPV6
fragmented packet. This is the same approach used for IPV4 fragmentation.
This fixes the flow keys calculation for the upper-layer protocols.
Note that according to RFC8200, the first fragment packet must include
the upper-layer header.

Signed-off-by: Santucci Pierpaolo <santucci@epigenesys.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/X7JUzUj34ceE2wBm@santucci.pierpaolo
2020-11-16 16:23:29 +01:00
Jakub Kicinski 2d38c5802f Merge branch 'ionic-updates'
Shannon Nelson says:

====================
ionic updates

These updates are a bit of code cleaning and a minor
bit of performance tweaking.

v3: convert ionic_lif_quiesce() to void
v2: added void cast on call to ionic_lif_quiesce()
    lowered batching threshold
    added patch to flatten calls to ionic_lif_rx_mode
    added patch to change from_ndo to can_sleep
====================

Link: https://lore.kernel.org/r/20201112182208.46770-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:23:01 -08:00
Shannon Nelson 7c8d008cc0 ionic: useful names for booleans
With a few more uses of true and false in function calls, we
need to give them some useful names so we can tell from the
calling point what we're doing.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:59 -08:00
Shannon Nelson 81dbc24147 ionic: change set_rx_mode from_ndo to can_sleep
Instead of having two different ways of expressing the same
sleepability concept, using opposite logic, we can rework the
from_ndo to can_sleep for a more consistent usage.

Fixes: 1800eee166 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:59 -08:00
Shannon Nelson e94f76bb20 ionic: flatten calls to ionic_lif_rx_mode
The _ionic_lif_rx_mode() is only used once and really doesn't
need to be broken out.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:59 -08:00
Shannon Nelson e0243e1966 ionic: use mc sync for multicast filters
We should be using the multicast sync routines for the multicast
filters.  Also, let's just flatten the logic a bit and pull
the small unicast routine back into ionic_set_rx_mode().

Fixes: 1800eee166 ("net: ionic: Replace in_interrupt() usage.")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:58 -08:00
Shannon Nelson a8205ab620 ionic: batch rx buffer refilling
We don't need to refill the rx descriptors on every napi
if only a few were handled.  Waiting until we can batch up
a few together will save us a few Rx cycles.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:58 -08:00
Shannon Nelson e7e8e087ac ionic: add lif quiesce
After the queues are stopped, expressly quiesce the lif.
This assures that even if the queues were in an odd state,
the firmware will close up everything cleanly.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:58 -08:00
Shannon Nelson f6e428b27e ionic: check for link after netdev registration
Request a link check as soon as the netdev is registered rather
than waiting for the watchdog to go off in order to get the
interface operational a little more quickly.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:58 -08:00
Shannon Nelson 8f56bc4dc1 ionic: start queues before announcing link up
Change the order of operations in the link_up handling to be
sure that the queues are up and ready before we announce that
the link is up.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 13:22:58 -08:00
YueHaibing 9e6cad531c net: macb: Fix passing zero to 'PTR_ERR'
Check PTR_ERR with IS_ERR to fix this.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20201112144936.54776-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 12:35:33 -08:00
Lukas Bulwahn 2e793878ae ipv6: remove unused function ipv6_skb_idev()
Commit bdb7cc643f ("ipv6: Count interface receive statistics on the
ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since
then, ipv6_skb_idev() is unused and make CC=clang W=1 warns:

  net/ipv6/exthdrs.c:909:33:
    warning: unused function 'ipv6_skb_idev' [-Wunused-function]

So, remove this unused function and a -Wunused-function warning.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 12:00:27 -08:00
Jakub Kicinski 07cbce2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-11-14

1) Add BTF generation for kernel modules and extend BTF infra in kernel
   e.g. support for split BTF loading and validation, from Andrii Nakryiko.

2) Support for pointers beyond pkt_end to recognize LLVM generated patterns
   on inlined branch conditions, from Alexei Starovoitov.

3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh.

4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage
   infra, from Martin KaFai Lau.

5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the
   XDP_REDIRECT path, from Lorenzo Bianconi.

6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI
   context, from Song Liu.

7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build
   for different target archs on same source tree, from Jean-Philippe Brucker.

8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet.

9) Move functionality from test_tcpbpf_user into the test_progs framework so it
   can run in BPF CI, from Alexander Duyck.

10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner.

Note that for the fix from Song we have seen a sparse report on context
imbalance which requires changes in sparse itself for proper annotation
detection where this is currently being discussed on linux-sparse among
developers [0]. Once we have more clarification/guidance after their fix,
Song will follow-up.

  [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/
      https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/

* git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits)
  net: mlx5: Add xdp tx return bulking support
  net: mvpp2: Add xdp tx return bulking support
  net: mvneta: Add xdp tx return bulking support
  net: page_pool: Add bulk support for ptr_ring
  net: xdp: Introduce bulking for xdp tx return path
  bpf: Expose bpf_d_path helper to sleepable LSM hooks
  bpf: Augment the set of sleepable LSM hooks
  bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP
  bpf: Rename some functions in bpf_sk_storage
  bpf: Folding omem_charge() into sk_storage_charge()
  selftests/bpf: Add asm tests for pkt vs pkt_end comparison.
  selftests/bpf: Add skb_pkt_end test
  bpf: Support for pointers beyond pkt_end.
  tools/bpf: Always run the *-clean recipes
  tools/bpf: Add bootstrap/ to .gitignore
  bpf: Fix NULL dereference in bpf_task_storage
  tools/bpftool: Fix build slowdown
  tools/runqslower: Build bpftool using HOSTCC
  tools/runqslower: Enable out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14 09:13:41 -08:00
Steen Hegelund 774626fa44 net: phy: mscc: Add PTP support for 2 more VSC PHYs
Add VSC8572 and VSC8574 in the PTP configuration
as they also support PTP.

The relevant datasheets can be found here:
  - VSC8572: https://www.microchip.com/wwwproducts/en/VSC8572
  - VSC8574: https://www.microchip.com/wwwproducts/en/VSC8574

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Link: https://lore.kernel.org/r/20201112092250.914079-1-steen.hegelund@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13 18:25:34 -08:00
Daniel Borkmann c14d61fca0 Merge branch 'xdp-redirect-bulk'
Lorenzo Bianconi says:

====================
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually
run inside the driver NAPI tx completion loop.

Convert mvneta, mvpp2 and mlx5 drivers to xdp_return_frame_bulk APIs.

More details on benchmarks run on mlx5 can be found here:
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/xdp_bulk_return01.org

Changes since v5:
- do not keep looping over ptr_ring if the cache is full but release leftover
  pages running page_pool_return_page

Changes since v4:
- fix comments
- introduce xdp_frame_bulk_init utility routine
- compiler annotations for I-cache code layout
- move rcu_read_lock outside fast-path
- mlx5 xdp bulking code optimization

Changes since v3:
- align DEV_MAP_BULK_SIZE to XDP_BULK_QUEUE_SIZE
- refactor page_pool_put_page_bulk to avoid code duplication

Changes since v2:
- move mvneta changes in a dedicated patch

Changes since v1:
- improve comments
- rework xdp_return_frame_bulk routine logic
- move count and xa fields at the beginning of xdp_frame_bulk struct
- invert logic in page_pool_put_page_bulk for loop
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
2020-11-14 02:30:03 +01:00
Lorenzo Bianconi b87c57ae12 net: mlx5: Add xdp tx return bulking support
Convert mlx5 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 8.9Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 10.2Mpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/250460319fd868b7b5668fc1deca74dd42813a90.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi dbef19ccde net: mvpp2: Add xdp tx return bulking support
Convert mvpp2 driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 1.79Mpps
XDP_REDIRECT (upstream codepath + bulking APIs): 1.93Mpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Matteo Croce <mcroce@microsoft.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/0b38c295e58e8ce251ef6b4e2187a2f457f9f7a3.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi 2f9d09394d net: mvneta: Add xdp tx return bulking support
Convert mvneta driver to xdp_return_frame_bulk APIs.

XDP_REDIRECT (upstream codepath): 275Kpps
XDP_REDIRECT (upstream codepath + bulking APIs): 284Kpps

Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/9af8014006d022fc0fec78cdaa71beb56999750d.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi 7886244736 net: page_pool: Add bulk support for ptr_ring
Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
2020-11-14 02:29:00 +01:00
Lorenzo Bianconi 8965398713 net: xdp: Introduce bulking for xdp tx return path
XDP bulk APIs introduce a defer/flush mechanism to return
pages belonging to the same xdp_mem_allocator object
(identified via the mem.id field) in bulk to optimize
I-cache and D-cache since xdp_return_frame is usually run
inside the driver NAPI tx completion loop.
The bulk queue size is set to 16 to be aligned to how
XDP_REDIRECT bulking works. The bulk is flushed when
it is full or when mem.id changes.
xdp_frame_bulk is usually stored/allocated on the function
call-stack to avoid locking penalties.
Current implementation considers only page_pool memory model.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
2020-11-14 02:28:59 +01:00