Commit Graph

703 Commits

Author SHA1 Message Date
Linus Torvalds
6456300356 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Here we go, another merge window full of networking and #ebpf changes:

   1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
      range without dealing with floods of ARP traffic, from Linus
      Lüssing.

   2) Throttle buffered multicast packet transmission in mt76, from
      Felix Fietkau.

   3) Support adaptive interrupt moderation in ice, from Brett Creeley.

   4) A lot of struct_size conversions, from Gustavo A. R. Silva.

   5) Add peek/push/pop commands to bpftool, as well as bash completion,
      from Stanislav Fomichev.

   6) Optimize sk_msg_clone(), from Vakul Garg.

   7) Add SO_BINDTOIFINDEX, from David Herrmann.

   8) Be more conservative with local resends due to local congestion,
      from Yuchung Cheng.

   9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.

  10) Add health buffer support to devlink, from Eran Ben Elisha.

  11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.

  12) Add statistics to basic packet scheduler filter, from Cong Wang.

  13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.

  14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.

  15) Add 3ad stats to bonding, from Nikolay Aleksandrov.

  16) Lots of probing improvements for bpftool, from Quentin Monnet.

  17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.

  18) Allow #ebpf programs to access gso_segs from skb shared info, from
      Eric Dumazet.

  19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.

  20) Support 22260 iwlwifi devices, from Luca Coelho.

  21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.

  22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.

  23) Add spinlock support to #ebpf, from Alexei Starovoitov.

  24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.

  25) Add device infomation API to devlink, from Jakub Kicinski.

  26) Add new timestamping socket options which are y2038 safe, from
      Deepa Dinamani.

  27) Add RX checksum offloading for various sh_eth chips, from Sergei
      Shtylyov.

  28) Flow offload infrastructure, from Pablo Neira Ayuso.

  29) Numerous cleanups, improvements, and bug fixes to the PHY layer
      and many drivers from Heiner Kallweit.

  30) Lots of changes to try and make packet scheduler classifiers run
      lockless as much as possible, from Vlad Buslov.

  31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.

  32) Add concurrency tests to tc-tests infrastructure, from Vlad
      Buslov.

  33) Add hwmon support to aquantia, from Heiner Kallweit.

  34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.

  And I would be remiss if I didn't thank the various major networking
  subsystem maintainers for integrating much of this work before I even
  saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
  Johannes Berg, Kalle Valo, and many others. Thank you!"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
  net/sched: avoid unused-label warning
  net: ignore sysctl_devconf_inherit_init_net without SYSCTL
  phy: mdio-mux: fix Kconfig dependencies
  net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
  net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
  selftest/net: Remove duplicate header
  sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
  net/mlx5e: Update tx reporter status in case channels were successfully opened
  devlink: Add support for direct reporter health state update
  devlink: Update reporter state to error even if recover aborted
  sctp: call iov_iter_revert() after sending ABORT
  team: Free BPF filter when unregistering netdev
  ip6mr: Do not call __IP6_INC_STATS() from preemptible context
  isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
  net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
  cxgb4/chtls: Prefix adapter flags with CXGB4
  net-sysfs: Switch to bitmap_zalloc()
  mellanox: Switch to bitmap_zalloc()
  bpf: add test cases for non-pointer sanitiation logic
  mlxsw: i2c: Extend initialization by querying resources data
  ...
2019-03-05 08:26:13 -08:00
Linus Torvalds
a83b04232c Merge tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:

 - Switch mdev to generic UUID API (Andy Shevchenko)

 - Fixup platform reset include paths (Masahiro Yamada)

 - Fix usage of MINORMASK (Chengguang Xu)

 - Remove noise from duplicate spapr table unsets (Alexey Kardashevskiy)

 - Restore device state after PM reset (Alex Williamson)

 - Ensure memory translation enabled for PCI ROM access (Eric Auger)

* tag 'vfio-v5.1-rc1' of git://github.com/awilliam/linux-vfio:
  vfio_pci: Enable memory accesses before calling pci_map_rom
  vfio/pci: Restore device state on PM transition
  vfio/spapr_tce: Skip unsetting already unset table
  samples/vfio-mdev/mtty: expand minor range when registering chrdev region
  samples/vfio-mdev/mdpy: expand minor range when registering chrdev region
  samples/vfio-mdev/mbochs: expand minor range when registering chrdev region
  vfio: expand minor range when registering chrdev region
  vfio: platform: reset: fix up include directives to remove ccflags-y
  vfio-mdev: Switch to use new generic UUID API
2019-03-04 18:56:36 -08:00
brakmo
4ffd44cfd1 bpf: HBM test script
Script for testing HBM (Host Bandwidth Manager) framework.
It creates a cgroup to use for testing and load a BPF program to limit
egress bandwidht. It then uses iperf3 or netperf to create
loads. The output is the goodput in Mbps (unless -D is used).

It can work on a single host using loopback or among two hosts (with netperf).
When using loopback, it is recommended to also introduce a delay of at least
1ms (-d=1), otherwise the assigned bandwidth is likely to be underutilized.

USAGE: $name [out] [-b=<prog>|--bpf=<prog>] [-c=<cc>|--cc=<cc>] [-D]
             [-d=<delay>|--delay=<delay>] [--debug] [-E]
             [-f=<#flows>|--flows=<#flows>] [-h] [-i=<id>|--id=<id >] [-l]
	     [-N] [-p=<port>|--port=<port>] [-P] [-q=<qdisc>]
             [-R] [-s=<server>|--server=<server] [--stats]
	     [-t=<time>|--time=<time>] [-w] [cubic|dctcp]
  Where:
    out               Egress (default egress)
    -b or --bpf       BPF program filename to load and attach.
                      Default is nrm_out_kern.o for egress,
    -c or -cc         TCP congestion control (cubic or dctcp)
    -d or --delay     Add a delay in ms using netem
    -D                In addition to the goodput in Mbps, it also outputs
                      other detailed information. This information is
                      test dependent (i.e. iperf3 or netperf).
    --debug           Print BPF trace buffer
    -E                Enable ECN (not required for dctcp)
    -f or --flows     Number of concurrent flows (default=1)
    -i or --id        cgroup id (an integer, default is 1)
    -l                Do not limit flows using loopback
    -N                Use netperf instead of iperf3
    -h                Help
    -p or --port      iperf3 port (default is 5201)
    -P                Use an iperf3 instance for each flow
    -q                Use the specified qdisc.
    -r or --rate      Rate in Mbps (default 1s 1Gbps)
    -R                Use TCP_RR for netperf. 1st flow has req
                      size of 10KB, rest of 1MB. Reply in all
                      cases is 1 byte.
                      More detailed output for each flow can be found
                      in the files netperf.<cg>.<flow>, where <cg> is the
                      cgroup id as specified with the -i flag, and <flow>
                      is the flow id starting at 1 and increasing by 1 for
                      flow (as specified by -f).
    -s or --server    hostname of netperf server. Used to create netperf
                      test traffic between to hosts (default is within host)
                      netserver must be running on the host.
    --stats           Get HBM stats (marked, dropped, etc.)
    -t or --time      duration of iperf3 in seconds (default=5)
    -w                Work conserving flag. cgroup can increase its
                      bandwidth beyond the rate limit specified
                      while there is available bandwidth. Current
                      implementation assumes there is only one NIC
                      (eth0), but can be extended to support multiple
                      NICs. This is just a proof of concept.
    cubic or dctcp    specify TCP CC to use

Examples:
 ./do_hbm_test.sh -l -d=1 -D --stats
     Runs a 5 second test, using a single iperf3 flow and with the default
     rate limit of 1Gbps and a delay of 1ms (using netem) using the default
     TCP congestion control on the loopback device (hence we use "-l" to
     enforce bandwidth limit on loopback device). Since no direction is
     specified, it defaults to egress. Since no TCP CC algorithm is
     specified it uses the system default (Cubic for this test).
     With no -D flag, only the value of the AGGREGATE OUTPUT would show.
     id refers to the cgroup id and is useful when running multi cgroup
     tests (supported by a future patch).
     This patchset does not support calling TCP's congesion window
     reduction, even when packets are dropped by the BPF program, resulting
     in a large number of packets dropped. It is recommended that the  current
     HBM implemenation only be used with ECN enabled flows. A future patch
     will add support for reducing TCP's cwnd and will increase the
     performance of non-ECN enabled flows.
   Output:
     Details for HBM in cgroup 1
     id:1
     rate_mbps:493
     duration:4.8 secs
     packets:11355
     bytes_MB:590
     pkts_dropped:4497
     bytes_dropped_MB:292
     pkts_marked_percent: 39.60
     bytes_marked_percent: 49.49
     pkts_dropped_percent: 39.60
     bytes_dropped_percent: 49.49
     PING AVG DELAY:2.075
     AGGREGATE_GOODPUT:505

./do_nrm_test.sh -l -d=1 -D --stats dctcp
     Same as above but using dctcp. Note that fewer bytes are dropped
     (0.01% vs. 49%).
   Output:
     Details for HBM in cgroup 1
     id:1
     rate_mbps:945
     duration:4.9 secs
     packets:16859
     bytes_MB:578
     pkts_dropped:1
     bytes_dropped_MB:0
     pkts_marked_percent: 28.74
     bytes_marked_percent: 45.15
     pkts_dropped_percent:  0.01
     bytes_dropped_percent:  0.01
     PING AVG DELAY:2.083
     AGGREGATE_GOODPUT:965

./do_nrm_test.sh -d=1 -D --stats
     As first example, but without limiting loopback device (i.e. no
     "-l" flag). Since there is no bandwidth limiting, no details for
     HBM are printed out.
   Output:
     Details for HBM in cgroup 1
     PING AVG DELAY:2.019
     AGGREGATE_GOODPUT:42655

./do_hbm.sh -l -d=1 -D --stats -f=2
     Uses iper3 and does 2 flows
./do_hbm.sh -l -d=1 -D --stats -f=4 -P
     Uses iperf3 and does 4 flows, each flow as a separate process.
./do_hbm.sh -l -d=1 -D --stats -f=4 -N
     Uses netperf, 4 flows
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats dctcp -s=<server-name>
     Uses netperf between two hosts. The remote host name is specified
     with -s= and you need to start the program netserver manually on
     the remote host. It will use 1 flow, a rate limit of 2Gbps and dctcp.
./do_hbm.sh -f=1 -r=2000 -t=5 -N -D --stats -w dctcp \
     -s=<server-name>
     As previous, but allows use of extra bandwidth. For this test the
     rate is 8Gbps vs. 1Gbps of the previous test.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-02 10:48:27 -08:00
brakmo
a1270fe95b bpf: User program for testing HBM
The program nrm creates a cgroup and attaches a BPF program to the
cgroup for testing HBM (Host Bandwidth Manager) for egress traffic.
One still needs to create network traffic. This can be done through
netesto, netperf or iperf3.
A follow-up patch contains a script to create traffic.

USAGE: hbm [-d] [-l] [-n <id>] [-r <rate>] [-s] [-t <secs>]
           [-w] [-h] [prog]
  Where:
   -d        Print BPF trace debug buffer
   -l        Also limit flows doing loopback
   -n <#>    To create cgroup "/hbm#" and attach prog. Default is /nrm1
             This is convenient when testing HBM in more than 1 cgroup
   -r <rate> Rate limit in Mbps
   -s        Get HBM stats (marked, dropped, etc.)
   -t <time> Exit after specified seconds (deault is 0)
   -w        Work conserving flag. cgroup can increase its bandwidth
             beyond the rate limit specified while there is available
             bandwidth. Current implementation assumes there is only
             NIC (eth0), but can be extended to support multiple NICs.
             Currrently only supported for egress. Note, this is just
	     a proof of concept.
   -h        Print this info
   prog      BPF program file name. Name defaults to hbm_out_kern.o

More information about HBM can be found in the paper "BPF Host Resource
Management" presented at the 2018 Linux Plumbers Conference, Networking Track
(http://vger.kernel.org/lpc_net2018_talks/LPC%20BPF%20Network%20Resource%20Paper.pdf)

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-02 10:48:27 -08:00
brakmo
187d0738ff bpf: Sample HBM BPF program to limit egress bw
A cgroup skb BPF program to limit cgroup output bandwidth.
It uses a modified virtual token bucket queue to limit average
egress bandwidth. The implementation uses credits instead of tokens.
Negative credits imply that queueing would have happened (this is
a virtual queue, so no queueing is done by it. However, queueing may
occur at the actual qdisc (which is not used for rate limiting).

This implementation uses 3 thresholds, one to start marking packets and
the other two to drop packets:
                                 CREDIT
       - <--------------------------|------------------------> +
             |    |          |      0
             |  Large pkt    |
             |  drop thresh  |
  Small pkt drop             Mark threshold
      thresh

The effect of marking depends on the type of packet:
a) If the packet is ECN enabled, then the packet is ECN ce marked.
   The current mark threshold is tuned for DCTCP.
c) Else, it is dropped if it is a large packet.

If the credit is below the drop threshold, the packet is dropped.
Note that dropping a packet through the BPF program does not trigger CWR
(Congestion Window Reduction) in TCP packets. A future patch will add
support for triggering CWR.

This BPF program actually uses 2 drop thresholds, one threshold
for larger packets (>= 120 bytes) and another for smaller packets. This
protects smaller packets such as SYNs, ACKs, etc.

The default bandwidth limit is set at 1Gbps but this can be changed by
a user program through a shared BPF map. In addition, by default this BPF
program does not limit connections using loopback. This behavior can be
overwritten by the user program. There is also an option to calculate
some statistics, such as percent of packets marked or dropped, which
the user program can access.

A latter patch provides such a program (hbm.c)

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-02 10:48:27 -08:00
Yonghong Song
b74e21ab7d samples/bpf: silence compiler warning for xdpsock_user.c
Compiling xdpsock_user.c with 4.8.5, I hit the following
compilation warning:
    HOSTCC  samples/bpf/xdpsock_user.o
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c: In function ‘main’:
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:449:6: warning: ‘idx_cq’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_cq, idx_fq;
        ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:606:7: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
     u32 idx_rx, idx_tx = 0;
         ^
  /data/users/yhs/work/net-next/samples/bpf/xdpsock_user.c:506:6: warning: ‘idx_rx’ may be used unini
  tialized in this function [-Wmaybe-uninitialized]
    u32 idx_rx, idx_fq = 0;

As an example, the code pattern looks like:
    u32 idx_cq;
    ...
    ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
    if (ret) {
      ...
    }
    ... idx_fq ...
The compiler warns since it does not know whether &idx_fq is assigned
or not inside the library function xsk_ring_prod__reserve().

Let us assign an initial value 0 to such auto variables to silence
compiler warning.

Fixes: 248c7f9c0e ("samples/bpf: convert xdpsock to use libbpf for AF_XDP access")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-02 01:07:10 +01:00
Jakub Kicinski
1a9b268c90 samples: bpf: use libbpf where easy
Some samples don't really need the magic of bpf_load,
switch them to libbpf.

v2: - specify program types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:45 +01:00
Jakub Kicinski
ea9b636201 samples: bpf: remove load_sock_ops in favour of bpftool
bpftool can do all the things load_sock_ops used to do, and more.
Point users to bpftool instead of maintaining this sample utility.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:45 +01:00
Jakub Kicinski
5c3cf87d47 samples: bpf: force IPv4 in ping
ping localhost may default of IPv6 on modern systems, but
samples are trying to only parse IPv4.  Force IPv4.

samples/bpf/tracex1_user.c doesn't interpret the packet so
we don't care which IP version will be used there.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01 00:53:45 +01:00
Daniel T. Lee
d2e614cb07 samples: bpf: fix: broken sample regarding removed function
Currently, running sample "task_fd_query" and "tracex3" occurs the
following error. On kernel v5.0-rc* this sample will be unavailable
due to the removal of function 'blk_start_request' at commit "a1ce35f".
(function removed, as "Single Queue IO scheduler" no longer exists)

$ sudo ./task_fd_query
failed to create kprobe 'blk_start_request' error 'No such file or
directory'

This commit will change the function 'blk_start_request' to
'blk_mq_start_request' to fix the broken sample.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-27 17:27:22 +01:00
Magnus Karlsson
248c7f9c0e samples/bpf: convert xdpsock to use libbpf for AF_XDP access
This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.

The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput. Note also that the shared umem code
has been removed from the sample as this is not supported by libbpf
at this point in time.

Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 23:21:42 +01:00
Toke Høiland-Jørgensen
915654fd71 samples/bpf: Fix dummy program unloading for xdp_redirect samples
The xdp_redirect and xdp_redirect_map sample programs both load a dummy
program onto the egress interfaces. However, the unload code checks these
programs against the wrong id number, and thus refuses to unload them. Fix
the comparison to avoid this.

Fixes: 3b7a8ec2de ("samples/bpf: Check the prog id before exiting")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-22 16:21:59 +01:00
Chengguang Xu
3e4835f7d7 samples/vfio-mdev/mtty: expand minor range when registering chrdev region
Actually, total amount of available minor number
for a single major is MINORMARK + 1. So expand
minor range when registering chrdev region.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-12 13:21:18 -07:00
Chengguang Xu
18bc04bc8a samples/vfio-mdev/mdpy: expand minor range when registering chrdev region
Actually, total amount of available minor number
for a single major is MINORMARK + 1. So expand
minor range when registering chrdev region.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-12 13:21:14 -07:00
Chengguang Xu
16355214a5 samples/vfio-mdev/mbochs: expand minor range when registering chrdev region
Actually, total amount of available minor number
for a single major is MINORMARK + 1. So expand
minor range when registering chrdev region.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-12 13:21:05 -07:00
David S. Miller
a655fe9f19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.

Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:00:17 -08:00
Andy Shevchenko
278bca7f31 vfio-mdev: Switch to use new generic UUID API
There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Cc: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05 11:51:56 -07:00
Maciej Fijalkowski
3b7a8ec2de samples/bpf: Check the prog id before exiting
Check the program id within the signal handler on polling xdp samples
that were previously converted to libbpf usage. Avoid the situation of
unloading the program that was not attached by sample that is exiting.
Handle also the case where bpf_get_link_xdp_id didn't exit with an error
but the xdp program was not found on an interface.

Reported-by: Michal Papaj <michal.papaj@intel.com>
Reported-by: Jakub Spizewski <jakub.spizewski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
743e568c15 samples/bpf: Add a "force" flag to XDP samples
Make xdp samples consistent with iproute2 behavior and set the
XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
interface. Provide an option for user to force the program loading,
which as a result will not include the mentioned flag in
bpf_set_link_xdp_fd call.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
6a5457618f samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
There is a common problem with xdp samples that happens when user wants
to run a particular sample and some bpf program is already loaded. The
default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
(assuming that xdp sample that is failing was converted to libbpf
usage):

libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
libbpf: failed to load object './xdp_sample_pkts_kern.o'

Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
to RLIM_INFINITY.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
bbaf6029c4 samples/bpf: Convert XDP samples to libbpf usage
Some of XDP samples that are attaching the bpf program to the interface
via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
loading and manipulating the ebpf program and maps. Convert them to do
this through libbpf usage and remove bpf_load from the picture.

While at it remove what looks like debug leftover in
xdp_redirect_map_user.c

In xdp_redirect_cpu, change the way that the program to be loaded onto
interface is chosen - user now needs to pass the program's section name
instead of the relative number. In case of typo print out the section
names to choose from.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Jesper Dangaard Brouer
7313798b14 samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe
The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Tomas Winkler
c4a46acf1d samples: mei: use /dev/mei0 instead of /dev/mei
The device was moved from misc device to character devices
to support multiple mei devices.

Cc: <stable@vger.kernel.org> #v4.9+
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-01-30 15:24:45 +01:00
David S. Miller
ec7146db15 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-01-29

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Teach verifier dead code removal, this also allows for optimizing /
   removing conditional branches around dead code and to shrink the
   resulting image. Code store constrained architectures like nfp would
   have hard time doing this at JIT level, from Jakub.

2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
   code generation for 32-bit sub-registers. Evaluation shows that this
   can result in code reduction of ~5-20% compared to 64 bit-only code
   generation. Also add implementation for most JITs, from Jiong.

3) Add support for __int128 types in BTF which is also needed for
   vmlinux's BTF conversion to work, from Yonghong.

4) Add a new command to bpftool in order to dump a list of BPF-related
   parameters from the system or for a specific network device e.g. in
   terms of available prog/map types or helper functions, from Quentin.

5) Add AF_XDP sock_diag interface for querying sockets from user
   space which provides information about the RX/TX/fill/completion
   rings, umem, memory usage etc, from Björn.

6) Add skb context access for skb_shared_info->gso_segs field, from Eric.

7) Add support for testing flow dissector BPF programs by extending
   existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.

8) Split BPF kselftest's test_verifier into various subgroups of tests
   in order better deal with merge conflicts in this area, from Jakub.

9) Add support for queue/stack manipulations in bpftool, from Stanislav.

10) Document BTF, from Yonghong.

11) Dump supported ELF section names in libbpf on program load
    failure, from Taeung.

12) Silence a false positive compiler warning in verifier's BTF
    handling, from Peter.

13) Fix help string in bpftool's feature probing, from Prashant.

14) Remove duplicate includes in BPF kselftests, from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 19:38:33 -08:00
Jiong Wang
6ea848b5ce selftests: bpf: functional and min/max reasoning unit tests for JMP32
This patch adds unit tests for new JMP32 instructions.

This patch also added the new BPF_JMP32_REG and BPF_JMP32_IMM macros to
samples/bpf/bpf_insn.h so that JMP32 insn builders are available to tests
under 'samples' directory.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26 13:33:02 -08:00