The commit c69de58ba8 ("net: erspan: use bitfield instead of
mask and offset") changes the erspan header to use bitfield, and
commit d350a82302 ("net: erspan: create erspan metadata uapi header")
creates a uapi header file. The above two commit breaks the current
erspan test. This patch fixes it by adapting the above two changes.
Fixes: ac80c2a165 ("samples/bpf: add erspan v2 sample code")
Fixes: ef88f89c83 ("samples/bpf: extend test_tunnel_bpf.sh with ERSPAN")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2018-02-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) support XDP attach in libbpf, from Eric.
2) minor fixes, from Daniel, Jakub, Yonghong, Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some
code duplication and benefit of netlink ext ack errors message.
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Parse netlink ext attribute to get the error message returned by
the card. Code is partially take from libnl.
We add netlink.h to the uapi include of tools. And we need to
avoid include of userspace netlink header to have a successful
build of sample so nlattr.h has a define to avoid
the inclusion. Using a direct define could have been an issue
as NLMSGERR_ATTR_MAX can change in the future.
We also define SOL_NETLINK if not defined to avoid to have to
copy socket.h for a fixed value.
Signed-off-by: Eric Leblond <eric@regit.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
Pull networking updates from David Miller:
1) Significantly shrink the core networking routing structures. Result
of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf
2) Add netdevsim driver for testing various offloads, from Jakub
Kicinski.
3) Support cross-chip FDB operations in DSA, from Vivien Didelot.
4) Add a 2nd listener hash table for TCP, similar to what was done for
UDP. From Martin KaFai Lau.
5) Add eBPF based queue selection to tun, from Jason Wang.
6) Lockless qdisc support, from John Fastabend.
7) SCTP stream interleave support, from Xin Long.
8) Smoother TCP receive autotuning, from Eric Dumazet.
9) Lots of erspan tunneling enhancements, from William Tu.
10) Add true function call support to BPF, from Alexei Starovoitov.
11) Add explicit support for GRO HW offloading, from Michael Chan.
12) Support extack generation in more netlink subsystems. From Alexander
Aring, Quentin Monnet, and Jakub Kicinski.
13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
Russell King.
14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.
15) Many improvements and simplifications to the NFP driver bpf JIT,
from Jakub Kicinski.
16) Support for ipv6 non-equal cost multipath routing, from Ido
Schimmel.
17) Add resource abstration to devlink, from Arkadi Sharshevsky.
18) Packet scheduler classifier shared filter block support, from Jiri
Pirko.
19) Avoid locking in act_csum, from Davide Caratti.
20) devinet_ioctl() simplifications from Al viro.
21) More TCP bpf improvements from Lawrence Brakmo.
22) Add support for onlink ipv6 route flag, similar to ipv4, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
tls: Add support for encryption using async offload accelerator
ip6mr: fix stale iterator
net/sched: kconfig: Remove blank help texts
openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
tcp_nv: fix potential integer overflow in tcpnv_acked
r8169: fix RTL8168EP take too long to complete driver initialization.
qmi_wwan: Add support for Quectel EP06
rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
ipmr: Fix ptrdiff_t print formatting
ibmvnic: Wait for device response when changing MAC
qlcnic: fix deadlock bug
tcp: release sk_frag.page in tcp_disconnect
ipv4: Get the address of interface correctly.
net_sched: gen_estimator: fix lockdep splat
net: macb: Handle HRESP error
net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
ipv6: addrconf: break critical section in addrconf_verify_rtnl()
ipv6: change route cache aging logic
i40e/i40evf: Update DESC_NEEDED value to reflect larger value
bnxt_en: cleanup DIM work on device shutdown
...
Pull livepatching updates from Jiri Kosina:
- handle 'infinitely'-long sleeping tasks, from Miroslav Benes
- remove 'immediate' feature, as it turns out it doesn't provide the
originally expected semantics, and brings more issues than value
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add locking to force and signal functions
livepatch: Remove immediate feature
livepatch: force transition to finish
livepatch: send a fake signal to all blocking tasks
Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory. This avoid making a buggy bpf.o file (e.g. missing
symbols).
This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Avoid extra step of setting limit from cmdline and do it directly in
the program.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Put client sockets in blocking mode otherwise with sendmsg tests
its easy to overrun the socket buffers which results in the test
being aborted.
The original non-blocking was added to handle listen/accept with
a single thread the client/accepted sockets do not need to be
non-blocking.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently for SENDMSG tests first send completes then recv runs. This
does not work well for large data sizes and/or many iterations. So
fork the recv and send handler so that we run both send and recv. In
the future we can add a parameter to do more than a single fork of
tx/rx.
With this we can get many GBps of data which helps exercise the
sockmap code.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When testing BPF programs using sockmap I often want to have more
control over how sendmsg is exercised. This becomes even more useful
as new sockmap program types are added.
This adds a test type option to select type of test to run. Currently,
only "ping" and "sendmsg" are supported, but more can be added as
needed.
The new help argument gives the following,
Usage: ./sockmap --cgroup <cgroup_path>
options:
--help -h
--cgroup -c
--rate -r
--verbose -v
--iov_count -i
--length -l
--test -t
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
sockmap sample program takes arguments from cmd line but it reads them
in using offsets into the array. Because we want to add more arguments
in the future lets do proper argument handling.
Also refactor code to pull apart sock init and ping/pong test. This
allows us to add new tests in the future.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.
Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints. Model the output format
like xdp_redirect_cpu.
Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events. This resembles part
of the builtin monitoring output from sample xdp_rxq_info. Thus, also
covering part of that sample in an external monitoring tool.
Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead. Thus, they are enabled by default.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Improve the 'unknown reason' comment, with an actual explaination of why
the ctx pkt-data pointers need to be loaded after the helper function
bpf_xdp_adjust_meta(). Based on the explaination Daniel gave.
Fixes: 36e04a2d78 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Immediate flag has been used to disable per-task consistency and patch
all tasks immediately. It could be useful if the patch doesn't change any
function or data semantics.
However, it causes problems on its own. The consistency problem is
currently broken with respect to immediate patches.
func a
patches 1i
2i
3
When the patch 3 is applied, only 2i function is checked (by stack
checking facility). There might be a task sleeping in 1i though. Such
task is migrated to 3, because we do not check 1i in
klp_check_stack_func() at all.
Coming atomic replace feature would be easier to implement and more
reliable without immediate.
Thus, remove immediate feature completely and save us from the problems.
Note that force feature has the similar problem. However it is
considered as a last resort. If used, administrator should not apply any
new live patches and should plan for reboot into an updated kernel.
The architectures would now need to provide HAVE_RELIABLE_STACKTRACE to
fully support livepatch.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Creating a bpf sample that shows howto use the XDP 'data_meta'
infrastructure, created by Daniel Borkmann. Very few drivers support
this feature, but I wanted a functional sample to begin with, when
working on adding driver support.
XDP data_meta is about creating a communication channel between BPF
programs. This can be XDP tail-progs, but also other SKB based BPF
hooks, like in this case the TC clsact hook. In this sample I show
that XDP can store info named "mark", and TC/clsact chooses to use
this info and store it into the skb->mark.
It is a bit annoying that XDP and TC samples uses different tools/libs
when attaching their BPF hooks. As the XDP and TC programs need to
cooperate and agree on a struct-layout, it is best/easiest if the two
programs can be contained within the same BPF restricted-C file.
As the bpf-loader, I choose to not use bpf_load.c (or libbpf), but
instead wrote a bash shell scripted named xdp2skb_meta.sh, which
demonstrate howto use the iproute cmdline tools 'tc' and 'ip' for
loading BPF programs. To make it easy for first time users, the shell
script have command line parsing, and support --verbose and --dry-run
mode, if you just want to see/learn the tc+ip command syntax:
# ./xdp2skb_meta.sh --dev ixgbe2 --dry-run
# Dry-run mode: enable VERBOSE and don't call TC+IP
tc qdisc del dev ixgbe2 clsact
tc qdisc add dev ixgbe2 clsact
tc filter add dev ixgbe2 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark
# Flush XDP on device: ixgbe2
ip link set dev ixgbe2 xdp off
ip link set dev ixgbe2 xdp obj ./xdp2skb_meta_kern.o sec xdp_mark
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This sample program can be used for monitoring and reporting how many
packets per sec (pps) are received per NIC RX queue index and which
CPU processed the packet. In itself it is a useful tool for quickly
identifying RSS imbalance issues, see below.
The default XDP action is XDP_PASS in-order to provide a monitor
mode. For benchmarking purposes it is possible to specify other XDP
actions on the cmdline --action.
Output below shows an imbalance RSS case where most RXQ's deliver to
CPU-0 while CPU-2 only get packets from a single RXQ. Looking at
things from a CPU level the two CPUs are processing approx the same
amount, BUT looking at the rx_queue_index levels it is clear that
RXQ-2 receive much better service, than other RXQs which all share CPU-0.
Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS
XDP stats CPU pps issue-pps
XDP-RX CPU 0 900,473 0
XDP-RX CPU 2 906,921 0
XDP-RX CPU total 1,807,395
RXQ stats RXQ:CPU pps issue-pps
rx_queue_index 0:0 180,098 0
rx_queue_index 0:sum 180,098
rx_queue_index 1:0 180,098 0
rx_queue_index 1:sum 180,098
rx_queue_index 2:2 906,921 0
rx_queue_index 2:sum 906,921
rx_queue_index 3:0 180,098 0
rx_queue_index 3:sum 180,098
rx_queue_index 4:0 180,082 0
rx_queue_index 4:sum 180,082
rx_queue_index 5:0 180,093 0
rx_queue_index 5:sum 180,093
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tests for ipv4 ipv6 erspan version 2.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a basic test for bpf_override_return to verify it works. We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>