Pull networking fixes from David Miller:
"First batch of fixes in the new merge window:
1) Double dst_cache free in act_tunnel_key, from Wenxu.
2) Avoid NULL deref in IN_DEV_MFORWARD() by failing early in the
ip_route_input_rcu() path, from Paolo Abeni.
3) Fix appletalk compile regression, from Arnd Bergmann.
4) If SLAB objects reach the TCP sendpage method we are in serious
trouble, so put a debugging check there. From Vasily Averin.
5) Memory leak in hsr layer, from Mao Wenan.
6) Only test GSO type on GSO packets, from Willem de Bruijn.
7) Fix crash in xsk_diag_put_umem(), from Eric Dumazet.
8) Fix VNIC mailbox length in nfp, from Dirk van der Merwe.
9) Fix race in ipv4 route exception handling, from Xin Long.
10) Missing DMA memory barrier in hns3 driver, from Jian Shen.
11) Use after free in __tcf_chain_put(), from Vlad Buslov.
12) Handle inet_csk_reqsk_queue_add() failures, from Guillaume Nault.
13) Return value correction when ip_mc_may_pull() fails, from Eric
Dumazet.
14) Use after free in x25_device_event(), also from Eric"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
gro_cells: make sure device is up in gro_cells_receive()
vxlan: test dev->flags & IFF_UP before calling gro_cells_receive()
net/x25: fix use-after-free in x25_device_event()
isdn: mISDNinfineon: fix potential NULL pointer dereference
net: hns3: fix to stop multiple HNS reset due to the AER changes
ip: fix ip_mc_may_pull() return value
net: keep refcount warning in reqsk_free()
net: stmmac: Avoid one more sometimes uninitialized Clang warning
net: dsa: mv88e6xxx: Set correct interface mode for CPU/DSA ports
rxrpc: Fix client call queueing, waiting for channel
tcp: handle inet_csk_reqsk_queue_add() failures
net: ethernet: sun: Zero initialize class in default case in niu_add_ethtool_tcam_entry
8139too : Add support for U.S. Robotics USR997901A 10/100 Cardbus NIC
fou, fou6: avoid uninit-value in gue_err() and gue6_err()
net: sched: fix potential use-after-free in __tcf_chain_put()
vhost: silence an unused-variable warning
vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock
connector: fix unsafe usage of ->real_parent
vxlan: do not need BH again in vxlan_cleanup()
net: hns3: add dma_rmb() for rx description
...
Pull perf/core changes from Arnaldo Carvalho de Melo:
perf bpf:
Arnaldo Carvalho de Melo:
- Automatically add BTF ELF markers to 'perf trace' BPF programs, so that
tools such as 'bpftool map dump' can pretty print map keys and values.
perf c2c:
Jiri Olsa:
- Fix report for empty NUMA node.
perf diff:
Jin Yao:
- Support --time, --cpu, --pid and --tid filter options.
perf probe:
Arnaldo Carvalho de Melo:
- Clarify error message about not finding kernel modules debuginfo.
perf record:
Jiri Olsa:
- Fixup probing for max attr.precise_ip.
perf trace:
Arnaldo Carvalho de Melo:
- Add missing %s lost in the 'msg_flags' recvmmsg arg when adding prefix suppression logic.
perf annotate:
Arnaldo Carvalho de Melo:
- Calculate the max instruction name, align column to that, removing the
hardcoded max 6 chars and cope with instructions with names longer than that,
such as vpmovmskb, vpcmpeqb, etc.
kernel:
Song Liu:
- Consider events with attr.bpf_event set as side-band.
Gustavo A. R. Silva:
- Mark expected switch fall-through in perf_event_parse_addr_filter().
Libraries:
Jiri Olsa:
- Fix leaks and double frees on error paths.
libtraceevent:
Tony Jones:
- Fix buffer overflow in arg_eval().
python scripting:
Tony Jones:
- More python3 fixes.
Trivial:
Yang Wei:
- Remove needless extra semicolon in clang C++ glue code.
Intel PT/BTS:
Adrian Hunter:
- Improve auxtrace address filter error message when there is no DSO.
- Fix divide by zero when TSC is not available.
- Further improvements to the export to sqlite/posgresql python scripts
and to the GUI sqlviewer, exporting 'parent_id' so that we have enable
the creation of call trees.
Andi Kleen:
- Generalize function to copy from thread addr space from intel-bts code.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We could end up in situation when we have object file w/ all btf
info, but kernel does not support btf yet. In this situation
currently libbpf just set obj->btf to NULL w/o freeing it first.
This patch is fixing it by making sure to run btf__free first.
Fixes: d29d87f7e6 ("btf: separate btf creation and loading")
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
libbpf targets don't explicitly depend on fixdep target, so when
we do 'make -j$(nproc)', there is a high probability, that some
objects will be built before fixdep binary is available.
Fix this by running sub-make; this makes sure that fixdep dependency
is properly accounted for.
For the same issue in perf, see commit abb26210a3 ("perf tools: Force
fixdep compilation at the start of the build").
Before:
$ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
HOSTCC /tmp/bld/fixdep.o
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/btf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
HOSTLD /tmp/bld/fixdep-in.o
LINK /tmp/bld/fixdep
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so
LINK /tmp/bld/test_libbpf
$ head /tmp/bld/.libbpf.o.cmd
# cannot find fixdep (/usr/local/google/home/sdf/src/linux/xxx//fixdep)
# using basic dep data
/tmp/bld/libbpf.o: libbpf.c /usr/include/stdc-predef.h \
/usr/include/stdlib.h /usr/include/features.h \
/usr/include/x86_64-linux-gnu/sys/cdefs.h \
/usr/include/x86_64-linux-gnu/bits/wordsize.h \
/usr/include/x86_64-linux-gnu/gnu/stubs.h \
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h \
/usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h \
After:
$ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
HOSTCC /tmp/bld/fixdep.o
HOSTLD /tmp/bld/fixdep-in.o
LINK /tmp/bld/fixdep
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/btf.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so
LINK /tmp/bld/test_libbpf
$ head /tmp/bld/.libbpf.o.cmd
cmd_/tmp/bld/libbpf.o := gcc -Wp,-MD,/tmp/bld/.libbpf.o.d -Wp,-MT,/tmp/bld/libbpf.o -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DCOMPAT_NEED_REALLOCARRAY -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Wstrict-aliasing=3 -Werror -Wall -fPIC -I. -I/usr/local/google/home/sdf/src/linux/tools/include -I/usr/local/google/home/sdf/src/linux/tools/arch/x86/include/uapi -I/usr/local/google/home/sdf/src/linux/tools/include/uapi -fvisibility=hidden -D"BUILD_STR(s)=$(pound)s" -c -o /tmp/bld/libbpf.o libbpf.c
source_/tmp/bld/libbpf.o := libbpf.c
deps_/tmp/bld/libbpf.o := \
/usr/include/stdc-predef.h \
/usr/include/stdlib.h \
/usr/include/features.h \
/usr/include/x86_64-linux-gnu/sys/cdefs.h \
/usr/include/x86_64-linux-gnu/bits/wordsize.h \
Fixes: 7c422f5572 ("tools build: Build fixdep helper from perf and basic libs")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:
- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.
- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates
- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side
- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.
- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
...
Pull locking updates from Ingo Molnar:
"The biggest part of this tree is the new auto-generated atomics API
wrappers by Mark Rutland.
The primary motivation was to allow instrumentation without uglifying
the primary source code.
The linecount increase comes from adding the auto-generated files to
the Git space as well:
include/asm-generic/atomic-instrumented.h | 1689 ++++++++++++++++--
include/asm-generic/atomic-long.h | 1174 ++++++++++---
include/linux/atomic-fallback.h | 2295 +++++++++++++++++++++++++
include/linux/atomic.h | 1241 +------------
I preferred this approach, so that the full call stack of the (already
complex) locking APIs is still fully visible in 'git grep'.
But if this is excessive we could certainly hide them.
There's a separate build-time mechanism to determine whether the
headers are out of date (they should never be stale if we do our job
right).
Anyway, nothing from this should be visible to regular kernel
developers.
Other changes:
- Add support for dynamic keys, which removes a source of false
positives in the workqueue code, among other things (Bart Van
Assche)
- Updates to tools/memory-model (Andrea Parri, Paul E. McKenney)
- qspinlock, wake_q and lockdep micro-optimizations (Waiman Long)
- misc other updates and enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/lockdep: Shrink struct lock_class_key
locking/lockdep: Add module_param to enable consistency checks
lockdep/lib/tests: Test dynamic key registration
lockdep/lib/tests: Fix run_tests.sh
kernel/workqueue: Use dynamic lockdep keys for workqueues
locking/lockdep: Add support for dynamic keys
locking/lockdep: Verify whether lock objects are small enough to be used as class keys
locking/lockdep: Check data structure consistency
locking/lockdep: Reuse lock chains that have been freed
locking/lockdep: Fix a comment in add_chain_cache()
locking/lockdep: Introduce lockdep_next_lockchain() and lock_chain_count()
locking/lockdep: Reuse list entries that are no longer in use
locking/lockdep: Free lock classes that are no longer in use
locking/lockdep: Update two outdated comments
locking/lockdep: Make it easy to detect whether or not inside a selftest
locking/lockdep: Split lockdep_free_key_range() and lockdep_reset_lock()
locking/lockdep: Initialize the locks_before and locks_after lists earlier
locking/lockdep: Make zap_class() remove all matching lock order entries
locking/lockdep: Reorder struct lock_class members
locking/lockdep: Avoid that add_chain_cache() adds an invalid chain to the cache
...
When checking available canonical candidates for struct/union algorithm
utilizes btf_dedup_is_equiv to determine if candidate is suitable. This
check is not enough when candidate is corresponding FWD for that
struct/union, because according to equivalence logic they are
equivalent. When it so happens that FWD and STRUCT/UNION end in hashing
to the same bucket, it's possible to create remapping loop from FWD to
STRUCT and STRUCT to same FWD, which will cause btf_dedup() to loop
forever.
This patch fixes the issue by additionally checking that type and
canonical candidate are strictly equal (utilizing btf_equal_struct).
Fixes: d5caef5b56 ("btf: add BTF types deduplication algorithm")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Default size of dedup table (16k) is good enough for most binaries, even
typical vmlinux images. But there are cases of binaries with huge amount
of BTF types (e.g., allyesconfig variants of kernel), which benefit from
having bigger dedup table size to lower amount of unnecessary hash
collisions. Tools like pahole, thus, can tune this parameter to reach
optimal performance.
This change also serves double purpose of allowing tests to force hash
collisions to test some corner cases, used in follow up patch.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
readelf truncates its output by default to attempt to make it more
readable. This can lead to function names getting aliased if they
differ late in the string. Use --wide parameter to avoid
truncation.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For historical reasons the helper to loop over maps in an object
is called bpf_map__for_each while it really should be called
bpf_object__for_each_map. Rename and add a correctly named
define for backward compatibility.
Switch all in-tree users to the correct name (Quentin).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit adds AF_XDP support to libbpf. The main reason for this is
to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.
The interface is composed of two parts:
* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
up umems and af_xdp sockets as well as a simple XDP program.
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
While it's understandable why kernel limits number of BTF types to 65535
and size of string section to 64KB, in libbpf as user-space library it's
too restrictive. E.g., pahole converting DWARF to BTF type information
for Linux kernel generates more than 3 million BTF types and more than
3MB of strings, before deduplication. So to allow btf__dedup() to do its
work, we need to be able to load bigger BTF sections using btf__new().
Singed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add new accessor for bpf_object to get opaque struct btf * from it.
struct btf * is needed for all operations with BTF and it's present in
bpf_object. The only thing missing is a way to get it.
Example use-case is to get BTF key_type_id and value_type_id for a map in
bpf_object. It can be done with btf__get_map_kv_tids() but that function
requires struct btf *.
Similar API can be added for struct btf_ext but no use-case for it yet.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add bpf_map__resize() to change max_entries for a map.
Quite often necessary map size is unknown at compile time and can be
calculated only at run time.
Currently the following approach is used to do so:
* bpf_object__open_buffer() to open Elf file from a buffer;
* bpf_object__find_map_by_name() to find relevant map;
* bpf_map__def() to get map attributes and create struct
bpf_create_map_attr from them;
* update max_entries in bpf_create_map_attr;
* bpf_create_map_xattr() to create new map with updated max_entries;
* bpf_map__reuse_fd() to replace the map in bpf_object with newly
created one.
And after all this bpf_object can finally be loaded. The map will have
new size.
It 1) is quite a lot of steps; 2) doesn't take BTF into account.
For "2)" even more steps should be made and some of them require changes
to libbpf (e.g. to get struct btf * from bpf_object).
Instead the whole problem can be solved by introducing simple
bpf_map__resize() API that checks the map and sets new max_entries if
the map is not loaded yet.
So the new steps are:
* bpf_object__open_buffer() to open Elf file from a buffer;
* bpf_object__find_map_by_name() to find relevant map;
* bpf_map__resize() to update max_entries.
That's much simpler and works with BTF.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Now that we have btf__get_raw_data() it's trivial for tests to iterate
over all strings for testing purposes, which eliminates the need for
btf__get_strings() API.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch changes struct btf_ext to retain original data in sequential
block of memory, which makes it possible to expose
btf_ext__get_raw_data() interface similar to btf__get_raw_data(), allowing
users of libbpf to get access to raw representation of .BTF.ext section.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch exposes new API btf__get_raw_data() that allows to get a copy
of raw BTF data out of struct btf. This is useful for external programs
that need to manipulate raw data, e.g., pahole using btf__dedup() to
deduplicate BTF type info and then writing it back to file.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This change splits out previous btf__new functionality of constructing
struct btf and loading it into kernel into two:
- btf__new() just creates and initializes struct btf
- btf__load() attempts to load existing struct btf into kernel
btf__free will still close BTF fd, if it was ever loaded successfully
into kernel.
This change allows users of libbpf to manipulate BTF using its API,
without the need to unnecessarily load it into kernel.
One of the intended use cases is pahole, which will do DWARF to BTF
conversion and then use libbpf to do type deduplication, while then
handling ELF sections overwriting and other concerns on its own.
Fixes: 2d3feca8c4 ("bpf: btf: print map dump and lookup with btf info")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The kernel verifier has three levels of logs:
0: no logs
1: logs mostly useful
> 1: verbose
Current libbpf API functions bpf_load_program_xattr() and
bpf_load_program() cannot specify log_level.
The bcc, however, provides an interface for user to
specify log_level 2 for verbose output.
This patch added log_level into structure
bpf_load_program_attr, so users, including bcc, can use
bpf_load_program_xattr() to change log_level. The
supported log_level is 0, 1, and 2.
The bpf selftest test_sock.c is modified to enable log_level = 2.
If the "verbose" in test_sock.c is changed to true,
the test will output logs like below:
$ ./test_sock
func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
0: (bf) r6 = r1
1: R1=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0) R10=fp0,call_-1
1: (61) r7 = *(u32 *)(r6 +28)
invalid bpf_context access off=28 size=4
Test case: bind4 load with invalid access: src_ip6 .. [PASS]
...
Test case: bind6 allow all .. [PASS]
Summary: 16 PASSED, 0 FAILED
Some test_sock tests are negative tests and verbose verifier
log will be printed out as shown in the above.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>