Pull rdma updates from Doug Ledford:
"Initial roundup of 4.5 merge window patches
- Remove usage of ib_query_device and instead store attributes in
ib_device struct
- Move iopoll out of block and into lib, rename to irqpoll, and use
in several places in the rdma stack as our new completion queue
polling library mechanism. Update the other block drivers that
already used iopoll to use the new mechanism too.
- Replace the per-entry GID table locks with a single GID table lock
- IPoIB multicast cleanup
- Cleanups to the IB MR facility
- Add support for 64bit extended IB counters
- Fix for netlink oops while parsing RDMA nl messages
- RoCEv2 support for the core IB code
- mlx4 RoCEv2 support
- mlx5 RoCEv2 support
- Cross Channel support for mlx5
- Timestamp support for mlx5
- Atomic support for mlx5
- Raw QP support for mlx5
- MAINTAINERS update for mlx4/mlx5
- Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates
- Add support for remote invalidate to the iSER driver (pushed
through the RDMA tree due to dependencies, acknowledged by nab)
- Update to NFSoRDMA (pushed through the RDMA tree due to
dependencies, acknowledged by Bruce)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
IB/mlx5: Unify CQ create flags check
IB/mlx5: Expose Raw Packet QP to user space consumers
{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
IB/mlx5: Add Raw Packet QP query functionality
IB/mlx5: Add create and destroy functionality for Raw Packet QP
IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
IB/mlx5: Allocate a Transport Domain for each ucontext
net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
net/mlx5_core: Add RQ and SQ event handling
net/mlx5_core: Export transport objects
IB/mlx5: Expose CQE version to user-space
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
IB/sa: Fix netlink local service GFP crash
IB/srpt: Remove redundant wc array
IB/qib: Improve ipoib UD performance
IB/mlx4: Advertise RoCE v2 support
IB/mlx4: Create and use another QP1 for RoCEv2
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
...
The new name is irq_poll as iopoll is already taken. Better suggestions
welcome.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Namhyung Kim pointed out a potential problem in original code that it
fetches names of maps from section header string table, which is used
to store section names.
Original code doesn't cause error because of a LLVM behavior that, it
combines shstrtab into strtab. For example:
$ echo 'int func() {return 0;}' | x86_64-oe-linux-clang -x c -o temp.o -c -
$ readelf -h ./temp.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
...
Section header string table index: 1
$ readelf -S ./temp.o
There are 10 section headers, starting at offset 0x288:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .strtab STRTAB 0000000000000000 00000230
0000000000000051 0000000000000000 0 0 1
...
$ readelf -p .strtab ./temp.o
String dump of section '.strtab':
[ 1] .text
[ 7] .comment
[ 10] .bss
[ 15] .note.GNU-stack
[ 25] .rela.eh_frame
[ 34] func
[ 39] .strtab
[ 41] .symtab
[ 49] .data
[ 4f] -
$ readelf -p .shstrtab ./temp.o
readelf: Warning: Section '.shstrtab' was not dumped because it does not exist!
Where, 'section header string table index' points to '.strtab', and
symbol names are also stored there.
However, in case of gcc:
$ echo 'int func() {return 0;}' | gcc -x c -o temp.o -c -
$ readelf -p .shstrtab ./temp.o
String dump of section '.shstrtab':
[ 1] .symtab
[ 9] .strtab
[ 11] .shstrtab
[ 1b] .text
[ 21] .data
[ 27] .bss
[ 2c] .comment
[ 35] .note.GNU-stack
[ 45] .rela.eh_frame
$ readelf -p .strtab ./temp.o
String dump of section '.strtab':
[ 1] func
They are separated sections.
Although original code doesn't cause error, we'd better use canonical
method for fetching symbol names to avoid potential behavior changing.
This patch learns from readelf's code, fetches string from sh_link
of .symbol section.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Reported-and-Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1449541544-67621-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before this patch libbpf always do feature check even when cleaning.
For example:
$ cd kernel/tools/lib/bpf
$ make
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CC libbpf.o
CC bpf.o
LD libbpf-in.o
LINK libbpf.a
LINK libbpf.so
$ make clean
CLEAN libbpf
CLEAN core-gen
$ make clean
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CLEAN libbpf
CLEAN core-gen
$
Although the first 'make clean' doesn't show feature check result, it
still does the check. No output because check result is similar to
FEATURE-DUMP.libbpf.
This patch uses same method as perf to turn off feature checking when
'make clean'.
Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448372181-151723-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Allows BPF scriptlets specify arguments to be fetched using
DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan)
- Allow attaching BPF scriptlets to module symbols (Wang Nan)
- Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan)
- BPF programs now can specify 'perf probe' tunables via its section name,
separating key=val values using semicolons (Wang Nan)
Testing some of these new BPF features:
Use case: get callchains when receiving SSL packets, filter then in the
kernel, at arbitrary place.
# cat ssl.bpf.c
#define SEC(NAME) __attribute__((section(NAME), used))
struct pt_regs;
SEC("func=__inet_lookup_established hnum")
int func(struct pt_regs *ctx, int err, unsigned short port)
{
return err == 0 && port == 443;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
# perf record -a -g -e ssl.bpf.c
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
# perf script | head -30
swapper 0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)
qemu-system-x86 9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
430a br_handle_frame_finish ([bridge])
48bc br_handle_frame ([bridge])
855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
#
Use 'perf probe' various options to list functions, see what variables can
be collected at any given point, experiment first collecting without a filter,
then filter, use it together with 'perf trace', 'perf top', with or without
callchains, if it explodes, please tell us!
- Introduce a new callchain mode: "folded", that will list per line
representations of all callchains for a give histogram entry, facilitating
'perf report' output processing by other tools, such as Brendan Gregg's
flamegraph tools (Namhyung Kim)
E.g:
# perf report | grep -v ^# | head
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
|
---cpu_startup_entry
|
|--12.07%--start_secondary
|
--6.30%--rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
#
Becomes, in "folded" mode:
# perf report -g folded | grep -v ^# | head -5
18.37% 0.00% swapper [kernel.kallsyms] [k] cpu_startup_entry
12.07% cpu_startup_entry;start_secondary
6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] call_cpuidle
11.23% call_cpuidle;cpu_startup_entry;start_secondary
5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
16.90% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter
11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
15.12% 0.00% swapper [kernel.kallsyms] [k] cpuidle_enter_state
#
The user can also select one of "count", "period" or "percent" as the first column.
Infrastructure changes:
- Fix multiple leaks found with Valgrind and a refcount
debugger (Masami Hiramatsu)
- Add further 'perf test' entries for BPF and LLVM (Wang Nan)
- Improve 'perf test' to suport subtests, so that the series of tests
performed in the LLVM and BPF main tests appear in the default 'perf test'
output (Wang Nan)
- Move memdup() from tools/perf to tools/lib/string.c (Arnaldo Carvalho de Melo)
- Adopt strtobool() from the kernel into tools/lib/ (Wang Nan)
- Fix selftests_install tools/ Makefile rule (Kevin Hilman)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch is a preparation for BPF prologue support which allows
generating a series of BPF bytecode for fetching kernel data before
calling program code. With the newly introduced multiple instances
support, perf is able to create different prologues for different kprobe
points.
Before this patch, a bpf_program can be loaded into kernel only once,
and get the only resulting fd. What this patch does is to allow creating
and loading different variants of one bpf_program, then fetching their
fds.
Here we describe the basic idea in this patch. The detailed description
of the newly introduced APIs can be found in comments in the patch body.
The key of this patch is the new mechanism in bpf_program__load().
Instead of loading BPF program into kernel directly, it calls a
'pre-processor' to generate program instances which would be finally
loaded into the kernel based on the original code. To enable the
generation of multiple instances, libbpf passes an index to the
pre-processor so it know which instance is being loaded.
Pre-processor should be called from libbpf's user (perf) using
bpf_program__set_prep(). The number of instances and the relationship
between indices and the target instance should be clear when calling
bpf_program__set_prep().
To retrieve a fd for a specific instance of a program,
bpf_program__nth_fd() is introduced. It returns the resulting fd
according to index.
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-8-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
[ Enclosed multi-line if/else blocks with {}, (*func_ptr)() -> func_ptr() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>