Commit e1abf2cc8d ("bpf: Fix the build on
BPF_SYSCALL=y && !CONFIG_TRACING kernels, make it more configurable")
updated the building condition of bpf_trace.o from CONFIG_BPF_SYSCALL
to CONFIG_BPF_EVENTS, but the corresponding #ifdef controller in
trace_events.h for trace_call_bpf() was not changed. Which, in theory,
is incorrect.
With current Kconfigs, we can create a .config with CONFIG_BPF_SYSCALL=y
and CONFIG_BPF_EVENTS=n by unselecting CONFIG_KPROBE_EVENT and
selecting CONFIG_BPF_SYSCALL. With these options, trace_call_bpf() will
be defined as an extern function, but if anyone calls it a symbol missing
error will be triggered since bpf_trace.o was not built.
This patch changes the #ifdef controller for trace_call_bpf() from
CONFIG_BPF_SYSCALL to CONFIG_BPF_EVENTS. I'll show its correctness:
Before this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n normal not compiled (incorrect)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
After this patch:
BPF_SYSCALL BPF_EVENTS trace_call_bpf bpf_trace.o
y y normal compiled
n n inline not compiled
y n inline not compiled (fixed)
n y impossible (BPF_EVENTS depends on BPF_SYSCALL)
So this patch doesn't break anything. QED.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1435716878-189507-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Deref sys_enter pointer args with contents from probe:vfs_getname, showing
pathnames instead of pointers in many syscalls in 'perf trace'. (Arnaldo Carvalho de Melo)
- Make 'perf trace' write to stderr by default, just like 'strace'. (Milian Woff)
Infrastructure changes:
- color_vfprintf() fixes. (Andi Kleen, Jiri Olsa)
- Allow enabling/disabling PERF_SAMPLE_TIME per event. (Kan Liang)
- Fix build errors with mipsel-linux-uclibc compiler. (Petri Gynther)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without this patch, it is cumbersome to read the trace output but
ignoring the normal, potentially verbose, output of the debuggee. One
common example is doing something like the following:
perf trace -s find /tmp > /dev/null
Without this patch, the trace summary will be lost. Now, it will still
be printed at the end. This behavior is also applied by strace.
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/n/tip-tqnks6y2cnvm5f9g2dsfr7zl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To work like strace and dereference syscall pointer args we need to
insert probes (or tracepoints) right after we copy those bytes from
userspace.
Since we're formatting the syscall args at raw_syscalls:sys_enter time,
we need to have a formatter that just stores the position where, later,
when we get the probe:vfs_getname, we can insert the pointer contents.
Now, if a probe:vfs_getname with this format is in place:
# perf probe -l
probe:vfs_getname (on getname_flags:72@/home/git/linux/fs/namei.c with pathname)
That was, in this case, put in place with:
# perf probe 'vfs_getname=getname_flags:72 pathname=filename:string'
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=filename:string)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_getname -aR sleep 1
#
Then 'perf trace' will notice that and do the pointer -> contents
expansion:
# trace -e open touch /tmp/bla
0.165 (0.010 ms): touch/17752 open(filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.195 (0.011 ms): touch/17752 open(filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.512 (0.012 ms): touch/17752 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.582 (0.012 ms): touch/17752 open(filename: /tmp/bla, flags: CREAT|NOCTTY|NONBLOCK|WRONLY, mode: 438) = 3
#
Roughly equivalent to strace's output:
# strace -rT -e open touch /tmp/bla
0.000000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000039>
0.000317 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000102>
0.001461 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000072>
0.000405 open("/tmp/bla", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0.000055>
0.000641 +++ exited with 0 +++
#
Now we need to either look for at all syscalls that are marked as
pointers and have some well known names ("filename", "pathname", etc)
and set the arg formatter to the one used for the "open" syscall in this
patch.
This implementation works for syscalls with just a string being copied
from userspace, for matching syscalls with more than one string being
copied via the same probe/trace point (vfs_getname) we need to extend
the vfs_getname probe spec to include the pointer too, but there are
some problems with that in 'perf probe' or the kernel kprobes code, need
to investigate before considering supporting multiple strings per
syscall.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <mail@milianw.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xvuwx6nuj8cf389kf9s2ue2s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Vince Weaver and Stephane Eranian reported warnings in the PEBS
code when running the perf fuzzer. Stephane wrote:
> I can reproduce the problem on my HSW running the fuzzer.
>
> I can see why this could be happening if you are mixing PEBS and non PEBS events
> in the bottom 4 counters. I suspect:
> for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
> if ((counts[bit] == 0) && (error[bit] == 0))
> continue;
>
> This test is not correct when you have non-PEBS events mixed with
> PEBS events and they overflow at the same time. They will have
> counts[i] != 0 but error[i] == 0, and thus you fall thru the loop
> and hit the assert. Or it is something along those lines.
The only way I can make this work is if ->status only has !PEBS events
set, because if it has both set we'll take that slow path which masks
out the !PEBS bits.
After masking there are 3 options:
- there is one bit set, and its @bit, we increment counts[bit].
- there are multiple bits set, we increment error[] for each set bit,
we do not increment counts[].
- there are no bits set, we do nothing.
The intent was to never increment counts[] for !PEBS events.
Now if we start out with only a single !PEBS event set, we'll pass the
test and increment counts[] for a !PEBS and hit the warn.
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add perf core PMU support for future Intel Skylake CPU cores.
The code is based on Haswell/Broadwell.
There is a new cache event list, based on the updated Haswell
event list.
Skylake has removed most counter constraints on basic
events, so the basic constraints table now only has a single
entry (plus the fixed counters).
TSX support and various other setups are all shared with Haswell.
Skylake has 32 LBR entries. Add a new LBR init function
to set this up. The filters are all the same as Haswell.
It also has a new LBR format with a separate LBR_INFO_* MSR,
but that has been already added earlier.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1431285767-27027-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>