Perf records IBS (Instruction Based Sampling) extra sample data when
'perf record --raw-samples' is used with an IBS-compatible event, on a
machine that supports IBS. IBS support is indicated in
CPUID_Fn80000001_ECX bit #10.
Up until now, users have been able to see the extra sample data solely
in raw hex format using 'perf report --dump-raw-trace'. From there,
users could decode the data either manually, or by using an external
script.
Enable the built-in 'perf report --dump-raw-trace' to do the decoding of
the extra sample data bits, so manual or external script decoding isn't
necessary.
Example usage:
$ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace
Stdout contains IBS Fetch samples, e.g.:
ibs_fetch_ctl: 02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat 7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0
IbsFetchLinAd: 000056016b2ead40
IbsFetchPhysAd: 000000115cedfd40
c_ibs_ext_ctl: 0000000000000000 IbsItlbRefillLat 0
..and IBS Op samples, e.g.:
ibs_op_ctl: 0000009e009e8968 MaxCnt 10000000 En 1 Val 1 CntCtl 1=uOps CurCnt 158
IbsOpRip: 000056016b2ea73d
ibs_op_data: 00000000000b0002 CompToRetCtr 2 TagToRetCtr 11 BrnRet 0 RipInvalid 0 BrnFuse 0 Microcode 0
ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache
ibs_op_data3: 0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth 4 bytes OpDcMissOpenMemReqs 0 DcMissLat 0 TlbRefillLat 0
IbsDCLinAd: 00007f133c319ce0
IbsDCPhysAd: 0000000270485ce0
Committer notes:
Fixed up this:
util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’:
util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=]
125 | " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1));
| ^~~~~~
In file included from /usr/include/stdio.h:866,
from util/amd-sample-raw.c:7:
/usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21
71 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
72 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
As that %2d won't limit the number of chars to 2, just state that 2 is
the minimal width:
$ cat printf.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
char bf[64];
int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1]));
printf("strlen(%s): %u\n", bf, len);
return 0;
}
$ ./printf 1
strlen( 1): 2
$ ./printf 12
strlen(12): 2
$ ./printf 123
strlen(123): 3
$ ./printf 1234
strlen(1234): 4
$ ./printf 12345
strlen(12345): 5
$ ./printf 123456
strlen(123456): 6
$
And since we probably don't want that output to be truncated, just
assume the worst case, as the compiler did, and add a few more chars to
that buffer.
Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to
avoid bugs when changing one but not the other.
I also had to change this:
-#include <asm/amd-ibs.h>
+#include "../../arch/x86/include/asm/amd-ibs.h"
To make it build on other architectures, just like intel-pt does.
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210817221509.88391-4-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Recently bperf was added to use BPF to count perf events for various
purposes. This is an extension for the approach and targetting to
cgroup usages.
Unlike the other bperf, it doesn't share the events with other
processes but it'd reduce unnecessary events (and the overhead of
multiplexing) for each monitored cgroup within the perf session.
When --for-each-cgroup is used with --bpf-counters, it will open
cgroup-switches event per cpu internally and attach the new BPF
program to read given perf_events and to aggregate the results for
cgroups. It's only called when task is switched to a task in a
different cgroup.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210701211227.1403788-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:
For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"
For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"
If the 'cpus' file exists and it has data, the pmu exists.
But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way.
So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the
hybrid pmu exists. All the detected hybrid pmus are linked to a global
list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the
list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Detect symbols generated by the OCaml compiler based on their prefix.
Demangle OCaml symbols, returning a newly allocated string (like the
existing Java demangling functionality).
Move a helper function (hex) from tests/code-reading.c to util/string.c
To test:
echo 'Printf.printf "%d\n" (Random.int 42)' > test.ml
perf record ocamlopt.opt test.ml
perf report -d ocamlopt.opt
Signed-off-by: Fabian Hemmer <copy@copy.sh>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LPU-Reference: 20210203211537.b25ytjb6dq5jfbwx@nyu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce 'perf stat -b' option, which counts events for BPF programs, like:
[root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
1.487903822 115,200 ref-cycles
1.487903822 86,012 cycles
2.489147029 80,560 ref-cycles
2.489147029 73,784 cycles
3.490341825 60,720 ref-cycles
3.490341825 37,797 cycles
4.491540887 37,120 ref-cycles
4.491540887 31,963 cycles
The example above counts 'cycles' and 'ref-cycles' of BPF program of id
254. This is similar to bpftool-prog-profile command, but more
flexible.
'perf stat -b' creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.
A new 'struct bpf_counter' is introduced to provide a common interface
that uses BPF programs/maps to count perf events.
Committer notes:
Removed all but bpf_counter.h includes from evsel.h, not needed at all.
Also BPF map lookups for PERCPU_ARRAYs need to have as its value receive
buffer passed to the kernel libbpf_num_possible_cpus() entries, not
evsel__nr_cpus(evsel), as the former uses
/sys/devices/system/cpu/possible while the later uses
/sys/devices/system/cpu/online, which may be less than the 'possible'
number making the bpf map lookup overwrite memory and cause hard to
debug memory corruption.
We need to continue using evsel__nr_cpus(evsel) when accessing the
perf_counts array tho, not to overwrite another are of memory :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/lkml/20210120163031.GU12699@kernel.org/
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Link: http://lore.kernel.org/lkml/20201229214214.3413833-4-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We define a stream as the branch history which is aggregated by the
branch records from perf samples. For example, the callchains aggregated
from the branch records are considered as streams. By browsing the hot
stream, we can understand the hot code path.
Now we only support the callchain for stream. For measuring the hot
level for a stream, we use the callchain_node->hit, higher is hotter.
There may be many callchains sampled so we only focus on the top N
hottest callchains. N is a user defined parameter or predefined default
value (nr_streams_max).
This patch creates an evsel_streams array per event, and saves the top N
hottest streams in a stream array.
So now we can get the per-event top N hottest streams.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20201009022845.13141-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than disable all warnings with -w, disable specific warnings.
Predicate enabling the warnings on a recent version of bison.
Tested with GCC 9.3.0 and clang 9.0.1.
Committer testing:
The full set of compilers, gcc and clang that this will be tested on
will be on the signed tag when this change goes upstream.
Had to add -Wno-switch-enum to build on opensuse tumbleweed:
/tmp/build/perf/util/parse-events-bison.c: In function 'yydestruct':
/tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEMPTY' not handled in switch [-Werror=switch-enum]
1200 | switch (yykind)
| ^~~~~~
/tmp/build/perf/util/parse-events-bison.c:1200:3: error: enumeration value 'YYSYMBOL_YYEOF' not handled in switch [-Werror=switch-enum]
Also replace -Wno-error=implicit-function-declaration with -Wno-implicit-function-declaration.
Also needed to check just the first two levels of the bison version, as
the patch was assuming that all versions were of the form x.y.z, and
there are several cases where it is just x.y, breaking the build.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200619043356.90024-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than disable all warnings with -w, disable specific warnings.
Predicate enabling the warnings on more recent flex versions.
Tested with GCC 9.3.0 and clang 9.0.1.
Committer notes:
The full set of compilers, gcc and clang that this will be tested on
will be on the signed tag when this change goes upstream.
Added -Wno-misleading-indentation to the flex_flags to overcome this on
opensuse tumbleweed when building with clang:
CC /tmp/build/perf/util/parse-events-flex.o
CC /tmp/build/perf/util/pmu.o
/tmp/build/perf/util/parse-events-flex.c:5038:13: error: misleading indentation; statement is not part of the previous 'if' [-Werror,-Wmisleading-indentation]
if ( ! yyg->yy_state_buf )
^
/tmp/build/perf/util/parse-events-flex.c:5036:9: note: previous statement is here
if ( ! yyg->yy_state_buf )
^
And we need to use this to redirect stderr to stdin and then grep in a
way that is acceptable for BusyBox shell:
2>&1 |
Previously I was using:
|&
Which seems to be bash specific.
Added -Wno-sign-compare to overcome this on systems such as centos:7:
CC /tmp/build/perf/util/parse-events-flex.o
CC /tmp/build/perf/util/pmu.o
CC /tmp/build/perf/util/pmu-flex.o
util/parse-events.l: In function 'parse_events_lex':
/tmp/build/perf/util/parse-events-flex.c:193:36: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
for ( yyl = n; yyl < yyleng; ++yyl )\
^
/tmp/build/perf/util/parse-events-flex.c:204:9: note: in expansion of macro 'YY_LESS_LINENO'
Added -Wno-unused-parameter to overcome this in systems such as
centos:7:
CC /tmp/build/perf/util/parse-events-flex.o
CC /tmp/build/perf/util/pmu.o
/tmp/build/perf/util/parse-events-flex.c: In function 'yy_fatal_error':
/tmp/build/perf/util/parse-events-flex.c:6265:58: error: unused parameter 'yyscanner' [-Werror=unused-parameter]
static void yy_fatal_error (yyconst char* msg , yyscan_t yyscanner)
^
Added -Wno-missing-declarations to build in systems such as centos:6:
/tmp/build/perf/util/parse-events-flex.c:6313: error: no previous prototype for 'parse_events_get_column'
/tmp/build/perf/util/parse-events-flex.c:6389: error: no previous prototype for 'parse_events_set_column'
And -Wno-missing-prototypes to cover older compilers:
-Wmissing-prototypes (C only)
Warn if a global function is defined without a previous prototype declaration. This warning is issued even if the definition itself provides a prototype. The aim is to detect global functions that fail to be declared in header files.
-Wmissing-declarations (C only)
Warn if a global function is defined without a previous declaration. Do so even if the definition itself provides a prototype. Use this option to detect global functions that are not declared in header files.
Older C compilers lack -Wno-misleading-indentation, check if it is
available before using it.
Also needed to check just the first two levels of the flex version, as
the patch was assuming that all versions were of the form x.y.z, and
there are several cases where it is just x.y, breaking the build.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200619043356.90024-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>