Jiri Olsa
9660e08ee8
perf stat: Add --interval-clear option
...
Adding --interval-clear option to clear the screen before next interval.
Committer testing:
# perf stat -I 1000 --interval-clear
And, as expected, it behaves almost like:
# watch -n 0 perf stat -a sleep 1
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <andi@firstfloor.org >
Cc: David Ahern <dsahern@gmail.com >
Cc: Frederic Weisbecker <frederic@kernel.org >
Cc: Milian Wolff <milian.wolff@kdab.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Stephane Eranian <eranian@google.com >
Link: http://lkml.kernel.org/r/20180606221513.11302-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-06-07 15:53:36 -03:00
Jin Yao
ac56aa4549
perf script python: Add dict fields introduction to Documentation
...
Add a brief introduction about fields to perf-script-python.txt.
It should help python script developers in easily finding what fields
are supported.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com >
Reviewed-by: Andi Kleen <ak@linux.intel.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Jin Yao <yao.jin@intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kan Liang <kan.liang@linux.intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/1527843663-32288-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-06-06 15:40:10 -03:00
Jiri Olsa
0ce2da1483
perf stat: Display user and system time
...
Adding the support to read rusage data once the workload is finished and
display the system/user time values:
$ perf stat --null perf bench sched pipe
...
Performance counter stats for 'perf bench sched pipe':
5.342599256 seconds time elapsed
2.544434000 seconds user
4.549691000 seconds sys
It works only in non -r mode and only for workload target.
So as of now, for workload targets, we display 3 types of timings. The
time we meassure in perf stat from enable to disable+period:
5.342599256 seconds time elapsed
The time spent in user and system lands, displayed only for workload
session/target:
2.544434000 seconds user
4.549691000 seconds sys
Those times are the very same displayed by 'time' tool. They are
returned by wait4 call via the getrusage struct interface.
Committer notes:
Had to rename some variables to avoid this on older systems such as
centos:6:
builtin-stat.c: In function 'print_footer':
builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
/usr/include/time.h:297: warning: shadowed declaration is here
Committer testing:
# perf stat --null time perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 5.526 [sec]
5.526534 usecs/op
180945 ops/sec
1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
0inputs+0outputs (0major+606minor)pagefaults 0swaps
Performance counter stats for 'time perf bench sched pipe':
5.530978744 seconds time elapsed
1.004037000 seconds user
6.259937000 seconds sys
#
Suggested-by: Ingo Molnar <mingo@kernel.org >
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-06-06 12:52:04 -03:00
Alexey Budankov
f92da71280
perf record: Enable arbitrary event names thru name= modifier
...
Enable complex event names containing [.:=,] symbols to be encoded into Perf
trace using name= modifier e.g. like this:
perf record -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',\
period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
Below is how it looks like in the report output. Please note explicit escaped
quoting at cmdline string in the header so that thestring can be directly reused
for another collection in shell:
perf report --header
# ========
...
# cmdline : /root/abudanko/kernel/tip/tools/perf/perf record -v -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
# event : name = OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM, , type = 4, size = 112, config = 0x100003c, { sample_period, sample_freq } = 3500000, sample_type = IP|TID|TIME, disabled = 1, inh
...
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 24K of event 'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM'
# Event count (approx.): 86492000000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ..............................................
#
14.75% futex [kernel.vmlinux] [k] __entry_trampoline_start
...
perf stat -e cpu/name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
10000000 process context switches in 16678890291ns (1667.9ns/ctxsw)
Performance counter stats for './futex':
88,095,770,571 CPU_CLK_UNHALTED.THREAD:cmask=0x1
16.679542407 seconds time elapsed
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com >
Acked-by: Andi Kleen <ak@linux.intel.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/c194b060-761d-0d50-3b21-bb4ed680002d@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-06-06 12:52:04 -03:00
Arnaldo Carvalho de Melo
7869e58894
Merge remote-tracking branch 'tip/perf/urgent' into perf/core
...
To pick up fixes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-06-04 10:28:20 -03:00
Arnaldo Carvalho de Melo
18a7057420
perf tools: Fix perf.data format description of NRCPUS header
...
In the perf.data HEADER_CPUDESC feadure header we store first the number
of available CPUs in the system, then the number of CPUs at the time of
writing the header, not the other way around.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com >
Acked-by: Andi Kleen <ak@linux.intel.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: He Kuang <hekuang@huawei.com >
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Kim Phillips <kim.phillips@arm.com >
Cc: Lakshman Annadorai <lakshmana@google.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Simon Que <sque@chromium.org >
Cc: Stephane Eranian <eranian@google.com >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-j7o92acm2vnxjv70y4o3swoc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-05-30 15:40:26 -03:00
Thomas Richter
0c711138fa
perf data: Update documentation section on cpu topology
...
Add an explanation of each cpu's core and socket identifier to the
perf.data file format documentation.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com >
Cc: Heiko Carstens <heiko.carstens@de.ibm.com >
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com >
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com >
Link: http://lkml.kernel.org/r/20180528074433.16652-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-05-30 15:39:13 -03:00
Takashi Iwai
ffef80ecf8
perf Documentation: Support for asciidoctor
...
The asciidoc package seems behind the recent big wave of python3
conversion, and we were advised to switch to asciidoctor instead. It's
almost compatible but some extensions used for perf documentation don't
work with it. Here is the patch to cover them, and add the proper
support for asciidoctor.
Pass USE_ASCIIDOCTOR=yes to make for using asciidoctor instead of
asciidoc. The man source and manual attributes are passed via command
options. The support for these attributes have been fixed in the
latest asciidoctor code.
Since asciidoctor can covert to a man page and an HTML directly, we
can omit the dependency on xmlto when USE_ASCIIDOCTOR is set.
Signed-off-by: Takashi Iwai <tiwai@suse.de >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20180424150456.17353-1-tiwai@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-26 13:47:10 -03:00
Jiri Olsa
abc60bad00
perf stat: Display length strings of each run for --table option
...
Adding support to display visual aid 'length strings' to easily spot the
biggest difference in time table.
$ perf stat -r 10 --table perf bench sched pipe
...
Performance counter stats for './perf bench sched pipe' (5 runs):
# Table of individual measurements:
5.189 (-0.293) #
5.189 (-0.294) #
5.186 (-0.296) #
5.663 (+0.181) ##
6.186 (+0.703) ####
# Final result:
5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
Suggested-by: Ingo Molnar <mingo@kernel.org >
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20180423090823.32309-9-jolsa@kernel.org
[ Updated 'perf stat --table' man page entry ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-26 09:30:27 -03:00
Jiri Olsa
e55c14af48
perf stat: Add --table option to display time of each run
...
Add --table option to display time for each run (-r option), like:
$ perf stat --null -r 5 --table perf bench sched pipe
Performance counter stats for './perf bench sched pipe' (5 runs):
# Table of individual measurements:
5.379 (-0.176)
5.243 (-0.311)
5.238 (-0.317)
5.536 (-0.019)
6.377 (+0.823)
# Final result:
5.555 +- 0.213 seconds time elapsed ( +- 3.83% )
Suggested-by: Ingo Molnar <mingo@kernel.org >
Signed-off-by: Jiri Olsa <jolsa@kernel.org >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20180423090823.32309-8-jolsa@kernel.org
[ Document the new option in 'perf stat's man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-26 09:30:27 -03:00
Ravi Bangoria
9a73c30854
perf buildid-cache: Support --purge-all option
...
User can remove files from cache using --remove/--purge options but both
needs list of files as an argument. It's not convenient when you want to
flush out entire cache. Add an option to purge all files from cache.
Ex,
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
ebe71fdcf4b366518cc154d570a33cd461a51c36 /tmp/a.out.1
# perf buildid-cache -P -v
Removing /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
Removing /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
Purged all: Ok
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Kate Stewart <kstewart@linuxfoundation.org >
Cc: Krister Johansen <kjlx@templeofstupid.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Philippe Ombredanne <pombredanne@nexb.com >
Cc: Sihyeon Jang <uneedsihyeon@gmail.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180417041346.5617-4-ravi.bangoria@linux.vnet.ibm.com
[ Initialize 'err' in build_id_cache__purge_all(), to fix build on debian:7, as it can be used uninitialized ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-26 09:30:26 -03:00
Ravi Bangoria
8e1e0d7467
perf buildid-cache: Support --list option
...
'perf buildid-cache' allows to add/remove files into cache but there is
no option to list all cached files. Add --list option to list all
_valid_ cached files.
Ex,
# perf buildid-cache --add /tmp/a.out
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Kate Stewart <kstewart@linuxfoundation.org >
Cc: Krister Johansen <kjlx@templeofstupid.com >
Cc: Masami Hiramatsu <mhiramat@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Philippe Ombredanne <pombredanne@nexb.com >
Cc: Sihyeon Jang <uneedsihyeon@gmail.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Link: http://lkml.kernel.org/r/20180417041346.5617-3-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-26 09:30:26 -03:00
Sangwon Hong
3138a2ef62
perf mem: Document incorrect and missing options
...
Several options were incorrectly described, some lacked describing
required arguments while others were simply not documented, fix it.
Signed-off-by: Sangwon Hong <qpakzk@gmail.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Taeung Song <treeze.taeung@gmail.com >
Link: http://lkml.kernel.org/r/1524382146-19609-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-23 11:59:18 -03:00
Andi Kleen
a7e9eab3db
perf mem: Allow all record/report options
...
For perf mem report / perf mem record, pass all unknown options
through to the underlying report/record commands. This makes things
like
perf mem record -a sleep 1
work. Matches how c2c and other tools work.
Signed-off-by: Andi Kleen <ak@linux.intel.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Link: http://lkml.kernel.org/r/20180406203812.3087-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-18 15:35:48 -03:00
Alexey Budankov
bf30cc1882
perf script: Extend misc field decoding with switch out event type
...
Append 'p' sign to 'S' tag designating the type of context switch out event so
'Sp' means preemption context switch. Documentation is extended to cover
new presentation changes.
$ perf script --show-switch-events -F +misc -I -i perf.data:
hdparm 4073 [004] U 762.198265: 380194 cycles:ppp: 7faf727f5a23 strchr (/usr/lib64/ld-2.26.so)
hdparm 4073 [004] K 762.198366: 441572 cycles:ppp: ffffffffb9218435 alloc_set_pte (/lib/modules/4.16.0-rc6+/build/vmlinux)
hdparm 4073 [004] S 762.198391: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0
swapper 0 [004] 762.198392: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 4073/4073
swapper 0 [004] Sp 762.198477: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 4073/4073
hdparm 4073 [004] 762.198478: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0
swapper 0 [007] K 762.198514: 2303073 cycles:ppp: ffffffffb98b0c66 intel_idle (/lib/modules/4.16.0-rc6+/build/vmlinux)
swapper 0 [007] Sp 762.198561: PERF_RECORD_SWITCH_CPU_WIDE OUT preempt next pid/tid: 1134/1134
kworker/u16:18 1134 [007] 762.198562: PERF_RECORD_SWITCH_CPU_WIDE IN prev pid/tid: 0/0
kworker/u16:18 1134 [007] S 762.198567: PERF_RECORD_SWITCH_CPU_WIDE OUT next pid/tid: 0/0
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/5fc65ce7-8ca5-53ae-8858-8ddd27290575@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-17 09:47:39 -03:00
Arnaldo Carvalho de Melo
43c4023152
perf annotate: Allow setting the offset level in .perfconfig
...
The default is 1 (jump_target):
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
_raw_spin_lock_irqsave() /proc/kcore
0.26 nop
4.61 push %rbx
19.33 pushfq
7.97 pop %rax
0.32 nop
0.06 mov %rax,%rbx
14.63 cli
0.06 nop
xor %eax,%eax
mov $0x1,%edx
49.94 lock cmpxchg %edx,(%rdi)
0.16 test %eax,%eax
↓ jne 2b
2.66 mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
→ callq *ffffffffb30eaed0
mov %rbx,%rax
pop %rbx
← retq
#
But one can ask for showing offsets for call instructions by setting
this:
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
_raw_spin_lock_irqsave() /proc/kcore
0.26 nop
4.61 push %rbx
19.33 pushfq
7.97 pop %rax
0.32 nop
0.06 mov %rax,%rbx
14.63 cli
0.06 nop
xor %eax,%eax
mov $0x1,%edx
49.94 lock cmpxchg %edx,(%rdi)
0.16 test %eax,%eax
↓ jne 2b
2.66 mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
2d: → callq *ffffffffb30eaed0
mov %rbx,%rax
pop %rbx
← retq
#
Or using a big value to ask for all offsets to be shown:
# cat ~/.perfconfig
[annotate]
offset_level = 100
hide_src_code = true
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
Samples: 3K of event 'cycles:ppp', 3000 Hz, Event count (approx.): 2766398574
_raw_spin_lock_irqsave() /proc/kcore
0.26 0: nop
4.61 5: push %rbx
19.33 6: pushfq
7.97 7: pop %rax
0.32 8: nop
0.06 d: mov %rax,%rbx
14.63 10: cli
0.06 11: nop
17: xor %eax,%eax
19: mov $0x1,%edx
49.94 1e: lock cmpxchg %edx,(%rdi)
0.16 22: test %eax,%eax
24: ↓ jne 2b
2.66 26: mov %rbx,%rax
29: pop %rbx
2a: ← retq
2b: mov %eax,%esi
2d: → callq *ffffffffb30eaed0
32: mov %rbx,%rax
35: pop %rbx
36: ← retq
#
This also affects the TUI, i.e. the default 'perf annotate' and 'perf
top/report' -> A hotkey -> annotate interfaces, when slang-devel is present
in the build, i.e.:
# perf version --build-options | grep slang
libslang: [ on ] # HAVE_SLANG_SUPPORT
#
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Martin Liška <mliska@suse.cz >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com >
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-venm6x5zrt40eu8hxdsmqxz6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-13 10:00:05 -03:00
Takuya Yamamoto
e8103e44ce
perf sched: Fix documentation for timehist
...
Fixed a incorrect option and usage to those shown by "perf sched timehist -h",
i.e. the default is really --call-graph, which is equivalent to -g.
Signed-off-by: Takuya Yamamoto <tkydevel@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: https://lkml.kernel.org/n/tip-8fzo0dlsi1mku5aqx8brep5s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-12 10:33:36 -03:00
Alexey Budankov
9dc9a95f03
perf stat: Enable 1ms interval for printing event counters values
...
Currently print count interval for performance counters values is
limited by 10ms so reading the values at frequencies higher than 100Hz
is restricted by the tool.
This change makes perf stat -I possible on frequencies up to 1KHz and,
to some extent, makes perf stat -I to be on-par with perf record
sampling profiling.
When running perf stat -I for monitoring e.g. PCIe uncore counters and
at the same time profiling some I/O workload by perf record e.g. for
cpu-cycles and context switches, it is then possible to observe
consolidated CPU/OS/IO(Uncore) performance picture for that workload.
Tool overhead warning printed when specifying -v option can be missed
due to screen scrolling in case you have output to the console
so message is moved into help available by running perf stat -h.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/b842ad6a-d606-32e4-afe5-974071b5198e@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-12 09:29:31 -03:00
Jin Yao
7098467256
perf version: Add man page
...
Since a new option '--build-options' is created for 'perf version', so
we need to document it.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com >
Acked-by: Jiri Olsa <jolsa@kernel.org >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Jin Yao <yao.jin@intel.com >
Cc: Kan Liang <kan.liang@intel.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/1522402036-22915-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-02 13:52:23 -03:00
Arnaldo Carvalho de Melo
0a6545bda2
perf trace: Show only failing syscalls
...
For instance:
# perf probe "vfs_getname=getname_flags:72 pathname=result->name:string"
Added new event:
probe:vfs_getname (on getname_flags:72 with pathname=result->name:string)
You can now use it in all perf tools, such as:
perf record -e probe:vfs_getname -aR sleep 1
# perf trace --failure sleep 1
0.043 ( 0.010 ms): sleep/10978 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
For reference, here are all the syscalls in this case:
# perf trace sleep 1
? ( ): sleep/10976 ... [continued]: execve()) = 0
0.027 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
0.044 ( 0.010 ms): sleep/10976 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
0.057 ( 0.006 ms): sleep/10976 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.064 ( 0.002 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b370) = 0
0.067 ( 0.003 ms): sleep/10976 mmap(len: 111457, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec8615000
0.071 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
0.080 ( 0.007 ms): sleep/10976 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.088 ( 0.002 ms): sleep/10976 read(fd: 3, buf: 0x7fffac22b538, count: 832) = 832
0.092 ( 0.001 ms): sleep/10976 fstat(fd: 3, statbuf: 0x7fffac22b3d0) = 0
0.094 ( 0.002 ms): sleep/10976 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7feec8613000
0.099 ( 0.004 ms): sleep/10976 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7feec8057000
0.104 ( 0.007 ms): sleep/10976 mprotect(start: 0x7feec8203000, len: 2097152) = 0
0.112 ( 0.005 ms): sleep/10976 mmap(addr: 0x7feec8403000, len: 24576, prot: READ|WRITE, flags: PRIVATE|DENYWRITE|FIXED, fd: 3, off: 1753088) = 0x7feec8403000
0.120 ( 0.003 ms): sleep/10976 mmap(addr: 0x7feec8409000, len: 14976, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS|FIXED) = 0x7feec8409000
0.128 ( 0.001 ms): sleep/10976 close(fd: 3) = 0
0.139 ( 0.001 ms): sleep/10976 arch_prctl(option: 4098, arg2: 140663540761856) = 0
0.186 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8403000, len: 16384, prot: READ) = 0
0.204 ( 0.003 ms): sleep/10976 mprotect(start: 0x55bdc0ec3000, len: 4096, prot: READ) = 0
0.209 ( 0.004 ms): sleep/10976 mprotect(start: 0x7feec8631000, len: 4096, prot: READ) = 0
0.214 ( 0.010 ms): sleep/10976 munmap(addr: 0x7feec8615000, len: 111457) = 0
0.269 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d04000
0.271 ( 0.002 ms): sleep/10976 brk(brk: 0x55bdc2d25000) = 0x55bdc2d25000
0.274 ( 0.001 ms): sleep/10976 brk() = 0x55bdc2d25000
0.278 ( 0.007 ms): sleep/10976 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.288 ( 0.001 ms): sleep/10976 fstat(fd: 3</usr/lib/locale/locale-archive>, statbuf: 0x7feec8408aa0) = 0
0.290 ( 0.003 ms): sleep/10976 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7feec1488000
0.297 ( 0.001 ms): sleep/10976 close(fd: 3</usr/lib/locale/locale-archive>) = 0
0.325 (1000.193 ms): sleep/10976 nanosleep(rqtp: 0x7fffac22c0b0) = 0
1000.560 ( 0.006 ms): sleep/10976 close(fd: 1) = 0
1000.573 ( 0.005 ms): sleep/10976 close(fd: 2) = 0
1000.596 ( ): sleep/10976 exit_group()
#
And can be done systemwide, etc, with backtraces:
# perf trace --max-stack=16 --failure sleep 1
0.048 ( 0.015 ms): sleep/11092 access(filename: /etc/ld.so.preload, mode: R) = -1 ENOENT No such file or directory
__access (inlined)
dl_main (/usr/lib64/ld-2.26.so)
#
Or for some specific syscalls:
# perf trace --max-stack=16 -e openat --failure cat /tmp/rien
cat: /tmp/rien: No such file or directory
0.251 ( 0.012 ms): cat/11106 openat(dfd: CWD, filename: /tmp/rien) = -1 ENOENT No such file or directory
__libc_open64 (inlined)
main (/usr/bin/cat)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/cat)
#
Look for inotify* syscalls that fail, system wide, for 2 seconds, with backtraces:
# perf trace -a --max-stack=16 --failure -e inotify* sleep 2
819.165 ( 0.058 ms): gmain/1724 inotify_add_watch(fd: 8<anon_inode:inotify>, pathname: /home/acme/~, mask: 16789454) = -1 ENOENT No such file or directory
__GI_inotify_add_watch (inlined)
_ik_watch (/usr/lib64/libgio-2.0.so.0.5400.3)
_ip_start_watching (/usr/lib64/libgio-2.0.so.0.5400.3)
im_scan_missing (/usr/lib64/libgio-2.0.so.0.5400.3)
g_timeout_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
g_main_context_dispatch (/usr/lib64/libglib-2.0.so.0.5400.3)
g_main_context_iterate.isra.23 (/usr/lib64/libglib-2.0.so.0.5400.3)
g_main_context_iteration (/usr/lib64/libglib-2.0.so.0.5400.3)
glib_worker_main (/usr/lib64/libglib-2.0.so.0.5400.3)
g_thread_proxy (/usr/lib64/libglib-2.0.so.0.5400.3)
start_thread (/usr/lib64/libpthread-2.26.so)
__GI___clone (inlined)
#
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-8f7d3mngaxvi7tlzloz3n7cs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-02 07:57:37 -03:00
Kim Phillips
b74d12d598
perf tools: Add a "dso_size" sort order
...
Add DSO size to perf report/top sort output list.
This includes adding a map__size fn to map.h, which is
approximately equal to the DSO data file_size:
DSO file size map (end-start) file / (end-start)
libwebkit2gtk-4.0.so.37.24.9 43260072 41295872 95%
libglib-2.0.so.0.5400.1 1125680 1118208 99%
libc-2.26.so 1960656 1925120 101%
libdbus-1.so.3.14.13 309456 303104 102%
Sample output:
$ ./perf report -s dso_size,dso
Samples: 2K of event 'cycles:uppp', Event count (approx.): 128373340
Overhead DSO size Shared Object
90.62% unknown [unknown]
2.87% 1118208 libglib-2.0.so.0.5400.1
1.92% 303104 libdbus-1.so.3.14.13
1.42% 1925120 libc-2.26.so
0.77% 41295872 libwebkit2gtk-4.0.so.37.24.9
0.61% 335872 libgobject-2.0.so.0.5400.1
0.41% 1052672 libgdk-3.so.0.2200.25
0.36% 106496 libpthread-2.26.so
0.29% 221184 dbus-daemon
0.17% 159744 ld-2.26.so
0.13% 49152 libwayland-client.so.0.3.0
0.12% 1642496 libgio-2.0.so.0.5400.1
0.09% 7327744 libgtk-3.so.0.2200.25
0.09% 12324864 libmozjs-52.so.0.0.0
0.05% 4796416 perf
0.04% 843776 libgjs.so.0.0.0
0.03% 1409024 libmutter-clutter-1.so
Committer testing:
To sort by DSO size, use:
# perf report -F dso_size,dso,overhead -s dso_size
<SNIP>
3465216 libdns-export.so.174.0.1 0.00%
3522560 libgc.so.1.0.3 0.00%
3538944 libbfd-2.29-13.fc27.so 0.59%
3670016 libunistring.so.2.1.0 0.00%
3723264 libguile-2.0.so.22.8.1 0.00%
3776512 libgio-2.0.so.0.5400.3 0.00%
3891200 libc-2.26.so 0.96%
3944448 libmozjs-17.0.so 0.00%
4218880 libperl.so.5.26.1 0.18%
4452352 libpython2.7.so.1.0 0.02%
4472832 perf 0.02%
4603904 git 0.01%
4751360 libcrypto.so.1.1.0g 0.00%
5005312 libslang.so.2.3.1 0.00%
7315456 libgtk-3.so.0.2200.26 0.09%
8818688 i965_dri.so 2.46%
8818688 i965_dri.so (deleted) 1.26%
12414976 libmozjs-52.so.0.0.0 0.03%
23642112 cc1 2.02%
27889664 [kernel.kallsyms] 25.41%
80834560 libxul.so (deleted) 15.68%
98078720 chrome 32.03%
1056964608 [kernel.kallsyms] 1.59%
#
Signed-off-by: Kim Phillips <kim.phillips@arm.com >
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com >
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@redhat.com >
Cc: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org >
Cc: Milian Wolff <milian.wolff@kdab.com >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Link: http://lkml.kernel.org/r/20180327060956.1c01ebe67a2a941bb4468c6f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-04-02 07:57:37 -03:00
Arnaldo Carvalho de Melo
91340c5184
perf report: Introduce --ignore-vmlinux command line option
...
We've had this in 'perf top' for quite a while, useful if one wishes
to force using /proc/kcore to do annotation using the patched kernel
instead of the ELF image it started from, aka vmlinux.
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-ircpvox4wzsv7gasrpb28fw9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-03-21 12:53:42 -03:00
Arnaldo Carvalho de Melo
be316409e9
perf annotate: Introduce --ignore-vmlinux command line option
...
This is already present in 'perf top', albeit undocumented (will fix),
and is useful to use /proc/kcore instead of vmlinux and then get what is
really in place, not what the kernel starts with, before alternatives,
ftrace .text patching, etc, see the differences:
# perf annotate --stdio2 _raw_spin_lock_irqsave
_raw_spin_lock_irqsave() /lib/modules/4.16.0-rc4/build/vmlinux
Event: anon group { cycles, instructions }
0.00 3.17 → callq __fentry__
0.00 7.94 push %rbx
7.69 36.51 → callq __page_file_index
mov %rax,%rbx
7.69 3.17 → callq *ffffffff82225cd0
xor %eax,%eax
mov $0x1,%edx
80.77 49.21 lock cmpxchg %edx,(%rdi)
test %eax,%eax
↓ jne 2b
3.85 0.00 mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
→ callq queued_spin_lock_slowpath
mov %rbx,%rax
pop %rbx
← retq
[root@jouet ~]# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
_raw_spin_lock_irqsave() /proc/kcore
Event: anon group { cycles, instructions }
0.00 3.17 nop
0.00 7.94 push %rbx
0.00 23.81 pushfq
7.69 12.70 pop %rax
nop
mov %rax,%rbx
7.69 3.17 cli
nop
xor %eax,%eax
mov $0x1,%edx
80.77 49.21 lock cmpxchg %edx,(%rdi)
test %eax,%eax
↓ jne 2b
3.85 0.00 mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
→ callq *ffffffff820e96b0
mov %rbx,%rax
pop %rbx
← retq
#
Diff of the output of those commands:
# perf annotate --stdio2 _raw_spin_lock_irqsave > /tmp/vmlinux
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave > /tmp/kcore
# diff -y /tmp/vmlinux /tmp/kcore
_raw_spin_lock_irqsave() vmlinux | _raw_spin_lock_irqsave() /proc/kcore
Event: anon group { cycles, instructions } Event: anon group { cycles, instructions }
0.00 3.17 → callq __fentry__ | 0.00 3.17 nop
0.00 7.94 push %rbx 0.00 7.94 push %rbx
7.69 36.51 → callq __page_file_index | 0.00 23.81 pushfq
> 7.69 12.70 pop %rax
> nop
mov %rax,%rbx mov %rax,%rbx
7.69 3.17 → callq *ffffffff82225cd0 | 7.69 3.17 cli
> nop
xor %eax,%eax xor %eax,%eax
mov $0x1,%edx mov $0x1,%edx
80.77 49.21 lock cmpxchg %edx,(%rdi) 80.77 49.21 lock cmpxchg %edx,(%rdi)
test %eax,%eax test %eax,%eax
↓ jne 2b ↓ jne 2b
3.85 0.00 mov %rbx,%rax 3.85 0.00 mov %rbx,%rax
pop %rbx pop %rbx
← retq ← retq
2b: mov %eax,%esi 2b: mov %eax,%esi
→ callq queued_spin_lock_slowpath| → callq *ffffffff820e96b0
mov %rbx,%rax mov %rbx,%rax
pop %rbx pop %rbx
← retq ← retq
#
This should be further streamlined by doing both annotations and
allowing the TUI to toggle initial/current, and show the patched
instructions in a slightly different color.
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-wz8d269hxkcwaczr0r4rhyjg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-03-21 12:53:42 -03:00
Arnaldo Carvalho de Melo
befd2a38a6
perf annotate: Introduce the --stdio2 output mode
...
This uses the TUI augmented formatting routines, modulo interactivity.
# perf annotate --ignore-vmlinux --stdio2 _raw_spin_lock_irqsave
_raw_spin_lock_irqsave() /proc/kcore
Event: cycles:ppp
Percent
Disassembly of section load0:
ffffffff9a8734b0 <load0>:
nop
push %rbx
50.00 pushfq
pop %rax
nop
mov %rax,%rbx
cli
nop
xor %eax,%eax
mov $0x1,%edx
50.00 lock cmpxchg %edx,(%rdi)
test %eax,%eax
↓ jne 2b
mov %rbx,%rax
pop %rbx
← retq
2b: mov %eax,%esi
→ callq queued_spin_lock_slowpath
mov %rbx,%rax
pop %rbx
← retq
Tested-by: Jin Yao <yao.jin@linux.intel.com >
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Wang Nan <wangnan0@huawei.com >
Link: https://lkml.kernel.org/n/tip-6cte5o8z84mbivbvqlg14uh1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-03-21 12:53:26 -03:00
Arnaldo Carvalho de Melo
a8403912d0
perf top: Document --ignore-vmlinux
...
We've had this since 2013, document it.
Cc: Adrian Hunter <adrian.hunter@intel.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: David Ahern <dsahern@gmail.com >
Cc: Jin Yao <yao.jin@linux.intel.com >
Cc: Jiri Olsa <jolsa@kernel.org >
Cc: Namhyung Kim <namhyung@kernel.org >
Cc: Wang Nan <wangnan0@huawei.com >
Cc: Willy Tarreau <w@1wt.eu >
Fixes: fc2be6968e ("perf symbols: Add new option --ignore-vmlinux for perf top")
Link: https://lkml.kernel.org/n/tip-0jwfueooddwfsw9r603belxi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com >
2018-03-19 13:51:52 -03:00