While we can enable the perf sample records per tracepoint
counter, we may also want to enable this option for every
tracepoint counters to open, so that we don't need to add a
:record flag for all of them.
Add the -R, --raw-samples options for this purpose.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1250152039-7284-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a new flag field while opening a tracepoint perf counter:
-e tracepoint_subsystem:tracepoint_name:flags
This is intended to be generic although for now it only supports the
r[e[c[o[r[d]]]]] flag:
./perf record -e workqueue:workqueue_insertion:record
./perf record -e workqueue:workqueue_insertion:r
will have the same effect: enabling the raw samples record for
the given tracepoint counter.
In the future, we may want to support further flags, separated
by commas.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1250152039-7284-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is better than showing the map addr, this way at least we
know that we can't get the symtabs because the DSO was deleted
(system update) while an app still used such DSO.
Yeah, don't do that, but if you do, you'll figure it out
quicker this way.
[acme@doppio linux-2.6-tip]$ perf report | head -15
# Samples: 3796
#
# Overhead Command Shared Object Symbol
# ........ ....... ................................................................... ......
#
23.55% pidgin /lib64/libglib-2.0.so.0.2000.4.#prelink#.Pd98lu (deleted) [.] 0x00000000038844
21.55% pidgin /lib64/libpthread-2.10.1.so.#prelink#.AFwK8Q (deleted) [.] 0x0000000000a42d
10.85% pidgin [kernel] [.] vread_hpet
7.85% pidgin /lib64/libgobject-2.0.so.0.2000.4.#prelink#.o1vpU7 (deleted) [.] 0x00000000014de8
3.35% pidgin /lib64/libc-2.10.1.so (deleted) [.] 0x0000000007a875
3.19% pidgin /lib64/libdbus-1.so.3.4.0.#prelink#.6mwgZP (deleted) [.] 0x0000000001d254
3.06% pidgin /usr/lib64/libgtk-x11-2.0.so.0.1600.5.#prelink#.511hAl (deleted) [.] 0x000000002334e7
2.90% pidgin /usr/lib64/libgdk-x11-2.0.so.0.1600.5.#prelink#.5qlMo1 (deleted) [.] 0x00000000037b2d
1.84% pidgin [kernel] [k] do_sys_poll
1.45% pidgin /usr/lib64/libX11.so.6.2.0.#prelink#.iR59Rx (deleted) [.] 0x0000000004c751
[acme@doppio linux-2.6-tip]$
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Luis Claudio R. Gonçalves <lclaudio@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090811200436.GA3478@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Noticed when trying to record events for a firefox thread. We
were synthesizing both .tid and .pid with the pid passed via
--pid.
Fix it by reading /proc/PID/status and getting the tgid
to use in .pid, .tid gets the specified "pid".
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20090811192200.GF18061@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Sometimes we get callchain branches that have a rate under the
limit given by the user.
Say you launched:
perf record -f -g -a ./hackbench 10
perf report -g fractal,10.0
And you got:
2.33% hackbench [kernel] [k] _spin_lock_irqsave
|
|--78.57%-- remove_wait_queue
| poll_freewait
| do_sys_poll
| sys_poll
| sysenter_dispatch
| 0xf7ffa430
| 0x1ffadea3c
|
|--7.14%-- __up_read
| up_read
| do_page_fault
| page_fault
| 0xf7ffa430
| 0xa0df710000000a
...
It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.
The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:
limit = 4 * 0.9;
The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%
We must then explicitly round up the minimum threshold.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: acme@redhat.com
Cc: peterz@infradead.org
Cc: efault@gmx.de
LKML-Reference: <20090809024235.GA10146@nowhere>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to a libz dependency in some distro's binutils package,
C++ demangle support isn't compiled in despite the necessary
libraries being available.
Fix this by adding a -lz link test to the dependency detection
rules.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1249733655.6929.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.
It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.
Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.
So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:
7.73% [k] delay_tsc
|
|--98.22%-- __const_udelay
| |
| |--86.37%-- ath5k_hw_register_timeout
| | ath5k_hw_noise_floor_calibration
| | ath5k_hw_reset
| | ath5k_reset
| | ath5k_config
| | ieee80211_hw_config
| | |
| | |--88.53%-- ieee80211_scan_work
| | | worker_thread
| | | kthread
| | | child_rip
| | --11.47%-- [...]
| --13.63%-- [...]
--1.78%-- [...]
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.
So if the recording includes call-chains, display the callchain by
default from perf report.
( The user can override this default using "-g none" option from
perf report. )
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the callchain tree comes to insert an empty backtrace, it
raises a spurious warning about the fact we are inserting an
empty. This is spurious because the radix tree assumes it did
something wrong to reach this state. But it didn't, we just met
an empty callchain that has to be ignored.
This happens occasionally with certain types of call-chain
recordings. If it happens it's a big nuisance as perf report
output starts with thousands of warning lines.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. Ignore the -A argument if there is no perf.data file
2. Treat an empty file like a non existent file.
Else, perf will try to read the perf.data header, and fail with
an error.
Treating an empty file like a non-existent file makes sense,
since an interupted (as in SIGKILLed) perf could leave such
files around, and you don't want to annoy the user with errors
for files with no data in it.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While toying with perf, I've noticed that perf record can
easily enter a busy loop when doing something as silly as:
$ perf record -A ls
Yeah, do_read here really wants to read a known size, not being
able to should die(), not busy-loop ;)
That was the cause for the bug.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stop perf list from displaying tracepoints without an id file,
those are special tracepoints that are not interfaced to
perfcounters so listing them is erroneous and passing them as
events will produce no output.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Used with perf report --verbose:
[acme@doppio linux-2.6-tip]$ perf report -v | head -16
5.17% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x00000000005d8eee f [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
2.56% firefox /lib64/libpthread-2.10.1.so 0x0000000000008e02 d [.] __pthread_mutex_lock_internal
1.94% firefox /usr/lib64/xulrunner-1.9.1/libxul.so 0x0000000000d0af8f f [.] SearchTable
1.75% firefox [kernel] 0xffffffffff60013b k [.] vread_hpet
1.63% firefox /lib64/libpthread-2.10.1.so 0x000000000000a404 d [.] __pthread_mutex_unlock
1.47% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000482ea f [.] js_Interpret
1.42% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x000000000003eda3 f [.] JS_CallTracer
1.24% firefox [kernel] 0xffffffff8102ca4a k [k] read_hpet
1.16% firefox [kernel] 0xffffffff810f3dd4 k [k] fget_light
1.11% firefox /usr/lib64/xulrunner-1.9.1/libmozjs.so 0x00000000000567ff f [.] js_TraceObject
0.98% firefox /usr/lib64/firefox-3.5.2/firefox 0x000000000000dd23 b [.] arena_ralloc
[acme@doppio linux-2.6-tip]$
The new field is just after the symbol address. To help in
figuring out symbol resolution bugs.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Brice Goglin reported:
> I can easily sort them by thread id, but I don't know how to match
> my 4 events with each group of 4 lines.
Also report the counter id and the time running/enabled
stats (in case the counter got time-shared).
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Brice Goglin reported that only the first result from a
multi-counter perf record --stat run is accurate, the
rest looks bogus.
A silly mistake made us re-read the first attribute for
every recorded attribute.
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The callchain fractal mode builds each new total hits in a new
branch of profiling by using the parent's hits of the current
branch plus the hits of the children.
This is wrong, the total hits of a branch should be made of the
sum of every children hits, we must ignore the parent hits in
this scope.
This patch also fixes another mistake with the hit counting.
Now the rates are correct.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter tools: update perf top manual page to reflect
current implementation.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pressing any key which is not currently mapped to
functionality, based on startup command line options, displays
currently mapped keys, and prompts for input.
Pressing any unmapped key at the prompt returns the user to
display mode with variables unchanged. eg, pressing ? <SPACE>
<ESC> etc displays currently available keys, the value of the
variable associated with that key, and prompts.
Pressing same again aborts input.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>