* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf report: Add "Fractal" mode output - support callchains with relative overhead rate
perf_counter tools: callchains: Manage the cumul hits on the fly
perf report: Change default callchain parameters
perf report: Use a modifiable string for default callchain options
perf report: Warn on callchain output request from non-callchain file
x86: atomic64: Inline atomic64_read() again
x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
x86: atomic64: Improve atomic64_xchg()
x86: atomic64: Export APIs to modules
x86: atomic64: Improve atomic64_read()
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
x86: atomic64: Fix unclean type use in atomic64_xchg()
x86: atomic64: Make atomic_read() type-safe
x86: atomic64: Reduce size of functions
x86: atomic64: Improve atomic64_add_return()
x86: atomic64: Improve cmpxchg8b()
x86: atomic64: Improve atomic64_read()
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
perf report: Annotate variable initialization
...
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
enable/disable both its counters. We use this for the
global enable/disable, and clear all config bits (except EN)
to disable individual counters.
Actual ia32 hardware doesn't support lfence, so use a locked
op without side-effect to implement a full barrier.
perf stat and perf record seem to function correctly.
[a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
Signed-off-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The cache events contain '$' which will hit shell variable
expansion. To avoid confusion change this to 'cache', ie
L1-d$-loads becomes L1-dcache-loads.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090706120131.GB4391@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current callchain displays the overhead rates as absolute:
relative to the total overhead.
This patch provides relative overhead percentage, in which each
branch of the callchain tree is a independant instrumentated object.
This provides a 'fractal' view of the call-chain profile: each
sub-graph looks like a profile in itself - relative to its parent.
You can produce such output by using the "fractal" mode
that you can abbreviate via f, fr, fra, frac, etc...
./perf report -s sym -c fractal
Example:
8.46% [k] copy_user_generic_string
|
|--52.01%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| |--97.20%-- sys_pread64
| | system_call_fastpath
| | pread64
| |
| --2.81%-- sys_read
| system_call_fastpath
| __read
|
|--39.85%-- generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| do_sync_write
| reiserfs_file_write
| vfs_write
| |
| |--97.05%-- sys_pwrite64
| | system_call_fastpath
| | __pwrite64
| |
| --2.95%-- sys_write
| system_call_fastpath
| __write_nocancel
[...]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The cumul hits are the number of hits of every childs of a node
plus the hits of the current nodes, required for percentage
computing of a branch.
Theses numbers are calculated during the sorting of the branches of
the callchain tree using a depth first postfix traversal, so that
cumulative hits are propagated in the right order.
But if we plan to implement percentages relative to the parent and not
absolute percentages (relative to the whole overhead), we need to know
the cumulative hits of the parent before computing the children
because the relative minimum acceptable number of entries (ie: minimum
rate against the cumulative hits from the parent) is the basis to
filter the children against a given rate.
Then we need to handle the cumul hits on the fly to prepare the
implementation of relative overhead rates.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If the user doesn't provide options to tune his callchain output
(ie: if he uses -c without arguments) then the default value passed
in the OPT_CALLBACK_DEFAULT() macro is used.
But it's parsed later by strtok() which will replace comma separators
to a zero. This may segfault as we are using a read-only string.
Use a modifiable one instead, and also fix the "100%" default
minimum threshold value by turning it into a 0 (output every callchains)
as it was intended in the origin.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Certain versions of GCC dont see the initialization that is done here:
builtin-report.c: In function ‘__cmd_report’:
builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function
So annotate it with a NULL initialization.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar wrote:
> i just bisected a 'perf report' bug that would cause us to not
> resolve all user-space symbols in a 'git gc' run to:
>
> f5812a7a33 is first bad commit
> commit f5812a7a33
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Tue Jun 30 11:43:17 2009 -0300
>
> perf_counter tools: Adjust only prelinked symbol's addresses
Rename ->prelinked to ->adjust_symbols and making what was done
only for prelinked libraries also to ET_EXEC binaries, such as
/usr/bin/git:
[acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
Type: EXEC (Executable file)
[acme@doppio pahole]$
And after installing the 'git-debuginfo' package, I get correct results:
[acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20
#
# (1139614 samples)
#
# Overhead Command Shared Object Symbol
# ........ ................ ......................... ......
#
34.98% git /usr/bin/git [.] send_sideband
33.39% git /usr/bin/git [.] enter_repo
6.81% git /usr/bin/git [.] diff_opt_parse
4.95% git /usr/bin/git [.] is_repository_shallow
3.24% git /usr/bin/git [.] odb_mkstemp
1.39% git /usr/bin/git [.] output
1.34% git /usr/bin/git [.] xmmap
1.25% git /usr/bin/git [.] receive_pack_config
1.16% git /usr/bin/git [.] git_pathdup
0.90% git /usr/bin/git [.] read_object_with_reference
0.86% git /usr/bin/git [.] show_patch_diff
0.85% git /usr/bin/git 0x00000000095e2e
0.69% git /usr/bin/git [.] display
[acme@doppio linux-2.6-tip]$
I'll check what are the last cases where we can't resolve symbols, like
this 0x00000000095e2e later.
And I guess this will fix the problems Mike were seeing too:
[acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
Type: EXEC (Executable file)
[acme@doppio linux-2.6-tip]$
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Among perf annotate, perf report and perf top, we can find the
common colored printing of percents according to the following
rules:
High overhead = > 5%, colored in red
Mid overhead = > 0.5%, colored in green
Low overhead = < 0.5%, default color
Factorize these multiple checks in a single function named
percent_color_fprintf() and also provide a get_percent_color()
for sites which print percentages and other things at the same
time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Callchains output may become a burden on a trace because even
rarely hit site are exposed. This can be too much information.
Let the user set a threshold as a minimum percent of hits using
the new pattern for the -c option:
-c mode,min_percent
Example:
$ perf report -s sym -c flat,4
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
5.39% [k] search_by_key
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
$ perf report -s sym -c graph,2
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| --4.19%-- sys_pread64
| system_call_fastpath
| pread64
|
--3.24%-- generic_file_buffered_write
__generic_file_aio_write_nolock
generic_file_aio_write
do_sync_write
reiserfs_file_write
vfs_write
|
--3.14%-- sys_pwrite64
system_call_fastpath
__pwrite64
5.39% [k] search_by_key
|
--2.23%-- reiserfs_update_sd_size
4.63% 0x00000000009e0a
2.36% [k] memcpy_c
[...]
You can also omit it and it will default to 0.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, the printing of callchains is done in a single
vertical level, this is the "flat" mode:
8.25% [k] copy_user_generic_string
4.19%
copy_user_generic_string
generic_file_aio_read
do_sync_read
vfs_read
sys_pread64
system_call_fastpath
pread64
This patch introduces a new "graph" mode which provides a
hierarchical output of factorized paths recursively sorted:
8.25% [k] copy_user_generic_string
|
|--4.31%-- generic_file_aio_read
| do_sync_read
| vfs_read
| |
| |--4.19%-- sys_pread64
| | system_call_fastpath
| | pread64
| |
| --0.12%-- sys_read
| system_call_fastpath
| __read
|
|--3.24%-- generic_file_buffered_write
| __generic_file_aio_write_nolock
| generic_file_aio_write
| do_sync_write
| reiserfs_file_write
| vfs_write
| |
| |--3.14%-- sys_pwrite64
| | system_call_fastpath
| | __pwrite64
| |
| --0.10%-- sys_write
[...]
The command line has then changed.
By providing the -c option, the callchain will output in the
flat mode by default.
But you can override it:
perf report -c graph
or
perf report -c flat
You can also pass the abreviated mode:
perf report -c g
or
perf report -c gra
will both make use of the graph mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no predefined macro to create an option that can have
a custom value or a default one if none is given.
This patch provides a new helper OPT_CALLBACK_DEFAULT() which
defines such kind of option.
For example, considering an option -c, we want to get the
default value in the following cases:
perf command -c -d
perf command -d -c
And the foo value when it's given:
perf command -c foo -d
perf command -d -c foo
That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
support default values whatever the position of the option, not
only in the end.
Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: git@vger.kernel.org
LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Iterating through children of a node in the callchain tree
shows something that may be quite confusing at a first glance.
The head is the children field of the parent and the list nodes
are in the brothers field of the children.
This is because the childs are linked to the parent as a list
of "brothers" using the "children" list of the parent as a
head:
---------------
| Parent (head) |-------------------------------------
--------------- |
| |
children |
| |
----------- ----------- |
| 1st child |---brother---| 2nd child |---brother-----
----------- -----------
This makes the following strange pattern often occuring:
list_for_each_entry(child, &parent->children, brothers) {
// do something with children
}
Abstract it to chain_for_each_child() to factorize and simplify
this pattern.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Building builtin-stat.c reports the following errors:
cc1: warnings being treated as errors
builtin-stat.c: In function ‘run_perf_stat’:
builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
make: *** [builtin-stat.o] Erreur 1
This patch handles the possible pipe read failures.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The copy we were using came from another copy I did for the dwarves
(pahole) package, that came from the kernel years ago.
The only function that is used by the perf tools and that isn't in the
kernel is list_del_range, that I'm leaving in the perf tools only for
now.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090701174608.GA5823@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>