Commit Graph

33 Commits

Author SHA1 Message Date
Chris Wilson
933da83aa1 perf: Propagate term signal to child
If we launch the child on behalf of the user, ensure that it dies
along with ourselves when we are interrupted.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
LKML-Reference: <1254616502-4728-1-git-send-email-chris@chris-wilson.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-04 19:37:39 +02:00
Ingo Molnar
c7f7fea30b perf stat: Fix zero total printouts
Before:

           0  sched:sched_switch #        nan M/sec

After:

           0  sched:sched_switch #      0.000 M/sec

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-22 15:01:47 +02:00
Ingo Molnar
cdd6c482c9 perf: Do the big rename: Performance Counters -> Performance Events
Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21 14:28:04 +02:00
Peter Zijlstra
849abde92b perf stat: Clean up statistics calculations a bit more
Remove some, now useless, global storage.
Don't calculate the stddev when not needed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 20:27:26 +02:00
Peter Zijlstra
8a02631a47 perf stat: More advanced variance computation
Use the more advanced single pass variance algorithm outlined
on the wikipedia page. This is numerically more stable for
larger sample sets.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 17:38:15 +02:00
Peter Zijlstra
63d40deb2e perf stat: Use stddev_mean in stead of stddev
When we're computing the mean by sampling the distribution,
then the std dev of the mean is related to the std dev of the
sample set by:

  stddev_mean = std_dev / sqrt(N)

Which is exactly what we want.

This results in the error on the mean decreasing with
increasing number of samples.

Also fix the scaled == -1, aka not counted case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 17:38:14 +02:00
Peter Zijlstra
9e9772c458 perf stat: Remove the limit on repeat
Since we don't need all the individual samples to calculate the
error remove both the limit and the storage overhead associated
with that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 16:33:08 +02:00
Peter Zijlstra
506d4bc8d5 perf stat: Change noise calculation to use stddev
The current noise computation does:

 \Sum abs(n_i - avg(n)) * N^-1.5

Which is (afaik) not a regular noise function, and needs the
complete sample set available to post-process.

Change this to use a regular stddev computation which can be
done by keeping a two sums:

 stddev = sqrt( 1/N (\Sum n_i^2) - avg(n)^2 )

For which we only need to keep \Sum n_i and \Sum n_i^2.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04 16:33:07 +02:00
Frederic Weisbecker
8f28827a16 perf tools: Librarize trace_event() helper
Librarize trace_event() helper so that perf trace can use it
too. Also clean up the debug.h includes a bit.

It's not good to have it included in perf.h because it doesn't
make it flexible against other headers it may need (headers
that can also depend on perf.h and then create a recursive
header dependency).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1250453149-664-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16 23:06:45 +02:00
Frederic Weisbecker
cd84c2ac6d perf tools: Factorize high level dso helpers
Factorize multiple definitions of high level dso helpers into the
symbol source file.

The side effect is a general export of the verbose and eprintf
debugging helpers into a new file dedicated to debugging purposes.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
2009-08-12 12:02:38 +02:00
Brice Goglin
b26bc5a7f8 perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
We want to use a coherent flag for -S/--stat across all tools,
so free up -S in perf stat.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 12:54:37 +02:00
Anton Blanchard
a0541234f8 perf_counter: Improve perf stat and perf record option parsing
perf stat and perf record currently look for all options on the command
line. This can lead to some confusion:

# perf stat ls -l
  Error: unknown switch `l'

While we can work around this by adding '--' before the command, the git
option parsing code can stop at the first non option:

# perf stat ls -l
 Performance counter stats for 'ls -l':
....

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090722130412.GD9029@kryten>
2009-07-22 18:05:56 +02:00
Frederic Weisbecker
a92bef0f21 perf stat: Handle pipe read failures in perf stat
Building builtin-stat.c reports the following errors:

cc1: warnings being treated as errors
builtin-stat.c: In function ‘run_perf_stat’:
builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
make: *** [builtin-stat.o] Erreur 1

This patch handles the possible pipe read failures.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 22:37:24 +02:00
Jaswinder Singh Rajput
b9ebdcc0ce perf stat: Define MATCH_EVENT for easy attr checking
MATCH_EVENT is useful:

 1. for multiple attrs checking
 2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space
 3. avoids line breakage

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 13:28:38 +02:00
Ingo Molnar
f37a291c52 perf_counter tools: Add more warnings and fix/annotate them
Enable -Wextra. This found a few real bugs plus a number
of signed/unsigned type mismatches/uncleanlinesses. It
also required a few annotations

All things considered it was still worth it so lets try with
this enabled for now.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 12:49:48 +02:00
Paul Mackerras
57e7986ed1 perf_counter: Provide a way to enable counters on exec
This provides a way to mark a counter to be enabled on the next
exec. This is useful for measuring the total activity of a
program without including overhead from the process that
launches it.

This also changes the perf stat command to use this new
facility.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19017.43927.838745.689203@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-30 12:00:16 +02:00
Paul Mackerras
051ae7f734 perf_counter tools: Reduce perf stat measurement overhead/skew
Vince Weaver reported a 'perf stat' measurement overhead in the
count of retired instructions, which can amount to a +6000
instructions inflated count in the reported count.

At present, perf stat creates its counters on the perf process.  Thus
the counters count the fork and various other activity in both the
parent and child, such as the resolver overhead for resolving PLT
entries for any libc functions that haven't been called before, such
as execvp.

This reduces the overhead by creating the counters on the child process
after the fork, using a couple of pipes to synchronize so that the
child process waits until the parent has created the counters before
doing the exec.  To eliminate the PLT resolution overhead on calling
execvp, this does a dummy execvp first which will always fail.

With this, the overhead of executing a program goes down from over
4800 instructions to about 90 instructions on powerpc (32-bit).
This was measured with a statically-linked program written in
assembler which only does the 3 instructions needed to call _exit(0).

Before:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

           4858  instructions

    0.001274523  seconds time elapsed

After:

$ perf stat -e 0:1:u ./three

 Performance counter stats for './three':

             92  instructions

    0.000468153  seconds time elapsed

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 22:38:09 +02:00
Ingo Molnar
210ad39fb7 perf stat: Use percentages for scaling output
Peter expressed a strong preference for percentage based
display of scaled values - so revert to that from the
recently introduced multiplication-factor unit.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 21:50:54 +02:00
Jaswinder Singh Rajput
c3043569dc perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
Set attrs and nr_counters if no event is selected and !null_run.

Setting of attrs should depend on number of counters,
so we need to memcpy only for sizeof(default_attrs)

Also set nr_counters as ARRAY_SIZE(default_attrs) in place of
hardcoded value.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246126749.32198.16.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-28 15:22:47 +02:00
Jaswinder Singh Rajput
6e750a8fc0 perf stat: Improve output
Increase size for event name to handle bigger names like
'L1-d$-prefetch-misses'

Changed scaled counters from percentage to a multiplicative
factor because the latter is more expressive.

Also aligned the scaling factor, otherwise sometimes it looks
like:

            384  iTLB-load-misses           (4.74x scaled)
         452029  branch-loads               (8.00x scaled)
           5892  branch-load-misses         (20.39x scaled)
         972315  iTLB-loads                 (3.24x scaled)

Before:
         150708  L1-d$-stores          (scaled from 23.57%)
         428804  L1-d$-prefetches      (scaled from 23.47%)
         314446  L1-d$-prefetch-misses  (scaled from 23.42%)
      252626137  L1-i$-loads           (scaled from 23.24%)
        5297550  dTLB-load-misses      (scaled from 23.96%)
      106992392  branch-loads          (scaled from 23.67%)
        5239561  branch-load-misses    (scaled from 23.43%)

After:
        1731713  L1-d$-loads               (  14.25x scaled)
          44241  L1-d$-prefetches          (   3.88x scaled)
          21076  L1-d$-prefetch-misses     (   3.40x scaled)
        5789421  L1-i$-loads               (   3.78x scaled)
          29645  dTLB-load-misses          (   2.95x scaled)
         461474  branch-loads              (   6.52x scaled)
           7493  branch-load-misses        (  26.57x scaled)

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1246051927.2988.10.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 18:39:41 +02:00
Ingo Molnar
566747e629 perf stat: Fix multi-run stats
In multi-run (-r/--repeat) printouts, print out the noise of
the wall-clock average as well.

Also, fix a bug in printing out scaled counters: if it was not
scaled then we should not update the average with -1.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 06:34:37 +02:00
Ingo Molnar
0cfb7a13b8 perf stat: Add -n/--null option to run without counters
Allow a no-counters run. This can be useful to measure just
elapsed wall-clock time - or to assess the raw overhead of perf
stat itself, without running any counters.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 06:11:24 +02:00
Jaswinder Singh Rajput
3d63259583 perf stat: Remove dead code
Remove dead code and do some code alignment.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245847774.2681.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 14:55:45 +02:00
Jaswinder Singh Rajput
cca03c0aeb perf stat: Fix verbose for perf stat
Error message should use stderr for verbose (-v), otherwise
message will be lost for:

 $ ./perf stat -v <cmd>  > /dev/null

For example on AMD bus-cycles event is not available so now
it looks like:

 $ ./perf stat -v -e bus-cycles ls > /dev/null
Error: counter 0, sys_perf_counter_open() syscall returned with -1 (Invalid argument)

 Performance counter stats for 'ls':

  <not counted>  bus-cycles

    0.006765877  seconds time elapsed.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245757369.3776.1.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-23 21:58:44 +02:00
Paul Mackerras
9cffa8d533 perf_counter tools: Define and use our own u64, s64 etc. definitions
On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long.  This causes compiler warnings every time we
print a __u64 value with %Lx.

Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8.  These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.

The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__".  The other changes are:

* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
  to kill two remaining warnings.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 18:25:47 +02:00