Commit Graph

180969 Commits

Author SHA1 Message Date
Robert Richter bb1165d688 perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
For consistency reasons this patch renames
ARCH_PERFMON_EVENTSEL0_ENABLE to ARCH_PERFMON_EVENTSEL_ENABLE.

The following is performed:

 $ sed -i -e s/ARCH_PERFMON_EVENTSEL0_ENABLE/ARCH_PERFMON_EVENTSEL_ENABLE/g \
   arch/x86/include/asm/perf_event.h arch/x86/kernel/cpu/perf_event.c \
   arch/x86/kernel/cpu/perf_event_p6.c \
   arch/x86/kernel/cpu/perfctr-watchdog.c \
   arch/x86/oprofile/op_model_amd.c arch/x86/oprofile/op_model_ppro.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 14:21:23 +01:00
Robert Richter a163b1099d perf, x86: add some IBS macros to perf_event.h
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 11:23:15 +01:00
Robert Richter 1d6040f17d perf, x86: make IBS macros available in perf_event.h
This patch moves code from oprofile to perf_event.h to make it also
available for usage by perf.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 11:23:15 +01:00
Robert Richter 86d62b6fa2 Merge remote branch 'tip/oprofile' into tip/perf/core
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 11:22:45 +01:00
Frederic Weisbecker 3d083407a1 x86/hw-breakpoints: Remove the name field
Remove the name field from the arch_hw_breakpoint. We never deal
with target symbols in the arch level, neither do we need to ever
store it. It's a legacy for the previous version of the x86
breakpoint backend.

Let's remove it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-27 17:24:15 +01:00
Frederic Weisbecker dd8b1cf681 perf: Remove pointless breakpoint union
Remove pointless union in the breakpoint field of hw_perf_event.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
2010-02-27 17:10:39 +01:00
Frederic Weisbecker b67577dfb4 perf lock: Drop the buffers multiplexing dependency
We need to deal with time ordered events to build a correct
state machine of lock events. This is why we multiplex the lock
events buffers. But the ordering is done from the kernel, on
the tracing fast path, leading to high contention between cpus.

Without multiplexing, the events appears in a weak order.
If we have four events, each split per cpu, perf record will
read the events buffers in the following order:

[ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....]

To handle a post processing reordering, we could just read and sort
the whole in memory, but it just doesn't scale with high amounts
of events: lock events can fill huge amounts in few times.

Basically we need to sort in memory and find a "grace period"
point when we know that a given slice of previously sorted events
can be committed for post-processing, so that we can unload the
memory usage step by step and keep a scalable sorting list.

There is no strong rules about how to define such "grace period".
What does this patch is:

We define a FLUSH_PERIOD value that defines a grace period in
seconds.
We want to have a slice of events covering 2 * FLUSH_PERIOD in our
sorted list.
If FLUSH_PERIOD is big enough, it ensures every events that occured
in the first half of the timeslice have all been buffered and there
are none remaining and there won't be further to put inside this
first timeslice. Then once we reach the 2 * FLUSH_PERIOD
timeslice, we flush the first half to be gentle with the memory
(the second half can still get new events in the middle, so wait
another period to flush it)

FLUSH_PERIOD is defined to 5 seconds. Say the first event started on
time t0. We can safely assume that at the time we are processing
events of t0 + 10 seconds, ther won't be anymore events to read
from perf.data that occured between t0 and t0 + 5 seconds. Hence
we can safely flush the first half.

To point out funky bugs, we have a guardian that checks a new event
timestamp is not below the last event's timestamp flushed and that
displays a warning in this case.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
2010-02-27 17:06:19 +01:00
Hitoshi Mitake 84c6f88fc8 perf lock: Fix and add misc documentally things
I've forgot to add 'perf lock' line to command-list.txt,
so users of perf could not find perf lock when they type 'perf'.

Fixing command-list.txt requires document
(tools/perf/Documentation/perf-lock.txt).
But perf lock is too much "under construction" to write a
stable document, so this is something like pseudo document for now.

And I wrote description of perf lock at help section of
CONFIG_LOCK_STAT, this will navigate users of lock trace events.

Signed-off-by: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
LKML-Reference: <1265267295-8388-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-27 17:05:22 +01:00
Tejun Heo 44ee63587d percpu: Add __percpu sparse annotations to hw_breakpoint
Add __percpu sparse annotations to hw_breakpoint.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

In kernel/hw_breakpoint.c, per_cpu(nr_task_bp_pinned, cpu)'s will
trigger spurious noderef related warnings from sparse.  Changing it to
&per_cpu(nr_task_bp_pinned[0], cpu) will work around the problem but
deemed to ugly by the maintainer.  Leave it alone until better
solution can be found.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
LKML-Reference: <4B7B4B7A.9050902@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-27 16:23:39 +01:00
Frederic Weisbecker 018cbffe68 Merge commit 'v2.6.33' into perf/core
Merge reason:
	__percpu annotations need the corresponding sparse address
space definition upstream.

Conflicts:
	tools/perf/util/probe-event.c (trivial)
2010-02-27 16:18:46 +01:00
Peter Zijlstra 1dd2980d99 perf_event, amd: Fix spinlock initialization
Avoid kernels from exploding on AMD machines when they have any
lock debugging bits enabled.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 17:25:19 +01:00
Peter Zijlstra 24691ea964 perf_event: Fix preempt warning in perf_clock()
A recent commit introduced a preemption warning for
perf_clock(), use raw_smp_processor_id() to avoid this, it
really doesn't matter which cpu we use here.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1267198583.22519.684.camel@laptop>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 17:25:00 +01:00
David S. Miller 4385d580f2 perf tools: Flush maps on COMM events
Even though we don't register the counters until the child is right about
to exec(), we're still going to get at least a few events while the
fork()'d child is still executing 'perf' and in particular we're going to
get the MMAP events.

We can't distinguish the ones in the newly executed process because the
PID will be the same.

One way to solve this would be to have a PERF_RECORD_EXEC event, and when
this is seen 'perf' can flush it's map cache.  We can't use
PERF_RECORD_COMM since that's generated by other things, not just exec().

Actually, thinking about it some more, using PERF_RECORD_COMM might be a
good enough approximation.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1267196914-16238-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 16:28:45 +01:00
Peter Zijlstra f22f54f449 perf_events, x86: Split PMU definitions into separate files
Split amd,p6,intel into separate files so that we can easily deal with
CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c
relies on symbols from amd.c

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 15:44:04 +01:00
Arnaldo Carvalho de Melo 48fb4fdd6b perf annotate: Handle samples not at objdump output addr boundaries
Without this patch we get this for need_resched:

[root@mica ~]# perf annotate need_resched

------------------------------------------------
 Percent |      Source code & Disassembly of vmlinux
------------------------------------------------
         :
         :
         :      Disassembly of section .text:
         :
         :      ffffffff810095ed <need_resched>:
         :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
         :      }
         :
         :      static inline int need_resched(void)
         :      {
    0.00 :      ffffffff810095ed:       55                      push   %rbp
         :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
    0.00 :      ffffffff810095ee:       be 03 00 00 00          mov    $0x3,%esi
         :
         :      static inline struct thread_info *current_thread_info(void)
         :      {
         :              struct thread_info *ti;
         :              ti = (void *)(percpu_read_stable(kernel_stack) +
    0.00 :      ffffffff810095f3:       65 48 8b 3c 25 48 b5    mov    %gs:0xb548,%rdi
    0.00 :      ffffffff810095fa:       00 00
         :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
         :      }
         :
         :      static inline int need_resched(void)
         :      {
    0.00 :      ffffffff810095fc:       48 89 e5                mov    %rsp,%rbp
         :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
    0.00 :      ffffffff810095ff:       48 81 ef d8 1f 00 00    sub    $0x1fd8,%rdi
    0.00 :      ffffffff81009606:       e8 9d ff ff ff          callq  ffffffff810095a8 <test_ti_thread_flag>
         :      }
    0.00 :      ffffffff8100960b:       c9                      leaveq
    0.00 :      ffffffff8100960c:       85 c0                   test   %eax,%eax
    0.00 :      ffffffff8100960e:       0f 95 c0                setne  %al
    0.00 :      ffffffff81009611:       0f b6 c0                movzbl %al,%eax
         :      Disassembly of section .vsyscall_0:
         :      Disassembly of section .vsyscall_fn:
         :      Disassembly of section .vsyscall_1:
         :      Disassembly of section .vsyscall_2:
         :      Disassembly of section .init.text:
         :      Disassembly of section .altinstr_replacement:
         :      Disassembly of section .exit.text:
[root@mica ~]#

But from the 'perf report' result we know that there are hits
for need_resched on a 4 way machine mostly doing nothing, so
after adding code to show what is in each hist offset and
collapsing IP hits for what happens between objdump lines we
get, for the same perf.data file:

[root@mica ~]# perf annotate -v need_resched

------------------------------------------------
 Percent |      Source code & Disassembly of vmlinux
------------------------------------------------
         :
         :
         :      Disassembly of section .text:
         :
         :      ffffffff810095ed <need_resched>:
         :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
         :      }
         :
         :      static inline int need_resched(void)
         :      {
    0.00 :      ffffffff810095ed:       55                      push   %rbp
         :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
   52.78 :      ffffffff810095ee:       be 03 00 00 00          mov    $0x3,%esi
         :
         :      static inline struct thread_info *current_thread_info(void)
         :      {
         :              struct thread_info *ti;
         :              ti = (void *)(percpu_read_stable(kernel_stack) +
    0.00 :      ffffffff810095f3:       65 48 8b 3c 25 48 b5    mov    %gs:0xb548,%rdi
    0.00 :      ffffffff810095fa:       00 00
         :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
         :      }
         :
         :      static inline int need_resched(void)
         :      {
    0.00 :      ffffffff810095fc:       48 89 e5                mov    %rsp,%rbp
         :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
    9.72 :      ffffffff810095ff:       48 81 ef d8 1f 00 00    sub    $0x1fd8,%rdi
    0.00 :      ffffffff81009606:       e8 9d ff ff ff          callq  ffffffff810095a8 <test_ti_thread_flag>
         :      }
    0.00 :      ffffffff8100960b:       c9                      leaveq
    0.00 :      ffffffff8100960c:       85 c0                   test   %eax,%eax
   37.50 :      ffffffff8100960e:       0f 95 c0                setne  %al
    0.00 :      ffffffff81009611:       0f b6 c0                movzbl %al,%eax
         :      Disassembly of section .vsyscall_0:
         :      Disassembly of section .vsyscall_fn:
         :      Disassembly of section .vsyscall_1:
         :      Disassembly of section .vsyscall_2:
         :      Disassembly of section .init.text:
         :      Disassembly of section .altinstr_replacement:
         :      Disassembly of section .exit.text:
[root@mica ~]#

And now 'perf annotate -v', verbose mode, will show the hits per
precise IP, so that one can make sense of the attribution to
each objdumop line:

[root@mica ~]# perf annotate -v need_resched
Looking at the vmlinux_path (5 entries long)
Using /lib/modules/2.6.33-rc8-tip-00784-g3471df5-dirty/build/vmlinux
for symbols annotate_sym: filename=/lib/modules/2.6.33-rc8-tip-00784-g3471df5-dirty/build/vmlinux, sym=need_resched, start=0xffffffff810095ed, end=0xffffffff81009614

------------------------------------------------
 Percent |      Source code & Disassembly of vmlinux
------------------------------------------------
                ffffffff810095f1: 152
                ffffffff81009603: 28
                ffffffff8100960f: 55
                ffffffff81009610: 53
                          h->sum: 288
<SNIP same annotation>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1267194194-15670-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-26 15:42:49 +01:00
Robert Richter cfc9c0b450 oprofile/x86: fix msr access to reserved counters
During switching virtual counters there is access to perfctr msrs. If
the counter is not available this fails due to an invalid
address. This patch fixes this.

Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:28:16 +01:00
Robert Richter c17c8fbf34 oprofile/x86: use kzalloc() instead of kmalloc()
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:20:03 +01:00
Robert Richter 68dc819ce8 oprofile/x86: fix perfctr nmi reservation for mulitplexing
Multiple virtual counters share one physical counter. The reservation
of virtual counters fails due to duplicate allocation of the same
counter. The counters are already reserved. Thus, virtual counter
reservation may removed at all. This also makes the code easier.

Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:19:03 +01:00
Naga Chumbalkar 8588d10671 oprofile/x86: add comment to counter-in-use warning
Currently, oprofile fails silently on platforms where a non-OS entity
such as the system firmware "enables" and uses a performance
counter. There is a warning in the code for this case.

The warning indicates an already running counter. If oprofile doesn't
collect data, then try using a different performance counter on your
platform to monitor the desired event. Delete the counter from the
desired event by editing the

 /usr/share/oprofile/<cpu_type>/<cpu>/events

file. If the event cannot be monitored by any other counter, contact
your hardware or BIOS vendor.

Cc: Shashi Belur <shashi-kiran.belur@hp.com>
Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:34 +01:00
Robert Richter 98a2e73a06 oprofile/x86: warn user if a counter is already active
This patch generates a warning if a counter is already active.

Implemented for AMD and P6 models. P4 is not supported.

Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Shashi Belur <shashi-kiran.belur@hp.com>
Cc: Tony Jones <tonyj@suse.de>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:03 +01:00
Robert Richter ba52078e19 oprofile/x86: implement randomization for IBS periodic op counter
IBS selects an op (execution operation) for sampling by counting
either cycles or dispatched ops. Better statistical samples can be
produced by adding a software generated random offset to the periodic
op counter value with each sample.

This patch adds software randomization to the IBS periodic op
counter. The lower 12 bits of the 20 bit counter are
randomized. IbsOpCurCnt is initialized with a 12 bit random value.

There is a work around if the hw can not write to IbsOpCurCnt. Then
the lower 8 bits of the 16 bit IbsOpMaxCnt [15:0] value are randomized
in the range of -128 to +127 by adding/subtracting an offset to the
maximum count (IbsOpMaxCnt).

The linear feedback shift register (LFSR) algorithm is used for
pseudo-random number generation to have low impact to the memory
system.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:02 +01:00
Suravee Suthikulpanit f125be1469 oprofile/x86: implement lsfr pseudo-random number generator for IBS
This patch implements a linear feedback shift register (LFSR) for
pseudo-random number generation for IBS.

For IBS measurements it would be good to minimize memory traffic in
the interrupt handler since every access pollutes the data
caches. Computing a maximal period LFSR just needs shifts and ORs.

The LFSR method is good enough to randomize the ops at low
overhead. 16 pseudo-random bits are enough for the implementation and
it doesn't matter that the pattern repeats with a fairly short
cycle. It only needs to break up (hard) periodic sampling behavior.

The logic was designed by Paul Drongowski.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:02 +01:00
Robert Richter 64683da664 oprofile/x86: implement IBS cpuid feature detection
This patch adds IBS feature detection using cpuid flags. An IBS
capability mask is introduced to test for certain IBS features. The
bit mask is the same as for IBS cpuid feature flags (Fn8000_001B_EAX),
but bit 0 is used to indicate the existence of IBS.

The patch also changes the handling of the IbsOpCntCtl bit (periodic
op counter count control). The oprofilefs file for this feature
(ibs_op/dispatched_ops) will be only exposed if the feature is
available, also the default for the bit is set to count clock cycles.

In general, the userland can detect the availability of a feature by
checking for the corresponding file in oprofilefs. If it exists, the
feature also exists. This may lead to a dynamic file layout depending
on the cpu type with that the userland has to deal with. Current
opcontrol is compatible.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:02 +01:00
Robert Richter 89baaaa98a oprofile/x86: remove node check in AMD IBS initialization
Standard AMD systems have the same number of nodes as there are
northbridge devices. However, there may kernel configurations
(especially for 32 bit) or system setups exist, where the node number
is different or it can not be detected properly. Thus the check is not
reliable and may fail though IBS setup was fine. For this reason it is
better to remove the check.

Cc: stable <stable@kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:14:01 +01:00
Robert Richter 013cfc5067 oprofile/x86: remove OPROFILE_IBS config option
OProfile support for IBS is now for several versions in the
kernel. The feature is stable now and the code can be activated
permanently.

As a side effect IBS now works also on nosmp configs.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-02-26 15:13:55 +01:00