Commit Graph

6872 Commits

Author SHA1 Message Date
Ravi Bangoria
99e608b595 perf probe ppc64le: Fix probe location when using DWARF
Powerpc has Global Entry Point and Local Entry Point for functions.  LEP
catches call from both the GEP and the LEP. Symbol table of ELF contains
GEP and Offset from which we can calculate LEP, but debuginfo does not
have LEP info.

Currently, perf prioritize symbol table over dwarf to probe on LEP for
ppc64le. But when user tries to probe with function parameter, we fall
back to using dwarf(i.e. GEP) and when function called via LEP, probe
will never hit.

For example:

  $ objdump -d vmlinux
    ...
    do_sys_open():
    c0000000002eb4a0:       e8 00 4c 3c     addis   r2,r12,232
    c0000000002eb4a4:       60 00 42 38     addi    r2,r2,96
    c0000000002eb4a8:       a6 02 08 7c     mflr    r0
    c0000000002eb4ac:       d0 ff 41 fb     std     r26,-48(r1)

  $ sudo ./perf probe do_sys_open
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060904

  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string

For second case, perf probed on GEP. So when function will be called via
LEP, probe won't hit.

  $ sudo ./perf record -a -e probe:do_sys_open ls
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.195 MB perf.data ]

To resolve this issue, let's not prioritize symbol table, let perf
decide what it wants to use. Perf is already converting GEP to LEP when
it uses symbol table. When perf uses debuginfo, let it find LEP offset
form symbol table. This way we fall back to probe on LEP for all cases.

After patch:

  $ sudo ./perf probe 'do_sys_open filename:string'
  $ sudo cat /sys/kernel/debug/tracing/kprobe_events
    p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string

  $ sudo ./perf record -a -e probe:do_sys_open ls
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ]

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 12:14:29 -03:00
Ravi Bangoria
d820456dc7 perf probe: Add function to post process kernel trace events
Instead of inline code, introduce function to post process kernel
probe trace events.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 12:09:59 -03:00
Naohiro Aota
19f00b0117 perf probe: Support signedness casting
The 'perf probe' tool detects a variable's type and use the detected
type to add a new probe. Then, kprobes prints its variable in
hexadecimal format if the variable is unsigned and prints in decimal if
it is signed.

We sometimes want to see unsigned variable in decimal format (i.e.
sector_t or size_t). In that case, we need to investigate the variable's
size manually to specify just signedness.

This patch add signedness casting support. By specifying "s" or "u" as a
type, perf-probe will investigate variable size as usual and use the
specified signedness.

E.g. without this:

  $ perf probe -a 'submit_bio bio->bi_iter.bi_sector'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)
  You can now use it in all perf tools, such as:
          perf record -e probe:submit_bio -aR sleep 1
  $ cat trace_pipe|head
          dbench-9692  [003] d..1   971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00
          dbench-9692  [003] d..1   971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80
          dbench-9692  [003] d..1   971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80
...
  // need to investigate the variable size
  $ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64)
  You can now use it in all perf tools, such as:
        perf record -e probe:submit_bio -aR sleep 1

  With this:

  // just use "s" to cast its signedness
  $ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)
  You can now use it in all perf tools, such as:
          perf record -e probe:submit_bio -aR sleep 1
  $ cat trace_pipe|head
          dbench-9689  [001] d..1  1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128
          dbench-9689  [001] d..1  1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072
          dbench-9697  [006] d..1  1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208

  This commit also update perf-probe.txt to describe "types". Most parts
  are based on existing documentation: Documentation/trace/kprobetrace.txt

Committer note:

Testing using 'perf trace':

  # perf probe -a 'submit_bio bio->bi_iter.bi_sector'
  Added new event:
    probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector)

  You can now use it in all perf tools, such as:

	perf record -e probe:submit_bio -aR sleep 1

  # trace --no-syscalls --ev probe:submit_bio
      0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0)
   3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8)
   3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0)
   3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8)
<SNIP>
   4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88)
   4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880)
  ^C[root@jouet ~]#

Now, using this new feature:

[root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s'
Added new event:
  probe:submit_bio     (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s)

You can now use it in all perf tools, such as:

	perf record -e probe:submit_bio -aR sleep 1

  [root@jouet ~]# trace --no-syscalls --ev probe:submit_bio
     0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704)
     0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712)
     0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720)
     2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728)
  5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0)
  5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8)
  5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16)
  5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24)
  ^C#

With callchains:

  # trace --no-syscalls --ev probe:submit_bio/max-stack=10/
     0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
     2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568)
                                       submit_bio+0xa8200001 ([kernel.kallsyms])
                                       submit_bh+0xa8200013 ([kernel.kallsyms])
                                       journal_submit_commit_record+0xa82001ac ([kernel.kallsyms])
                                       jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms])
                                       kjournald2+0xa82000ca ([kernel.kallsyms])
                                       kthread+0xa82000d8 ([kernel.kallsyms])
                                       ret_from_fork+0xa820001f ([kernel.kallsyms])
  ^C#

Signed-off-by: Naohiro Aota <naohiro.aota@hgst.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470710408-23515-1-git-send-email-naohiro.aota@hgst.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:52:22 -03:00
Mark Rutland
3df33eff2b perf stat: Avoid skew when reading events
When we don't have a tracee (i.e. we're attaching to a task or CPU),
counters can still be running after our workload finishes, and can still
be running as we read their values. As we read events one-by-one, there
can be arbitrary skew between values of events, even within a group.
This means that ratios within an event group are not reliable.

This skew can be seen if measuring a group of identical events, e.g:

  # perf stat -a -C0 -e '{cycles,cycles}' sleep 1

To avoid this, we must stop groups from counting before we read the
values of any constituent events. This patch adds and makes use of a new
disable_counters() helper, which disables group leaders (and thus each
group as a whole). This mirrors the use of enable_counters() for
starting event groups in the absence of a tracee.

Closing a group leader splits the group, and without a disabled group
leader the newly split events will begin counting. Thus to ensure counts
are reliable we must defer closing group leaders until all counts have
been read. To do so this patch removes the event closing logic from the
read_counters() helper, explicitly closes the events using
perf_evlist__close(), which also aids legibility.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1470747869-3567-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:48:32 -03:00
Konstantin Khlebnikov
cb3f3378cd perf probe: Fix module name matching
If module is "module" then dso->short_name is "[module]".  Substring
comparing is't enough: "raid10" matches to "[raid1]".  This patch also
checks terminating zero in module name.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:48:09 -03:00
Masami Hiramatsu
8e34189b34 perf probe: Adjust map->reloc offset when finding kernel symbol from map
Adjust map->reloc offset for the unmapped address when finding
alternative symbol address from map, because KASLR can relocate the
kernel symbol address.

The same adjustment has been done when finding appropriate kernel symbol
address from map which was introduced by commit f90acac757 ("perf
probe: Find given address from offline dwarf")

Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu@linaro.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20160806192948.e366f3fbc4b194de600f8326@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:47:43 -03:00
Arnaldo Carvalho de Melo
887fa86d6f perf hists: Trim libtraceevent trace_seq buffers
When we use libtraceevent to format trace event fields into printable
strings to use in hist entries it is important to trim it from the
default 4 KiB it starts with to what is really used, to reduce the
memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning
the seq.buffer formatted with the fields contents.

Reported-and-Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-t3hl7uxmilrkigzmc90rlhk2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:46:56 -03:00
Brendan Gregg
bcdc09af3e perf script: Add 'bpf-output' field to usage message
This adds the 'bpf-output' field to the perf script usage message, and docs.

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1470192469-11910-4-git-send-email-bgregg@netflix.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-09 10:46:43 -03:00
Ingo Molnar
f282f7a0ec Merge tag 'perf-core-for-mingo-20160803' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

- Add --sample-cpu to 'perf record', to explicitely ask for sampling
  the CPU (Jiri Olsa)

Fixes:

- Fix processing of multi byte chunks in objdump output, fixing
  disassemble processing for annotation on at least ARM64 (Jan Stancek)

- Use SyS_epoll_wait in a BPF 'perf test' entry instead of sys_epoll_wait, that
  is not present in the DWARF info in vmlinux files (Arnaldo Carvalho de Melo)

- Add -wno-shadow when processing files using perl headers, fixing
  the build on Fedora Rawhide and Arch Linux (Namhyung Kim)

Infrastructure changes:

- Annotate prep work to better catch and report errors related to
  using objdump to disassemble DSOs (Arnaldo Carvalho de Melo)

- Add 'alloc', 'scnprintf' and 'and' methods for bitmap processing (Jiri Olsa)

- Add nested output resorting callback in hists processing (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-04 11:02:38 +02:00
Arnaldo Carvalho de Melo
c369e0a1a8 perf tests bpf: Use SyS_epoll_wait alias
Something made the sys_epoll_wait() function alias not to be found in
the vmlinux DWARF info, being found only in /proc/kallsyms, which made
the BPF perf tests to fail:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : FAILED!
  37.2: Test BPF prologue generation                           : Skip
  37.3: Test BPF relocation checker                            : Skip
  [root@jouet ~]#

Using -v we can see it is failing to find DWARF info for the probed function,
sys_epoll_wait, which we can find in /proc/kallsyms but not in vmlinux with
CONFIG_DEBUG_INFO:

  [root@jouet ~]# grep -w sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

  [root@jouet ~]# readelf -wi /lib/modules/4.7.0+/build/vmlinux | grep -w sys_epoll_wait
  [root@jouet ~]#

If we try to use perf probe:

[root@jouet ~]# perf probe sys_epoll_wait
Failed to find debug information for address ffffffffbd295b50
Probe point 'sys_epoll_wait' not found.
  Error: Failed to add events.
[root@jouet ~]#

It all works if we use SyS_epoll_wait, that is just an alias to the probed
function:

  [root@jouet ~]# grep -i sys_epoll_wait /proc/kallsyms
  ffffffffbd295b50 T SyS_epoll_wait
  ffffffffbd295b50 T sys_epoll_wait
  [root@jouet ~]#

So use it:

  [root@jouet ~]# perf test BPF
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : Ok
  37.2: Test BPF prologue generation                           : Ok
  37.3: Test BPF relocation checker                            : Ok
  [root@jouet ~]#

Further info:

  [root@jouet ~]# gcc --version
  gcc (GCC) 6.1.1 20160621 (Red Hat 6.1.1-3)
  [acme@jouet linux]$ cat /etc/fedora-release
  Fedora release 24 (Twenty Four)

Investigation as to why it fails is still underway, but it was always
going from sys_epoll_wait to SyS_epoll_wait when looking up the DWARF
info in vmlinux, and this is what is breaking now.

Switching to use SyS_epoll_wait allows this test to proceed and test the
BPF code it was designed for, so lets have this in to allow passing this
test while we fix the root cause.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-7hekjp0bodwjbb419sl2b55h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-03 19:40:48 -03:00
Jan Stancek
b2d0dbf097 perf tests: objdump output can contain multi byte chunks
objdump's raw insn output can vary across architectures on the number of
bytes per chunk (bpc) displayed and their endianness.

The code-reading test relied on reading objdump output as 1 bpc. Kaixu
Xia reported test failure on ARM64, where objdump displays 4 bpc:

  70c48:        f90027bf         str        xzr, [x29,#72]
  70c4c:        91224000         add        x0, x0, #0x890
  70c50:        f90023a0         str        x0, [x29,#64]

This patch adds support to read raw insn output for any bpc length.
In case of 2+ bpc it also guesses objdump's display endian.

Reported-and-Tested-by: Kaixu Xia <xiakaixu@huawei.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/07f0f7bcbda78deb423298708ef9b6a54d6b92bd.1452592712.git.jstancek@redhat.com
[ Fix up pr_fmt() call to use %zd for size_t variables, fixing the build on Ubuntu cross-compiling to armhf and ppc64 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 16:42:51 -03:00
Jiri Olsa
b6f35ed774 perf record: Add --sample-cpu option
Adding --sample-cpu option to be able to explicitly enable CPU sample
type. Currently it's only enable implicitly in case the target is cpu
related.

It will be useful for following c2c record tool.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 16:33:29 -03:00
Jiri Olsa
52c5cc363f perf hists: Introduce output_resort_cb method
When dealing with nested hist entries it's helpful to have a way to
resort those nested objects.

Adding optional callback call into output_resort function and following
new interface function:

  typedef int (*hists__resort_cb_t)(struct hist_entry *he);

  void hists__output_resort_cb(struct hists *hists,
                               struct ui_progress *prog,
                               hists__resort_cb_t cb);

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 16:33:28 -03:00
Jiri Olsa
4842576cd8 perf tools: Move config/Makefile into Makefile.config
There's no reason to keep it in separate directory now when we moved out
the rest of the files.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 16:33:28 -03:00
Jiri Olsa
ff3e33b075 perf tests: Add test for bitmap_scnprintf function
Automatically test the bitmap_scnprintf function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1470074555-24889-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 16:33:27 -03:00
Namhyung Kim
b581c01fff perf tools: Fix build failure on perl script context
On my Archlinux machine, perf faild to build like below:

    CC       scripts/perl/Perf-Trace-Util/Context.o
  In file included from /usr/lib/perl5/core/perl/CORE/perl.h:3905:0,
                   from Context.xs:23:
  /usr/lib/perl5/core/perl/CORE/inline.h: In function :
  /usr/lib/perl5/core/perl/CORE/cop.h:612:13: warning: declaration of 'av'
                                  shadows a previous local [-Werror-shadow]
             AV *av =3D GvAV(PL_defgv);
                 ^
  /usr/lib/perl5/core/perl/CORE/inline.h:526:5: note: in expansion of
                                  macro 'CX_POP_SAVEARRAY'
         CX_POP_SAVEARRAY(cx);
         ^~~~~~~~~~~~~~~~
  In file included from /usr/lib/perl5/core/perl/CORE/perl.h:5853:0,
                   from Context.xs:23:
  /usr/lib/perl5/core/perl/CORE/inline.h:518:9: note:
                                  shadowed declaration is here
         AV *av;
             ^~

What I did to fix is adding '-Wno-shadow' as the error message said it's
the cause of the failure.  Since it's from the perl (not perf) code
base, we don't have the control so I just wanted to ignore the warning
when compiling perl scripting code.

Committer note:

This also fixes the build on Fedora Rawhide.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20160802024317.31725-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02 12:11:06 -03:00
Arnaldo Carvalho de Melo
c17c17e8c2 perf annotate: Plug filename string leak
If dso__build_id_filename(..., NULL, ...) returns !NULL its because it
allocated it, so, when reaching the  'if (dso__is_kcore()) test, we
already checked that and were just "fallbacking" to using
dso->long_name, but without freeing filename, thus leaking it.

Fix it by adding the dso__is_kcore() test to the 'or' group just after
it, the one containing the full fallback code, including freeing the
filename.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: ee205503f2 ("perf tools: Fix annotation with kcore")
Link: http://lkml.kernel.org/n/tip-qi4rpjq8yo6myvg99kkgt0xz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01 18:49:13 -03:00
Arnaldo Carvalho de Melo
ee51d85139 perf annotate: Introduce strerror for handling symbol__disassemble() errors
We were just using pr_error() which makes it difficult for non stdio UIs
to provide errors using its widgets, as they need to somehow catch what
was passed to pr_error().

Fix it by introducing a __strerror() interface like the ones used
elsewhere, for instance target__strerror().

This is just the initial step, more work will be done, but first some
error handling bugs noticed while working on this need to be dealt with.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dgd22zl2xg7x4vcnoa83jxfb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01 18:18:16 -03:00
Arnaldo Carvalho de Melo
5cb725a972 perf annotate: Rename symbol__annotate() to symbol__disassemble()
This function will not annotate anything, it will just disassembly the
given map->dso and symbol.

It currently does this by parsing the output of 'objdump --disassemble',
but this could conceivably be done using a library or an offshot of
the kernel's instruction decoder (arch/x86/lib/inat.c), etc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2xpfl4bfnrd6x584b390qok7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-01 17:06:46 -03:00
Linus Torvalds
7f7d556496 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "This update contains:

   - a fix for the bpf tools to use the new EM_BPF code

   - a fix for the module parser of perf to retrieve the
     proper text start address

   - add str_error_c to libapi to avoid linking against
     tools/lib/str_error_r.o"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools lib api: Add str_error_c to libapi
  perf s390: Fix 'start' address of module's map
  tools lib bpf: Use official ELF e_machine value
2016-07-30 12:11:36 -07:00
Arnaldo Carvalho de Melo
ce92834407 perf target: str_error_r() always returns the buffer it receives
So no need for checking if it uses the strerror_r() GNU variant error
reporting mechanism, i.e. if it returns a pointer to a immutable string
internal to glibc.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: c8b5f2c96d ("tools: Introduce str_error_r()")
Link: http://lkml.kernel.org/n/tip-xr83cd4y4r3cn6tq6w4f59jb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-29 11:54:35 -03:00
Arnaldo Carvalho de Melo
9955d0be16 perf annotate: Use pipe + fork instead of popen
We will need to redirect the stderr as well, so open code popen as
a starting point.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-k0zt9svg4bswiglem7ornts4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-29 11:12:39 -03:00
Linus Torvalds
f0c98ebc57 Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
2016-07-28 17:38:16 -07:00
Vlastimil Babka
2516035499 mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
After the previous patch, we can distinguish costly allocations that
should be really lightweight, such as THP page faults, with
__GFP_NORETRY.  This means we don't need to recognize khugepaged
allocations via PF_KTHREAD anymore.  We can also change THP page faults
in areas where madvise(MADV_HUGEPAGE) was used to try as hard as
khugepaged, as the process has indicated that it benefits from THP's and
is willing to pay some initial latency costs.

We can also make the flags handling less cryptic by distinguishing
GFP_TRANSHUGE_LIGHT (no reclaim at all, default mode in page fault) from
GFP_TRANSHUGE (only direct reclaim, khugepaged default).  Adding
__GFP_NORETRY or __GFP_KSWAPD_RECLAIM is done where needed.

The patch effectively changes the current GFP_TRANSHUGE users as
follows:

* get_huge_zero_page() - the zero page lifetime should be relatively
  long and it's shared by multiple users, so it's worth spending some
  effort on it.  We use GFP_TRANSHUGE, and __GFP_NORETRY is not added.
  This also restores direct reclaim to this allocation, which was
  unintentionally removed by commit e4a49efe4e7e ("mm: thp: set THP defrag
  by default to madvise and add a stall-free defrag option")

* alloc_hugepage_khugepaged_gfpmask() - this is khugepaged, so latency
  is not an issue.  So if khugepaged "defrag" is enabled (the default), do
  reclaim via GFP_TRANSHUGE without __GFP_NORETRY.  We can remove the
  PF_KTHREAD check from page alloc.

  As a side-effect, khugepaged will now no longer check if the initial
  compaction was deferred or contended.  This is OK, as khugepaged sleep
  times between collapsion attempts are long enough to prevent noticeable
  disruption, so we should allow it to spend some effort.

* migrate_misplaced_transhuge_page() - already was masking out
  __GFP_RECLAIM, so just convert to GFP_TRANSHUGE_LIGHT which is
  equivalent.

* alloc_hugepage_direct_gfpmask() - vma's with VM_HUGEPAGE (via madvise)
  are now allocating without __GFP_NORETRY.  Other vma's keep using
  __GFP_NORETRY if direct reclaim/compaction is at all allowed (by default
  it's allowed only for madvised vma's).  The rest is conversion to
  GFP_TRANSHUGE(_LIGHT).

[mhocko@suse.com: suggested GFP_TRANSHUGE_LIGHT]
Link: http://lkml.kernel.org/r/20160721073614.24395-7-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Arnaldo Carvalho de Melo
7c48dcfd32 perf evsel: Introduce constructor for cycles event
That is the default used when no events is specified in tools, separate
it so that simpler tools that need no evlist can use it directly.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-67mwuthscwroz88x9pswcqyv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-28 18:33:20 -03:00