Commit Graph

638274 Commits

Author SHA1 Message Date
Masami Hiramatsu eebc509b20 perf probe: Fix --funcs to show correct symbols for offline module
Fix --funcs (-F) option to show correct symbols for offline module.
Since previous perf-probe uses machine__findnew_module_map() for offline
module, even if user passes a module file (with full path) which is for
other architecture, perf-probe always tries to load symbol map for
current kernel module.

This fix uses dso__new_map() to load the map from given binary as same
as a map for user applications.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-04 11:15:09 -03:00
Arnaldo Carvalho de Melo 7934c98a6e perf symbols: Robustify reading of build-id from sysfs
Markus reported that perf segfaults when reading /sys/kernel/notes from
a kernel linked with GNU gold, due to what looks like a gold bug, so do
some bounds checking to avoid crashing in that case.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Report-Link: http://lkml.kernel.org/r/20161219161821.GA294@x4
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-ryhgs6a6jxvz207j2636w31c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 16:11:13 -03:00
Arnaldo Carvalho de Melo 30a9c64448 perf tools: Install tools/lib/traceevent plugins with install-bin
Those are binaries as well, so should be installed by:

  make -C tools/perf install-bin'

too.

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-3841b37u05evxrs1igkyu6ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 16:11:13 -03:00
Daniel Bristot de Oliveira 074859184d tools lib traceevent: Fix prev/next_prio for deadline tasks
Currently, the sched:sched_switch tracepoint reports deadline tasks with
priority -1. But when reading the trace via perf script I've got the
following output:

  # ./d & # (d is a deadline task, see [1])
  # perf record -e sched:sched_switch -a sleep 1
  # perf script
      ...
         swapper     0 [000]  2146.962441: sched:sched_switch: swapper/0:0 [120] R ==> d:2593 [4294967295]
               d  2593 [000]  2146.972472: sched:sched_switch: d:2593 [4294967295] R ==> g:2590 [4294967295]

The task d reports the wrong priority [4294967295]. This happens because
the "int prio" is stored in an unsigned long long val. Although it is
set as a %lld, as int is shorter than unsigned long long,
trace_seq_printf prints it as a positive number.

The fix is just to cast the val as an int, and print it as a %d,
as in the sched:sched_switch tracepoint's "format".

The output with the fix is:

  # ./d &
  # perf record -e sched:sched_switch -a sleep 1
  # perf script
      ...
         swapper     0 [000]  4306.374037: sched:sched_switch: swapper/0:0 [120] R ==> d:10941 [-1]
               d 10941 [000]  4306.383823: sched:sched_switch: d:10941 [-1] R ==> swapper/0:0 [120]

[1] d.c

 ---
  #include <stdio.h>
  #include <unistd.h>
  #include <sys/syscall.h>
  #include <linux/types.h>
  #include <linux/sched.h>

  struct sched_attr {
	__u32 size, sched_policy;
	__u64 sched_flags;
	__s32 sched_nice;
	__u32 sched_priority;
	__u64 sched_runtime, sched_deadline, sched_period;
  };

  int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags)
  {
	return syscall(__NR_sched_setattr, pid, attr, flags);
  }

  int main(void)
  {
	struct sched_attr attr = {
		.size		= sizeof(attr),
		.sched_policy	= SCHED_DEADLINE, /* This creates a 10ms/30ms reservation */
		.sched_runtime	= 10 * 1000 * 1000,
		.sched_period	= attr.sched_deadline = 30 * 1000 * 1000,
	};

	if (sched_setattr(0, &attr, 0) < 0) {
		perror("sched_setattr");
		return -1;
	}

	for(;;);
  }
 ---

Committer notes:

Got the program from the provided URL, http://bristot.me/lkml/d.c,
trimmed it and included in the cset log above, so that we have
everything needed to test it in one place.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/866ef75bcebf670ae91c6a96daa63597ba981f0d.1483443552.git.bristot@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 16:11:12 -03:00
Jiri Olsa 60437ac02f perf record: Fix --switch-output documentation and comment
There's no --signal-trigger option, also adding the code comment into
record man page.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 11:11:38 -03:00
Jiri Olsa efd2130711 perf record: Make __record_options static
There's no need for this one to be global.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 11:11:10 -03:00
Jiri Olsa b66fb1da5a tools lib subcmd: Add OPT_STRING_OPTARG_SET option
To allow string options with a default argument and variable set when
the option is used.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1483431600-19887-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-03 11:10:38 -03:00
Masami Hiramatsu 1f2ed153b9 perf probe: Fix to get correct modname from elf header
Since 'perf probe' supports cross-arch probes, it is possible to analyze
different arch kernel image which has different bits-per-long.

In that case, it fails to get the module name because it uses the
MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead
of the target arch bits-per-long.

This fixes above issue by changing modname-offset based on the target
archs bit width. This is ok because linux kernel uses LP64 model on
64bit arch.

E.g. without this (on x86_64, and target module is arm32):

  $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
  p:probe/configfs_lookup :configfs_lookup+0
                          ^-Here is an empty module name.

With this fix, you can see correct module name:

  $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup
  p:probe/configfs_lookup configfs:configfs_lookup+0

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-02 14:09:17 -03:00
Arnaldo Carvalho de Melo b6f4c66704 samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-28 10:47:13 -03:00
Arnaldo Carvalho de Melo ee12996c9d samples/bpf sock_example: Avoid getting ethhdr from two includes
To avoid the following build failure on Alpine Linux 3.4, that has
clang-3.8 with the bpf target:

    HOSTCC  samples/bpf/sock_example.o
  In file included from /usr/include/net/ethernet.h:10:0,
                   from /git/linux/samples/bpf/sock_example.h:7,
                   from /git/linux/samples/bpf/sock_example.c:30:
  /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
  ethhdr'
   struct ethhdr {
          ^
  In file included from /git/linux/samples/bpf/sock_example.c:26:0:
  ./usr/include/linux/if_ether.h:144:8: note: originally defined here
   struct ethhdr {
          ^
  scripts/Makefile.host:124: recipe for target
  'samples/bpf/sock_example.o' failed
  make[2]: *** [samples/bpf/sock_example.o] Error 1
  /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed

So include net/if_ether.h for the needs of sock_example.h, using the
same include that sock_example.c uses.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-27 21:49:17 -03:00
Namhyung Kim 9396c9cb0d perf sched timehist: Show total scheduling time
Show length of analyzed sample time and rate of idle task running.
This also takes care of time range given by --time option.

  $ perf sched timehist -sI | tail
  Samples do not have callchains.
  Idle stats:
      CPU  0 idle for    930.316  msec  ( 92.93%)
      CPU  1 idle for    963.614  msec  ( 96.25%)
      CPU  2 idle for    885.482  msec  ( 88.45%)
      CPU  3 idle for    938.635  msec  ( 93.76%)

      Total number of unique tasks: 118
  Total number of context switches: 2337
             Total run time (msec): 3718.048
      Total scheduling time (msec): 1001.131  (x 4)

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-27 21:47:57 -03:00
Ingo Molnar 3705b97505 Merge tag 'perf-urgent-for-mingo-20161222' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

Fixes for 'perf sched timehist': (Namhyung Kim)

 - Define a larger initial alignment value for the COMM column and
   make it be more consistently honoured, for instance in the header.

 - Fix invalid period calculation when using the --time option to
   select a time slice, when events outside that slice were being
   considered for the per cpu idle stats summary.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-23 20:23:29 +01:00
Namhyung Kim bdd75729e5 perf sched timehist: Fix invalid period calculation
When --time option is given with a value outside recorded time, the last
sample time (tprev) was set to that value and run time calculation might
be incorrect.  This is a problem of the first samples for each cpus
since it would skip the runtime update when tprev is 0.  But with --time
option it had non-zero (which is invalid) value so the calculation is
also incorrect.

For example, let's see the followging:

  $ perf sched timehist
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
      3195.968367 [0003]  <idle>                              0.000      0.000      0.000
      3195.968386 [0002]  Timer[4306/4277]                    0.000      0.000      0.018
      3195.968397 [0002]  Web Content[4277]                   0.000      0.000      0.000
      3195.968595 [0001]  JS Helper[4302/4277]                0.000      0.000      0.000
      3195.969217 [0000]  <idle>                              0.000      0.000      0.621
      3195.969251 [0001]  kworker/1:1H[291]                   0.000      0.000      0.033

The sample starts at 3195.968367 but when I gave a time interval from
3194 to 3196 (in sec) it will calculate the whole 2 second as runtime.
In below, 2 cpus accounted it as runtime, other 2 cpus accounted it as
idle time.

Before:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
      CPU  0 idle for   1995.991  msec
      CPU  1 idle for     20.793  msec
      CPU  2 idle for     30.191  msec
      CPU  3 idle for   1999.852  msec

      Total number of unique tasks: 23
  Total number of context switches: 128
             Total run time (msec): 3724.940

After:

  $ perf sched timehist --time 3194,3196 -s | tail
  Idle stats:
      CPU  0 idle for     10.811  msec
      CPU  1 idle for     20.793  msec
      CPU  2 idle for     30.191  msec
      CPU  3 idle for     18.337  msec

      Total number of unique tasks: 23
  Total number of context switches: 128
             Total run time (msec): 18.139

Committer notes:

Further testing:

Before:

  Idle stats:
      CPU  0 idle for    229.785  msec
      CPU  1 idle for    937.944  msec
      CPU  2 idle for    188.931  msec
      CPU  3 idle for    986.185  msec

  After:

  # perf sched timehist --time 40602,40603 -s | tail

  Idle stats:
      CPU  0 idle for    229.785  msec
      CPU  1 idle for    175.407  msec
      CPU  2 idle for    188.931  msec
      CPU  3 idle for    223.657  msec

      Total number of unique tasks: 68
  Total number of context switches: 814
             Total run time (msec): 97.688

  # for cpu in `seq 0 3` ; do echo -n "CPU $cpu idle for " ; perf sched timehist --time 40602,40603 | grep "\[000${cpu}\].*\<idle\>" | tr -s ' ' | cut -d' ' -f7 | awk '{entries++ ; s+=$1} END {print s " msec (entries: " entries ")"}' ; done
  CPU 0 idle for 229.721 msec (entries: 123)
  CPU 1 idle for 175.381 msec (entries: 65)
  CPU 2 idle for 188.903 msec (entries: 56)
  CPU 3 idle for 223.61 msec (entries: 102)

Difference due to the idle stats being accounted at nanoseconds precision while
the <idle> entries in 'perf sched timehist' are trucated at msec.usec.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixes: 853b740711 ("perf sched timehist: Add option to specify time window of interest")
Link: http://lkml.kernel.org/r/20161222060350.17655-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22 16:35:46 -03:00
Namhyung Kim 4fa0d1aa27 perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
Now that the default 'comm_width' value is 30, no need to check that at
print_summary,

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22 16:35:46 -03:00
Namhyung Kim 9b8087d720 perf sched timehist: Enlarge default 'comm_width'
Current default value is 20 but it's easily changed to a bigger value as
task has a long name and different tid and pid.  And it makes the output
not aligned.  So change it to have a large value as summary shows.

Committer notes:

Before:

  # perf sched record
  ^C
  # perf sched timehist
  <SNIP>
    40602.770537 [0001]  rcuos/2[29]               7.970      0.002      0.020
    40602.771512 [0003]  <idle>                    0.003      0.000      0.986
    40602.771586 [0001]  <idle>                    0.020      0.000      1.049
    40602.771606 [0001]  qemu-system-x86[3593/3510]      0.000      0.002      0.020
    40602.771629 [0003]  qemu-system-x86[3510]           0.000      0.003      0.116
    40602.771776 [0000]  <idle>                          0.001      0.000      1.892
  <SNIP>

After:

  # perf sched timehist
  <SNIP>
   40602.770537 [0001]  rcuos/2[29]                         7.970      0.002      0.020
   40602.771512 [0003]  <idle>                              0.003      0.000      0.986
   40602.771586 [0001]  <idle>                              0.020      0.000      1.049
   40602.771606 [0001]  qemu-system-x86[3593/3510]          0.000      0.002      0.020
   40602.771629 [0003]  qemu-system-x86[3510]               0.000      0.003      0.116
  <SNIP>

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22 16:35:45 -03:00
Namhyung Kim 0e6758e823 perf sched timehist: Honour 'comm_width' when aligning the headers
Current default value is 20, but that may change in the future, so make
places where we have 20 hardcoded use 'comm_width'.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20161222060350.17655-1-namhyung@kernel.org
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-22 16:35:45 -03:00
Peter Zijlstra 1134c2b5cb perf/x86: Fix overlap counter scheduling bug
Jiri reported the overlap scheduling exceeding its max stack.

Looking at the constraint that triggered this, it turns out the
overlap marker isn't needed.

The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
the counter mask of such an event is not a subset of any other counter
mask of a constraint with an equal or higher weight".

Esp. that latter part is of interest here I think, our overlapping mask
is 0x0e, that has 3 bits set and is the highest weight mask in on the
PMU, therefore it will be placed last. Can we still create a scenario
where we would need to rewind that?

The scenario for AMD Fam15h is we're having masks like:

	0x3F -- 111111
	0x38 -- 111000
	0x07 -- 000111

	0x09 -- 001001

And we mark 0x09 as overlapping, because it is not a direct subset of
0x38 or 0x07 and has less weight than either of those. This means we'll
first try and place the 0x09 event, then try and place 0x38/0x07 events.
Now imagine we have:

	3 * 0x07 + 0x09

and the initial pick for the 0x09 event is counter 0, then we'll fail to
place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
event, and then re-try all 0x07 events, which will now work.

The masks on the PMU in question are:

  0x01 - 0001
  0x03 - 0011
  0x0e - 1110
  0x0c - 1100

But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
0x1) are of heavier weight, it should all work out.

Reported-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-22 17:45:43 +01:00
Stephane Eranian daa864b8f8 perf/x86/pebs: Fix handling of PEBS buffer overflows
This patch solves a race condition between PEBS and the PMU handler.

In case multiple PEBS events are sampled at the same time,
it is possible to have GLOBAL_STATUS bit 62 set indicating
PEBS buffer overflow and also seeing at most 3 PEBS counters
having their bits set in the status register. This is a sign
that there was at least one PEBS record pending at the time
of the PMU interrupt. PEBS counters must only be processed
via the drain_pebs() calls, and not via the regular sample
processing loop coming after that the function, otherwise
phony regular samples may be generated in the sampling buffer
not marked with the EXACT tag.

Another possibility is to have one PEBS event and at least
one non-PEBS event whic hoverflows while PEBS has armed. In this
case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
status bit for the PEBS counter will be on Skylake.

To avoid this problem, we systematically ignore the PEBS-enabled
counters from the GLOBAL_STATUS mask and we always process PEBS
events via drain_pebs().

The problem manifested itself by having non-exact samples when
sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
not have the EXACT flag set.

Note that this problem is only present on Skylake processor.
This fix is harmless on older processors.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-22 17:45:36 +01:00
Ingo Molnar 0375691720 Merge tag 'perf-core-for-mingo-20161220' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes:

New features:

 - Introduce 'perf sched timehist --idle', to analyse processes
   going to/from idle state (Namhyung Kim)

Fixes:

 - Allow 'perf record -u user' to continue when facing races with threads
   going away after having scanned them via /proc (Jiri Olsa)

 - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)

 - Support jumps with multiple arguments (Ravi Bangoria)

 - Fix jumps to before the function where they are located (Ravi Bangoria)

 - Fix lock-pi help string (Davidlohr Bueso)

 - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)

 - Do not overwrite valid build id in 'perf diff' (Kan Liang)

 - Don't throw error for zero length symbols, allowing the use of the TUI
   in PowerPC, where such symbols became more common recently (Ravi Bangoria)

Infrastructure changes:

 - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
   duplication (Joe Stringer)

 - Move headers check into bash script (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 20:13:37 +01:00
Joe Stringer 9899694a7f samples/bpf: Move open_raw_sock to separate header
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Joe Stringer 205c8ada31 samples/bpf: Remove perf_event_open() declaration
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.

Committer notes:

Testing it:

  $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
  make[1]: Entering directory '/home/build/v4.9.0-rc8+'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /home/acme/git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    /home/acme/git/linux/scripts/checksyscalls.sh
    HOSTCC  samples/bpf/test_verifier.o
    HOSTCC  samples/bpf/libbpf.o
    HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
    HOSTCC  samples/bpf/test_maps.o
    HOSTCC  samples/bpf/sock_example.o
    HOSTCC  samples/bpf/bpf_load.o
<SNIP>
    HOSTLD  samples/bpf/trace_event
    HOSTLD  samples/bpf/sampleip
    HOSTLD  samples/bpf/tc_l2_redirect
  make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
  $

Also tested the offwaketime resulting from the rebuild, seems to work as
before.

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Arnaldo Carvalho de Melo 811b4f0d78 samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
Only one of the examples declare the bpf_insn bpf proggie as a const:

  $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
  samples/bpf/fds_example.c:	static const struct bpf_insn insns[] = {
  samples/bpf/sock_example.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach2.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_sock.c:	struct bpf_insn prog[] = {
  $

Which causes this warning:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  <SNIP>
     HOSTCC  samples/bpf/fds_example.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex1_user.o

So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.

Cc: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer 5dc880de6e tools lib bpf: Add bpf_prog_{attach,detach}
Commit d8c5b17f2b ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.

Committer notes:

Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
Link: https://github.com/joestringer/linux/commit/353e6f298c3d0a92fa8bfa61ff898c5050261a12.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer 43371c83f3 samples/bpf: Switch over to libbpf
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  <SNIP>
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    HOSTCC  scripts/basic/fixdep
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    HOSTCC  scripts/basic/bin2c
    HOSTCC  arch/x86/tools/relocs_32.o
    HOSTCC  arch/x86/tools/relocs_64.o
    LD      samples/bpf/built-in.o
  <SNIP>
    HOSTCC  samples/bpf/fds_example.o
    HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex2_user.o
  <SNIP>
    HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
	  -Wno-compare-distinct-pointer-types \
	  -Wno-gnu-variable-sized-type-not-at-end \
	  -Wno-address-of-packed-member -Wno-tautological-compare \
	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD  samples/bpf/tc_l2_redirect
  <SNIP>
    HOSTLD  samples/bpf/lwt_len_hist
    HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime         offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
    [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
    [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
    [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
    [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
    [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
    [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
    [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
    [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
  qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
  swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: https://github.com/joestringer/linux/commit/5c40f54a52b1f437123c81e21873f4b4b1f9bd55.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:38 -03:00
Kan Liang ed6c166cc7 perf diff: Do not overwrite valid build id
Fixes a perf diff regression issue which was introduced by commit
5baecbcd9c ("perf symbols: we can now read separate debug-info files
based on a build ID")

The binary name could be same when perf diff different binaries. Build
id is used to distinguish between them.
However, the previous patch assumes the same binary name has same build
id. So it overwrites the build id according to the binary name,
regardless of whether the build id is set or not.

Check the has_build_id in dso__load. If the build id is already set, use
it.

Before the fix:

  $ perf diff 1.perf.data 2.perf.data
  # Event 'cycles'
  #
  # Baseline    Delta  Shared Object     Symbol
  # ........  .......  ................  .............................
  #
    99.83%  -99.80%  tchain_edit       [.] f2
     0.12%  +99.81%  tchain_edit       [.] f3
     0.02%   -0.01%  [ixgbe]           [k] ixgbe_read_reg

  After the fix:
  $ perf diff 1.perf.data 2.perf.data
  # Event 'cycles'
  #
  # Baseline    Delta  Shared Object     Symbol
  # ........  .......  ................  .............................
  #
    99.83%   +0.10%  tchain_edit       [.] f3
     0.12%   -0.08%  tchain_edit       [.] f2

Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
CC: Dima Kogan <dima@secretsauce.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 5baecbcd9c ("perf symbols: we can now read separate debug-info files based on a build ID")
Link: http://lkml.kernel.org/r/1481642984-13593-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:38 -03:00