Pull perf updates and fixes from Ingo Molnar:
"It's mostly fixes, but there's also two late items:
- preliminary GTK GUI support for perf report
- PMU raw event format descriptors in sysfs, to be parsed by tooling
The raw event format in sysfs is a new ABI. For example for the 'CPU'
PMU we have:
aldebaran:~> ll /sys/bus/event_source/devices/cpu/format/*
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/any
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/cmask
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/edge
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/event
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/inv
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/offcore_rsp
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/pc
-r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/umask
those lists of fields contain a specific format:
aldebaran:~> cat /sys/bus/event_source/devices/cpu/format/offcore_rsp
config1:0-63
So, those who wish to specify raw events can now use the following
event format:
-e cpu/cmask=1,event=2,umask=3
Most people will not want to specify any events (let alone raw
events), they'll just use whatever default event the tools use.
But for more obscure PMU events that have no cross-architecture
generic events the above syntax is more usable and a bit more
structured than specifying hex numbers."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
perf tools: Remove auto-generated bison/flex files
perf annotate: Fix off by one symbol hist size allocation and hit accounting
perf tools: Add missing ref-cycles event back to event parser
perf annotate: addr2line wants addresses in same format as objdump
perf probe: Finder fails to resolve function name to address
tracing: Fix ent_size in trace output
perf symbols: Handle NULL dso in dso__name_len
perf symbols: Do not include libgen.h
perf tools: Fix bug in raw sample parsing
perf tools: Fix display of first level of callchains
perf tools: Switch module.h into export.h
perf: Move mmap page data_head offset assertion out of header
perf: Fix mmap_page capabilities and docs
perf diff: Fix to work with new hists design
perf tools: Fix modifier to be applied on correct events
perf tools: Fix various casting issues for 32 bits
perf tools: Simplify event_read_id exit path
tracing: Fix ftrace stack trace entries
tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
perf report: Add a simple GTK2-based 'perf report' browser
...
Pull ACPI & Power Management changes from Len Brown:
- ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup
- cpuidle evolving, more ARM use
- thermal sub-system evolving, ditto
- assorted other PM bits
Fix up conflicts in various cpuidle implementations due to ARM cpuidle
cleanups (ARM at91 self-refresh and cpu idle code rewritten into
"standby" in asm conflicting with the consolidation of cpuidle time
keeping), trivial SH include file context conflict and RCU tracing fixes
in generic code.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits)
ACPI throttling: fix endian bug in acpi_read_throttling_status()
Disable MCP limit exceeded messages from Intel IPS driver
ACPI video: Don't start video device until its associated input device has been allocated
ACPI video: Harden video bus adding.
ACPI: Add support for exposing BGRT data
ACPI: export acpi_kobj
ACPI: Fix logic for removing mappings in 'acpi_unmap'
CPER failed to handle generic error records with multiple sections
ACPI: Clean redundant codes in scan.c
ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed()
ACPI: consistently use should_use_kmap()
PNPACPI: Fix device ref leaking in acpi_pnp_match
ACPI: Fix use-after-free in acpi_map_lsapic
ACPI: processor_driver: add missing kfree
ACPI, APEI: Fix incorrect APEI register bit width check and usage
Update documentation for parameter *notrigger* in einj.txt
ACPI, APEI, EINJ, new parameter to control trigger action
ACPI, APEI, EINJ, limit the range of einj_param
ACPI, APEI, Fix ERST header length check
cpuidle: power_usage should be declared signed integer
...
Sometimes users have turbostat running in interval mode
when they take processors offline/online.
Previously, turbostat would survive, but not gracefully.
Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:
turbostat: re-initialized with num_cpus %d
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU. This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.
This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.
Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.
On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:
# ./turbostat.old -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.29 2.29 0.95 0.02 0.00 98.77 20.23 0.00 77.41 0.00
0.25 1.24 2.29 0.98 0.02 0.00 98.75 20.34 0.03 77.74 0.00
0.27 1.22 2.29 0.54 0.00 0.00 99.18 20.64 0.00 77.70 0.00
0.26 1.22 2.29 1.22 0.00 0.00 98.52 20.22 0.00 77.74 0.00
0.26 1.38 2.29 0.78 0.02 0.00 98.95 20.51 0.05 77.56 0.00
^C
i# ./turbostat.new -s
%c0 GHz TSC %c1 %c3 %c6 %c7 %pc2 %pc3 %pc6 %pc7
0.27 1.20 2.29 0.24 0.01 0.00 99.49 20.58 0.00 78.20 0.00
0.27 1.22 2.29 0.25 0.00 0.00 99.48 20.79 0.00 77.85 0.00
0.27 1.20 2.29 0.25 0.02 0.00 99.46 20.71 0.03 77.89 0.00
0.28 1.26 2.29 0.25 0.01 0.00 99.46 20.89 0.02 77.67 0.00
0.27 1.20 2.29 0.24 0.01 0.00 99.48 20.65 0.00 78.04 0.00
cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull cpupower updates from Dominik Brodowski.
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/cpupowerutils:
cpupower tools: add install target to the debug tools' makefiles
cpupower tools: allow to build debug tools in a separate directory too
cpupower: Fix broken mask values
cpupower tool: allow to build in a separate directory
cpupower tool: makefile: simplify the recipe used to generate cpupower.pot target
cpupower tool: remove use of undefined variables from the clean target of the top makefile
cpupower: Fix linking with --as-needed
cpupower: Remove unneeded code and by that fix a memleak
cpupower: Fix number of idle states
cpupower: Unify cpupower-frequency-* manpages
cpupower: Add cpupower-idle-info manpage
cpupower: AMD fam14h/Ontario monitor can also be used by fam12h cpus
cpupower: Better interface for accessing AMD pci registers
turbostat -s
cuts down on the amount of output, per user request.
also treak some output whitespace and the man page.
Signed-off-by: Len Brown <len.brown@intel.com>
hugepage-mmap.c, hugepage-shm.c and map_hugetlb.c in Documentation/vm are
simple pass/fail tests, It's better to promote them to
tools/testing/selftests.
Thanks suggestion of Andrew Morton about this. They all need firstly
setting up proper nr_hugepages and hugepage-mmap need to mount hugetlbfs.
So I add a shell script run_vmtests to do such work which will call the
three test programs and check the return value of them.
Changes to original code including below:
a. add run_vmtests script
b. return error when read_bytes mismatch with writed bytes.
c. coding style fixes: do not use assignment in if condition
[akpm@linux-foundation.org: build the targets before trying to execute them]
[akpm@linux-foundation.org: Documentation/vm/ no longer has a Makefile. Fixes "make clean"]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tools/ is the better place for vm tools which are used by many people.
Moving them to tools also make them open to more users instead of hide in
Documentation folder.
This patch moves page-types.c to tools/vm/page-types.c. Also add a
Makefile in tools/vm and fix two coding style problems: a) change const
arrary to 'const char * const', b) change a space to tab for indent.
Signed-off-by: Dave Young <dyoung@redhat.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the run_tests script and launch the selftests by calling "make
run_tests" from the selftests top directory instead. This delegates to
the Makefile in each selftest directory, where it is decided how to launch
the local test.
This removes the need to add each selftest directory to the now removed
"run_tests" top script.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) Name string overrun fix in gianfar driver from Joe Perches.
2) VHOST bug fixes from Michael S. Tsirkin and Nadav Har'El
3) Fix dependencies on xt_LOG netfilter module, from Pablo Neira Ayuso.
4) Fix RCU locking in xt_CT, also from Pablo Neira Ayuso.
5) Add a parameter to skb_add_rx_frag() so we can fix the truesize
adjustments in the drivers that use it. The individual drivers
aren't fixed by this commit, but will be dealt with using follow-on
commits. From Eric Dumazet.
6) Add some device IDs to qmi_wwan driver, from Andrew Bird.
7) Fix a potential rcu_read_lock() imbalancein rt6_fill_node(). From
Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: fix a potential rcu_read_lock() imbalance in rt6_fill_node()
net: add a truesize parameter to skb_add_rx_frag()
gianfar: Fix possible overrun and simplify interrupt name field creation
USB: qmi_wwan: Add ZTE (Vodafone) K3570-Z and K3571-Z net interfaces
USB: option: Ignore ZTE (Vodafone) K3570/71 net interfaces
USB: qmi_wwan: Add ZTE (Vodafone) K3565-Z and K4505-Z net interfaces
qlcnic: Bug fix for LRO
netfilter: nf_conntrack: permanently attach timeout policy to conntrack
netfilter: xt_CT: fix assignation of the generic protocol tracker
netfilter: xt_CT: missing rcu_read_lock section in timeout assignment
netfilter: cttimeout: fix dependency with l4protocol conntrack module
netfilter: xt_LOG: use CONFIG_IP6_NF_IPTABLES instead of CONFIG_IPV6
vhost: fix release path lockdep checks
vhost: don't forget to schedule()
tools/virtio: stub out strong barriers
tools/virtio: add linux/hrtimer.h stub
tools/virtio: add linux/module.h stub
In perf_event__parse_sample(), the array variable was not incremented
by the amount of data used by the raw_data.
That was okay until we added PERF_SAMPLE_BRANCH_STACK which depends on
the array variable pointing to the beginning of the branch stack data.
But that was not the case if branch stack was combined with raw mode
sampling. That led to bogus branch stack addresses and count.
The bug would show up with:
$ perf record -R -b foo
This patch fixes the problem by correctly moving the array pointer
forward for RAW samples.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120317222317.GA8803@quad
[ committer note: Fix also later submitted by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The callchain stdio mode display was written using a sorted by symbol
report. In this mode we have only one callchain root per hist so we
forgot to handle cases where we have multiple callchain root, as in per
dso sorting for example.
Fix this by handling these roots like any other branch, with the hist as
the parent.
Before:
1.97% libpthread-2.12.1.so
|
--- __libc_write
create_worker
bench_sched_messaging
cmd_bench
run_builtin
main
__libc_start_main
|
--- __libc_read
create_worker
bench_sched_messaging
cmd_bench
run_builtin
main
__libc_start_main
After:
1.97% libpthread-2.12.1.so
|
|--36.97%-- __libc_write
| create_worker
| bench_sched_messaging
| cmd_bench
| run_builtin
| main
| __libc_start_main
|
|--31.47%-- __libc_read
| create_worker
| bench_sched_messaging
| cmd_bench
| run_builtin
| main
| __libc_start_main
...
Single roots keep their entry without percentage because they have
the same overhead than the hist they refer to. ie: 100% in fractal
mode and the percentage of the hist in graph mode:
0.00% [k] reschedule_interrupt
|
--- default_idle
amd_e400_idle
cpu_idle
start_secondary
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1332526010-15400-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>