Commit Graph

23810 Commits

Author SHA1 Message Date
Ingo Molnar
210f400d68 Merge tag 'v4.10-rc8' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-14 07:29:14 +01:00
Linus Torvalds
fdb0ee7c65 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix a sporadic missed timer hw reprogramming bug that can result in
  random delays"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/nohz: Fix possible missing clock reprog after tick soft restart
2017-02-11 10:24:16 -08:00
Linus Torvalds
d5b76bef01 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A kernel crash fix plus three tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix crash in perf_event_read()
  perf callchain: Reference count maps
  perf diff: Fix -o/--order option behavior (again)
  perf diff: Fix segfault on 'perf diff -o N' option
2017-02-11 10:20:06 -08:00
Linus Torvalds
4e4f74a7ee Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull lockdep fix from Ingo Molnar:
 "This fixes an ugly lockdep stack trace output regression. (But also
  affects other stacktrace users such as kmemleak, KASAN, etc)"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stacktrace, lockdep: Fix address, newline ugliness
2017-02-11 10:16:05 -08:00
Frederic Weisbecker
7bdb59f1ad tick/nohz: Fix possible missing clock reprog after tick soft restart
ts->next_tick keeps track of the next tick deadline in order to optimize
clock programmation on irq exit and avoid redundant clock device writes.

Now if ts->next_tick missed an update, we may spuriously miss a clock
reprog later as the nohz code is fooled by an obsolete next_tick value.

This is what happens here on a specific path: when we observe an
expired timer from the nohz update code on irq exit, we perform a soft
tick restart which simply fires the closest possible tick without
actually exiting the nohz mode and restoring a periodic state. But we
forget to update ts->next_tick accordingly.

As a result, after the next tick resulting from such soft tick restart,
the nohz code sees a stale value on ts->next_tick which doesn't match
the clock deadline that just expired. If that obsolete ts->next_tick
value happens to collide with the actual next tick deadline to be
scheduled, we may spuriously bypass the clock reprogramming. In the
worst case, the tick may never fire again.

Fix this with a ts->next_tick reset on soft tick restart.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1486485894-29173-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 09:43:48 +01:00
Alexander Shishkin
6ce77bfd6c perf/core: Allow kernel filters on CPU events
While supporting file-based address filters for CPU events requires some
extra context switch handling, kernel address filters are easy, since the
kernel mapping is preserved across address spaces. It is also useful as
it permits tracing scheduling paths of the kernel.

This patch allows setting up kernel filters for CPU events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170126094057.13805-4-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:08:09 +01:00
Alexander Shishkin
9ccbfbb157 perf/core: Do error out on a kernel filter on an exclude_filter event
It is currently possible to configure a kernel address filter for a
event that excludes kernel from its traces (attr.exclude_kernel==1).

While in reality this doesn't make sense, the SET_FILTER ioctl() should
return a error in such case, currently it does not. Furthermore, it
will still silently discard the filter and any potentially valid filters
that came with it.

This patch makes the SET_FILTER ioctl() error out in such cases.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170126094057.13805-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:08:09 +01:00
Peter Zijlstra
451d24d1e5 perf/core: Fix crash in perf_event_read()
Alexei had his box explode because doing read() on a package
(rapl/uncore) event that isn't currently scheduled in ends up doing an
out-of-bounds load.

Rework the code to more explicitly deal with event->oncpu being -1.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Fixes: d6a2f9035b ("perf/core: Introduce PMU_EV_CAP_READ_ACTIVE_PKG")
Link: http://lkml.kernel.org/r/20170131102710.GL6515@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:04:50 +01:00
Luis R. Rodriguez
ed5bd7dc88 kernel/ucount.c: mark user_header with kmemleak_ignore()
The user_header gets caught by kmemleak with the following splat as
missing a free:

  unreferenced object 0xffff99667a733d80 (size 96):
  comm "swapper/0", pid 1, jiffies 4294892317 (age 62191.468s)
  hex dump (first 32 bytes):
    a0 b6 92 b4 ff ff ff ff 00 00 00 00 01 00 00 00  ................
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
     kmemleak_alloc+0x4a/0xa0
     __kmalloc+0x144/0x260
     __register_sysctl_table+0x54/0x5e0
     register_sysctl+0x1b/0x20
     user_namespace_sysctl_init+0x17/0x34
     do_one_initcall+0x52/0x1a0
     kernel_init_freeable+0x173/0x200
     kernel_init+0xe/0x100
     ret_from_fork+0x2c/0x40

The BUG_ON()s are intended to crash so no need to clean up after
ourselves on error there.  This is also a kernel/ subsys_init() we don't
need a respective exit call here as this is never modular, so just white
list it.

Link: http://lkml.kernel.org/r/20170203211404.31458-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-08 15:41:43 -08:00
Omar Sandoval
bfeda41d06 stacktrace, lockdep: Fix address, newline ugliness
Since KERN_CONT became meaningful again, lockdep stack traces have had
annoying extra newlines, like this:

[    5.561122] -> #1 (B){+.+...}:
[    5.561528]
[    5.561532] [<ffffffff810d8873>] lock_acquire+0xc3/0x210
[    5.562178]
[    5.562181] [<ffffffff816f6414>] mutex_lock_nested+0x74/0x6d0
[    5.562861]
[    5.562880] [<ffffffffa01aa3c3>] init_btrfs_fs+0x21/0x196 [btrfs]
[    5.563717]
[    5.563721] [<ffffffff81000472>] do_one_initcall+0x52/0x1b0
[    5.564554]
[    5.564559] [<ffffffff811a3af6>] do_init_module+0x5f/0x209
[    5.565357]
[    5.565361] [<ffffffff81122f4d>] load_module+0x218d/0x2b80
[    5.566020]
[    5.566021] [<ffffffff81123beb>] SyS_finit_module+0xeb/0x120
[    5.566694]
[    5.566696] [<ffffffff816fd241>] entry_SYSCALL_64_fastpath+0x1f/0xc2

That's happening because each printk() call now gets printed on its own
line, and we do a separate call to print the spaces before the symbol.
Fix it by doing the printk() directly instead of using the
print_ip_sym() helper.

Additionally, the symbol address isn't very helpful, so let's get rid of
that, too. The final result looks like this:

[    5.194518] -> #1 (B){+.+...}:
[    5.195002]        lock_acquire+0xc3/0x210
[    5.195439]        mutex_lock_nested+0x74/0x6d0
[    5.196491]        do_one_initcall+0x52/0x1b0
[    5.196939]        do_init_module+0x5f/0x209
[    5.197355]        load_module+0x218d/0x2b80
[    5.197792]        SyS_finit_module+0xeb/0x120
[    5.198251]        entry_SYSCALL_64_fastpath+0x1f/0xc2

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Fixes: 4bcc595ccd ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/43b4e114724b2bdb0308fa86cb33aa07d3d67fad.1486510315.git.osandov@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-08 08:21:31 +01:00
Ingo Molnar
0fc6d1ca14 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-06 11:04:39 +01:00
Linus Torvalds
a572a1b999 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:

 - Prevent double activation of interrupt lines, which causes problems
   on certain interrupt controllers

 - Handle the fallout of the above because x86 (ab)uses the activation
   function to reconfigure interrupts under the hood.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Make irq activate operations symmetric
  irqdomain: Avoid activating interrupts more than once
2017-02-04 12:18:01 -08:00
Linus Torvalds
2d47b8aac7 Merge tag 'trace-v4.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "Simple fix of s/static struct __init/static __init struct/"

* tag 'trace-v4.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Fix __init annotation
2017-02-03 11:06:59 -08:00
Ard Biesheuvel
71810db27c modversions: treat symbol CRCs as 32 bit quantities
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.

This has a couple of downsides:

 - Given that the CRCs are treated as memory addresses, we waste 4 bytes
   for each CRC on 64 bit architectures,

 - On architectures that support runtime relocation, a R_<arch>_RELATIVE
   relocation entry is emitted for each CRC value, which identifies it
   as a quantity that requires fixing up based on the actual runtime
   load offset of the kernel. This results in corrupted CRCs unless we
   explicitly undo the fixup (and this is currently being handled in the
   core module code)

 - Such runtime relocation entries take up 24 bytes of __init space
   each, resulting in a x8 overhead in [uncompressed] kernel size for
   CRCs.

Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset.  Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant.  Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.

So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff).  To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.

Note that this mostly reverts commit d4703aefdb ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 08:28:25 -08:00
Linus Torvalds
891aa1e0f1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Five kernel fixes:

   - an mmap tracing ABI fix for certain mappings

   - a use-after-free fix, found via KASAN

   - three CPU hotplug related x86 PMU driver fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Make package handling more robust
  perf/x86/intel/uncore: Clean up hotplug conversion fallout
  perf/x86/intel/rapl: Make package handling more robust
  perf/core: Fix PERF_RECORD_MMAP2 prot/flags for anonymous memory
  perf/core: Fix use-after-free bug
2017-02-02 13:30:19 -08:00
Arnd Bergmann
26a346f23c tracing/kprobes: Fix __init annotation
clang complains about "__init" being attached to a struct name:

kernel/trace/trace_kprobe.c:1375:15: error: '__section__' attribute only applies to functions and global variables

The intention must have been to mark the function as __init instead of
the type, so move the attribute there.

Link: http://lkml.kernel.org/r/20170201165826.2625888-1-arnd@arndb.de

Fixes: f18f97ac43 ("tracing/kprobes: Add a helper method to return number of probe hits")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-02-02 10:48:35 -05:00
Linus Torvalds
a2ca3d6179 Merge tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "It was reported to me that the thread created by the hwlat tracer does
  not migrate after the first instance. I found that there was as small
  bug in the logic, and fixed it. It's minor, but should be fixed
  regardless. There's not much impact outside the hwlat tracer"

* tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix hwlat kthread migration
2017-01-31 16:32:40 -08:00
Linus Torvalds
f1774f46d4 Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "The cgroup creation path was getting the order of operations wrong and
  exposing cgroups which don't have their names set yet to controllers
  which can lead to NULL derefs.

  This contains the fix for the bug"

* 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: don't online subsystems before cgroup_name/path() are operational
2017-01-31 13:54:41 -08:00
Steven Rostedt (VMware)
79c6f448c8 tracing: Fix hwlat kthread migration
The hwlat tracer creates a kernel thread at start of the tracer. It is
pinned to a single CPU and will move to the next CPU after each period of
running. If the user modifies the migration thread's affinity, it will not
change after that happens.

The original code created the thread at the first instance it was called,
but later was changed to destroy the thread after the tracer was finished,
and would not be created until the next instance of the tracer was
established. The code that initialized the affinity was only called on the
initial instantiation of the tracer. After that, it was not initialized, and
the previous affinity did not match the current newly created one, making
it appear that the user modified the thread's affinity when it did not, and
the thread failed to migrate again.

Cc: stable@vger.kernel.org
Fixes: 0330f7aa8e ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-01-31 09:13:49 -05:00
Marc Zyngier
08d85f3ea9 irqdomain: Avoid activating interrupts more than once
Since commit f3b0946d62 ("genirq/msi: Make sure PCI MSIs are
activated early"), we can end-up activating a PCI/MSI twice (once
at allocation time, and once at startup time).

This is normally of no consequences, except that there is some
HW out there that may misbehave if activate is used more than once
(the GICv3 ITS, for example, uses the activate callback
to issue the MAPVI command, and the architecture spec says that
"If there is an existing mapping for the EventID-DeviceID
combination, behavior is UNPREDICTABLE").

While this could be worked around in each individual driver, it may
make more sense to tackle the issue at the core level. In order to
avoid getting in that situation, let's have a per-interrupt flag
to remember if we have already activated that interrupt or not.

Fixes: f3b0946d62 ("genirq/msi: Make sure PCI MSIs are activated early")
Reported-and-tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1484668848-24361-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-30 15:18:56 +01:00
Kan Liang
40999312c7 perf/core: Try parent PMU first when initializing a child event
perf has additional overhead when monitoring the task which
frequently generates child tasks.

perf_init_event() is one of the hotspots for the additional overhead:

Currently, to get the PMU, it tries to search the type in pmu_idr at
first. But it is not always successful, especially for the widely used
PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE events. So it has to go to the
slow path which go through the whole PMUs list.

It will be a big performance issue, if the PMUs list is long (e.g. server
with many uncore boxes) and the task frequently generates child tasks.

The child event inherits its parent event. So the child event should
try its parent PMU first.

Here is some data from the overhead test on Broadwell server:

  perf record -e $TEST_EVENTS -- ./loop.sh 50000

  loop.sh
    start=$(date +%s%N)
    i=0
    while [ "$i" -le "$1" ]
    do
            date > /dev/null
            i=`expr $i + 1`
    done
    end=$(date +%s%N)
    elapsed=`expr $end - $start`

  Event#	Original elapsed time	Elapsed time with patch		delta
  1		196,573,192,397		189,162,029,998			-3.77%
  2		257,567,753,013		241,620,788,683			-6.19%
  4		398,730,726,971		370,518,938,714			-7.08%
  8		824,983,761,120		740,702,489,329			-10.22%
  16		1,883,411,923,498	1,672,027,508,355		-11.22%

... which shows a nice performance improvement.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1484745662-15928-2-git-send-email-kan.liang@intel.com
[ Tidied up the changelog and the code comment. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:16 +01:00
Alexander Shishkin
487f05e18a perf/core: Optimize event rescheduling on active contexts
When new events are added to an active context, we go and reschedule
all cpu groups and all task groups in order to preserve the priority
(cpu pinned, task pinned, cpu flexible, task flexible), but in
reality we only need to reschedule groups of the same priority as
that of the events being added, and below.

This patch changes the behavior so that only groups that need to be
rescheduled are rescheduled.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170119164330.22887-3-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:15 +01:00
Alexander Shishkin
fe45bafbd0 perf/core: Don't re-schedule CPU flexible events needlessly
In the sched-in path, we first remove a CPU's flexible events in order to
give priority to the task's pinned events. However, this step can be safely
skipped if the task doesn't have its own pinned events.

This patch implements this skipping.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20170119164330.22887-2-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:14 +01:00
David Carrillo-Cisneros
1fd7e41699 perf/core: Remove perf_cpu_context::unique_pmu
cpuctx->unique_pmu was originally introduced as a way to identify cpuctxs
with shared pmus in order to avoid visiting the same cpuctx more than once
in a for_each_pmu loop.

cpuctx->unique_pmu == cpuctx->pmu in non-software task contexts since they
have only one pmu per cpuctx. Since perf_pmu_sched_task() is only called in
hw contexts, this patch replaces cpuctx->unique_pmu by cpuctx->pmu in it.

The change above, together with the previous patch in this series, removed
the remaining uses of cpuctx->unique_pmu, so we remove it altogether.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170118192454.58008-3-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:14 +01:00
David Carrillo-Cisneros
058fe1c044 perf/core: Make cgroup switch visit only cpuctxs with cgroup events
This patch follows from a conversation in CQM/CMT's last series about
speeding up the context switch for cgroup events:

  https://patchwork.kernel.org/patch/9478617/

This is a low-hanging fruit optimization. It replaces the iteration over
the "pmus" list in cgroup switch by an iteration over a new list that
contains only cpuctxs with at least one cgroup event.

This is necessary because the number of PMUs have increased over the years
e.g modern x86 server systems have well above 50 PMUs.

The iteration over the full PMU list is unneccessary and can be costly in
heavy cache contention scenarios.

Below are some instrumentation measurements with 10, 50 and 90 percentiles
of the total cost of context switch before and after this optimization for
a simple array read/write microbenchark.

  Contention
    Level    Nr events      Before (us)            After (us)       Median
  L2    L3     types      (10%, 50%, 90%)       (10%, 50%, 90%     Speedup
  --------------------------------------------------------------------------
  Low   Low       1       (1.72, 2.42, 5.85)    (1.35, 1.64, 5.46)     29%
  High  Low       1       (2.08, 4.56, 19.8)    (1720, 2.20, 13.7)     51%
  High  High      1       (2.86, 10.4, 12.7)    (2.54, 4.32, 12.1)     58%

  Low   Low       2       (1.98, 3.20, 6.89)    (1.68, 2.41, 8.89)     24%
  High  Low       2       (2.48, 5.28, 22.4)    (2150, 3.69, 14.6)     30%
  High  High      2       (3.32, 8.09, 13.9)    (2.80, 5.15, 13.7)     36%

where:

  1 event type  = cycles
  2 event types = cycles,intel_cqm/llc_occupancy/

   Contention L2 Low: workset  <  L2 cache size.
                 High:  "     >>  L2   "     " .
   Contention L3 Low: workset of task on all sockets  <  L3 cache size.
                 High:   "     "   "   "   "    "    >>  L3   "     " .

   Median Speedup is (50%ile Before - 50%ile After) /  50%ile Before

Unsurprisingly, the benefits of this optimization decrease with the number
of cpuctxs with a cgroup events, yet, is never detrimental.

Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170118192454.58008-2-davidcc@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-30 12:01:13 +01:00