Commit Graph

16589 Commits

Author SHA1 Message Date
Ingo Molnar 2a3ede8cb2 Merge branch 'perf/urgent' into perf/core to fix conflicts
Conflicts:
	tools/perf/bench/numa.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04 07:49:35 +01:00
Oleg Nesterov 3ab6796617 uprobes: Teach uprobe_copy_process() to handle CLONE_VFORK
uprobe_copy_process() does nothing if the child shares ->mm with
the forking process, but there is a special case: CLONE_VFORK.
In this case it would be more correct to do dup_utask() but avoid
dup_xol(). This is not that important, the child should not unwind
its stack too much, this can corrupt the parent's stack, but at
least we need this to allow to ret-probe __vfork() itself.

Note: in theory, it would be better to check task_pt_regs(p)->sp
instead of CLONE_VFORK, we need to dup_utask() if and only if the
child can return from the function called by the parent. But this
needs the arch-dependant helper, and I think that nobody actually
does clone(same_stack, CLONE_VM).

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-10-29 18:02:55 +01:00
Oleg Nesterov aa59c53fd4 uprobes: Change uprobe_copy_process() to dup xol_area
This finally fixes the serious bug in uretprobes: a forked child
crashes if the parent called fork() with the pending ret probe.

Trivial test-case:

	# perf probe -x /lib/libc.so.6 __fork%return
	# perf record -e probe_libc:__fork perl -le 'fork || print "OK"'

(the child doesn't print "OK", it is killed by SIGSEGV)

If the child returns from the probed function it actually returns
to trampoline_vaddr, because it got the copy of parent's stack
mangled by prepare_uretprobe() when the parent entered this func.

It crashes because a) this address is not mapped and b) until the
previous change it doesn't have the proper->return_instances info.

This means that uprobe_copy_process() has to create xol_area which
has the trampoline slot, and its vaddr should be equal to parent's
xol_area->vaddr.

Unfortunately, uprobe_copy_process() can not simply do
__create_xol_area(child, xol_area->vaddr). This could actually work
but perf_event_mmap() doesn't expect the usage of foreign ->mm. So
we offload this to task_work_run(), and pass the argument via not
yet used utask->vaddr.

We know that this vaddr is fine for install_special_mapping(), the
necessary hole was recently "created" by dup_mmap() which skips the
parent's VM_DONTCOPY area, and nobody else could use the new mm.

Unfortunately, this also means that we can not handle the errors
properly, we obviously can not abort the already completed fork().
So we simply print the warning if GFP_KERNEL allocation (the only
possible reason) fails.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:54 +01:00
Oleg Nesterov 248d3a7b2f uprobes: Change uprobe_copy_process() to dup return_instances
uprobe_copy_process() assumes that the new child doesn't need
->utask, it should be allocated by demand.

But this is not true if the forking task has the pending ret-
probes, the child should report them as well and thus it needs
the copy of parent's ->return_instances chain. Otherwise the
child crashes when it returns from the probed function.

Alternatively we could cleanup the child's stack, but this needs
per-arch changes and this is not what we want. At least systemtap
expects a .return in the child too.

Note: this change alone doesn't fix the problem, see the next
change.

Reported-by: Martin Cermak <mcermak@redhat.com>
Reported-by: David Smith <dsmith@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:53 +01:00
Oleg Nesterov af0d95af79 uprobes: Teach __create_xol_area() to accept the predefined vaddr
Currently xol_add_vma() uses get_unmapped_area() for area->vaddr,
but the next patches need to use the fixed address. So this patch
adds the new "vaddr" argument to __create_xol_area() which should
be used as area->vaddr if it is nonzero.

xol_add_vma() doesn't bother to verify that the predefined addr is
not used, insert_vm_struct() should fail if find_vma_links() detects
the overlap with the existing vma.

Also, __create_xol_area() doesn't need __GFP_ZERO to allocate area.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:51 +01:00
Oleg Nesterov 6441ec8b7c uprobes: Introduce __create_xol_area()
No functional changes, preparation.

Extract the code which actually allocates/installs the new area
into the new helper, __create_xol_area().

While at it remove the unnecessary "ret = ENOMEM" and "ret = 0"
in xol_add_vma(), they both have no effect.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:50 +01:00
Oleg Nesterov b68e074910 uprobes: Change the callsite of uprobe_copy_process()
Preparation for the next patches.

Move the callsite of uprobe_copy_process() in copy_process() down
to the succesfull return. We do not care if copy_process() fails,
uprobe_free_utask() won't be called in this case so the wrong
->utask != NULL doesn't matter.

OTOH, with this change we know that copy_process() can't fail when
uprobe_copy_process() is called, the new task should either return
to user-mode or call do_exit(). This way uprobe_copy_process() can:

	1. setup p->utask != NULL if necessary

	2. setup uprobes_state.xol_area

	3. use task_work_add(p)

Also, move the definition of uprobe_copy_process() down so that it
can see get_utask().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-10-29 18:02:48 +01:00
Peter Zijlstra 5a3126d4fe perf: Fix the perf context switch optimization
Currently we only optimize the context switch between two
contexts that have the same parent; this forgoes the
optimization between parent and child context, even though these
contexts could be equivalent too.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Shishkin, Alexander <alexander.shishkin@intel.com>
Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 14:13:01 +01:00
Peter Zijlstra 2c42cfbfe1 perf: Change zero-padding of strings in perf_event_mmap_event()
Oleg complained about the excessive 0-ing in perf_event_mmap_event(),
so try and be smarter about it while keeping it fairly fool proof and
avoid leaking random bits out to userspace.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8jirlm99m6if2z13wd6rbyu6@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:53 +01:00
Oleg Nesterov 3ea2f2b96f perf: Do not waste PAGE_SIZE bytes for ALIGN(8) in perf_event_mmap_event()
perf_event_mmap_event() does kzalloc(PATH_MAX + sizeof(u64)) to
ensure we can align the size later. However this means that we
actually allocate PAGE_SIZE * 2 buffer, seems too much.

Change this code to allocate PATH_MAX==PAGE_SIZE bytes, but tell
d_path() to not use the last sizeof(u64) bytes.

Note: it is not clear why do we need __GFP_ZERO, see the next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016201004.GC23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:52 +01:00
Oleg Nesterov 32c5fb7e7d perf: Kill the dead !vma->vm_mm code in perf_event_mmap_event()
1. perf_event_mmap(vma) is never called with a gate_vma-like arg,
   remove the "if (!vma->vm_mm)" code.

2. arch_vma_name() can use the chached value of mmap_event->vma.

3. Change the code to not call arch_vma_name() twice.

4. Purely cosmetic, but since we use "goto got_name" all the time
   remove "else" from "[stack]" branch just for symmetry.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016200945.GB23214@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:51 +01:00
Peter Zijlstra d9494cb429 perf: Remove useless atomic_t
There's nothing atomic about atomic_set vs atomic_read; so remove the
atomic_t usage.

Also, make running_sample_length static as it really is (and should
be) local to this translation unit.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Link: http://lkml.kernel.org/n/tip-vw9lg588x1ic248whybjon0c@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:02:49 +01:00
Peter Zijlstra bf378d341e perf: Fix perf ring buffer memory ordering
The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky <victork@il.ibm.com>
Tested-by: Victor Kaplansky <victork@il.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-29 12:01:19 +01:00
Ingo Molnar aac898548d Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/util/hist.h
2013-10-29 11:23:32 +01:00
Linus Torvalds aff22d3f1a Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "This tree contains a clockevents regression fix for certain ARM
  subarchitectures"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clockevents: Sanitize ticks to nsec conversion
2013-10-27 10:29:25 -07:00
Linus Torvalds e2756f5e0f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "The tree contains three fixes:

   - Two tooling fixes

   - Reversal of the new 'MMAP2' extended mmap record ABI, introduced in
     this merge window.  (Patches were proposed to fix it but it was all
     a bit late and we felt it's safer to just delay the ABI one more
     kernel release and do it right)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Disable PERF_RECORD_MMAP2 support
  perf scripting perl: Fix build error on Fedora 12
  perf probe: Fix to initialize fname always before use it
2013-10-27 10:28:35 -07:00
Linus Torvalds 1c99ca43a4 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "This tree fixes a boot crash in CONFIG_DEBUG_MUTEXES=y kernels, on
  kernels built with GCC 3.x (there are still such distros)"

Side note: it's not just a fix for old gcc versions, it's also removing
an incredibly broken/subtle check that LLVM had issues with, and that
made no sense.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mutex: Avoid gcc version dependent __builtin_constant_p() usage
2013-10-27 10:18:15 -07:00
Linus Torvalds 20582e34c8 Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from
 "These fix two bugs in the intel_pstate driver, a hibernate bug leading
  to nasty resume failures sometimes and acpi-cpufreq initialization bug
  that causes problems to happen during module unload when intel_pstate
  is in use.

  Specifics:

   - Fix for rounding errors in intel_pstate causing CPU utilization to
     be underestimated from Brennan Shacklett.

   - intel_pstate fix to always use the correct max pstate value when
     computing the min pstate from Dirk Brandewie.

   - Hibernation fix for deadlocking resume in cases when the probing of
     the device containing the image is deferred from Russ Dill.

   - acpi-cpufreq fix to prevent the module from staying in memory when
     the driver cannot be registered and then attempting to unregister
     things that have never been registered on exit"

* tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  acpi-cpufreq: Fail initialization if driver cannot be registered
  PM / hibernate: Move software_resume to late_initcall_sync
  intel_pstate: Correct calculation of min pstate value
  intel_pstate: Improve accuracy by not truncating until final result
2013-10-26 04:38:47 +01:00
Russ Dill d3c345dbc7 PM / hibernate: Move software_resume to late_initcall_sync
software_resume is being called after deferred_probe_initcall in
drivers base. If the probing of the device that contains the resume
image is deferred, and the system has been instructed to wait for
it to show up, this wait will occur in software_resume. This causes
a deadlock.

Move software_resume into late_initcall_sync so that it happens
after all the other late_initcalls.

Signed-off-by: Russ Dill <Russ.Dill@ti.com>
Acked-by: Pavel Machek <Pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25 01:58:49 +02:00
Thomas Gleixner 97b9410643 clockevents: Sanitize ticks to nsec conversion
Marc Kleine-Budde pointed out, that commit 77cc982 "clocksource: use
clockevents_config_and_register() where possible" caused a regression
for some of the converted subarchs.

The reason is, that the clockevents core code converts the minimal
hardware tick delta to a nanosecond value for core internal
usage. This conversion is affected by integer math rounding loss, so
the backwards conversion to hardware ticks will likely result in a
value which is less than the configured hardware limitation. The
affected subarchs used their own workaround (SIGH!) which got lost in
the conversion.

The solution for the issue at hand is simple: adding evt->mult - 1 to
the shifted value before the integer divison in the core conversion
function takes care of it. But this only works for the case where for
the scaled math mult/shift pair "mult <= 1 << shift" is true. For the
case where "mult > 1 << shift" we can apply the rounding add only for
the minimum delta value to make sure that the backward conversion is
not less than the given hardware limit. For the upper bound we need to
omit the rounding add, because the backwards conversion is always
larger than the original latch value. That would violate the upper
bound of the hardware device.

Though looking closer at the details of that function reveals another
bogosity: The upper bounds check is broken as well. Checking for a
resulting "clc" value greater than KTIME_MAX after the conversion is
pointless. The conversion does:

      u64 clc = (latch << evt->shift) / evt->mult;

So there is no sanity check for (latch << evt->shift) exceeding the
64bit boundary. The latch argument is "unsigned long", so on a 64bit
arch the handed in argument could easily lead to an unnoticed shift
overflow. With the above rounding fix applied the calculation before
the divison is:

       u64 clc = (latch << evt->shift) + evt->mult - 1;

So we need to make sure, that neither the shift nor the rounding add
is overflowing the u64 boundary.

[ukl: move assignment to rnd after eventually changing mult, fix build
 issue and correct comment with the right math]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: nicolas.ferre@atmel.com
Cc: Marc Pignat <marc.pignat@hevs.ch>
Cc: john.stultz@linaro.org
Cc: kernel@pengutronix.de
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1380052223-24139-1-git-send-email-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
2013-10-23 12:51:21 +02:00
Linus Torvalds ee7eafc907 Merge branch 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two late fixes for cgroup.

  One fixes descendant walk introduced during this rc1 cycle.  The other
  fixes a post 3.9 bug during task attach which can lead to hang.  Both
  fixes are critical and the fixes are relatively straight-forward"

* 'for-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix to break the while loop in cgroup_attach_task() correctly
  cgroup: fix cgroup post-order descendant walk of empty subtree
2013-10-22 08:20:34 +01:00
Ingo Molnar e4f8eaad70 Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

" * Fix build error on Fedora 12.

  * Fix to initialize fname always before use it, bug introduced
    during this merge window, from Masami Hiramatsu.

  * Disable PERF_RECORD_MMAP2 support, from Stephane Eranian. "

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-20 10:51:35 +02:00
Tetsuo Handa b0267507df mutex: Avoid gcc version dependent __builtin_constant_p() usage
Commit 040a0a37 ("mutex: Add support for wound/wait style locks")
used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot
handle such expression correctly, leading to boot failure when
built with CONFIG_DEBUG_MUTEXES=y.

Fix it by explicitly passing a bool which tells whether p != NULL
or not.

[ PeterZ: This is a sad patch, but provided it actually generates
          similar code I suppose its the best we can do bar whole
	  sale deprecating gcc-3. ]

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: peterz@infradead.org
Cc: imirkin@alum.mit.edu
Cc: daniel.vetter@ffwll.ch
Cc: robdclark@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-18 21:58:54 +02:00
Stephane Eranian 3090ffb5a2 perf: Disable PERF_RECORD_MMAP2 support
For now, we disable the extended MMAP record support (MMAP2).

We have identified cases where it would not report the correct mapping
information, clone(VM_CLONE) but with separate pids.  We will revisit
the support once we find a solution for this case.

The patch changes the kernel to return EINVAL if attr->mmap2 is set. The
patch also modifies the perf tool to use regular PERF_RECORD_MMAP for
synthetic events and it also prevents the tool from requesting
attr->mmap2 mode because the kernel would reject it.

The support will be revisited once the kenrel interface is updated.

In V2, we reduce the patch to the strict minimum.

In V3, we avoid calling perf_event_open() with mmap2 set because we know
it will fail and require fallback retry.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131017173215.GA8820@quad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-10-17 16:27:14 -03:00
Ingo Molnar 426ee9e3bb Merge tag 'v3.12-rc5' into perf/core
Merge Linux v3.12-rc5, to pick up the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-15 07:05:18 +02:00