Commit Graph

14314 Commits

Author SHA1 Message Date
Steven Rostedt
7bcfaf54f5 tracing: Add trace_options kernel command line parameter
Add trace_options to the kernel command line parameter to be able to
set options at early boot. For example, to enable stack dumps of
events, add the following:

  trace_options=stacktrace

This along with the trace_event option, you can get not only
traces of the events but also the stack dumps with them.

Requested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:53 -04:00
Steven Rostedt
0d5c6e1c19 tracing: Use irq_work for wake ups and remove *_nowake_*() functions
Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).

With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:52 -04:00
Steven Rostedt
02404baf1b tracing: Remove deprecated tracing_enabled file
The tracing_enabled file was used as a quick way to stop
tracers, and try to bring down overhead for things like
the latency tracers (irqsoff, wakeup, etc). But it didn't
work that well.

The tracing_on file was created as a really fast way to
stop recording into the ftrace ring buffer and can interact
with the kernel. That is a tracing_off() call in the kernel
can disable recording of events, and then from userspace one
could echo 1 into the tracing_on file to continue it. The
tracing_enabled function did too much to allow for this.

The tracing_on has taken over as a way to start and stop tracing
and the tracing_enabled file should not be used. But because of
its existance, it still confuses people. Over a year ago the
following commit was added:

 commit 6752ab4a9c
 Author: Steven Rostedt <srostedt@redhat.com>
 Date:   Tue Feb 8 13:54:06 2011 -0500

    tracing: Deprecate tracing_enabled for tracing_on

This commit added a WARN_ON() if the tracing_enabled file's variable
was changed. After this was added, only LatencyTop complained, and
they soon fixed their tool as there was no reason that LatencyTop
should touch this file as it was using the perf ring buffers which
this file does not interact with. But since that time no one else
has complained about this WARN_ON(). Thus it is safe to assume that
this file is no longer needed. Time to get rid of it.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:51 -04:00
Steven Rostedt
0fb9656d95 tracing: Make tracing_enabled be equal to tracing_on
The tracing_enabled file has been deprecated as it never was able
to serve its purpose well. The tracing_on file has taken over.
Instead of having code to keep tracing_enabled, have the tracing_enabled
file just set tracing_on, and remove the tracing_enabled variable.

This allows us to remove the tracing_enabled file. The reason that
the remove is in a different change set and not removed here is
in case we find some lonely userspace tool that requires the file
to exist. Then the removal patch will get reverted, but this one
will not.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:50 -04:00
Steven Rostedt
c7b84ecada tracing: Remove unused function unregister_tracer()
The function register_tracer() is only used by kernel core code,
that never needs to remove the tracer. As trace_events have become
the main way to add new tracing to the kernel, the need to
unregister a tracer has diminished. Remove the unused function
unregister_tracer(). If a need arises where we need it, then we
can always add it back.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:50 -04:00
Steven Rostedt
15075cac42 tracing: Separate open function from set_event and available_events
The open function used by available_events is the same as set_event even
though it uses different seq functions. This causes a side effect of
writing into available_events clearing all events, even though
available_events is suppose to be read only.

There's no reason to keep a single function for just the open and have
both use different functions for everything else. It is a little
confusing and causes strange behavior. Just have each have their own
function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:49 -04:00
Yoshihiro YUNOMAE
50ecf2c3af ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() to u64
ring_buffer_oldest_event_ts() should return a value of u64 type, because
ring_buffer_per_cpu->buffer_page->buffer_data_page->time_stamp is u64 type.

Link: http://lkml.kernel.org/r/1349998076-15495-5-git-send-email-dhsharp@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:48 -04:00
David Sharp
60303ed3f4 tracing: Reset ring buffer when changing trace_clocks
Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1349998076-15495-3-git-send-email-dhsharp@google.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:47 -04:00
Vaibhav Nagarnaik
6f86ab9fca tracing: Cleanup unnecessary function declarations
The functions defined in include/trace/syscalls.h are not used directly
since struct ftrace_event_class was introduced. Remove them from the
header file and rearrange the ftrace_event_class declarations in
trace_syscalls.c.

Link: http://lkml.kernel.org/r/1339112785-21806-2-git-send-email-vnagarnaik@google.com

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:34 -04:00
David Sharp
01e3e710a9 tracing: Trivial cleanup
Remove ftrace_format_syscall() declaration; it is neither defined nor
used. Also update a comment and formatting.

Link: http://lkml.kernel.org/r/1339112785-21806-1-git-send-email-vnagarnaik@google.com

Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:33 -04:00
Steven Rostedt
7ffbd48d5c tracing: Cache comms only after an event occurred
Whenever an event is registered, the comm of tasks are saved at
every task switch instead of saving them at every event. But if
an event isn't executed much, the comm cache will be filled up
by tasks that did not record the event and you lose out on the comms
that did.

Here's an example, if you enable the following events:

echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
echo 1 > /debug/tracing/events/net/net_dev_xmit/enable

Note, there's no kvm running on this machine so the first event will
never be triggered, but because it is enabled, the storing of comms
will continue. If we now disable the network event:

echo 0 > /debug/tracing/events/net/net_dev_xmit/enable

and look at the trace:

cat /debug/tracing/trace
            sshd-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
            sshd-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
            sshd-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
            sshd-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
            sshd-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

We see that process 2672 which triggered the events has the comm "sshd".
But if we run hackbench for a bit and look again:

cat /debug/tracing/trace
           <...>-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
           <...>-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
           <...>-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
           <...>-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
           <...>-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

The stored "sshd" comm has been flushed out and we get a useless "<...>".

But by only storing comms after a trace event occurred, we can run
hackbench all day and still get the same output.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:31 -04:00
Steven Rostedt
2b70e59043 tracing: Have tracing_sched_wakeup_trace() use standard unlock_commit
The functon tracing_sched_wakeup_trace() does an open coded unlock
commit and save stack. This is what the trace_nowake_buffer_unlock_commit()
is for.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:30 -04:00
Steven Rostedt
81698831bc tracing: Enable comm recording if trace_printk() is used
If comm recording is not enabled when trace_printk() is used then
you just get this type of output:

[ adding trace_printk("hello! %d", irq); in do_IRQ ]

           <...>-2843  [001] d.h.    80.812300: do_IRQ: hello! 14
           <...>-2734  [002] d.h2    80.824664: do_IRQ: hello! 14
           <...>-2713  [003] d.h.    80.829971: do_IRQ: hello! 14
           <...>-2814  [000] d.h.    80.833026: do_IRQ: hello! 14

By enabling the comm recorder when trace_printk is enabled:

       hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
            sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
          <idle>-0     [001] d.h1   193.665996: do_IRQ: hello! 21

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:29 -04:00
Steven Rostedt
b382ede6b5 tracing: Expand ring buffer when trace_printk() is used
Since tracing is not used by 99% of Linux users, even though tracing
may be configured in, it does not make sense to allocate 1.4 Megs
per CPU for the ring buffers if they are not used. Thus, on boot up
the ring buffers are set to a minimal size until something needs the
and they are expanded.

This works well for events and tracers (function, etc), but for the
asynchronous use of trace_printk() which can write to the ring buffer
at any time, does not expand the buffers.

On boot up a check is made to see if any trace_printk() is used to
see if the trace_printk() temp buffer pages should be allocated. This
same code can be used to expand the buffers as well.

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:28 -04:00
Slava Pestov
884bfe89a4 ring-buffer: Add a 'dropped events' counter
The existing 'overrun' counter is incremented when the ring
buffer wraps around, with overflow on (the default). We wanted
a way to count requests lost from the buffer filling up with
overflow off, too. I decided to add a new counter instead
of retro-fitting the existing one because it seems like a
different statistic to count conceptually, and also because
of how the code was structured.

Link: http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapestov@google.com

Signed-off-by: Slava Pestov <slavapestov@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:27 -04:00
Hiraku Toyooka
f43c738bfa tracing: Change tracer's integer flags to bool
print_max and use_max_tr in struct tracer are "int" variables and
used like flags. This is wasteful, so change the type to "bool".

Link: http://lkml.kernel.org/r/20121002082710.9807.86393.stgit@falsita

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:25 -04:00
Steven Rostedt
6f4156723c tracing: Allow tracers to start at core initcall
There's times during debugging that it is helpful to see traces of early
boot functions. But the tracers are initialized at device_initcall()
which is quite late during the boot process. Setting the kernel command
line parameter ftrace=function will not show anything until the function
tracer is initialized. This prevents being able to trace functions before
device_initcall().

There's no reason that the tracers need to be initialized so late in the
boot process. Move them up to core_initcall() as they still need to come
after early_initcall() which initializes the tracing buffers.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:24 -04:00
Daniel Walter
bcd83ea6cb tracing: Replace strict_strto* with kstrto*
* remove old string conversions with kstrto*

Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:23 -04:00
Linus Torvalds
e17b131583 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Most of these are uprobes race fixes from Oleg, and their preparatory
  cleanups.  (It's larger than what I'd normally send for an -rc kernel,
  but they looked significant enough to not delay them.)

  There's also an oprofile fix and an uncore PMU fix."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  perf/x86: Disable uncore on virtualized CPUs
  oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()
  ring-buffer: Check for uninitialized cpu buffer before resizing
  uprobes: Fix the racy uprobe->flags manipulation
  uprobes: Fix prepare_uprobe() race with itself
  uprobes: Introduce prepare_uprobe()
  uprobes: Fix handle_swbp() vs unregister() + register() race
  uprobes: Do not delete uprobe if uprobe_unregister() fails
  uprobes: Don't return success if alloc_uprobe() fails
  uprobes/x86: Only rep+nop can be emulated correctly
  uprobes: Simplify is_swbp_at_addr(), remove stale comments
  uprobes: Kill set_orig_insn()->is_swbp_at_addr()
  uprobes: Introduce copy_opcode(), kill read_opcode()
  uprobes: Kill set_swbp()->is_swbp_at_addr()
  uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas
  uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC
  uprobes: Change write_opcode() to use FOLL_FORCE
  uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()
  uprobes: Kill UTASK_BP_HIT state
  uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()
  ...
2012-10-24 04:07:51 +03:00
Randy Dunlap
0390c88356 module_signing: fix printk format warning
Fix the warning:

  kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'

by using the proper 'z' modifier for printing a size_t.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-22 08:56:34 +03:00
Ingo Molnar
ef8ff74ed8 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent
Pull ftrace ring-buffer resizing fix from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21 19:53:34 +02:00
Ingo Molnar
f38787f4f9 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent
Pull various uprobes bugfixes from Oleg Nesterov - mostly race and
failure path fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-10-21 18:18:17 +02:00
Kees Cook
31fd84b95e use clamp_t in UNAME26 fix
The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

  kernel/sys.c: In function 'override_release':
  kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-19 18:51:17 -07:00
David Howells
caabe24057 MODSIGN: Move the magic string to the end of a module and eliminate the search
Emit the magic string that indicates a module has a signature after the
signature data instead of before it.  This allows module_sig_check() to
be made simpler and faster by the elimination of the search for the
magic string.  Instead we just need to do a single memcmp().

This works because at the end of the signature data there is the
fixed-length signature information block.  This block then falls
immediately prior to the magic number.

From the contents of the information block, it is trivial to calculate
the size of the signature data and thus the size of the actual module
data.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-19 17:30:40 -07:00
Cyrill Gorcunov
bbc2e3ef87 pidns: remove recursion from free_pid_ns()
free_pid_ns() operates in a recursive fashion:

free_pid_ns(parent)
  put_pid_ns(parent)
    kref_put(&ns->kref, free_pid_ns);
      free_pid_ns

thus if there was a huge nesting of namespaces the userspace may trigger
avalanche calling of free_pid_ns leading to kernel stack exhausting and a
panic eventually.

This patch turns the recursion into an iterative loop.

Based on a patch by Andrew Vagin.

[akpm@linux-foundation.org: export put_pid_ns() to modules]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-19 14:07:47 -07:00