Commit Graph

49 Commits

Author SHA1 Message Date
Johannes Berg
d184b31c0e tracing: Add full state to trace_seq
The trace_seq buffer might fill up, and right now one needs to check the
return value of each printf into the buffer to check for that.

Instead, have the buffer keep track of whether it is full or not, and
reject more input if it is full or would have overflowed with an input
that wasn't added.

Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 14:05:49 -05:00
Steven Rostedt
a63ce5b306 tracing: Buffer the output of seq_file in case of filled buffer
If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.

What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).

The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.

To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.

Before this patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)               |          __fsnotify_parent() {
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

After the patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)   0.548 us    |          __fsnotify_parent();
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

[
  Updated the patch to fix a missing return 0 from the trace_print_seq()
  stub when CONFIG_TRACING is disabled.

  Reported-by: Ingo Molnar <mingo@elte.hu>
]

Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 13:55:26 -05:00
Jiri Olsa
3e69533b51 tracing: Fix trace_seq_printf() return value
trace_seq_printf() return value is a little ambiguous. It
currently returns the length of the space available in the
buffer. printf usually returns the amount written. This is not
adequate here, because:

  trace_seq_printf(s, "");

is perfectly legal, and returning 0 would indicate that it
failed.

We can always see the amount written by looking at the before
and after values of s->len. This is not quite the same use as
printf. We only care if the string was successfully written to
the buffer or not.

Make trace_seq_printf() return 0 if the trace oversizes the
buffer's free space, 1 otherwise.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.631787612@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-24 11:07:50 +02:00
Steven Rostedt
829b876dfc tracing: fix transposed numbers of lock_depth and preempt_count
The lock_depth and preempt_count numbers in the latency format is
transposed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-10-07 14:05:04 -04:00
Hiroshi Shimamoto
b0f56f1a63 trace: Fix missing assignment in trace_ctxwake_*
The state char variable S should be reassigned, if S == 0.

We are missing the state of the task that is going to sleep for the
context switch events (in the raw mode).

Fortunately the problem arises with the sched_switch/wake_up
tracers, not the sched trace events.

The formers are legacy now. But still, that was buggy.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4AC43118.6050409@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-06 14:28:24 +02:00
Steven Rostedt
f81c972d27 tracing: consolidate code between trace_output.c and trace_function_graph.c
Both trace_output.c and trace_function_graph.c do basically the same
thing to handle the printing of the latency-format. This patch moves
the code into one function that both can use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 14:24:13 -04:00
Steven Rostedt
637e7e8641 tracing: add lock depth to entries
This patch adds the lock depth of the big kernel lock to the generic
entry header. This way we can see the depth of the lock and help
in removing the BKL.

Example:

 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /_--=> lock-depth
 #                |||||/     delay
 #  cmd     pid   |||||| time  |   caller
 #     \   /      ||||||   \   |   /
   <idle>-0       2.N..3 5902255250us+: lock_acquire: read rcu_read_lock
   <idle>-0       2.N..3 5902255253us+: lock_release: rcu_read_lock
   <idle>-0       2dN..3 5902255257us+: lock_acquire: xtime_lock
   <idle>-0       2dN..4 5902255259us : lock_acquire: clocksource_lock
   <idle>-0       2dN..4 5902255261us+: lock_release: clocksource_lock

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 13:55:35 -04:00
Steven Rostedt
48659d3119 tracing: move tgid out of generic entry and into userstack
The userstack trace required the recording of the tgid entry.
Unfortunately, it was added to the generic entry where it wasted
4 bytes of every entry and was only used by one entry.

This patch moves it out of the generic field and moves it into the
only user (userstack_entry).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-11 11:36:23 -04:00
Xiao Guangrong
e1af3aec3e tracing: Fix trace_print_seq()
We will lose something if trace_seq->buffer[0] is 0, because the copy length
is calculated by strlen() in seq_puts(), so using seq_write() instead of
seq_puts().

There have a example:
after reboot:

 # echo kmemtrace > current_tracer
 # echo 0 > options/kmem_minimalistic
 # cat trace
 # tracer: kmemtrace
 #
 #

Nothing is exported, because the first byte of trace_seq->buffer[ ]
is KMEMTRACE_USER_ALLOC.

( the value of KMEMTRACE_USER_ALLOC is zero, seeing
  kmemtrace_print_alloc_user() in kernel/trace/kmemtrace.c)

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A4B2351.5010300@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 08:51:13 +02:00
Steven Rostedt
110bf2b764 tracing: add protection around module events unload
When reading the trace buffer, there is a race that when a module
is unloaded it removes events that is stilled referenced in the buffers.
This patch adds the protection around the unloading of the events
from modules and the reading of the trace buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09 17:29:07 -04:00
Steven Rostedt
725c624a58 tracing: add trace_seq_vprint interface
The code to update the print formats for events requires a vprintf
format in the trace_seq. This patch adds that interface.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-09 15:17:32 -04:00
Steven Rostedt
563af16c30 tracing: add annotation to what type of stack trace is recorded
The current method of printing out a stack trace is to add a new line
and print out the trace:

    yum-updatesd-3120  [002]   573.691303:
 => do_softirq
 => irq_exit
 => smp_apic_timer_interrupt
 => apic_timer_interrupt

This looks a bit awkward, and if we have both stack and user stack traces
running, it would be nice to have a title to tell them apart, although
it is easy to tell by the output.

This patch adds an annotation to the start of the stack traces:

            init-1     [003]   929.304979: <stack trace>
 => user_path_at
 => vfs_fstatat
 => vfs_stat
 => sys_newstat
 => system_call_fastpath

             cat-3459  [002]  1016.824040: <user stack trace>
 =>  <0000003aae6c0250>
 =>  <00007ffff4b06ae4>
 =>  <69636172742f6775>

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-03 11:10:44 -04:00
Steven Whitehouse
56d8bd3f0b tracing: fix multiple use of __print_flags and __print_symbolic
Here is an updated patch to include the extra call to
trace_seq_init() as requested. This is vs. the latest
-tip tree and fixes the use of multiple __print_flags
and __print_symbolic in a single tracer. Also tested
to ensure its working now:

mount.gfs2-2534  [000]   235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
mount.gfs2-2534  [000]   235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
mount.gfs2-2534  [000]   235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
glock_workqueue-2529  [000]   235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
glock_workqueue-2529  [000]   235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-03 10:29:48 -04:00
walimis
048dc50c5e tracing/events: fix output format of user stack
According to "events/ftrace/user_stack/format", fix the output of
user stack.

before fix:

  sh-1073  [000]    31.137561:  <b7f274fe> <-  <0804e33c> <-  <080835c1>

after fix:

  sh-1072  [000]    37.039329:
 =>  <b7f8a4fe>
 =>  <0804e33c>
 =>  <080835c1>

Signed-off-by: walimis <walimisdev@gmail.com>
LKML-Reference: <1244016090-7814-3-git-send-email-walimisdev@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-03 10:25:30 -04:00
walimis
f11b3f4e29 tracing/events: fix output format of kernel stack
According to "events/ftrace/kernel_stack/format", output format of
kernel stack should use "=>" instead of "<=".

The second problem is that we shouldn't skip the first entry in the stack,
although it seems to be duplicated when used in the "function" tracer,
but events also use it. If we skip the first one, we will drop the topmost
entry of the stack.

The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should
drop it, otherwise it will print a NULL name line.

before fix:

      sh-1072  [000]   26.957239: sched_process_fork: parent sh:1072 child sh:1073
      sh-1072  [000]   26.957262:
 <= syscall_call
 <=
      sh-1072  [000]   26.957744: sched_switch: task sh:1072 [120] (R) ==> sh:1073 [120]
      sh-1072  [000]   26.957752:
 <= preempt_schedule
 <= wake_up_new_task
 <= do_fork
 <= sys_clone
 <= syscall_call
 <=

After fix:

      sh-1075  [000]    39.791848: sched_process_fork: parent sh:1075  child sh:1076
      sh-1075  [000]    39.791871:
 => sys_clone
 => syscall_call
      sh-1075  [000]    39.792713: sched_switch: task sh:1075 [120] (R) ==> sh:1076 [120]
      sh-1075  [000]    39.792722:
 => schedule
 => preempt_schedule
 => wake_up_new_task
 => do_fork
 => sys_clone
 => syscall_call

Signed-off-by: walimis <walimisdev@gmail.com>
LKML-Reference: <1244016090-7814-2-git-send-email-walimisdev@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-03 10:25:15 -04:00
Steven Whitehouse
ec081ddc3d tracing: add exports to use __print_symbolic and __print_flags from a module
A patch to allow the use of __print_symbolic and __print_flags
from a module. This allows the current GFS2 tracing patch to
build.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1243868015.29604.542.camel@localhost.localdomain>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-01 23:25:29 -04:00
Steven Rostedt
0f4fc29dd6 tracing: add __print_symbolic to trace events
This patch adds __print_symbolic which is similar to __print_flags but
works for an enumeration type instead. That is, there is only a one to one
mapping between the values and the symbols. When a match is made, then
it is printed, otherwise the hex value is outputed.

[ Impact: add interface for showing symbol names in events ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26 20:31:50 +02:00
Steven Rostedt
be74b73a57 tracing: add __print_flags for events
Developers have been asking for the ability in the ftrace event tracer
to display names of bits in a flags variable.

Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.

Some examples where this would be useful are the state flags in a context
switch, kmalloc flags, and even permision flags in accessing files.

[
  v2 changes include:

  Frederic Weisbecker's idea of using a mask instead of bits,
  thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.

  Li Zefan's idea of allowing the caller of __print_flags to add their
  own delimiter (or no delimiter) where we can get for file permissions
  rwx instead of r|w|x.
]

[
  v3 changes:

   Christoph Hellwig's idea of using an array instead of va_args.
]

[ Impact: better displaying of flags in trace output ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26 20:25:22 +02:00
Lai Jiangshan
4f5359685a tracing: add trace_event_read_lock()
I found that there is nothing to protect event_hash in
ftrace_find_event(). Rcu protects the event hashlist
but not the event itself while we use it after its extraction
through ftrace_find_event().

This lack of a proper locking in this spot opens a race
window between any event dereferencing and module removal.

Eg:

--Task A--

print_trace_line(trace) {
  event = find_ftrace_event(trace)

--Task B--

trace_module_remove_events(mod) {
  list_trace_events_module(ev, mod) {
    unregister_ftrace_event(ev->event) {
      hlist_del(ev->event->node)
        list_del(....)
    }
  }
}
|--> module removed, the event has been dropped

--Task A--

  event->print(trace); // Dereferencing freed memory

If the event retrieved belongs to a module and this module
is concurrently removed, we may end up dereferencing a data
from a freed module.

RCU could solve this, but it would add latency to the kernel and
forbid tracers output callbacks to call any sleepable code.
So this fix converts 'trace_event_mutex' to a read/write semaphore,
and adds trace_event_read_lock() to protect ftrace_find_event().

[ Impact: fix possible freed memory dereference in ftrace ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A114806.7090302@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-25 23:53:41 +02:00
Steven Rostedt
1ec7c4849c tracing: stop stack trace on first empty entry
The stack tracer stores eight entries in the ring buffer when an event
traces the stack. The output outputs all eight entries regardless of
how many entries were recorded.

This patch breaks out of the loop when a null entry is discovered.

[ Impact: only print the stack that is recorded ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-14 23:40:06 -04:00
Jaswinder Singh Rajput
48dd0fed90 tracing: trace_output.c, fix false positive compiler warning
This compiler warning:

  CC      kernel/trace/trace_output.o
 kernel/trace/trace_output.c: In function ‘register_ftrace_event’:
 kernel/trace/trace_output.c:544: warning: ‘list’ may be used uninitialized in this function

Is wrong as 'list' is always initialized - but GCC (4.3.2) does not
recognize this relationship properly.

Work around the warning by initializing the variable to NULL.

[ Impact: fix false positive compiler warning ]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-06 14:19:16 +02:00
Steven Rostedt
060fa5c83e tracing/events: reuse trace event ids after overflow
With modules being able to add trace events, and the max trace event
counter is 16 bits (65536) we can overflow the counter easily
with a simple while loop adding and removing modules that contain
trace events.

This patch links together the registered trace events and on overflow
searches for available trace event ids. It will still fail if
over 65536 events are registered, but considering that a typical
kernel only has 22000 functions, 65000 events should be sufficient.

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-24 23:06:00 -04:00
Steven Rostedt
89ec0dee9e tracing: increase size of number of possible events
With the new event tracing registration, we must increase the number
of events that can be registered. Currently the type field is only
one byte, which leaves us only 256 possible events.

Since we do not save the CPU number in the tracer anymore (it is determined
by the per cpu ring buffer that is used) we have an extra byte to use.

This patch increases the size of type from 1 byte (256 events) to
2 bytes (65,536 events).

It also adds a WARN_ON_ONCE if we exceed that limit.

[ Impact: allow more than 255 events ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-04-23 23:03:19 -04:00
Steven Rostedt
17c873ec28 tracing/events: add export symbols for trace events in modules
Impact: let modules add trace events

The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-04-14 12:58:01 -04:00
Ingo Molnar
86665c75da Merge branch 'tracing/urgent' into tracing/ftrace 2009-04-07 14:41:17 +02:00