Commit Graph

2070 Commits

Author SHA1 Message Date
Li Zefan
16620e0f19 ksym_tracer: Fix bad cast
Fix this warning:

kernel/trace/trace_ksym.c: In function 'ksym_trace_filter_read':
kernel/trace/trace_ksym.c:239: warning: cast to pointer from integer of different size

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
LKML-Reference: <4B1DC578.9020909@cn.fujitsu.com>
[remove the strstrip fix as tglx already fixed that]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:46:54 +01:00
Li Zefan
472bbe02c9 tracing/power: Remove two exports
trace_power_start and trace_power_end are used in
arch/x86/kernel/power.c, and this file can't be compiled
as a module, so these two tracepoints don't need to be
exported.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC55F.7060305@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:28 +01:00
Li Zefan
e00bf2ec60 tracing: Change event->profile_count to be int type
Like total_profile_count, struct ftrace_event_call::profile_count
is protected by event_mutex, so it doesn't need to be atomic_t.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4B1DC549.5010705@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:28 +01:00
Li Zefan
8d18eaaff5 tracing: Simplify trace_option_write()
- remove duplicate code inside trace_options_write()
- extract duplicate code in trace_options_write() and set_tracer_option()

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC532.9010802@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:28 +01:00
Li Zefan
2cbafd68b8 tracing: Remove useless trace option
Since commit 4d9493c90f
("ftrace: remove add-hoc code"), option "sched-tree"
has become useless.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC50A.7040402@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Li Zefan
13f16d2091 tracing: Use seq file for trace_clock
The buffer for the output is as small as 64 bytes, so it'll
overflow if we add more clock type. Use seq file instead.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4FB.5030407@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Li Zefan
fdb372ed4c tracing: Use seq file for trace_options
Code simplification for reading trace_options.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-reference: <4B1DC4EF.3090106@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:27 +01:00
Li Zefan
91baf6285b function-graph: Allow writing the same val to set_graph_function
# echo 'do_open' > set_graph_function
 # echo 'do_open' >> set_graph_function
 bash: echo: write error: Invalid argument

Make it valid to write the same value to set_graph_function,
which is consistent with set_ftrace_filter interface.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-reference: <4B1DC4E1.1060303@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:26 +01:00
Li Zefan
313254a940 ftrace: Call trace_parser_clear() properly
I found a weird behavior:

  # echo 'fuse:*' > set_ftrace_filter
  bash: echo: write error: Invalid argument
  # cat set_ftrace_filter
  fuse_dev_fasync
  fuse_dev_poll
  fuse_copy_do

We should call trace_parser_clear() no matter ftrace_process_regex()
returns 0 or -errno, otherwise we will actually take the unaccepted
records from ftrace_regex_release().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4D2.3000406@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:26 +01:00
Li Zefan
311d16da57 ftrace: Return EINVAL when writing invalid val to set_ftrace_filter
Currently it doesn't warn user on invald value:

 # echo nonexist_symbol > set_ftrace_filter
or:
 # echo 'nonexist_symbol:mod:fuse' > set_ftrace_filter

Better make it return failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4BF.2070003@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:25 +01:00
Li Zefan
3b8e427381 tracing: Move a printk out of ftrace_raw_reg_event_foo()
Move the printk from each ftrace_raw_reg_event_foo() to
its caller ftrace_event_enable_disable(). This avoids each
regfunc trace event callbacks to handle a same error report
that can be carried from the caller.

See how much space this saves:

   text    data     bss     dec     hex filename
5345151 1961864 7103260 14410275         dbe223 vmlinux.o.old
5331487 1961864 7103260 14396611         dbacc3 vmlinux.o

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <4B1DC4AC.802@cn.fujitsu.com>
[start cmdline record before calling regfunc to avoid lost
window of pid to comm resolution]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:37:25 +01:00
Li Zefan
614a71a26b tracing: Pull up calls to trace_define_common_fields()
Call trace_define_common_fields() in event_create_dir() only.
This avoids trace events to handle it from their define_fields
callbacks and shrinks the kernel code size:

   text    data     bss     dec     hex filename
5346802 1961864 7103260 14411926         dbe896 vmlinux.o.old
5345151 1961864 7103260 14410275         dbe223 vmlinux.o

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
LKML-Reference: <4B1DC49C.8000107@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:34:23 +01:00
Li Zefan
87d9b4e1c5 tracing: Extract duplicate ftrace_raw_init_event_foo()
Use a generic trace_event_raw_init() function for all event's raw_init
callbacks (but kprobes) instead of defining the same version for each
of these.
This shrinks the kernel code:

   text    data     bss     dec     hex filename
5355293 1961928 7103260 14420481         dc0a01 vmlinux.o.old
5346802 1961864 7103260 14411926         dbe896 vmlinux.o

raw_init can't be removed, because ftrace events and kprobe events
use different raw_init callbacks. Though it's possible to totally
remove raw_init, I choose to leave it as it is for now.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4B1DC48C.7080603@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-12-13 18:34:23 +01:00
Sam Ravnborg
273b281fa2 kbuild: move utsrelease.h to include/generated
Fix up all users of utsrelease.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:15 +01:00
Linus Torvalds
df7147b3c3 Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
  tracing: Fix function graph trace_pipe to properly display failed entries
  tracing: Add full state to trace_seq
  tracing: Buffer the output of seq_file in case of filled buffer
  tracing: Only call pipe_close if pipe_close is defined
  tracing: Add pipe_close interface
2009-12-11 20:47:44 -08:00
Linus Torvalds
6f696eb17b Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (57 commits)
  x86, perf events: Check if we have APIC enabled
  perf_event: Fix variable initialization in other codepaths
  perf kmem: Fix unused argument build warning
  perf symbols: perf_header__read_build_ids() offset'n'size should be u64
  perf symbols: dsos__read_build_ids() should read both user and kernel buildids
  perf tools: Align long options which have no short forms
  perf kmem: Show usage if no option is specified
  sched: Mark sched_clock() as notrace
  perf sched: Add max delay time snapshot
  perf tools: Correct size given to memset
  perf_event: Fix perf_swevent_hrtimer() variable initialization
  perf sched: Fix for getting task's execution time
  tracing/kprobes: Fix field creation's bad error handling
  perf_event: Cleanup for cpu_clock_perf_event_update()
  perf_event: Allocate children's perf_event_ctxp at the right time
  perf_event: Clean up __perf_event_init_context()
  hw-breakpoints: Modify breakpoints without unregistering them
  perf probe: Update perf-probe document
  perf probe: Support --del option
  trace-kprobe: Support delete probe syntax
  ...
2009-12-11 20:47:30 -08:00
Steven Rostedt
cc51a0fca6 tracing: Add stack trace to irqsoff tracer
The irqsoff and friends tracers help in finding causes of latency in the
kernel. The also work with the function tracer to show what was happening
when interrupts or preemption are disabled. But the function tracer has
a bit of an overhead and can cause exagerated readings.

Currently, when tracing with /proc/sys/kernel/ftrace_enabled = 0, where the
function tracer is disabled, the information that is provided can end up
being useless. For example, a 2 and a half millisecond latency only showed:

 # tracer: preemptirqsoff
 #
 # preemptirqsoff latency trace v1.1.5 on 2.6.32
 # --------------------------------------------------------------------
 # latency: 2463 us, #4/4, CPU#2 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
 #    -----------------
 #    | task: -4242 (uid:0 nice:0 policy:0 rt_prio:0)
 #    -----------------
 #  => started at: _spin_lock_irqsave
 #  => ended at:   remove_wait_queue
 #
 #
 #                  _------=> CPU#
 #                 / _-----=> irqs-off
 #                | / _----=> need-resched
 #                || / _---=> hardirq/softirq
 #                ||| / _--=> preempt-depth
 #                |||| /_--=> lock-depth
 #                |||||/     delay
 #  cmd     pid   |||||| time  |   caller
 #     \   /      ||||||   \   |   /
 hackbenc-4242    2d....    0us!: trace_hardirqs_off <-_spin_lock_irqsave
 hackbenc-4242    2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
 hackbenc-4242    2...1. 2466us : trace_preempt_on <-remove_wait_queue

The above lets us know that hackbench with pid 2463 grabbed a spin lock
somewhere and enabled preemption at remove_wait_queue. This helps a little
but where this actually happened is not informative.

This patch adds the stack dump to the end of the irqsoff tracer. This provides
the following output:

 hackbenc-4242    2d....    0us!: trace_hardirqs_off <-_spin_lock_irqsave
 hackbenc-4242    2...1. 2463us+: _spin_unlock_irqrestore <-remove_wait_queue
 hackbenc-4242    2...1. 2466us : trace_preempt_on <-remove_wait_queue
 hackbenc-4242    2...1. 2467us : <stack trace>
  => sub_preempt_count
  => _spin_unlock_irqrestore
  => remove_wait_queue
  => free_poll_entry
  => poll_freewait
  => do_sys_poll
  => sys_poll
  => system_call_fastpath

Now we see that the culprit of this latency was the free_poll_entry code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-11 13:19:51 -05:00
Steven Rostedt
03889384ce tracing: Add trace_dump_stack()
I've been asked a few times about how to find out what is calling
some location in the kernel. One way is to use dynamic function tracing
and implement the func_stack_trace. But this only finds out who is
calling a particular function. It does not tell you who is calling
that function and entering a specific if conditional.

I have myself implemented a quick version of trace_dump_stack() for
this purpose a few times, and just needed it now. This is when I realized
that this would be a good tool to have in the kernel like trace_printk().

Using trace_dump_stack() is similar to dump_stack() except that it
writes to the trace buffer instead and can be used in critical locations.

For example:

@@ -5485,8 +5485,12 @@ need_resched_nonpreemptible:
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
 		if (unlikely(signal_pending_state(prev->state, prev)))
 			prev->state = TASK_RUNNING;
-		else
+		else {
 			deactivate_task(rq, prev, 1);
+			trace_printk("Deactivating task %s:%d\n",
+				     prev->comm, prev->pid);
+			trace_dump_stack();
+		}
 		switch_count = &prev->nvcsw;
 	}

Produces:

           <...>-3249  [001]   296.105269: schedule: Deactivating task ntpd:3249
           <...>-3249  [001]   296.105270: <stack trace>
 => schedule
 => schedule_hrtimeout_range
 => poll_schedule_timeout
 => do_select
 => core_sys_select
 => sys_select
 => system_call_fastpath

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-11 10:38:47 -05:00
Steven Rostedt
dd7f594357 ring-buffer: Move resize integrity check under reader lock
While using an application that does splice on the ftrace ring
buffer at start up, I triggered an integrity check failure.

Looking into this, I discovered that resizing the buffer performs
an integrity check after the buffer is resized. This check unfortunately
is preformed after it releases the reader lock. If a reader is
reading the buffer it may cause the integrity check to trigger a
false failure.

This patch simply moves the integrity checker under the protection
of the ring buffer reader lock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-10 23:20:52 -05:00
Steven Rostedt
184210154b ring-buffer: Use sync sched protection on ring buffer resizing
There was a comment in the ring buffer code that says the calling
layers should prevent tracing or reading of the ring buffer while
resizing. I have discovered that the tracers do not honor this
arrangement.

This patch moves the disabling and synchronizing the ring buffer to
a higher layer during resizing. This guarantees that no writes
are occurring while the resize takes place.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-10 22:54:27 -05:00
Thomas Gleixner
d954fbf0ff tracing: Fix wrong usage of strstrip in trace_ksyms
strstrip returns a pointer to the first non space character, but the
code in parse_ksym_trace_str() ignores that.

strstrip is now must_check and therefor we get the correct warning:
kernel/trace/trace_ksym.c:294: warning:
ignoring return value of ‘strstrip’, declared with attribute warn_unused_result

We are really not interested in leading whitespace here.

Fix that and cleanup the dozen kfree() exit pathes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2009-12-11 00:01:36 +01:00
Carsten Emde
f2942487ff tracing: Remove comparing of NULL to va_list in trace_array_vprintk()
Olof Johansson stated the following:

  Comparing a va_list with NULL is bogus. It's supposed to be treated like
  an opaque type and only be manipulated with va_* accessors.

Olof noticed that this code broke the ARM builds:

    kernel/trace/trace.c: In function 'trace_array_vprintk':
    kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
    kernel/trace/trace.c: In function 'tracing_mark_write':
    kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'

This patch partly reverts c13d2f7c32 and
re-installs the original mark_printk() mechanism.

Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B1BAB74.104@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 14:20:08 -05:00
Jiri Olsa
be1eca3931 tracing: Fix function graph trace_pipe to properly display failed entries
There is a case where the graph tracer might get confused and omits
displaying of a single record.  This applies mostly with the trace_pipe
since it is unlikely that the trace_seq buffer will overflow with the
trace file.

As the function_graph tracer goes through the trace entries keeping a
pointer to the current record:

current ->  func1 ENTRY
            func2 ENTRY
            func2 RETURN
            func1 RETURN

When an function ENTRY is encountered, it moves the pointer to the
next entry to check if the function is a nested or leaf function.

            func1 ENTRY
current ->  func2 ENTRY
            func2 RETURN
            func1 RETURN

If the rest of the writing of the function fills the trace_seq buffer,
then the trace_pipe read will ignore this entry. The next read will
Now start at the current location, but the first entry (func1) will
be discarded.

This patch keeps a copy of the current entry in the iterator private
storage and will keep track of when the trace_seq buffer fills. When
the trace_seq buffer fills, it will reuse the copy of the entry in the
next iteration.

[
  This patch has been largely modified by Steven Rostedt in order to
  clean it up and simplify it. The original idea and concept was from
  Jirka and for that, this patch will go under his name to give him
  the credit he deserves. But because this was modify by Steven Rostedt
  anything wrong with the patch should be blamed on Steven.
]

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1259067458-27143-1-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 14:09:06 -05:00
Johannes Berg
d184b31c0e tracing: Add full state to trace_seq
The trace_seq buffer might fill up, and right now one needs to check the
return value of each printf into the buffer to check for that.

Instead, have the buffer keep track of whether it is full or not, and
reject more input if it is full or would have overflowed with an input
that wasn't added.

Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 14:05:49 -05:00
Steven Rostedt
a63ce5b306 tracing: Buffer the output of seq_file in case of filled buffer
If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.

What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).

The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.

To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.

Before this patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)               |          __fsnotify_parent() {
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

After the patch:

 2)               |      fput() {
 2)               |        __fput() {
 2)   0.550 us    |          inotify_inode_queue_event();
 2)   0.548 us    |          __fsnotify_parent();
 2)   0.540 us    |          inotify_dentry_parent_queue_event();

[
  Updated the patch to fix a missing return 0 from the trace_print_seq()
  stub when CONFIG_TRACING is disabled.

  Reported-by: Ingo Molnar <mingo@elte.hu>
]

Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-12-09 13:55:26 -05:00