trace_printk can be called from any context, including NMIs.
If this happens, then we must test for for recursion before
grabbing any spinlocks.
This patch prevents trace_printk from being called recursively.
[ Impact: prevent hard lockup in lockdep event tracer ]
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Not all the necessary symbols were exported to allow for tracing
by modules. This patch adds them in.
[ Impact: allow modules to commit data to the ring buffer ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Export the cached task comms to userspace. This allows user apps to translate
the pids from a trace into their respective task command lines.
[ Impact: let userspace apps reading binary buffer know comm's of pids ]
Signed-off-by: Avadh Patel <avadh4all@gmail.com>
[ added error checking and use of buf pointer to index file_buf ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Impact: let modules add trace events
The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This patch changes filter_check_discard() to make use of the new
ring_buffer_discard_commit() function and modifies the current users to
call the old commit function in the non-discard case.
It also introduces a version of filter_check_discard() that uses the
global trace buffer (filter_current_check_discard()) for those cases.
v2 changes:
- fix compile error noticed by Ingo Molnar
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178554.10295.36.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.
This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.
When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)
One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.
The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.
Changes from v1:
As suggested by Frederic Weisbecker:
- get rid of externs in functions
- added unlikely() to filter_check_discard()
Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes]
and pick up the current lineup of tracing/urgent fixes as well
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I got these from strace:
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
I wanted to splice_read 4096 bytes, but it returns 8192 or larger.
It is because the return value of tracing_buffers_splice_read()
does not include "zero out any left over data" bytes.
But tracing_buffers_read() includes these bytes, we make them
consistent.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Cleanup
These two lines:
if (unlikely(*ppos))
return -ESPIPE;
in tracing_buffers_splice_read() are not needed, VFS layer
has disabled seek(2).
We remove these two lines, and then we can update file->f_pos.
And tracing_buffers_read() updates file->f_pos, this fix
make tracing_buffers_splice_read() updates file->f_pos too.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: disable pread()
We set tracing_buffers_fops.llseek to no_llseek,
but we can still perform pread() to read this file.
That is not expected.
This fix uses nonseekable_open() to disable it.
tracing_buffers_fops.llseek is still set to no_llseek,
it mark this file is a "non-seekable device" and is used by
sys_splice(). See also do_splice() or manual of splice(2):
ERRORS
EINVAL Target file system doesn't support splicing;
neither of the descriptors refers to a pipe;
or offset given for non-seekable device.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix a crash while cat trace file
Currently we are using a cpumask to remind each cpu where a
trace occured. It lets us notice the user that a cpu just had
its first trace.
But on latest -tip we have the following crash once we cat the trace
file:
IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/class/net/eth0/carrier
Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81)
EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0
EIP is at print_trace_fmt+0x45/0xe7
EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010
ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000)
Stack:
d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000
d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33
00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000
Call Trace:
[<c0270e5c>] ? print_trace_line+0x170/0x17c
[<c02718e8>] ? s_show+0xa7/0xbd
[<c02bfc33>] ? seq_read+0x24a/0x327
[<c02bf9e9>] ? seq_read+0x0/0x327
[<c02ab18b>] ? vfs_read+0x86/0xe1
[<c02ab289>] ? sys_read+0x40/0x65
[<c0202d8f>] ? sysenter_do_call+0x12/0x3c
Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00
EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc
CR2: 0000000000000000
---[ end trace aa9cf38e5ebed9dd ]---
This is because we alloc the iter->started cpumask on tracing_pipe_open but
not on tracing_open.
It hadn't been noticed until now because we need to have ring buffer overruns
to activate the starting of cpu buffer detection.
Also, we need a check to not print the messagge for the first trace on the file.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Building a kernel with tracing can raise the following warning on
tip/master:
kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf'
We are missing an include to string.h
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix time output bug in 32bits system
ns2usecs() returns 'long', it's incorrect.
(In i386)
...
<idle>-0 [000] 521.442100: _spin_lock <-tick_do_update_jiffies64
<idle>-0 [000] 521.442101: do_timer <-tick_do_update_jiffies64
<idle>-0 [000] 521.442102: update_wall_time <-do_timer
<idle>-0 [000] 521.442102: update_xtime_cache <-update_wall_time
....
(It always print the time less than 2200 seconds besides ...)
Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)
...
<idle>-0 [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
<idle>-0 [001] 4154502640.134760: _local_bh_enable <-__do_softirq
<idle>-0 [001] 4154502640.134761: idle_cpu <-irq_exit
...
(very large value)
Because 'long' is a signed type and it is 32bits in i386.
Changes in v2:
return 'unsigned long long' instead of 'cycle_t'
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.
cat /debug/tracing/trace_stat/functions
Function Hit Time
-------- --- ----
mwait_idle 127 183028.4 us
schedule 26 151997.7 us
__schedule 31 151975.1 us
sys_wait4 2 74080.53 us
do_wait 2 74077.80 us
sys_newlstat 138 39929.16 us
do_path_lookup 179 39845.79 us
vfs_lstat_fd 138 39761.97 us
user_path_at 153 39469.58 us
path_walk 179 39435.76 us
__link_path_walk 189 39143.73 us
[...]
Note the times are skewed due to the function graph tracer not taking
into account schedules.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: give user a choice to show times spent while sleeping
The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.
echo sleep-time > /debug/tracing/trace_options
produces:
------------------------------------------
2) avahi-d-3428 => <idle>-0
------------------------------------------
2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }
where as,
echo nosleep-time > /debug/tracing/trace_options
produces:
0) <idle>-0 => yum-upd-3416
------------------------------------------
0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }
Signed-off-by: Steven Rostedt <srostedt@redhat.com>