Impact: prevent possible memory leak
The reader page of the ring buffer is special. Although it points
into the ring buffer, it is not part of the actual buffer. It is
a page used by the reader to swap with a page in the ring buffer.
Once the swap is made, the new reader page is again outside the
buffer.
Even though the reader page points into the buffer, it is really
pointing to residual data. Note, this data is used by the reader.
reader page
|
v
(prev) +---+ (next)
+----------| |----------+
| +---+ |
v v
+---+ +---+ +---+
-->| |------->| |------->| |--->
<--| |<-------| |<-------| |<---
+---+ +---+ +---+
^ ^ ^
\ | /
------- Buffer---------
If we perform a list_del_init() on the reader page we will actually remove
the last page the reader swapped with and not the reader page itself.
This will cause that page to not be freed, and thus is a memory leak.
Luckily, the only user of the ring buffer so far is ftrace. And ftrace
will not free its ring buffer after it allocates it. There is no current
possible memory leak. But once there are other users, or if ftrace
dynamically creates and frees its ring buffer, then this would be a
memory leak.
This patch fixes the leak for future cases.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to permanent disabling of function graph tracer
There should be nothing to prevent a tracer from unregistering a
function graph callback more than once. This can simplify error paths.
But currently, the counter does not account for mulitple unregistering
of the function graph callback. If it happens, the function graph
tracer will be permanently disabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
Many declarations within trace_output.h are missing the 'extern' keyword
in an inconsistent manner. This adds 'extern' where it should be.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
trace_seq_reserve() allows a caller to reserve space in a trace_seq and
write directly into it. This makes it easier to export binary data to
userspace via the tracing interface, by simply filling in a struct.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
blk_trace_event_print() and blk_tracer_print_line() share most of the code.
text data bss dec hex filename
8605 393 12 9010 2332 kernel/trace/blktrace.o.orig
text data bss dec hex filename
8555 393 12 8960 2300 kernel/trace/blktrace.o
This patch also prepares for the next patch, that prints out BLK_TN_MESSAGE.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix mixed ioctl and ftrace-plugin blktrace use memory leak
When mixing the use of ioctl-based blktrace and ftrace-based blktrace,
we can leak memory in this way:
# btrace /dev/sda > /dev/null &
# echo 0 > /sys/block/sda/sda1/trace/enable
now we leak bt->dropped_file, bt->msg_file, bt->rchan...
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix mixed ioctl and ftrace-plugin blktrace use refcount bugs
ioctl-based blktrace allocates bt and registers tracepoints when
ioctl(BLKTRACESETUP), and do all cleanups when ioctl(BLKTRACETEARDOWN).
while ftrace-based blktrace allocates/frees bt when:
# echo 1/0 > /sys/block/sda/sda1/trace/enable
and registers/unregisters tracepoints when:
# echo blk/nop > /debugfs/tracing/current_tracer
or
# echo 1/0 > /debugfs/tracing/tracing_enable
The separatation of allocation and registeration causes 2 problems:
1. current user-space blktrace still calls ioctl(TEARDOWN) when
ioctl(SETUP) failed:
# echo 1 > /sys/block/sda/sda1/trace/enable
# blktrace /dev/sda
BLKTRACESETUP: Device or resource busy
^C
and now blk_probes_ref == -1
2. Another way to make blk_probes_ref == -1:
# plugin sdb && mount sdb1
# echo 1 > /sys/block/sdb/sdb1/trace/enable
# remove sdb
This patch does the allocation and registeration when writing
sdaX/trace/enable.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently the original blktrace, which is using relay and is used via
ioctl, is broken. You can use ftrace to see the output of blktrace,
but user-space blktrace is unusable.
It's broken by "blktrace: add ftrace plugin"
(c71a896154)
- if (unlikely(bt->trace_state != Blktrace_running))
+ if (unlikely(bt->trace_state != Blktrace_running || !blk_tracer_enabled))
return;
With this patch, both ioctl and ftrace can be used, but of course you
can't use both of them at the same time.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
t1 t2
------ ------
do_blk_trace_setup() do_blk_trace_setup()
if (!blk_tree_root) {
if (!blk_tree_root)
blk_tree_root = create_dir()
blk_tree_root = create_dir();
(now blk_tree_root == NULL)
...
dir = create_dir(name, blk_tree_root);
Due to this race, t1 will create 'dir' in /debugfs but not /debugfs/block.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I found the timestamp is wrong:
# echo bin > trace_option
# echo blk > current_tracer
# cat trace_pipe | blkparse -i -
8,0 0 0 0.000000000 504 A W ...
...
8,7 1 0 0.008534097 0 C R ...
(should be 8.534097xxx)
user-space blkparse expects the timestamp to be nanosecond.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix crash (hang) when using TRACE_EVENT_FORMAT filter files
filters are only hooked up to the tracepoint events defined using
TRACE_EVENT but not the tracers that use TRACE_EVENT_FORMAT, such
as ftrace.
Do not display the filter files at all for TRACE_EVENT_FORMAT events
for the time being.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237878882.8339.61.camel@charm-linux>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Show the average time in the function (Time / Hit)
Function Hit Time Avg
-------- --- ---- ---
mwait_idle 51 140326.6 us 2751.503 us
smp_apic_timer_interrupt 47 3517.735 us 74.845 us
schedule 10 2738.754 us 273.875 us
__schedule 10 2732.857 us 273.285 us
hrtimer_interrupt 47 1896.104 us 40.342 us
irq_exit 56 1711.833 us 30.568 us
__run_hrtimer 47 1315.589 us 27.991 us
tick_sched_timer 47 1138.690 us 24.227 us
do_softirq 56 1116.829 us 19.943 us
__do_softirq 56 1066.932 us 19.052 us
do_IRQ 9 926.153 us 102.905 us
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: safer code
The on the fly allocator for the function profiler was to save
memory. But at the expense of stability. Although it survived several
tests, allocating from the function tracer is just too risky, just
to save space.
This patch removes the allocator and simply allocates enough entries
at start up.
Each function gets a profiling structure of 40 bytes. With an average
of 20K functions, and this is for each CPU, we have 800K per online
CPU. This is not too bad, at least for non-embedded.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
"Because when we call ftrace_free_rec we change the rec->ip to point to the
next record in the chain. Something is very wrong if rec->ip >= s &&
rec->ip < e and the record is already free."
"Note, use FTRACE_WARN_ON() macro. This way it shuts down ftrace if it is
hit and helps to avoid further damage later."
-- Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: make trace_stat files show items with the original order
trace_stat tracer reverse the items, it makes the output
looks a little ugly.
Example, when we read trace_stat/workqueues, we get cpu#7's stat.
at first, and then cpu#6... cpu#0.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <49C9F23F.5040307@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar suggested clean ups for the profiling code. This patch
makes those updates.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>