Impact: clean up, extend PID filtering to PID namespaces
Eric Biederman suggested using the struct pid for filtering on
pids in the kernel. This patch is based off of a demonstration
of an implementation that Eric sent me in an email.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: New feature
This patch makes the changes to set_ftrace_pid apply to the function
graph tracer.
# echo $$ > /debugfs/tracing/set_ftrace_pid
# echo function_graph > /debugfs/tracing/current_tracer
Will cause only the current task to be traced. Note, the trace flags are
also inherited by child processes, so the children of the shell
will also be traced.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up
Use the new task struct trace flags to determine if a process should be
traced or not.
Note: this moves the searching of the pid to the slow path of setting
the pid field. This needs to be converted to the pid name space.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the file:
/debugfs/tracing/set_graph_function
which can be used along with the function graph tracer.
When this file is empty, the function graph tracer will act as
usual. When the file has a function in it, the function graph
tracer will only trace that function.
For example:
# echo blk_unplug > /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/trace
[...]
------------------------------------------
| 2) make-19003 => kjournald-2219
------------------------------------------
2) | blk_unplug() {
2) | dm_unplug_all() {
2) | dm_get_table() {
2) 1.381 us | _read_lock();
2) 0.911 us | dm_table_get();
2) 1. 76 us | _read_unlock();
2) + 12.912 us | }
2) | dm_table_unplug_all() {
2) | blk_unplug() {
2) 0.778 us | generic_unplug_device();
2) 2.409 us | }
2) 5.992 us | }
2) 0.813 us | dm_table_put();
2) + 29. 90 us | }
2) + 34.532 us | }
You can add up to 32 functions into this file. Currently we limit it
to 32, but this may change with later improvements.
To add another function, use the append '>>':
# echo sys_read >> /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/set_graph_function
blk_unplug
sys_read
Using the '>' will clear out the function and write anew:
# echo sys_write > /debug/tracing/set_graph_function
# cat /debug/tracing/set_graph_function
sys_write
Note, if you have function graph running while doing this, the small
time between clearing it and updating it will cause the graph to
record all functions. This should not be an issue because after
it sets the filter, only those functions will be recorded from then on.
If you need to only record a particular function then set this
file first before starting the function graph tracer. In the future
this side effect may be corrected.
The set_graph_function file is similar to the set_ftrace_filter but
it does not take wild cards nor does it allow for more than one
function to be set with a single write. There is no technical reason why
this is the case, I just do not have the time yet to implement that.
Note, dynamic ftrace must be enabled for this to appear because it
uses the dynamic ftrace records to match the name to the mcount
call sites.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature, let entry function decide to trace or not
This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: new ftrace_graph_stop function
While developing more features of function graph, I hit a bug that
caused the WARN_ON to trigger in the prepare_ftrace_return function.
Well, it was hard for me to find out that was happening because the
bug would not print, it would just cause a hard lockup or reboot.
The reason is that it is not safe to call printk from this function.
Looking further, I also found that it calls unregister_ftrace_graph,
which grabs a mutex and calls kstop machine. This would definitely
lock the box up if it were to trigger.
This patch adds a fast and safe ftrace_graph_stop() which will
stop the function tracer. Then it is safe to call the WARN ON.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: extend and enable the function graph tracer to 64-bit x86
This patch implements the support for function graph tracer under x86-64.
Both static and dynamic tracing are supported.
This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
wanted to use probe_kernel_read/write to make the return address
saving/patching code more generic but it causes tracing recursion.
That would be perhaps useful to implement a notrace version of these
function for other archs ports.
Note that arch/x86/process_64.c is not traced, as in X86-32. I first
thought __switch_to() was responsible of crashes during tracing because I
believed current task were changed inside but that's actually not the
case (actually yes, but not the "current" pointer).
So I will have to investigate to find the functions that harm here, to
enable tracing of the other functions inside (but there is no issue at
this time, while process_64.c stays out of -pg flags).
A little possible race condition is fixed inside this patch too. When the
tracer allocate a return stack dynamically, the current depth is not
initialized before but after. An interrupt could occur at this time and,
after seeing that the return stack is allocated, the tracer could try to
trace it with a random uninitialized depth. It's a prevention, even if I
hadn't problems with it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix "no output from tracer" bug caused by ftrace_update_pid_func()
When disabling single thread function trace using
"echo -1 > set_ftrace_pid", the normal function trace
has to restore to original function, otherwise the normal
function trace will not work well.
Without this commit, something like below:
$ ps |grep 850
850 root 2556 S -/bin/sh
$ echo 850 > /debug/tracing/set_ftrace_pid
$ echo function > /debug/tracing/current_tracer
$ echo 1 > /debug/tracing/tracing_enabled
$ sleep 1
$ echo 0 > /debug/tracing/tracing_enabled
$ cat /debug/tracing/trace_pipe |wc -l
59704
$ echo -1 > /debug/tracing/set_ftrace_pid
$ echo 1 > /debug/tracing/tracing_enabled
$ sleep 1
$ echo 0 > /debug/tracing/tracing_enabled
$ more /debug/tracing/trace_pipe
<====== nothing output now!
it should output trace record.
Signed-off-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make ftrace position computing more sane
First remove useless ->pos field. Then we needn't check seq_printf
in .show like other place.
Signed-off-by: Liming Wang <liming.wang@windriver.com>
Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature
This patch enables function tracing and function return to run together.
I've tested this by enabling the stack tracer and return tracer, where
both the function entry and function return are used together with
dynamic ftrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: more efficient code for ftrace graph tracer
This patch uses the dynamic patching, when available, to patch
the function graph code into the kernel.
This patch will ease the way for letting both function tracing
and function graph tracing run together.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature to function trace a single thread
This patch adds the ability to function trace a single thread.
The file:
/debugfs/tracing/set_ftrace_pid
contains the pid to trace. Valid pids are any positive integer.
Writing any negative number to this file will disable the pid
tracing and the function tracer will go back to tracing all of
threads.
This feature works with both static and dynamic function tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: feature
This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.
The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.
Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.
2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.
I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).
Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...
Here is an example of trace:
sys_read() {
fget_light() {
} 526
vfs_read() {
rw_verify_area() {
security_file_permission() {
cap_file_permission() {
} 519
} 1564
} 2640
do_sync_read() {
pipe_read() {
__might_sleep() {
} 511
pipe_wait() {
prepare_to_wait() {
} 760
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 504
} 1587
clear_buddies() {
} 512
add_cfs_task_weight() {
} 519
update_min_vruntime() {
} 511
} 5602
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 496
} 1631
clear_buddies() {
} 496
update_min_vruntime() {
} 527
} 4580
hrtick_update() {
hrtick_start_fair() {
} 488
} 1489
} 13700
} 14949
} 16016
msecs_to_jiffies() {
} 496
put_prev_task_fair() {
} 504
pick_next_task_fair() {
} 489
pick_next_task_rt() {
} 496
pick_next_task_fair() {
} 489
pick_next_task_idle() {
} 489
------------8<---------- thread 4 ------------8<----------
finish_task_switch() {
} 1203
do_softirq() {
__do_softirq() {
__local_bh_disable() {
} 669
rcu_process_callbacks() {
__rcu_process_callbacks() {
cpu_quiet() {
rcu_start_batch() {
} 503
} 1647
} 3128
__rcu_process_callbacks() {
} 542
} 5362
_local_bh_enable() {
} 587
} 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
calc_delta_mine() {
} 511
update_min_vruntime() {
} 511
} 2813
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: use deeper function tracing depth safely
Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.
So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.
Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork
I chose a default depth of 50. I don't have overruns anymore.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make output of available_filter_functions complete
phenomenon:
The first value of dyn_ftrace_total_info is not equal with
`cat available_filter_functions | wc -l`, but they should be equal.
root cause:
When printing functions with seq_printf in t_show, if the read buffer
is just overflowed by current function record, then this function
won't be printed to user space through read buffer, it will
just be dropped. So we can't see this function printing.
So, every time the last function to fill the read buffer, if overflowed,
will be dropped.
This also applies to set_ftrace_filter if set_ftrace_filter has
more bytes than read buffer.
fix:
Through checking return value of seq_printf, if less than 0, we know
this function doesn't be printed. Then we decrease position to force
this function to be printed next time, in next read buffer.
Another little fix is to show correct allocating pages count.
Signed-off-by: walimis <walimisdev@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.
An example of trace with getnstimeofday set as a filter:
ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: set filtered functions at time the filter is set
It can be confusing when the set_filter_functions is set (or cleared)
and the functions being recorded by the dynamic tracer does not
match.
This patch causes the code to be updated if the function tracer is
enabled and the filter is changed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: correct implementation of dyn ftrace filter
The old decisions made by the filter algorithm was complex and incorrect.
This lead to inconsistent enabling or disabling of functions when
the filter was used.
This patch simplifies that code and in doing so, corrects the usage
of the filters.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: make ftrace_convert_nops() more permissive
Due to the way different architecture linkers combine the data sections
of the mcount_loc (the section that lists all the locations that
call mcount), there may be zeros added in that section. This is usually
due to strange alignments that the linker performs, that pads in zeros.
This patch makes the conversion code to nops skip any pointer in
the mcount_loc section that is NULL.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: allow archs more flexibility on dynamic ftrace implementations
Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.
Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.
The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.
This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:
ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
a nop, where the original text is calling addr. (mod is the
module struct if called by module init)
ftrace_make_caller(rec, addr) - convert the code rec->ip that should
be a nop into a caller to addr.
The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: keep from converting freed records
When the tracer is started or stopped, it converts all code pointed
to by the saved records into callers to ftrace or nops. When modules
are unloaded, their records are freed, but they still exist within
the record pages.
This patch changes the code to skip over freed records.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>