The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The ftrace_trigger_soft_disabled() tests if a
trace_event is soft disabled (called but not traced), and returns true if
it is. It has nothing to do with function tracing and should be renamed.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structures ftrace_event_call and
ftrace_event_class have nothing to do with the function hooks, and are
really trace_event structures. Rename ftrace_event_* to trace_event_*.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The name "ftrace" really refers to the function hook infrastructure. It
is not about the trace_events. The structure ftrace_event_file is really
about trace events and not "ftrace". Rename it to trace_event_file.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.
The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.
Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.
This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.
Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.
We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).
One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull tracing updates from Steven Rostedt:
"As the merge window is still open, and this code was not as complex as
I thought it might be. I'm pushing this in now.
This will allow Thomas to debug his irq work for 3.20.
This adds two new features:
1) Allow traceopoints to be enabled right after mm_init().
By passing in the trace_event= kernel command line parameter,
tracepoints can be enabled at boot up. For debugging things like
the initialization of interrupts, it is needed to have tracepoints
enabled very early. People have asked about this before and this
has been on my todo list. As it can be helpful for Thomas to debug
his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
add tracepoints into the IRQ set up and have users enable them when
things go wrong.
2) Have the tracepoints printed via printk() (the console) when they
are triggered.
If the irq code locks up or reboots the box, having the tracepoint
output go into the kernel ring buffer is useless for debugging.
But being able to add the tp_printk kernel command line option
along with the trace_event= option will have these tracepoints
printed as they occur, and that can be really useful for debugging
early lock up or reboot problems.
This code is not that intrusive and it passed all my tests. Thomas
tried them out too and it works for his needs.
Link: http://lkml.kernel.org/r/20141214201609.126831471@goodmis.org"
* tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add tp_printk cmdline to have tracepoints go to printk()
tracing: Move enabling tracepoints to just after rcu_init()
Pull tracing updates from Steven Rostedt:
"There was a lot of clean ups and minor fixes. One of those clean ups
was to the trace_seq code. It also removed the return values to the
trace_seq_*() functions and use trace_seq_has_overflowed() to see if
the buffer filled up or not. This is similar to work being done to
the seq_file code as well in another tree.
Some of the other goodies include:
- Added some "!" (NOT) logic to the tracing filter.
- Fixed the frame pointer logic to the x86_64 mcount trampolines
- Added the logic for dynamic trampolines on !CONFIG_PREEMPT systems.
That is, the ftrace trampoline can be dynamically allocated and be
called directly by functions that only have a single hook to them"
* tag 'trace-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (55 commits)
tracing: Truncated output is better than nothing
tracing: Add additional marks to signal very large time deltas
Documentation: describe trace_buf_size parameter more accurately
tracing: Allow NOT to filter AND and OR clauses
tracing: Add NOT to filtering logic
ftrace/fgraph/x86: Have prepare_ftrace_return() take ip as first parameter
ftrace/x86: Get rid of ftrace_caller_setup
ftrace/x86: Have save_mcount_regs macro also save stack frames if needed
ftrace/x86: Add macro MCOUNT_REG_SIZE for amount of stack used to save mcount regs
ftrace/x86: Simplify save_mcount_regs on getting RIP
ftrace/x86: Have save_mcount_regs store RIP in %rdi for first parameter
ftrace/x86: Rename MCOUNT_SAVE_FRAME and add more detailed comments
ftrace/x86: Move MCOUNT_SAVE_FRAME out of header file
ftrace/x86: Have static tracing also use ftrace_caller_setup
ftrace/x86: Have static function tracing always test for function graph
kprobes: Add IPMODIFY flag to kprobe_ftrace_ops
ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict
kprobes/ftrace: Recover original IP if pre_handler doesn't change it
tracing/trivial: Fix typos and make an int into a bool
tracing: Deletion of an unnecessary check before iput()
...
The functions trace_seq_printf() and friends will not be returning values
soon and will be void functions. To know if they succeeded or not, the
functions trace_seq_has_overflowed() and trace_handle_return() should be
used instead.
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls. If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.
# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653 [000] 384.675777: sys_enter: NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
true-653 [000] 384.675812: sys_exit: NR 192 = 1995915264
true-653 [000] 384.675971: sys_enter: NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
true-653 [000] 384.675988: sys_exit: NR 983045 = 0
...
# trace-cmd record -e syscalls:* true
[ 17.289329] Unable to handle kernel paging request at virtual address aaaaaace
[ 17.289590] pgd = 9e71c000
[ 17.289696] [aaaaaace] *pgd=00000000
[ 17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 17.290169] Modules linked in:
[ 17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[ 17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
[ 17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[ 17.290866] LR is at syscall_trace_enter+0x124/0x184
Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
Commit cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.
Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
Fixes: cd0980fc8a "tracing: Check invalid syscall nr while tracing syscalls"
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The uses of "rcu_assign_pointer()" are NULLing out the pointers.
According to RCU_INIT_POINTER()'s block comment:
"1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
it is better to use it instead of rcu_assign_pointer() because it has a
smaller overhead.
The following Coccinelle semantic patch was used:
@@
@@
- rcu_assign_pointer
+ RCU_INIT_POINTER
(..., NULL)
Link: http://lkml.kernel.org/p/20140822142822.GA32391@ada
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The event trigger code that checks for callback triggers before and
after recording of an event has lots of flags checks. This code is
duplicated throughout the ftrace events, kprobes and system calls.
They all do the exact same checks against the event flags.
Added helper functions ftrace_trigger_soft_disabled(),
event_trigger_unlock_commit() and event_trigger_unlock_commit_regs()
that consolidated the code and these are used instead.
Link: http://lkml.kernel.org/r/20140106222703.5e7dbba2@gandalf.local.home
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
trigger has a filter associated with it, the trigger invocation now
needs to happen after the { assign; } part of the call, in order for
the trigger condition to be tested.
There's still a SOFT_DISABLED-only check at the top of e.g. the
ftrace_raw_events function, so when an event is soft disabled but not
because of the presence of a trigger, the original SOFT_DISABLED
behavior remains unchanged.
There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().
To simplify the above and make it more efficient, the TRIGGER_COND bit
is introduced, which is set only if a soft-disabled trigger needs to
use the log record for filter testing or needs to wait until the
current log record is closed.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.
Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Idea-by: Steve Rostedt <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
It has been reported that boot up with FTRACE_SELFTEST enabled can take a
very long time. There can be stalls of over a minute.
This was tracked down to the synchronize_sched() called when a system call
event is disabled. As the self tests enable and disable thousands of events,
this makes the synchronize_sched() get called thousands of times.
The synchornize_sched() was added with d562aff93b "tracing: Add support
for SOFT_DISABLE to syscall events" which caused this regression (added
in 3.13-rc1).
The synchronize_sched() is to protect against the events being accessed
when a tracer instance is being deleted. When an instance is being deleted
all the events associated to it are unregistered. The synchronize_sched()
makes sure that no more users are running when it finishes.
Instead of calling synchronize_sched() for all syscall events, we only
need to call it once, after the events are unregistered and before the
instance is deleted. The event_mutex is held during this action to
prevent new users from enabling events.
Link: http://lkml.kernel.org/r/20131203124120.427b9661@gandalf.local.home
Reported-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it.
Add an array of ftrace_event_file pointers indexed by syscall number
to the trace array and remove the existing enabled bitmaps, which as a
result are now redundant. The ftrace_event_file structs in turn
contain the soft disable flags we need for per-syscall soft disable
accounting.
Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
bit, thus enabling multibuffer filter support for syscall events.
Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:
Before:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 2048
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.
After:
# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048
We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters. For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.
The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).
Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Every perf_trace_buf_prepare() caller does
WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
almost the same.
Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
the meaning of _ONCE, but I think this is fine.
- 4947014 2932448 10104832 17984294 1126b26 vmlinux
+ 4948422 2932448 10104832 17985702 11270a6 vmlinux
on my build.
Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently, the way the latency tracers and snapshot feature works
is to have a separate trace_array called "max_tr" that holds the
snapshot buffer. For latency tracers, this snapshot buffer is used
to swap the running buffer with this buffer to save the current max
latency.
The only items needed for the max_tr is really just a copy of the buffer
itself, the per_cpu data pointers, the time_start timestamp that states
when the max latency was triggered, and the cpu that the max latency
was triggered on. All other fields in trace_array are unused by the
max_tr, making the max_tr mostly bloat.
This change removes the max_tr completely, and adds a new structure
called trace_buffer, that holds the buffer pointer, the per_cpu data
pointers, the time_start timestamp, and the cpu where the latency occurred.
The trace_array, now has two trace_buffers, one for the normal trace and
one for the max trace or snapshot. By doing this, not only do we remove
the bloat from the max_trace but the instances of traces can now use
their own snapshot feature and not have just the top level global_trace have
the snapshot feature and latency tracers for itself.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>