Commit Graph

178 Commits

Author SHA1 Message Date
Masami Hiramatsu 6ed0106667 tracing: Return -EBUSY when event_enable_func() fails to get module
Since try_module_get() returns false( = 0) when it fails to
pindown a module, event_enable_func() returns 0 which means
"succeed". This can cause a kernel panic when the entry
is removed, because the event is already released.

This fixes the bug by returning -EBUSY, because the reason
why it fails is that the module is being removed at that time.

Link: http://lkml.kernel.org/r/20130516114848.13508.97899.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-16 11:01:16 -04:00
Steven Rostedt (Red Hat) da511bf33e tracing: Add helper function trace_create_new_event() to remove duplicate code
Both __trace_add_new_event() and __trace_early_add_new_event() do
basically the same thing, except that __trace_add_new_event() does
a little more.

Instead of having duplicate code between the two functions, add
a helper function trace_create_new_event() that both can use.
This will help against having bugs fixed in one function but not
the other.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:49 -04:00
Masami Hiramatsu 1cf4c0732d tracing: Modify soft-mode only if there's no other referrer
Modify soft-mode flag only if no other soft-mode referrer
(currently only the ftrace triggers) by using a reference
counter in each ftrace_event_file.

Without this fix, adding and removing several different
enable/disable_event triggers on the same event clear
soft-mode bit from the ftrace_event_file. This also
happens with a typo of glob on setting triggers.

e.g.

 # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter
 # cat events/net/netif_rx/enable
 0*
 # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter
 # cat events/net/netif_rx/enable
 0
 # cat set_ftrace_filter
 #### all functions enabled ####
 vfs_symlink:enable_event:net:netif_rx:unlimited

As above, we still have a trigger, but soft-mode is gone.

Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:25 -04:00
Masami Hiramatsu 30052170dc tracing: Indicate enabled soft-mode in enable file
Indicate enabled soft-mode event as "1*" in "enable" file
for each event, because it can be soft-disabled when disable_event
trigger is hit.

Link: http://lkml.kernel.org/r/20130509054426.30398.28202.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:14:07 -04:00
Steven Rostedt (Red Hat) ff305ded9f tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
register_ftrace_function_probe() returns the number of functions
it registered, which can be zero, it can also return a negative number
if something went wrong. But event_enable_func() only checks for
the case that it didn't register anything, it needs to also check
for the case that something went wrong and return that error code
as well.

Added some comments about the code as well, to make it more
understandable.

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 11:30:26 -04:00
Masami Hiramatsu a5b85bd155 tracing: Don't succeed if event_enable_func did not register anything
Return 0 instead of the number of activated ftrace function probes if
event_enable_func succeeded and return an error code if it failed or
did not register any functions. But it currently returns the number
of registered functions and if it didn't register anything, it returns 0,
but that is considered success.

This also fixes the return value. As if it succeeds, it returns the
number of functions that were enabled, which is returned back to
the user in ftrace_regex_write (the write() return code). If only
one function is enabled, then the return code of the write is one,
and this can confuse the user program in thinking it only wrote 1
byte.

Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Rewrote change log to reflect that this fixes two bugs - SR ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 11:26:01 -04:00
zhangwei(Jovi) 52f6ad6dc3 tracing: Rename trace_event_mutex to trace_event_sem
trace_event_mutex is an rw semaphore now, not a mutex, change the name.

Link: http://lkml.kernel.org/r/513D843B.40109@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ Forward ported to my new code ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:10 -04:00
zhangwei(Jovi) ad7067cebf tracing: Convert trace_destroy_fields() to static
trace_destroy_fields() is not used outside of the file. It can be
a static function.

Link: http://lkml.kernel.org/r/513D842A.2000907@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:08 -04:00
zhangwei(Jovi) b3a8c6fd7b tracing: Move find_event_field() into trace_events.c
By moving find_event_field() and trace_find_field() into trace_events.c,
the ftrace_common_fields list and trace_get_fields() can become local to
the trace_events.c file.

find_event_field() is renamed to trace_find_event_field() to conform to
the tracing global function names.

Link: http://lkml.kernel.org/r/513D8426.9070109@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
[ rostedt: Modified trace_find_field() to trace_find_event_field() ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 13:22:07 -04:00
Steven Rostedt (Red Hat) 3cd715de26 tracing: Add function probe triggers to enable/disable events
Add triggers to function tracer that lets an event get enabled or
disabled when a function is called:

format is:

 <function>:enable_event:<system>:<event>[:<count>]
 <function>:disable_event:<system>:<event>[:<count>]

 echo 'schedule:enable_event:sched:sched_switch' > /debug/tracing/set_ftrace_filter

Every time schedule is called, it will enable the sched_switch event.

 echo 'schedule:disable_event:sched:sched_switch:2' > /debug/tracing/set_ftrace_filter

The first two times schedule is called while the sched_switch
event is enabled, it will disable it. It will not count for a time
that the event is already disabled (or enabled for enable_event).

[ fixed return without mutex_unlock() - thanks to Dan Carpenter and smatch ]

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:36:04 -04:00
Steven Rostedt (Red Hat) 417944c4c7 tracing: Add a way to soft disable trace events
In order to let triggers enable or disable events, we need a 'soft'
method for doing so. For example, if a function probe is added that
lets a user enable or disable events when a function is called, that
change must be done without taking locks or a mutex, and definitely
it can't sleep. But the full enabling of a tracepoint is expensive.

By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
without the protection of a mutex (using set/clear_bit()), this soft
disable flag can be used to allow critical sections to enable or disable
events from being traced (after the event has been placed into "SOFT_MODE").

Some caveats though: The comm recorder (to map pids with a comm) can not
be soft disabled (yet). If you disable an event with with a "soft"
disable and wait a while before reading the trace, the comm cache may be
replaced and you'll get a bunch of <...> for comms in the trace.

Reading the "enable" file for an event that is disabled will now give
you "0*" where the '*' denotes that the tracepoint is still active but
the event itself is "disabled".

[ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:36:03 -04:00
Steven Rostedt (Red Hat) 55034cd6e6 tracing: Add alloc_snapshot kernel command line parameter
If debugging the kernel, and the developer wants to use
tracing_snapshot() in places where tracing_snapshot_alloc() may
be difficult (or more likely, the developer is lazy and doesn't
want to bother with tracing_snapshot_alloc() at all), then adding

  alloc_snapshot

to the kernel command line parameter will tell ftrace to allocate
the snapshot buffer (if configured) when it allocates the main
tracing buffer.

I also noticed that ring_buffer_expanded and tracing_selftest_disabled
had inconsistent use of boolean "true" and "false" with "0" and "1".
I cleaned that up too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:35:53 -04:00
Steven Rostedt (Red Hat) 873c642f59 tracing: Clear all trace buffers when unloaded module event was used
Currently we do not know what buffer a module event was enabled in.
On unload, it is safest to clear all buffer instances, not just the
top level buffer.

Todo: Clear only the buffer that the event was used in. The
infrastructure is there to do this, but it makes the code a bit
more complex. Lets get the current code vetted before we add that.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:57 -04:00
Steven Rostedt (Red Hat) 575380da8b tracing: Only clear trace buffer on module unload if event was traced
Currently, when a module with events is unloaded, the trace buffer is
cleared. This is just a safety net in case the module might have some
strange callback when its event is outputted. But there's no reason
to reset the buffer if the module didn't have any of its events traced.

Add a flag to the event "call" structure called WAS_ENABLED and gets set
when the event is ever enabled, and this flag never gets cleared. When a
module gets unloaded, if any of its events have this flag set, then the
trace buffer will get cleared.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:56 -04:00
Steven Rostedt (Red Hat) 315326c16a tracing: Fix trace events build without modules
The new multi-buffers added a descriptor that kept track of module
events, and the directories they use, with struct ftace_module_file_ops.
This is used to add a ref count to keep modules from unloading while
their files are being accessed.

As the descriptor is only needed when CONFIG_MODULES is enabled, it
is only declared when the config is enabled. But that struct is
dereferenced in a few areas outside the #ifdef CONFIG_MODULES.

By adding some helper routines and moving code around a little,
events can be compiled again without modules.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:53 -04:00
Steven Rostedt 92edca073c tracing: Use direct field, type and system names
The names used to display the field and type in the event format
files are copied, as well as the system name that is displayed.

All these names are created by constant values passed in.
If one of theses values were to be removed by a module, the module
would also be required to remove any event it created.

By using the strings directly, we can save over 100K of memory.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:47 -04:00
Steven Rostedt d1a291437f tracing: Use kmem_cache_alloc instead of kmalloc in trace_events.c
The event structures used by the trace events are mostly persistent,
but they are also allocated by kmalloc, which is not the best at
allocating space for what is used. By converting these kmallocs
into kmem_cache_allocs, we can save over 50K of space that is
permanently allocated.

After boot we have:

 slab name          active allocated size
 ---------          ------ --------- ----
ftrace_event_file    979   1005     56   67    1
ftrace_event_field   2301   2310     48   77    1

The ftrace_event_file has at boot up 979 active objects out of
1005 allocated in the slabs. Each object is 56 bytes. In a normal
kmalloc, that would allocate 64 bytes for each object.

 1005 - 979  = 26 objects not used
 26 * 56 = 1456 bytes wasted

But if we used kmalloc:

 64 - 56 = 8 bytes unused per allocation
 8 * 979 = 7832 bytes wasted

 7832 - 1456 = 6376 bytes in savings

Doing the same for ftrace_event_field where there's 2301 objects
allocated in a slab that can hold 2310 with 48 bytes each we have:

 2310 - 2301 = 9 objects not used
 9 * 48 = 432 bytes wasted

A kmalloc would also use 64 bytes per object:

 64 - 48 = 16 bytes unused per allocation
 16 * 2301 = 36816 bytes wasted!

 36816 - 432 = 36384 bytes in savings

This change gives us a total of 42760 bytes in savings. At least
on my machine, but as there's a lot of these persistent objects
for all configurations that use trace points, this is a net win.

Thanks to Ezequiel Garcia for his trace_analyze presentation which
pointed out the wasted space in my code.

Cc: Ezequiel Garcia <elezegarcia@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:46 -04:00
Steven Rostedt 772482216f tracing: Get trace_events kernel command line working again
With the new descriptors used to allow multiple buffers in the
tracing directory added, the kernel command line parameter
trace_events=... no longer works. This is because the top level
(global) trace array now has a list of descriptors associated
with the events and the files in the debugfs directory. But in
early bootup, when the command line is processed and the events
enabled, the trace array list of events has not been set up yet.

Without the list of events in the trace array, the setting of
events to record will fail because it would not match any events.

The solution is to set up the top level array in two stages.
The first is to just add the ftrace file descriptors that just point
to the events. This will allow events to be enabled and start tracing.
The second stage is called after the filesystem is set up, and this
stage will create the debugfs event files and directories associated
with the trace array events.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:46 -04:00
Steven Rostedt 0c8916c342 tracing: Add rmdir to remove multibuffer instances
Add a method to the hijacked dentry descriptor of the
"instances" directory to allow for rmdir to remove an
instance of a multibuffer.

Example:

  cd /debug/tracing/instances
  mkdir hello
  ls
hello/
  rmdir hello
  ls

Like the mkdir method, the i_mutex is dropped for the instances
directory. The instances directory is created at boot up and can
not be renamed or removed. The trace_types_lock mutex is used to
synchronize adding and removing of instances.

I've run several stress tests with different threads trying to
create and delete directories of the same name, and it has stood
up fine.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:45 -04:00
Steven Rostedt 277ba04461 tracing: Add interface to allow multiple trace buffers
Add the interface ("instances" directory) to add multiple buffers
to ftrace. To create a new instance, simply do a mkdir in the
instances directory:

This will create a directory with the following:

 # cd instances
 # mkdir foo
 # ls foo
buffer_size_kb        free_buffer  trace_clock    trace_pipe
buffer_total_size_kb  set_event    trace_marker   tracing_enabled
events/               trace        trace_options  tracing_on

Currently only events are able to be set, and there isn't a way
to delete a buffer when one is created (yet).

Note, the i_mutex lock is dropped from the parent "instances"
directory during the mkdir operation. As the "instances" directory
can not be renamed or deleted (created on boot), I do not see
any harm in dropping the lock. The creation of the sub directories
is protected by trace_types_lock mutex, which only lets one
instance get into the code path at a time. If two tasks try to
create or delete directories of the same name, only one will occur
and the other will fail with -EEXIST.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:44 -04:00
Steven Rostedt ae63b31e4d tracing: Separate out trace events from global variables
The trace events for ftrace are all defined via global variables.
The arrays of events and event systems are linked to a global list.
This prevents multiple users of the event system (what to enable and
what not to).

By adding descriptors to represent the event/file relation, as well
as to which trace_array descriptor they are associated with, allows
for more than one set of events to be defined. Once the trace events
files have a link between the trace event and the trace_array they
are associated with, we can create multiple trace_arrays that can
record separate events in separate buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-03-15 00:34:40 -04:00
Steven Rostedt b000c8065a tracing: Remove the extra 4 bytes of padding in events
Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.

 (commit e6e1e2593 "tracing: Remove lock_depth from event entry")

Because this broke userspace, it was determined that we must
keep those 4 bytes around.

 (commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")

This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.

It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.

With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.

As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21 21:05:41 -05:00
Steven Rostedt 0d5c6e1c19 tracing: Use irq_work for wake ups and remove *_nowake_*() functions
Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).

With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:52 -04:00
Steven Rostedt 15075cac42 tracing: Separate open function from set_event and available_events
The open function used by available_events is the same as set_event even
though it uses different seq functions. This causes a side effect of
writing into available_events clearing all events, even though
available_events is suppose to be read only.

There's no reason to keep a single function for just the open and have
both use different functions for everything else. It is a little
confusing and causes strange behavior. Just have each have their own
function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-02 10:21:49 -04:00
Steven Rostedt 81698831bc tracing: Enable comm recording if trace_printk() is used
If comm recording is not enabled when trace_printk() is used then
you just get this type of output:

[ adding trace_printk("hello! %d", irq); in do_IRQ ]

           <...>-2843  [001] d.h.    80.812300: do_IRQ: hello! 14
           <...>-2734  [002] d.h2    80.824664: do_IRQ: hello! 14
           <...>-2713  [003] d.h.    80.829971: do_IRQ: hello! 14
           <...>-2814  [000] d.h.    80.833026: do_IRQ: hello! 14

By enabling the comm recorder when trace_printk is enabled:

       hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
            sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
          <idle>-0     [001] d.h1   193.665996: do_IRQ: hello! 21

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-31 16:45:29 -04:00