Commit Graph

350 Commits

Author SHA1 Message Date
Linus Torvalds
d2d8b14604 Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
 "The major changes in this tracing update includes:

   - Removal of non-DYNAMIC_FTRACE from 32bit x86

   - Removal of mcount support from x86

   - Emulating a call from int3 on x86_64, fixes live kernel patching

   - Consolidated Tracing Error logs file

  Minor updates:

   - Removal of klp_check_compiler_support()

   - kdb ftrace dumping output changes

   - Accessing and creating ftrace instances from inside the kernel

   - Clean up of #define if macro

   - Introduction of TRACE_EVENT_NOP() to disable trace events based on
     config options

  And other minor fixes and clean ups"

* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
  x86: Hide the int3_emulate_call/jmp functions from UML
  livepatch: Remove klp_check_compiler_support()
  ftrace/x86: Remove mcount support
  ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
  tracing: Simplify "if" macro code
  tracing: Fix documentation about disabling options using trace_options
  tracing: Replace kzalloc with kcalloc
  tracing: Fix partial reading of trace event's id file
  tracing: Allow RCU to run between postponed startup tests
  tracing: Fix white space issues in parse_pred() function
  tracing: Eliminate const char[] auto variables
  ring-buffer: Fix mispelling of Calculate
  tracing: probeevent: Fix to make the type of $comm string
  tracing: probeevent: Do not accumulate on ret variable
  tracing: uprobes: Re-enable $comm support for uprobe events
  ftrace/x86_64: Emulate call function while updating in breakpoint handler
  x86_64: Allow breakpoints to emulate call instructions
  x86_64: Add gap to int3 to allow for call emulation
  tracing: kdb: Allow ftdump to skip all but the last few entries
  tracing: Add trace_total_entries() / trace_total_entries_cpu()
  ...
2019-05-15 16:05:47 -07:00
Peter Zijlstra
d6097c9e44 trace: Fix preempt_enable_no_resched() abuse
Unless the very next line is schedule(), or implies it, one must not use
preempt_enable_no_resched(). It can cause a preemption to go missing and
thereby cause arbitrary delays, breaking the PREEMPT=y invariant.

Link: http://lkml.kernel.org/r/20190423200318.GY14281@hirez.programming.kicks-ass.net

Cc: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: the arch/x86 maintainers <x86@kernel.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: huang ying <huang.ying.caritas@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: stable@vger.kernel.org
Fixes: 2c2d7329d8 ("tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-26 11:45:03 -04:00
YueHaibing
40ed29b373 ring-buffer: Fix ring buffer size in rb_write_something()
'cnt' should be used to calculate ring buffer size rather than data->cnt

Link: http://lkml.kernel.org/r/1537704693-184237-1-git-send-email-yuehaibing@huawei.com

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-04-02 18:24:06 -04:00
Douglas Anderson
31b265b3ba tracing: kdb: Fix ftdump to not sleep
As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
BUG for "sleeping function called from invalid context".

kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
atomic context.  A very simple solution for this is to add allocation
flags to ring_buffer_read_prepare() so kdb can call it without
triggering the allocation error.  This patch does that.

Note that in the original email thread about this, it was suggested
that perhaps the solution for kdb was to either preallocate the buffer
ahead of time or create our own iterator.  I'm hoping that this
alternative of adding allocation flags to ring_buffer_read_prepare()
can be considered since it means I don't need to duplicate more of the
core trace code into "trace_kdb.c" (for either creating my own
iterator or re-preparing a ring allocator whose memory was already
allocated).

NOTE: another option for kdb is to actually figure out how to make it
reuse the existing ftrace_dump() function and totally eliminate the
duplication.  This sounds very appealing and actually works (the "sr
z" command can be seen to properly dump the ftrace buffer).  The
downside here is that ftrace_dump() fully consumes the trace buffer.
Unless that is changed I'd rather not use it because it means "ftdump
| grep xyz" won't be very useful to search the ftrace buffer since it
will throw away the whole trace on the first grep.  A future patch to
dump only the last few lines of the buffer will also be hard to
implement.

[1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com

Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.org

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-13 09:46:10 -04:00
Miroslav Benes
d325c40296 ring-buffer: Remove unused function ring_buffer_page_len()
Commit 6b7e633fe9 ("tracing: Remove extra zeroing out of the ring
buffer page") removed the only caller of ring_buffer_page_len(). The
function is now unused and may be removed.

Link: http://lkml.kernel.org/r/20181228133847.106177-1-mbenes@suse.cz

Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-02-06 11:58:33 -05:00
Linus Torvalds
495d714ad1 Merge tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:

 - Rework of the kprobe/uprobe and synthetic events to consolidate all
   the dynamic event code. This will make changes in the future easier.

 - Partial rewrite of the function graph tracing infrastructure. This
   will allow for multiple users of hooking onto functions to get the
   callback (return) of the function. This is the ground work for having
   kprobes and function graph tracer using one code base.

 - Clean up of the histogram code that will facilitate adding more
   features to the histograms in the future.

 - Addition of str_has_prefix() and a few use cases. There currently is
   a similar function strstart() that is used in a few places, but only
   returns a bool and not a length. These instances will be removed in
   the future to use str_has_prefix() instead.

 - A few other various clean ups as well.

* tag 'trace-v4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits)
  tracing: Use the return of str_has_prefix() to remove open coded numbers
  tracing: Have the historgram use the result of str_has_prefix() for len of prefix
  tracing: Use str_has_prefix() instead of using fixed sizes
  tracing: Use str_has_prefix() helper for histogram code
  string.h: Add str_has_prefix() helper function
  tracing: Make function ‘ftrace_exports’ static
  tracing: Simplify printf'ing in seq_print_sym
  tracing: Avoid -Wformat-nonliteral warning
  tracing: Merge seq_print_sym_short() and seq_print_sym_offset()
  tracing: Add hist trigger comments for variable-related fields
  tracing: Remove hist trigger synth_var_refs
  tracing: Use hist trigger's var_ref array to destroy var_refs
  tracing: Remove open-coding of hist trigger var_ref management
  tracing: Use var_refs[] for hist trigger reference checking
  tracing: Change strlen to sizeof for hist trigger static strings
  tracing: Remove unnecessary hist trigger struct field
  tracing: Fix ftrace_graph_get_ret_stack() to use task and not current
  seq_buf: Use size_t for len in seq_buf_puts()
  seq_buf: Make seq_buf_puts() null-terminate the buffer
  arm64: Use ftrace_graph_get_ret_stack() instead of curr_ret_stack
  ...
2018-12-31 11:46:59 -08:00
Steven Rostedt (VMware)
03329f9939 tracing: Add tracefs file buffer_percentage
Add a "buffer_percentage" file, that allows users to specify how much of the
buffer (percentage of pages) need to be filled before waking up a task
blocked on a per cpu trace_pipe_raw file.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08 20:54:08 -05:00
Steven Rostedt (VMware)
2c2b0a78b3 ring-buffer: Add percentage of ring buffer full to wake up reader
Instead of just waiting for a page to be full before waking up a pending
reader, allow the reader to pass in a "percentage" of pages that have
content before waking up a reader. This should help keep the process of
reading the events not cause wake ups that constantly cause reading of the
buffer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-12-08 20:54:08 -05:00
Paul E. McKenney
7440172974 tracing: Replace synchronize_sched() and call_rcu_sched()
Now that synchronize_rcu() waits for preempt-disable regions of code
as well as RCU read-side critical sections, synchronize_sched() can
be replaced by synchronize_rcu().  Similarly, call_rcu_sched() can be
replaced by call_rcu().  This commit therefore makes these changes.

Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <linux-kernel@vger.kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-11-27 09:21:41 -08:00
Vaibhav Nagarnaik
83f365554e ring-buffer: Allow for rescheduling when removing pages
When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.

After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.

Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com

Cc: stable@vger.kernel.org
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer <jbehmer@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-09-17 18:15:11 -04:00
Steven Rostedt (VMware)
bcea3f96e1 tracing: Add SPDX License format tags to tracing files
Add the SPDX License header to ease license compliance management.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-16 19:08:06 -04:00
Steven Rostedt (VMware)
d7224c0e12 ring-buffer: Make ring_buffer_record_is_set_on() return bool
The value of ring_buffer_record_is_set_on() is either true or false, so have
its return value be bool.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 21:09:50 -04:00
Steven Rostedt (VMware)
3ebea280d7 ring-buffer: Make ring_buffer_record_is_on() return bool
The value of ring_buffer_record_is_on() is either true or false, so have its
return value be bool.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-08-01 21:08:30 -04:00
Masami Hiramatsu
73c8d89455 ring_buffer: tracing: Inherit the tracing setting to next ring buffer
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.

Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:

  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 0 > tracing_on
  /sys/kernel/debug/tracing # cat tracing_on
  0
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  1
  /sys/kernel/debug/tracing # echo 1 > snapshot
  /sys/kernel/debug/tracing # cat tracing_on
  0

We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.

Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Cc: stable@vger.kernel.org
Fixes: debdd57f51 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-07-25 10:29:41 -04:00
Steven Rostedt (VMware)
6167c205ca ring-buffer: Fix a bunch of typos in comments
An anonymous source sent me a bunch of typo fixes in the comments of
ring_buffer.c file. That source did not want to be associated to this patch
because they don't want to be known as "one of those" commiters (you know who
you are!). They gave me permission to sign this off in my own name.

Suggested-by: One-of-those-commiters@YouKnowWhoYouAre.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-06-04 17:28:19 -04:00
Steven Rostedt (VMware)
927e56db62 ring-buffer: Add set/clear_current_oom_origin() during allocations
As si_mem_available() can say there is enough memory even though the memory
available is not useable by the ring buffer, it is best to not kill innocent
applications because the ring buffer is taking up all the memory while it is
trying to allocate a great deal of memory.

If the allocator is user space (because kernel threads can also increase the
size of the kernel ring buffer on boot up), then after si_mem_available()
says there is enough memory, set the OOM killer to kill the current task if
an OOM triggers during the allocation.

Link: http://lkml.kernel.org/r/20180404062340.GD6312@dhcp22.suse.cz

Suggested-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06 08:56:52 -04:00
Steven Rostedt (VMware)
2a872fa4e9 ring-buffer: Check if memory is available before allocation
The ring buffer is made up of a link list of pages. When making the ring
buffer bigger, it will allocate all the pages it needs before adding to the
ring buffer, and if it fails, it frees them and returns an error. This makes
increasing the ring buffer size an all or nothing action. When this was
first created, the pages were allocated with "NORETRY". This was to not
cause any Out-Of-Memory (OOM) actions from allocating the ring buffer. But
NORETRY was too strict, as the ring buffer would fail to expand even when
there's memory available, but was taken up in the page cache.

Commit 848618857d ("tracing/ring_buffer: Try harder to allocate") changed
the allocating from NORETRY to RETRY_MAYFAIL. The RETRY_MAYFAIL would
allocate from the page cache, but if there was no memory available, it would
simple fail the allocation and not trigger an OOM.

This worked fine, but had one problem. As the ring buffer would allocate one
page at a time, it could take up all memory in the system before it failed
to allocate and free that memory. If the allocation is happening and the
ring buffer allocates all memory and then tries to take more than available,
its allocation will not trigger an OOM, but if there's any allocation that
happens someplace else, that could trigger an OOM, even though once the ring
buffer's allocation fails, it would free up all the previous memory it tried
to allocate, and allow other memory allocations to succeed.

Commit d02bd27bd3 ("mm/page_alloc.c: calculate 'available' memory in a
separate function") separated out si_mem_availble() as a separate function
that could be used to see how much memory is available in the system. Using
this function to make sure that the ring buffer could be allocated before it
tries to allocate pages we can avoid allocating all memory in the system and
making it vulnerable to OOMs if other allocations are taking place.

Link: http://lkml.kernel.org/r/1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com

CC: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Fixes: 848618857d ("tracing/ring_buffer: Try harder to allocate")
Requires: d02bd27bd3 ("mm/page_alloc.c: calculate 'available' memory in a separate function")
Reported-by: Zhaoyang Huang <huangzhaoyang@gmail.com>
Tested-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06 08:56:52 -04:00
Chris Wilson
913ea4d0b1 tracing: Mention trace_clock=global when warning about unstable clocks
Mention the alternative of adding trace_clock=global to the kernel
command line when we detect that we've used an unstable clock across a
suspend/resume cycle.

Link: http://lkml.kernel.org/r/20180330150132.16903-2-chris@chris-wilson.co.uk

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-06 08:56:47 -04:00
Steven Rostedt (VMware)
8e012066fe ring-buffer: Add nesting for adding events within events
The ring-buffer code has recusion protection in case tracing ends up tracing
itself, the ring-buffer will detect that it was called at the same context
(normal, softirq, interrupt or NMI), and not continue to record the event.

With the histogram synthetic events, they are called while tracing another
event at the same context. The recusion protection triggers because it
detects tracing at the same context and stops it.

Add ring_buffer_nest_start() and ring_buffer_nest_end() that will notify the
ring buffer that a trace is about to happen within another trace and that it
is intended, and not to trigger the recursion blocking.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:06:04 -05:00
Tom Zanussi
dc4e2801d4 ring-buffer: Redefine the unimplemented RINGBUF_TYPE_TIME_STAMP
RINGBUF_TYPE_TIME_STAMP is defined but not used, and from what I can
gather was reserved for something like an absolute timestamp feature
for the ring buffer, if not a complete replacement of the current
time_delta scheme.

This code redefines RINGBUF_TYPE_TIME_STAMP to implement absolute time
stamps.  Another way to look at it is that it essentially forces
extended time_deltas for all events.

The motivation for doing this is to enable time_deltas that aren't
dependent on previous events in the ring buffer, making it feasible to
use the ring_buffer_event timetamps in a more random-access way, for
purposes other than serial event printing.

To set/reset this mode, use tracing_set_timestamp_abs() from the
previous interface patch.

Link: http://lkml.kernel.org/r/477b362dba1ce7fab9889a1a8e885a62c472f041.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:05:50 -05:00
Tom Zanussi
00b4145298 ring-buffer: Add interface for setting absolute time stamps
Define a new function, tracing_set_time_stamp_abs(), which can be used
to enable or disable the use of absolute timestamps rather than time
deltas for a trace array.

Only the interface is added here; a subsequent patch will add the
underlying implementation.

Link: http://lkml.kernel.org/r/ce96119de44c7fe0ee44786d15254e9b493040d3.1516069914.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-03-10 16:05:50 -05:00
Linus Torvalds
a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Steven Rostedt (VMware)
0164e0d7e8 ring-buffer: Fix duplicate results in mapping context to bits in recursive lock
In bringing back the context checks, the code checks first if its normal
(non-interrupt) context, and then for NMI then IRQ then softirq. The final
check is redundant. Since the if branch is only hit if the context is one of
NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if
it is SOFTIRQ. The current code returns the same result even if its not a
SOFTIRQ. Which is confusing.

  pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ

Is redundant as RB_CTX_SOFTIRQ *is* 2!

Fixes: a0e3a18f4b ("ring-buffer: Bring back context level recursive checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-18 15:45:48 -05:00
Steven Rostedt (VMware)
a0e3a18f4b ring-buffer: Bring back context level recursive checks
Commit 1a149d7d3f ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
simpler") replaced the context level recursion checks with a simple counter.
This would prevent the ring buffer code from recursively calling itself more
than the max number of contexts that exist (Normal, softirq, irq, nmi). But
this change caused a lockup in a specific case, which was during suspend and
resume using a global clock. Adding a stack dump to see where this occurred,
the issue was in the trace global clock itself:

  trace_buffer_lock_reserve+0x1c/0x50
  __trace_graph_entry+0x2d/0x90
  trace_graph_entry+0xe8/0x200
  prepare_ftrace_return+0x69/0xc0
  ftrace_graph_caller+0x78/0xa8
  queued_spin_lock_slowpath+0x5/0x1d0
  trace_clock_global+0xb0/0xc0
  ring_buffer_lock_reserve+0xf9/0x390

The function graph tracer traced queued_spin_lock_slowpath that was called
by trace_clock_global. This pointed out that the trace_clock_global() is not
reentrant, as it takes a spin lock. It depended on the ring buffer recursive
lock from letting that happen.

By removing the context detection and adding just a max number of allowable
recursions, it allowed the trace_clock_global() to be entered again and try
to retake the spinlock it already held, causing a deadlock.

Fixes: 1a149d7d3f ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
Reported-by: David Weinehall <david.weinehall@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-01-15 12:28:06 -05:00