Commit Graph

305 Commits

Author SHA1 Message Date
Steven Rostedt (Red Hat) babe3fce95 ring-buffer: Froce rb_update_write_stamp() to be inlined
The function rb_update_write_stamp() is in the hotpath of the ring buffer
recording. Make sure that it is inlined as well. There's not many places
that call it.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:38:39 -05:00
Steven Rostedt (Red Hat) 2289d5672f ring-buffer: Force inline of hotpath helper functions
There's several small helper functions in ring_buffer.c that are used in the
hot path. For some reason, even though they are marked inline, gcc tends not
to enforce it. Make sure these functions are always inlined.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 20:35:32 -05:00
Steven Rostedt (Red Hat) 929ddbf3ef ring-buffer: Always inline rb_event_data()
The rb_event_data() is the fast path of getting the ring buffer data from an
event. Externally, ring_buffer_event_data() is used to access this function.
But unfortunately, rb_event_data() is not inlined, and calling
ring_buffer_event_data() causes that function to be called again. Force
rb_event_data() to be inlined to lower the number of operations needed when
calling ring_buffer_event_data().

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 11:40:34 -05:00
Steven Rostedt (Red Hat) fa7ffb39ef ring-buffer: Make rb_reserve_next_event() always inlined
The function rb_reserved_next_event() is called by two functions:
ring_buffer_lock_reserve() and ring_buffer_write(). This is in a very hot
path of the tracing code, and it is best that they are not functions. The
two callers are basically wrapers for rb_reserver_next_event(). Removing the
function calls can save execution time in the hotpath of tracing.

Link: http://lkml.kernel.org/r/20161121183700.GW26852@two.firstfloor.org

Reported-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-11-23 11:36:30 -05:00
Steven Rostedt (Red Hat) 59643d1535 ring-buffer: Prevent overflow of size in ring_buffer_resize()
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.

Here's the details:

  # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb

tracing_entries_write() processes this and converts kb to bytes.

 18014398509481980 << 10 = 18446744073709547520

and this is passed to ring_buffer_resize() as unsigned long size.

 size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);

Where DIV_ROUND_UP(a, b) is (a + b - 1)/b

BUF_PAGE_SIZE is 4080 and here

 18446744073709547520 + 4080 - 1 = 18446744073709551599

where 18446744073709551599 is still smaller than 2^64

 2^64 - 18446744073709551599 = 17

But now 18446744073709551599 / 4080 = 4521260802379792

and size = size * 4080 = 18446744073709551360

This is checked to make sure its still greater than 2 * 4080,
which it is.

Then we convert to the number of buffer pages needed.

 nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)

but this time size is 18446744073709551360 and

 2^64 - (18446744073709551360 + 4080 - 1) = -3823

Thus it overflows and the resulting number is less than 4080, which makes

  3823 / 4080 = 0

an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.

There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.

Cc: stable@vger.kernel.org # 3.5+
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13 16:44:20 -04:00
Steven Rostedt (Red Hat) 9b94a8fba5 ring-buffer: Use long for nr_pages to avoid overflow failures
The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.

For example, the following will cause the ring buffer to crash:

 # cd /sys/kernel/debug/tracing
 # echo 10 > buffer_size_kb
 # echo 8556384240 > buffer_size_kb

Then you get the warning of:

 WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260

Which is:

  RB_WARN_ON(cpu_buffer, nr_removed);

Note each ring buffer page holds 4080 bytes.

This is because:

 1) 10 causes the ring buffer to have 3 pages.
    (10kb requires 3 * 4080 pages to hold)

 2) (2^31 / 2^10  + 1) * 4080 = 8556384240
    The value written into buffer_size_kb is shifted by 10 and then passed
    to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760

 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
    which is 4080. 8761737461760 / 4080 = 2147484672

 4) nr_pages is subtracted from the current nr_pages (3) and we get:
    2147484669. This value is saved in a signed integer nr_pages_to_update

 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
    turns into the value of -2147482627

 6) As the value is a negative number, in update_pages_handler() it is
    negated and passed to rb_remove_pages() and 2147482627 pages will
    be removed, which is much larger than 3 and it causes the warning
    because not all the pages asked to be removed were removed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: 7a8e76a382 ("tracing: unified trace buffer")
Reported-by: Hao Qin <QEver.cn@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13 11:12:20 -04:00
Steven Rostedt (Red Hat) 4239c38fe0 ring-buffer: Process commits whenever moving to a new page.
When crossing over to a new page, commit the current work. This will allow
readers to get data with less latency, and also simplifies the work to get
timestamps working for interrupted events.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-25 15:24:05 -05:00
Steven Rostedt (Red Hat) 70004986ff ring-buffer: Remove redundant update of page timestamp
The first commit of a buffer page updates the timestamp of that page. No
need to have the update to the next page add the timestamp too. It will only
be replaced by the first commit on that page anyway.

Only update to a page if it contains an event.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:29:16 -05:00
Steven Rostedt (Red Hat) 8573636ea7 ring-buffer: Use READ_ONCE() for most tail_page access
As cpu_buffer->tail_page may be modified by interrupts at almost any time,
the flow of logic is very important. Do not let gcc get smart with
re-reading cpu_buffer->tail_page by adding READ_ONCE() around most of its
accesses.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:29:15 -05:00
Steven Rostedt (Red Hat) bd1b7cd360 ring-buffer: Put back the length if crossed page with add_timestamp
Commit fcc742eaad "ring-buffer: Add event descriptor to simplify passing
data" added a descriptor that holds various data instead of passing around
several variables through parameters. The problem was that one of the
parameters was modified in a function and the code was designed not to have
an effect on that modified  parameter. Now that the parameter is a
descriptor and any modifications to it are non-volatile, the size of the
data could be unnecessarily expanded.

Remove the extra space added if a timestamp was added and the event went
across the page.

Cc: stable@vger.kernel.org # 4.3+
Fixes: fcc742eaad "ring-buffer: Add event descriptor to simplify passing data"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:27:25 -05:00
Steven Rostedt (Red Hat) b81f472a20 ring-buffer: Update read stamp with first real commit on page
Do not update the read stamp after swapping out the reader page from the
write buffer. If the reader page is swapped out of the buffer before an
event is written to it, then the read_stamp may get an out of date
timestamp, as the page timestamp is updated on the first commit to that
page.

rb_get_reader_page() only returns a page if it has an event on it, otherwise
it will return NULL. At that point, check if the page being returned has
events and has not been read yet. Then at that point update the read_stamp
to match the time stamp of the reader page.

Cc: stable@vger.kernel.org # 2.6.30+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24 09:23:17 -05:00
Yaowei Bai cdb2a0a915 ring-buffer: rb_event_is_commit() can return boolean
Make rb_event_is_commit() return bool to improve readability
due to this particular function only using either one or zero as its
return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-7-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 14:25:29 -05:00
Yaowei Bai da58834cf2 ring-buffer: rb_per_cpu_empty() can return boolean
Makes rb_per_cpu_empty() return bool to improve readability.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-6-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 14:24:27 -05:00
Yaowei Bai 3d4e204d81 ring_buffer: ring_buffer_empty{cpu}() can return boolean
Make ring_buffer_empty() and ring_buffer_empty_cpu() return bool.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-5-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 14:23:38 -05:00
Yaowei Bai 06ca320952 ring-buffer: rb_is_reader_page() can return boolean
Make rb_is_reader_page() return bool to improve readability due to this
particular function only using either true or false as its return value.

No functional change.

Link: http://lkml.kernel.org/r/1443537816-5788-4-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-02 14:23:20 -05:00
Steven Rostedt (Red Hat) b7dc42fd79 ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
The commit a4543a2fa9 "ring-buffer: Get timestamp after event is
allocated" is needed for some future work. But after adding it, there is a
race somewhere that causes the saved timestamp to have a slight shift, and
get ahead of the actual timestamp and make it look like time goes backwards.

I'm still looking into why this happens, but in the mean time, this is
holding up other work to get in. I'm reverting the change for now (which
makes the problem go away), and will add it back after I know what is wrong
and fix it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-03 08:57:12 -04:00
Steven Rostedt (Red Hat) d90fd77402 ring-buffer: Reorganize function locations
Functions in ring-buffer.c have gotten interleaved between different
use cases. Move the functions around to get like functions closer
together. This may or may not help gcc keep cache locality, but it
makes it a little easier to work with the code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:49 -04:00
Steven Rostedt (Red Hat) 7d75e6833b ring-buffer: Make sure event has enough room for extend and padding
Now that events only add time extends after it is committed, in case
an event comes in before it can discard the allocated event, the time
extend needs to be stored within the event. If the event is bigger
than then size needed for the time extend, padding must be added.
The minimum padding size is 8 bytes. Thus if the event is 12 bytes
(size of time extend + 4), there will not be enough room to add both
the time extend and padding. Make sure all events are either 8 bytes
or 16 or more bytes.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:49 -04:00
Steven Rostedt (Red Hat) a4543a2fa9 ring-buffer: Get timestamp after event is allocated
Move the capturing of the timestamp to after an event is allocated.
If the event is not a commit (where it is an event that preempted
another event), then no timestamp is needed, because the delta of
nested events is always zero.

If the event starts on a new page, no delta needs to be calculated
as the full timestamp will be added to the page header, and the
event will have a delta of zero.

Now if the event requires a time extend (the delta does not fit
in the 27 bit delta slot in the header), then the event is discarded,
the length is extended to hold the TIME_EXTEND event that allows for
a 59 bit delta, and the commit is tried again.

If the event can't be discarded (another event came in after it),
then the TIME_EXTEND is added directly to the allocated event and
the rest of the event is given padding.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:48 -04:00
Steven Rostedt (Red Hat) 9826b2733a ring-buffer: Move the adding of the extended timestamp out of line
Requiring a extended time stamp is an uncommon occurrence, and it is
best to do it out of line when needed.

Add a noinline function that handles the extended timestamp and
have it called with an unlikely to completely move it out of the
fast path.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:47 -04:00
Steven Rostedt (Red Hat) fcc742eaad ring-buffer: Add event descriptor to simplify passing data
Add rb_event_info descriptor to pass event info to functions a bit
easier than using a bunch of parameters. This will also allow for
changing the code around a bit to find better fast paths.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-07-20 22:30:46 -04:00
Steven Rostedt (Red Hat) a497adb45b ring-buffer: Add enum names for the context levels
Instead of having hard coded numbers for the context levels, use
enums to describe them more.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-29 10:39:08 -04:00
Steven Rostedt (Red Hat) 3c6296f716 ring-buffer: Remove useless unused tracing_off_permanent()
The tracing_off_permanent() call is a way to disable all ring_buffers.
Nothing uses it and nothing should use it, as tracing_off() and
friends are better, as they disable the ring buffers related to
tracing. The tracing_off_permanent() even disabled non tracing
ring buffers. This is a bit drastic, and was added to handle NMIs
doing outputs that could corrupt the ring buffer when only tracing
used them. It is now obsolete and adds a little overhead, it should
be removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-28 16:47:39 -04:00
Steven Rostedt (Red Hat) 289a5a25c5 ring-buffer: Give NMIs a chance to lock the reader_lock
Currently, if an NMI does a dump of a ring buffer, it disables
all ring buffers from ever doing any writes again. This is because
it wont take the locks for the cpu_buffer and this can cause
corruption if it preempted a read, or a read happens on another
CPU for the current cpu buffer. This is a bit overkill.

First, it should at least try to take the lock, and if it fails
then disable it. Also, there's no need to disable all ring
buffers, even those that are unrelated to what is being read.
Only disable the per cpu ring buffer that is being read if
it can not get the lock for it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-28 16:47:01 -04:00
Steven Rostedt (Red Hat) 985e871b28 ring-buffer: Add trace_recursive checks to ring_buffer_write()
The ring_buffer_write() function isn't protected by the trace recursive
writes. Luckily, this function is not used as much and is unlikely
to ever recurse. But it should still have the protection, because
even a call to ring_buffer_lock_reserve() could cause ring buffer
corruption if called when ring_buffer_write() is being used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-05-27 10:48:56 -04:00