We get a lot of harmless warnings about this header file at W=1 level
because of an unusual function declaration:
kernel/trace/trace.h:766:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]
This moves the inline statement where it normally belongs, avoiding the
warning.
Link: http://lkml.kernel.org/r/20170123122521.3389010-1-arnd@arndb.de
Fixes: 4046bf023b ("ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The static_key->next field goes mostly unused. The field is used for
associating module uses with a static key. Most uses of struct static_key
define a static key in the core kernel and make use of it entirely within
the core kernel, or define the static key in a module and make use of it
only from within that module. In fact, of the ~3,000 static keys defined,
I found only about 5 or so that did not fit this pattern.
Thus, we can remove the static_key->next field entirely and overload
the static_key->entries field. That is, when all the static_key uses
are contained within the same module, static_key->entries continues
to point to those uses. However, if the static_key uses are not contained
within the module where the static_key is defined, then we allocate a
struct static_key_mod, store a pointer to the uses within that
struct static_key_mod, and have the static key point at the static_key_mod.
This does incur some extra memory usage when a static_key is used in a
module that does not define it, but since there are only a handful of such
cases there is a net savings.
In order to identify if the static_key->entries pointer contains a
struct static_key_mod or a struct jump_entry pointer, bit 1 of
static_key->entries is set to 1 if it points to a struct static_key_mod and
is 0 if it points to a struct jump_entry. We were already using bit 0 in a
similar way to store the initial value of the static_key. This does mean
that allocations of struct static_key_mod and that the struct jump_entry
tables need to be at least 4-byte aligned in memory. As far as I can tell
all arches meet this criteria.
For my .config, the patch increased the text by 778 bytes, but reduced
the data + bss size by 14912, for a net savings of 14,134 bytes.
text data bss dec hex filename
8092427 5016512 790528 13899467 d416cb vmlinux.pre
8093205 5001600 790528 13885333 d3df95 vmlinux.post
Link: http://lkml.kernel.org/r/1486154544-4321-1-git-send-email-jbaron@akamai.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The timer flags in the timer_start trace event contain lots of useful
information, but the meaning is not clear in the trace output. Making tools
rely on the bit positions is bad as they might change over time.
Decode the flags in the print out. Tools can retrieve the bits and their
meaning from the trace format file.
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1702101639290.4036@nanos
Requested-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The code in traceprobe_probes_write() reads up to 4096 bytes from userpace
for each line. If userspace passes in several lines to execute, the code
will do a large read for each line, even though, it is highly likely that
the first read from userspace received all of the lines at once.
I changed the logic to do a single read from userspace, and to only read
from userspace again if not all of the read from userspace made it in.
I tested this by adding printk()s and writing files that would test -1, ==,
and +1 the buffer size, to make sure that there's no overflows and that if a
single line is written with +1 the buffer size, that it fails properly.
Link: http://lkml.kernel.org/r/20170209180458.5c829ab2@gandalf.local.home
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The GLOB operation "~" should be able to work with the COMM filter key in
order to trace programs with a glob. For example
echo 'COMM ~ "systemd*"' > events/syscalls/filter
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, only one function can be written to set_graph_function and
set_graph_notrace. The last function in the list will have saved, even
though other functions will be added then removed.
Change the behavior to be the same as set_ftrace_function as to allow
multiple functions to be written. If any one fails, none of them will be
added. The addition of the functions are done at the end when the file is
closed.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified
within the graph_lock being held. Holding a pointer to them and passing them
along can lead to a use of a stale pointer (fgd->hash). Move assigning the
pointer and its use to within the holding of the lock. Note, it's an
rcu_sched protected data, and other instances of referencing them are done
with preemption disabled. But the file manipuation code must be protected by
the lock.
The fgd->hash pointer is set to NULL when the lock is being released.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
trace_parser_put() simply frees the allocated parser buffer. But it does not
reset the pointer that was freed. This means that if trace_parser_put() is
called on the same parser more than once, it will corrupt the allocation
system. Setting parser->buffer to NULL after free allows it to be called
more than once without any ill effect.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since reading the set_graph_functions uses seq functions, which sets the
file->private_data pointer to a seq_file descriptor. On writes the
ftrace_graph_data descriptor is set to file->private_data. But if the file
is opened for RDWR, the ftrace_graph_write() will incorrectly use the
file->private_data descriptor instead of
((struct seq_file *)file->private_data)->private pointer, and this can crash
the kernel.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When the set_graph_function or set_graph_notrace contains no records, a
banner is displayed of either "#### all functions enabled ####" or
"#### all functions disabled ####" respectively. To tell the seq operations
to do this, (void *)1 is passed as a return value. Instead of using a
hardcoded meaningless variable, define it as a macro.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This is a micro-optimization, but as it has to deal with a fast path of the
function tracer, these optimizations can be noticed.
The ftrace_lookup_ip() returns true if the given ip is found in the hash. If
it's not found or the hash is NULL, it returns false. But there's some cases
that a NULL hash is a true, and the ftrace_hash_empty() is tested before
calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests
that first, that adds a few extra unneeded instructions in those cases.
A new static "always_inlined" function is created that does not perform the
hash empty test. This most only be used by callers that do the check first
anyway, as an empty or NULL hash could cause a crash if a lookup is
performed on it.
Also add kernel doc for the ftrace_lookup_ip() main function.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Replace the couple of use cases that has small logic to produce the ftrace
function key id with a helper function. No need for duplicate code.
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Use ftrace_hash instead of a static array of a fixed size. This is
useful when a graph filter pattern matches to a large number of
functions. Now hash lookup is done with preemption disabled to protect
from the hash being changed/freed.
Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The unlikely/likely branch profiler now gets called even if the if statement
is a constant (always goes in one direction without a compare). Add a value
to denote this in the likely/unlikely tracer as well.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Now that constants are traced, it is useful to see the number of constants
that are traced in the likely/unlikely profiler in order to know if they
should be ignored or not.
The likely/unlikely will display a number after the "correct" number if a
"constant" count exists.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When running the likely/unlikely profiler, one of the results did not look
accurate. It noted that the unlikely() in link_path_walk() was 100%
incorrect. When I added a trace_printk() to see what was happening there, it
became 80% correct! Looking deeper into what whas happening, I found that
gcc split that if statement into two paths. One where the if statement
became a constant, the other path a variable. The other path had the if
statement always hit (making the unlikely there, always false), but since
the #define unlikely() has:
#define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
Where constants are ignored by the branch profiler, the "constant" path
made by the compiler was ignored, even though it was hit 80% of the time.
By just passing the constant value to the __branch_check__() function and
tracing it out of line (as always correct, as likely/unlikely isn't a factor
for constants), then we get back the accurate readings of branches that were
optimized by gcc causing part of the execution to become constant.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Previously, `create_trace_uprobe` found the *first* occurence
of the ':' character when parsing `PATH:OFFSET` for a uprobe.
However, if the path contains a ':' character, then the function
would parse the path incorrectly. Even worse, if the path does not
exist, the subsequent call to `kern_path()` would set `ret` to
`ENOENT`, leading to very cryptic errno values in user space.
The fix is to find the *last* occurence of ':'.
How to repro:: The write fails with "No such file or directory", suggesting
incorrectly that the `uprobe_events` file does not exist.
$ mkdir testing && cd testing
$ cp /bin/bash .
$ cp /bin/bash ./bash:with:colon
$ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this works
$ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this doesn't
-bash: echo: write error: No such file or directory
With the patch:
$ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this still works
$ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this works now too!
$ cat /sys/kernel/debug/tracing/uprobe_events
p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006
p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006
Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.com
Signed-off-by: Kenny Yu <kennyyu@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull DAX updates from Dan Williams:
"The completion of Jan's DAX work for 4.10.
As I mentioned in the libnvdimm-for-4.10 pull request, these are some
final fixes for the DAX dirty-cacheline-tracking invalidation work
that was merged through the -mm, ext4, and xfs trees in -rc1. These
patches were prepared prior to the merge window, but we waited for
4.10-rc1 to have a stable merge base after all the prerequisites were
merged.
Quoting Jan on the overall changes in these patches:
"So I'd like all these 6 patches to go for rc2. The first three
patches fix invalidation of exceptional DAX entries (a bug which
is there for a long time) - without these patches data loss can
occur on power failure even though user called fsync(2). The other
three patches change locking of DAX faults so that ->iomap_begin()
is called in a more relaxed locking context and we are safe to
start a transaction there for ext4"
These have received a build success notification from the kbuild
robot, and pass the latest libnvdimm unit tests. There have not been
any -next releases since -rc1, so they have not appeared there"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ext4: Simplify DAX fault path
dax: Call ->iomap_begin without entry lock during dax fault
dax: Finish fault completely when loading holes
dax: Avoid page invalidation races and unnecessary radix tree traversals
mm: Invalidate DAX radix tree entries only if appropriate
ext2: Return BH_New buffers for zeroed blocks
Pull documentation fixes from Jonathan Corbet:
"Two small fixes:
- A merge error on my part broke the DocBook build. I've
requisitioned one of tglx's frozen sharks for appropriate
disciplinary action and resolved to be more careful about testing
the DocBook stuff as long as it's still around.
- Fix an error in unaligned-memory-access.txt"
* tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux:
Documentation/unaligned-memory-access.txt: fix incorrect comparison operator
docs: Fix build failure