mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
d27cb32e00eff60d8fa94de265a8d84c63ef059c
1331 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bb32500fb9 |
tracing: Have trace_event_file have ref counters
The following can crash the kernel:
# cd /sys/kernel/tracing
# echo 'p:sched schedule' > kprobe_events
# exec 5>>events/kprobes/sched/enable
# > kprobe_events
# exec 5>&-
The above commands:
1. Change directory to the tracefs directory
2. Create a kprobe event (doesn't matter what one)
3. Open bash file descriptor 5 on the enable file of the kprobe event
4. Delete the kprobe event (removes the files too)
5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.
But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes:
|
||
|
|
dcc4e5728e |
seq_buf: Introduce DECLARE_SEQ_BUF and seq_buf_str()
Solve two ergonomic issues with struct seq_buf; 1) Too much boilerplate is required to initialize: struct seq_buf s; char buf[32]; seq_buf_init(s, buf, sizeof(buf)); Instead, we can build this directly on the stack. Provide DECLARE_SEQ_BUF() macro to do this: DECLARE_SEQ_BUF(s, 32); 2) %NUL termination is fragile and requires 2 steps to get a valid C String (and is a layering violation exposing the "internals" of seq_buf): seq_buf_terminate(s); do_something(s->buffer); Instead, we can just return s->buffer directly after terminating it in the refactored seq_buf_terminate(), now known as seq_buf_str(): do_something(seq_buf_str(s)); Link: https://lore.kernel.org/linux-trace-kernel/20231027155634.make.260-kees@kernel.org Link: https://lore.kernel.org/linux-trace-kernel/20231026194033.it.702-kees@kernel.org/ Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Justin Stitt <justinstitt@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yun Zhou <yun.zhou@windriver.com> Cc: Jacob Keller <jacob.e.keller@intel.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
d0ed46b603 |
tracing: Move readpos from seq_buf to trace_seq
To make seq_buf more lightweight as a string buf, move the readpos member from seq_buf to its container, trace_seq. That puts the responsibility of maintaining the readpos entirely in the tracing code. If some future users want to package up the readpos with a seq_buf, we can define a new struct then. Link: https://lore.kernel.org/linux-trace-kernel/20231020033545.2587554-2-willy@infradead.org Cc: Kees Cook <keescook@chromium.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Petr Mladek <pmladek@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
5790b1fb3d |
eventfs: Remove eventfs_file and just use eventfs_inode
Instead of having a descriptor for every file represented in the eventfs
directory, only have the directory itself represented. Change the API to
send in a list of entries that represent all the files in the directory
(but not other directories). The entry list contains a name and a callback
function that will be used to create the files when they are accessed.
struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent,
const struct eventfs_entry *entries,
int size, void *data);
is used for the top level eventfs directory, and returns an eventfs_inode
that will be used by:
struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent,
const struct eventfs_entry *entries,
int size, void *data);
where both of the above take an array of struct eventfs_entry entries for
every file that is in the directory.
The entries are defined by:
typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
const struct file_operations **fops);
struct eventfs_entry {
const char *name;
eventfs_callback callback;
};
Where the name is the name of the file and the callback gets called when
the file is being created. The callback passes in the name (in case the
same callback is used for multiple files), a pointer to the mode, data and
fops. The data will be pointing to the data that was passed in
eventfs_create_dir() or eventfs_create_events_dir() but may be overridden
to point to something else, as it will be used to point to the
inode->i_private that is created. The information passed back from the
callback is used to create the dentry/inode.
If the callback fills the data and the file should be created, it must
return a positive number. On zero or negative, the file is ignored.
This logic may also be used as a prototype to convert entire pseudo file
systems into just-in-time allocation.
The "show_events_dentry" file has been updated to show the directories,
and any files they have.
With just the eventfs_file allocations:
Before after deltas for meminfo (in kB):
MemFree: -14360
MemAvailable: -14260
Buffers: 40
Cached: 24
Active: 44
Inactive: 48
Inactive(anon): 28
Active(file): 44
Inactive(file): 20
Dirty: -4
AnonPages: 28
Mapped: 4
KReclaimable: 132
Slab: 1604
SReclaimable: 132
SUnreclaim: 1472
Committed_AS: 12
Before after deltas for slabinfo:
<slab>: <objects> [ * <size> = <total>]
ext4_inode_cache 27 [* 1184 = 31968 ]
extent_status 102 [* 40 = 4080 ]
tracefs_inode_cache 144 [* 656 = 94464 ]
buffer_head 39 [* 104 = 4056 ]
shmem_inode_cache 49 [* 800 = 39200 ]
filp -53 [* 256 = -13568 ]
dentry 251 [* 192 = 48192 ]
lsm_file_cache 277 [* 32 = 8864 ]
vm_area_struct -14 [* 184 = -2576 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k 35 [* 1024 = 35840 ]
kmalloc-256 49 [* 256 = 12544 ]
kmalloc-192 -28 [* 192 = -5376 ]
kmalloc-128 -30 [* 128 = -3840 ]
kmalloc-96 10581 [* 96 = 1015776 ]
kmalloc-64 3056 [* 64 = 195584 ]
kmalloc-32 1291 [* 32 = 41312 ]
kmalloc-16 2310 [* 16 = 36960 ]
kmalloc-8 9216 [* 8 = 73728 ]
Free memory dropped by 14,360 kB
Available memory dropped by 14,260 kB
Total slab additions in size: 1,771,032 bytes
With this change:
Before after deltas for meminfo (in kB):
MemFree: -12084
MemAvailable: -11976
Buffers: 32
Cached: 32
Active: 72
Inactive: 168
Inactive(anon): 176
Active(file): 72
Inactive(file): -8
Dirty: 24
AnonPages: 196
Mapped: 8
KReclaimable: 148
Slab: 836
SReclaimable: 148
SUnreclaim: 688
Committed_AS: 324
Before after deltas for slabinfo:
<slab>: <objects> [ * <size> = <total>]
tracefs_inode_cache 144 [* 656 = 94464 ]
shmem_inode_cache -23 [* 800 = -18400 ]
filp -92 [* 256 = -23552 ]
dentry 179 [* 192 = 34368 ]
lsm_file_cache -3 [* 32 = -96 ]
vm_area_struct -13 [* 184 = -2392 ]
trace_event_file 1748 [* 88 = 153824 ]
kmalloc-1k -49 [* 1024 = -50176 ]
kmalloc-256 -27 [* 256 = -6912 ]
kmalloc-128 1864 [* 128 = 238592 ]
kmalloc-64 4685 [* 64 = 299840 ]
kmalloc-32 -72 [* 32 = -2304 ]
kmalloc-16 256 [* 16 = 4096 ]
total = 721352
Free memory dropped by 12,084 kB
Available memory dropped by 11,976 kB
Total slab additions in size: 721,352 bytes
That's over 2 MB in savings per instance for free and available memory,
and over 1 MB in savings per instance of slab memory.
Link: https://lore.kernel.org/linux-trace-kernel/20231003184059.4924468e@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20231004165007.43d79161@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
||
|
|
a1f157c7a3 |
tracing: Expand all ring buffers individually
The ring buffer of global_trace is set to the minimum size in
order to save memory on boot up and then it will be expand when
some trace feature enabled.
However currently operations under an instance can also cause
global_trace ring buffer being expanded, and the expanded memory
would be wasted if global_trace then not being used.
See following case, we enable 'sched_switch' event in instance 'A', then
ring buffer of global_trace is unexpectedly expanded to be 1410KB, also
the '(expanded: 1408)' from 'buffer_size_kb' of instance is confusing.
# cd /sys/kernel/tracing
# mkdir instances/A
# cat buffer_size_kb
7 (expanded: 1408)
# cat instances/A/buffer_size_kb
1410 (expanded: 1408)
# echo sched:sched_switch > instances/A/set_event
# cat buffer_size_kb
1410
# cat instances/A/buffer_size_kb
1410
To fix it, we can:
- Make 'ring_buffer_expanded' as a member of 'struct trace_array';
- Make 'ring_buffer_expanded' of instance is defaultly true,
global_trace is defaultly false;
- In order not to expose 'global_trace' outside of file
'kernel/trace/trace.c', introduce trace_set_ring_buffer_expanded()
to set 'ring_buffer_expanded' as 'true';
- Pass the expected trace_array to tracing_update_buffers().
Link: https://lore.kernel.org/linux-trace-kernel/20230906091837.3998020-1-zhengyejian1@huawei.com
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
||
|
|
99214f6778 |
Merge tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt: - Add missing LOCKDOWN checks for eventfs callers When LOCKDOWN is active for tracing, it causes inconsistent state when some functions succeed and others fail. - Use dput() to free the top level eventfs descriptor There was a race between accesses and freeing it. - Fix a long standing bug that eventfs exposed due to changing timings by dynamically creating files. That is, If a event file is opened for an instance, there's nothing preventing the instance from being removed which will make accessing the files cause use-after-free bugs. - Fix a ring buffer race that happens when iterating over the ring buffer while writers are active. Check to make sure not to read the event meta data if it's beyond the end of the ring buffer sub buffer. - Fix the print trigger that disappeared because the test to create it was looking for the event dir field being filled, but now it has the "ef" field filled for the eventfs structure. - Remove the unused "dir" field from the event structure. - Fix the order of the trace_dynamic_info as it had it backwards for the offset and len fields for which one was for which endianess. - Fix NULL pointer dereference with eventfs_remove_rec() If an allocation fails in one of the eventfs_add_*() functions, the caller of it in event_subsystem_dir() or event_create_dir() assigns the result to the structure. But it's assigning the ERR_PTR and not NULL. This was passed to eventfs_remove_rec() which expects either a good pointer or a NULL, not ERR_PTR. The fix is to not assign the ERR_PTR to the structure, but to keep it NULL on error. - Fix list_for_each_rcu() to use list_for_each_srcu() in dcache_dir_open_wrapper(). One iteration of the code used RCU but because it had to call sleepable code, it had to be changed to use SRCU, but one of the iterations was missed. - Fix synthetic event print function to use "as_u64" instead of passing in a pointer to the union. To fix big/little endian issues, the u64 that represented several types was turned into a union to define the types properly. * tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec() tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper() tracing/synthetic: Print out u64 values properly tracing/synthetic: Fix order of struct trace_dynamic_info selftests/ftrace: Fix dependencies for some of the synthetic event tests tracing: Remove unused trace_event_file dir field tracing: Use the new eventfs descriptor for print trigger ring-buffer: Do not attempt to read past "commit" tracefs/eventfs: Free top level files on removal ring-buffer: Avoid softlockup in ring_buffer_resize() tracing: Have event inject files inc the trace array ref count tracing: Have option files inc the trace array ref count tracing: Have current_trace inc the trace array ref count tracing: Have tracing_max_latency inc the trace array ref count tracing: Increase trace array ref count on enable and filter files tracefs/eventfs: Use dput to free the toplevel events directory tracefs/eventfs: Add missing lockdown checks tracefs: Add missing lockdown check to tracefs_create_dir() |
||
|
|
1ef26d8b2c |
tracing: Use the new eventfs descriptor for print trigger
The check to create the print event "trigger" was using the obsolete "dir"
value of the trace_event_file to determine if it should create the trigger
or not. But that value will now be NULL because it uses the event file
descriptor.
Change it to test the "ef" field of the trace_event_file structure so that
the trace_marker "trigger" file appears again.
Link: https://lkml.kernel.org/r/20230908022001.371815239@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ajay Kaher <akaher@vmware.com>
Fixes:
|
||
|
|
7e2cfbd2d3 |
tracing: Have option files inc the trace array ref count
The option files update the options for a given trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.
Up the ref count of the trace_array when an option file is opened.
Link: https://lkml.kernel.org/r/20230907024804.086679464@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Fixes:
|
||
|
|
9b37febc57 |
tracing: Have current_trace inc the trace array ref count
The current_trace updates the trace array tracer. For an instance, if the
file is opened and the instance is deleted, reading or writing to the file
will cause a use after free.
Up the ref count of the trace array when current_trace is opened.
Link: https://lkml.kernel.org/r/20230907024803.877687227@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Fixes:
|
||
|
|
7d660c9b2b |
tracing: Have tracing_max_latency inc the trace array ref count
The tracing_max_latency file points to the trace_array max_latency field.
For an instance, if the file is opened and the instance is deleted,
reading or writing to the file will cause a use after free.
Up the ref count of the trace_array when tracing_max_latency is opened.
Link: https://lkml.kernel.org/r/20230907024803.666889383@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Fixes:
|
||
|
|
f5ca233e2e |
tracing: Increase trace array ref count on enable and filter files
When the trace event enable and filter files are opened, increment the
trace array ref counter, otherwise they can be accessed when the trace
array is being deleted. The ref counter keeps the trace array from being
deleted while those files are opened.
Link: https://lkml.kernel.org/r/20230907024803.456187066@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
|
||
|
|
b70100f2e6 |
Merge tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- kprobes: use struct_size() for variable size kretprobe_instance data
structure.
- eprobe: Simplify trace_eprobe list iteration.
- probe events: Data structure field access support on BTF argument.
- Update BTF argument support on the functions in the kernel
loadable modules (only loaded modules are supported).
- Move generic BTF access function (search function prototype and
get function parameters) to a separated file.
- Add a function to search a member of data structure in BTF.
- Support accessing BTF data structure member from probe args by
C-like arrow('->') and dot('.') operators. e.g.
't sched_switch next=next->pid vruntime=next->se.vruntime'
- Support accessing BTF data structure member from $retval. e.g.
'f getname_flags%return +0($retval->name):string'
- Add string type checking if BTF type info is available. This will
reject if user specify ":string" type for non "char pointer"
type.
- Automatically assume the fprobe event as a function return event
if $retval is used.
- selftests/ftrace: Add BTF data field access test cases.
- Documentation: Update fprobe event example with BTF data field.
* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation: tracing: Update fprobe event example with BTF field
selftests/ftrace: Add BTF fields access testcases
tracing/fprobe-event: Assume fprobe is a return event by $retval
tracing/probes: Add string type check with BTF
tracing/probes: Support BTF field access from $retval
tracing/probes: Support BTF based data structure field access
tracing/probes: Add a function to search a member of a struct/union
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
tracing/probes: Support BTF argument on module functions
tracing/eprobe: Iterate trace_eprobe directly
kernel: kprobes: Use struct_size()
|
||
|
|
3d07fa1dd1 |
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
3163f635b2 |
tracing: Fix race issue between cpu buffer write and swap
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes:
|
||
|
|
34232fcfe9 |
Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
"User visible changes:
- Added a way to easier filter with cpumasks:
# echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter
- Show actual size of ring buffer after modifying the ring buffer
size via buffer_size_kb.
Currently it just returns what was written, but the actual size
rounds up to the sub buffer size. Show that real size instead.
Major changes:
- Added "eventfs". This is the code that handles the inodes and
dentries of tracefs/events directory. As there are thousands of
events, and each event has several inodes and dentries that
currently exist even when tracing is never used, they take up
precious memory. Instead, eventfs will allocate the inodes and
dentries in a JIT way (similar to what procfs does). There is now
metadata that handles the events and subdirectories, and will
create the inodes and dentries when they are used.
Note, I also have patches that remove the subdirectory meta data,
but will wait till the next merge window before applying them. It's
a little more complex, and I want to make sure the dynamic code
works properly before adding more complexity, making it easier to
revert if need be.
Minor changes:
- Optimization to user event list traversal
- Remove intermediate permission of tracefs files (note the
intermediate permission removes all access to the files so it is
not a security concern, but just a clean up)
- Add the complex fix to FORTIFY_SOURCE to the kernel stack event
logic
- Other minor cleanups"
* tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
tracefs: Remove kerneldoc from struct eventfs_file
tracefs: Avoid changing i_mode to a temp value
tracing/user_events: Optimize safe list traversals
ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon()
tracing: Remove unused function declarations
tracing/filters: Document cpumask filtering
tracing/filters: Further optimise scalar vs cpumask comparison
tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU
tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU
tracing/filters: Enable filtering the CPU common field by a cpumask
tracing/filters: Enable filtering a scalar field by a cpumask
tracing/filters: Enable filtering a cpumask field by another cpumask
tracing/filters: Dynamically allocate filter_pred.regex
test: ftrace: Fix kprobe test for eventfs
eventfs: Move tracing/events to eventfs
eventfs: Implement removal of meta data from eventfs
eventfs: Implement functions to create files and dirs when accessed
eventfs: Implement eventfs lookup, read, open functions
eventfs: Implement eventfs file add functions
...
|
||
|
|
c440adfbe3 |
tracing/probes: Support BTF based data structure field access
Using BTF to access the fields of a data structure. You can use this
for accessing the field with '->' or '.' operation with BTF argument.
# echo 't sched_switch next=next->pid vruntime=next->se.vruntime' \
> dynamic_events
# echo 1 > events/tracepoints/sched_switch/enable
# head -n 40 trace | tail
<idle>-0 [000] d..3. 272.565382: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956533179
kcompactd0-26 [000] d..3. 272.565406: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.069441: sched_switch: (__probestub_sched_switch+0x4/0x10) next=9 vruntime=956533179
kworker/0:1-9 [000] d..3. 273.069464: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956579181
kcompactd0-26 [000] d..3. 273.069480: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.141434: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956533179
kworker/u2:1-22 [000] d..3. 273.141461: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
<idle>-0 [000] d..3. 273.480872: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956585857
kworker/u2:1-22 [000] d..3. 273.480905: sched_switch: (__probestub_sched_switch+0x4/0x10) next=70 vruntime=959533179
sh-70 [000] d..3. 273.481102: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
Link: https://lore.kernel.org/all/169272157251.160970.9318175874130965571.stgit@devnote2/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
||
|
|
c2489bb7e6 |
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
eecb91b9f9 |
tracing: Fix memleak due to race between current_tracer and trace
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}
s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}
To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
Fixes:
|
||
|
|
b71645d6af |
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:
# cd /sys/kernel/tracing
# cat tracing_cpumask
fff
# echo 0 > tracing_cpumask
# echo 1 > snapshot
# echo fff > tracing_cpumask
# echo 1 > tracing_on
# echo "hello world" > trace_marker
-bash: echo: write error: Bad file descriptor
The root cause is that:
1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
(see update_max_tr());
3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes:
|
||
|
|
6d98a0f2ac |
tracing: Set actual size after ring buffer resize
Currently we can resize trace ringbuffer by writing a value into file 'buffer_size_kb', then by reading the file, we get the value that is usually what we wrote. However, this value may be not actual size of trace ring buffer because of the round up when doing resize in kernel, and the actual size would be more useful. Link: https://lore.kernel.org/linux-trace-kernel/20230705002705.576633-1-zhengyejian1@huawei.com Cc: <mhiramat@kernel.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
6bba92881d |
tracing: Add free_trace_iter_content() helper function
As the trace iterator is created and used by various interfaces, the clean up of it needs to be consistent. Create a free_trace_iter_content() helper function that frees the content of the iterator and use that to clean it up in all places that it is used. Link: https://lkml.kernel.org/r/20230715141348.341887497@goodmis.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
|
|
9182b519b8 |
tracing: Remove unnecessary copying of tr->current_trace
The iterator allocated a descriptor to copy the current_trace. This was done
with the assumption that the function pointers might change. But this was a
false assuption, as it does not change. There's no reason to make a copy of the
current_trace and just use the pointer it points to. This removes needing to
manage freeing the descriptor. Worse yet, there's locations that the iterator
is used but does make a copy and just uses the pointer. This could cause the
actual pointer to the trace descriptor to be freed and not the allocated copy.
This is more of a clean up than a fix.
Link: https://lkml.kernel.org/r/20230715141348.135792275@goodmis.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes:
|
||
|
|
e7186af7fb |
tracing: Add back FORTIFY_SOURCE logic to kernel_stack event structure
For backward compatibility, older tooling expects to see the kernel_stack event with a "caller" field that is a fixed size array of 8 addresses. The code now supports more than 8 with an added "size" field that states the real number of entries. But the "caller" field still just looks like a fixed size to user space. Since the tracing macros that create the user space format files also creates the structures that those files represent, the kernel_stack event structure had its "caller" field a fixed size of 8, but in reality, when it is allocated on the ring buffer, it can hold more if the stack trace is bigger that 8 functions. The copying of these entries was simply done with a memcpy(): size = nr_entries * sizeof(unsigned long); memcpy(entry->caller, fstack->calls, size); The FORTIFY_SOURCE logic noticed at runtime that when the nr_entries was larger than 8, that the memcpy() was writing more than what the structure stated it can hold and it complained about it. This is because the FORTIFY_SOURCE code is unaware that the amount allocated is actually enough to hold the size. It does not expect that a fixed size field will hold more than the fixed size. This was originally solved by hiding the caller assignment with some pointer arithmetic. ptr = ring_buffer_data(); entry = ptr; ptr += offsetof(typeof(*entry), caller); memcpy(ptr, fstack->calls, size); But it is considered bad form to hide from kernel hardening. Instead, make it work nicely with FORTIFY_SOURCE by adding a new __stack_array() macro that is specific for this one special use case. The macro will take 4 arguments: type, item, len, field (whereas the __array() macro takes just the first three). This macro will act just like the __array() macro when creating the code to deal with the format file that is exposed to user space. But for the kernel, it will turn the caller field into: type item[] __counted_by(field); or for this instance: unsigned long caller[] __counted_by(size); Now the kernel code can expose the assignment of the caller to the FORTIFY_SOURCE and everyone is happy! Link: https://lore.kernel.org/linux-trace-kernel/20230712105235.5fc441aa@gandalf.local.home/ Link: https://lore.kernel.org/linux-trace-kernel/20230713092605.2ddb9788@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Kees Cook <keescook@chromium.org> |
||
|
|
8a96c0288d |
ring-buffer: Do not swap cpu_buffer during resize process
When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.
This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
sleep 0.5
echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
echo irqsoff > /sys/kernel/debug/tracing/current_tracer
sleep 1
echo nop > /sys/kernel/debug/tracing/current_tracer
sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &
A typical oops log is as follows, sometimes with other different oops logs.
[ 231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
[ 231.713375] Modules linked in:
[ 231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 #15
[ 231.716750] Hardware name: linux,dummy-virt (DT)
[ 231.718152] Workqueue: events update_pages_handler
[ 231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 231.721171] pc : rb_update_pages+0x378/0x3f8
[ 231.722212] lr : rb_update_pages+0x25c/0x3f8
[ 231.723248] sp : ffff800082b9bd50
[ 231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[ 231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
[ 231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
[ 231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
[ 231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
[ 231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
[ 231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
[ 231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
[ 231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
[ 231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
[ 231.744196] Call trace:
[ 231.744892] rb_update_pages+0x378/0x3f8
[ 231.745893] update_pages_handler+0x1c/0x38
[ 231.746893] process_one_work+0x1f0/0x468
[ 231.747852] worker_thread+0x54/0x410
[ 231.748737] kthread+0x124/0x138
[ 231.749549] ret_from_fork+0x10/0x20
[ 231.750434] ---[ end trace 0000000000000000 ]---
[ 233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 233.721696] Mem abort info:
[ 233.721935] ESR = 0x0000000096000004
[ 233.722283] EC = 0x25: DABT (current EL), IL = 32 bits
[ 233.722596] SET = 0, FnV = 0
[ 233.722805] EA = 0, S1PTW = 0
[ 233.723026] FSC = 0x04: level 0 translation fault
[ 233.723458] Data abort info:
[ 233.723734] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 233.724176] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 233.724589] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
[ 233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 233.726720] Modules linked in:
[ 233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 #15
[ 233.727777] Hardware name: linux,dummy-virt (DT)
[ 233.728225] Workqueue: events update_pages_handler
[ 233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 233.729054] pc : rb_update_pages+0x1a8/0x3f8
[ 233.729334] lr : rb_update_pages+0x154/0x3f8
[ 233.729592] sp : ffff800082b9bd50
[ 233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[ 233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
[ 233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
[ 233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
[ 233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
[ 233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
[ 233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
[ 233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
[ 233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
[ 233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
[ 233.734418] Call trace:
[ 233.734593] rb_update_pages+0x1a8/0x3f8
[ 233.734853] update_pages_handler+0x1c/0x38
[ 233.735148] process_one_work+0x1f0/0x468
[ 233.735525] worker_thread+0x54/0x410
[ 233.735852] kthread+0x124/0x138
[ 233.736064] ret_from_fork+0x10/0x20
[ 233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
[ 233.736959] ---[ end trace 0000000000000000 ]---
After analysis, the seq of the error is as follows [1-5]:
int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
int cpu_id)
{
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//1. get cpu_buffer, aka cpu_buffer(A)
...
...
schedule_work_on(cpu,
&cpu_buffer->update_pages_work);
//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
// update_pages_handler, do the update process, set 'update_done' in
// complete(&cpu_buffer->update_done) and to wakeup resize process.
//---->
//3. Just at this moment, ring_buffer_swap_cpu is triggered,
//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
//ring_buffer_swap_cpu is called as the 'Call trace' below.
Call trace:
dump_backtrace+0x0/0x2f8
show_stack+0x18/0x28
dump_stack+0x12c/0x188
ring_buffer_swap_cpu+0x2f8/0x328
update_max_tr_single+0x180/0x210
check_critical_timing+0x2b4/0x2c8
tracer_hardirqs_on+0x1c0/0x200
trace_hardirqs_on+0xec/0x378
el0_svc_common+0x64/0x260
do_el0_svc+0x90/0xf8
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb8
el0_sync+0x180/0x1c0
//<----
/* wait for all the updates to complete */
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
//for example, cpu_buffer(A)->update_done will leave be set 1, and will
//not 'wait_for_completion' at the next resize round.
if (!cpu_buffer->nr_pages_to_update)
continue;
if (cpu_online(cpu))
wait_for_completion(&cpu_buffer->update_done);
cpu_buffer->nr_pages_to_update = 0;
}
...
}
//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
//Continuing to run in the wrong state, then oops occurs.
Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn
Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
||
|
|
d5a8218963 |
tracing: Fix memory leak of iter->temp when reading trace_pipe
kmemleak reports:
unreferenced object 0xffff88814d14e200 (size 256):
comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
hex dump (first 32 bytes):
04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00 ................
0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff .........Z......
backtrace:
[<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
[<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
[<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
[<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
[<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
[<ffffffff9bf03a43>] vfs_read+0x143/0x520
[<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
[<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
[<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
To fix it, free 'iter->temp' in tracing_release_pipe().
Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
|