Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.
Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:
<...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit
<...>-2265 1d..1 3419us : __do_softirq <-irq_exit
<...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq
<...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq
<...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq
<...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq
<...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
<...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
<...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
<...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
<...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
<...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq
<...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq
<...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq
There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.
Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com
Fixes: 5e6d2b9cfa "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Cc: stable@vget.kernel.org # 2.6.37+
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently dynamic ftrace calls are updated any time
the ftrace_ops is un/registered. If we do this update
only when it's needed, we save lot of time for perf
system wide ftrace function sampling/counting.
The reason is that for system wide sampling/counting,
perf creates event for each cpu in the system.
Each event then registers separate copy of ftrace_ops,
which ends up in FTRACE_UPDATE_CALLS updates. On servers
with many cpus that means serious stall (240 cpus server):
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
370,663 ftrace:function
1.401427505 seconds time elapsed
real 3m51.743s
user 0m0.023s
sys 3m48.569s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 141200 events and lost 5 chunks!
[ perf record: Captured and wrote 10.703 MB perf.data (135950 samples) ]
real 2m31.429s
user 0m0.213s
sys 2m29.494s
There's no reason to do the FTRACE_UPDATE_CALLS update
for each event in perf case, because all the ftrace_ops
always share the same filter, so the updated calls are
always the same.
It's required that only first ftrace_ops registration
does the FTRACE_UPDATE_CALLS update (also sometimes
the second if the first one used the trampoline), but
the rest can be only cheaply linked into the ftrace_ops
list.
Counting:
# time ./perf stat -e ftrace:function -a sleep 1
Performance counter stats for 'system wide':
398,571 ftrace:function
1.377503733 seconds time elapsed
real 0m2.787s
user 0m0.005s
sys 0m1.883s
Sampling:
# time ./perf record -e ftrace:function -a sleep 1
[ perf record: Woken up 0 times to write data ]
Warning:
Processed 261730 events and lost 9 chunks!
[ perf record: Captured and wrote 19.907 MB perf.data (256293 samples) ]
real 1m31.948s
user 0m0.309s
sys 1m32.051s
Link: http://lkml.kernel.org/r/1458138873-1553-6-git-send-email-jolsa@kernel.org
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
echo nop > /sys/kernel/debug/tracing/options/current_tracer
echo 1 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 0 > /sys/kernel/debug/tracing/options/test_nop_accept
echo 1 > /sys/kernel/debug/tracing/options/test_nop_refuse
Before the fix, the dmesg is a bit ugly since a align issue.
[ 191.973081] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[ 195.156942] nop_test_refuse flag set to 1: we refuse.Now cat trace_options to see the result
After the fix, the dmesg will show aligned log for nop_test_refuse and nop_test_accept.
[ 2718.032413] nop_test_refuse flag set to 1: we refuse. Now cat trace_options to see the result
[ 2734.253360] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
Link: http://lkml.kernel.org/r/1457444222-8654-2-git-send-email-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
commit 5634cc2aa9 ("writeback: update writeback
tracepoints to report cgroup") made writeback tracepoints print out cgroup
path when CGROUP_WRITEBACK is enabled, but it may trigger the below bug on -rt
kernel since kernfs_path and kernfs_path_len are called by tracepoints, which
acquire spin lock that is sleepable on -rt kernel.
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:930
in_atomic(): 1, irqs_disabled(): 0, pid: 625, name: kworker/u16:3
INFO: lockdep is turned off.
Preemption disabled at:[<ffffffc000374a5c>] wb_writeback+0xec/0x830
CPU: 7 PID: 625 Comm: kworker/u16:3 Not tainted 4.4.1-rt5 #20
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Workqueue: writeback wb_workfn (flush-7:0)
Call trace:
[<ffffffc00008d708>] dump_backtrace+0x0/0x200
[<ffffffc00008d92c>] show_stack+0x24/0x30
[<ffffffc0007b0f40>] dump_stack+0x88/0xa8
[<ffffffc000127d74>] ___might_sleep+0x2ec/0x300
[<ffffffc000d5d550>] rt_spin_lock+0x38/0xb8
[<ffffffc0003e0548>] kernfs_path_len+0x30/0x90
[<ffffffc00036b360>] trace_event_raw_event_writeback_work_class+0xe8/0x2e8
[<ffffffc000374f90>] wb_writeback+0x620/0x830
[<ffffffc000376224>] wb_workfn+0x61c/0x950
[<ffffffc000110adc>] process_one_work+0x3ac/0xb30
[<ffffffc0001112fc>] worker_thread+0x9c/0x7a8
[<ffffffc00011a9e8>] kthread+0x190/0x1b0
[<ffffffc000086ca0>] ret_from_fork+0x10/0x30
With unlocked kernfs_* functions, synchronize_sched() has to be called in
kernfs_rename which could be called in syscall path, but it is problematic.
So, print out cgroup ino instead of path name, which could be converted to
path name by userland.
Withouth CGROUP_WRITEBACK enabled, it just prints out root dir. But, root
dir ino vary from different filesystems, so printing out -1U to indicate
an invalid cgroup ino.
Link: http://lkml.kernel.org/r/1456996137-8354-1-git-send-email-yang.shi@linaro.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
gcc isn't known for handling bool in structures. Instead of using bool, use
an integer mask and use bit flags instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some trace events have conditions that check if the current CPU is online or
not before recording the tracepoint. That's because certain trace events are
in locations that can be called as the CPU is going offline and when RCU no
longer monitors it (like kfree and friends). The check was added because
trace events require RCU to be active.
This is a trace event infrastructure issue and not something that individual
trace events should worry about. The tracepoint.h code now has added a check
to see if the current CPU is considered online, and it only does the
tracepoint if it is. There's no more need for individual trace events to
also include this check. It is now redundant.
Cc: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When I was updating the ftrace_stress test of ltp. I encountered
a strange phenomemon, excute following steps:
echo nop > /sys/kernel/debug/tracing/current_tracer
echo 0 > /sys/kernel/debug/tracing/options/funcgraph-cpu
bash: echo: write error: Invalid argument
check dmesg:
[ 1024.903855] nop_test_refuse flag set to 0: we refuse.Now cat trace_options to see the result
The reason is that the trace option test will randomly setup trace
option under tracing/options no matter what the current_tracer is.
but the set_tracer_option is always using the set_flag callback
from the current_tracer. This patch adds a pointer to tracer_flags
and make it point to the tracer it belongs to. When the option is
setup, the set_flag of the right tracer will be used no matter
what the the current_tracer is.
And the old dummy_tracer_flags is used for all the tracers which
doesn't have a tracer_flags, having issue to use it to save the
pointer of a tracer. So remove it and use dynamic dummy tracer_flags
for tracers needing a dummy tracer_flags, as a result, there are no
tracers sharing tracer_flags, so remove the check code.
And save the current tracer to trace_option_dentry seems not good as
it may waste mem space when mount the debug/trace fs more than one time.
Link: http://lkml.kernel.org/r/1457444222-8654-1-git-send-email-chuhu@redhat.com
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
[ Fixed up function tracer options to work with the change ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull ARM SoC fix from Olof Johansson:
"Tiny fixes branch this week, in fact only one patch.
Turns out the USB support for a Renesas board was developed on a
pre-release board that ended up being changed before shipping. To
avoid breakage on those boards, and avoid confusion, it's a reasonable
idea to patch now instead of later. There are no known users of the
pre-release variant any more"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: porter: remove enable prop from HS-USB device node
Pull ARM fixes from Russell King:
"Just two ARM fixes this time: one to fix the hyp-stub for older ARM
CPUs, and another to fix the set_memory_xx() permission functions to
deal with zero sizes correctly"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8544/1: set_memory_xx fixes
ARM: 8534/1: virt: fix hyp-stub build for pre-ARMv7 CPUs
Pull ceph fix from Sage Weil:
"This is a final commit we missed to align the protocol compatibility
with the feature bits.
It decodes a few extra fields in two different messages and reports
EIO when they are used (not yet supported)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support
Pull UBI fix from Richard Weinberger:
"This contains a single bug fix for UBI"
* tag 'upstream-4.5-rc7' of git://git.infradead.org/linux-ubifs:
ubi: Fix out of bounds write in volume update code
Pull UML fixes from Richard Weinberger:
"This contains three bug/build fixes"
* 'for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: use %lx format specifiers for unsigned longs
um: Export pm_power_off
Revert "um: Fix get_signal() usage"
Pull MIPS fixes from Ralf Baechle:
"Another round of fixes for 4.5:
- Fix the use of an undocumented syntactial variant of the .type
pseudo op which is not supported by the LLVM assembler.
- Fix invalid initialization on S-cache-less systems.
- Fix possible information leak from the kernel stack for SIGFPE.
- Fix handling of copy_{from,to}_user() return value in KVM
- Fix the last instance of irq_to_gpio() which now was causing build
errors"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp'
MIPS: kvm: Fix ioctl error handling.
MIPS: scache: Fix scache init with invalid line size.
MIPS: Avoid variant of .type unsupported by LLVM Assembler
MIPS: jz4740: Fix surviving instance of irq_to_gpio()
Pull powerpc fixes from Michael Ellerman:
- cxl: Fix PSL timebase synchronization detection from Frederic Barrat
- Fix oops when destroying hw_breakpoint event from Ravi Bangoria
- Avoid lbarx on e5500 from Scott Wood
* tag 'powerpc-4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/fsl-book3e: Avoid lbarx on e5500
powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event
cxl: Fix PSL timebase synchronization detection