Prior to commit 5d5fc33ce5 ("riscv: Improve exception and system call
latency"), backtrace through exception worked since ra was filled with
ret_from_exception symbol address and the stacktrace code checked 'pc' to
be equal to that symbol. Now that handle_exception uses regular 'call'
instructions, this isn't working anymore and backtrace stops at
handle_exception(). Since there are multiple call site to C code in the
exception handling path, rather than checking multiple potential return
addresses, add a new symbol at the end of exception handling and check pc
to be in that range.
Fixes: 5d5fc33ce5 ("riscv: Improve exception and system call latency")
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20241209155714.1239665-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Currently, userstacktrace is unsupported for riscv. So use the
perf_callchain_user() code as blueprint to implement the
arch_stack_walk_user() which add userstacktrace support on riscv.
Meanwhile, we can use arch_stack_walk_user() to simplify the implementation
of perf_callchain_user().
A ftrace test case is shown as below:
# cd /sys/kernel/debug/tracing
# echo 1 > options/userstacktrace
# echo 1 > options/sym-userobj
# echo 1 > events/sched/sched_process_fork/enable
# cat trace
......
bash-178 [000] ...1. 97.968395: sched_process_fork: comm=bash pid=178 child_comm=bash child_pid=231
bash-178 [000] ...1. 97.970075: <user stack trace>
=> /lib/libc.so.6[+0xb5090]
Also a simple perf test is ok as below:
# perf record -e cpu-clock --call-graph fp top
# perf report --call-graph
.....
[[31m 66.54%[[m 0.00% top [kernel.kallsyms] [k] ret_from_exception
|
---ret_from_exception
|
|--[[31m58.97%[[m--do_trap_ecall_u
| |
| |--[[31m17.34%[[m--__riscv_sys_read
| | ksys_read
| | |
| | --[[31m16.88%[[m--vfs_read
| | |
| | |--[[31m10.90%[[m--seq_read
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Tested-by: Jinjie Ruan <ruanjinjie@huawei.com>
Cc: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240708032847.2998158-3-ruanjinjie@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Many CPUs implement return address branch prediction as a stack. The
RISCV architecture refers to this as a return address stack (RAS). If
this gets corrupted then the CPU will mispredict at least one but
potentally many function returns.
There are two issues with the current RISCV exception code:
- We are using the alternate link stack (x5/t0) for the indirect branch
which makes the hardware think this is a function return. This will
corrupt the RAS.
- We modify the return address of handle_exception to point to
ret_from_exception. This will also corrupt the RAS.
Testing the null system call latency before and after the patch:
Visionfive2 (StarFive JH7110 / U74)
baseline: 189.87 ns
patched: 176.76 ns
Lichee pi 4a (T-Head TH1520 / C910)
baseline: 666.58 ns
patched: 636.90 ns
Just over 7% on the U74 and just over 4% on the C910.
Signed-off-by: Anton Blanchard <antonb@tenstorrent.com>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20240607061335.2197383-1-cyrilbur@tenstorrent.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
ftrace_graph_ret_addr() takes an `idx` integer pointer that is used to
optimize the stack unwinding. Pass it a valid pointer to utilize the
optimizations that might be available in the future.
The commit is making riscv's usage of ftrace_graph_ret_addr() match
x86_64.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240618145820.62112-1-puranjay@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
arch_stack_walk() is called intensively in function_graph when the
kernel is compiled with CONFIG_TRACE_IRQFLAGS. As a result, the kernel
logs a lot of arch_stack_walk and its sub-functions into the ftrace
buffer. However, these functions should not appear on the trace log
because they are part of the ftrace itself. This patch references what
arm64 does for the smae function. So it further prevent the re-enter
kprobe issue, which is also possible on riscv.
Related-to: commit 0fbcd8abf3 ("arm64: Prohibit instrumentation on arch_stack_walk()")
Fixes: 680341382d ("riscv: add CALLER_ADDRx support")
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240613-dev-andyc-dyn-ftrace-v4-v1-1-1a538e12c01e@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
If the load access fault occures in a leaf function (with
CONFIG_FRAME_POINTER=y), when wrong stack trace will be displayed:
[<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
---[ end trace 0000000000000000 ]---
Registers dump:
ra 0xffffffff80485758 <regmap_mmio_read+36>
sp 0xffffffc80200b9a0
fp 0xffffffc80200b9b0
pc 0xffffffff804853ba <regmap_mmio_read32le+6>
Stack dump:
0xffffffc80200b9a0: 0xffffffc80200b9e0 0xffffffc80200b9e0
0xffffffc80200b9b0: 0xffffffff8116d7e8 0x0000000000000100
0xffffffc80200b9c0: 0xffffffd8055b9400 0xffffffd8055b9400
0xffffffc80200b9d0: 0xffffffc80200b9f0 0xffffffff8047c526
0xffffffc80200b9e0: 0xffffffc80200ba30 0xffffffff8047fe9a
The assembler dump of the function preambula:
add sp,sp,-16
sd s0,8(sp)
add s0,sp,16
In the fist stack frame, where ra is not stored on the stack we can
observe:
0(sp) 8(sp)
.---------------------------------------------.
sp->| frame->fp | frame->ra (saved fp) |
|---------------------------------------------|
fp->| .... | .... |
|---------------------------------------------|
| | |
and in the code check is performed:
if (regs && (regs->epc == pc) && (frame->fp & 0x7))
I see no reason to check frame->fp value at all, because it is can be
uninitialized value on the stack. A better way is to check frame->ra to
be an address on the stack. After the stacktrace shows as expect:
[<ffffffff804853c2>] regmap_mmio_read32le+0xe/0x1c
[<ffffffff80485758>] regmap_mmio_read+0x24/0x52
[<ffffffff8047c526>] _regmap_bus_reg_read+0x1a/0x22
[<ffffffff8047fe9a>] _regmap_read+0x5c/0xea
[<ffffffff80480376>] _regmap_update_bits+0x76/0xc0
...
---[ end trace 0000000000000000 ]---
As pointed by Samuel Holland it is incorrect to remove check of the stackframe
entirely.
Changes since v2 [2]:
- Add accidentally forgotten curly brace
Changes since v1 [1]:
- Instead of just dropping frame->fp check, replace it with validation of
frame->ra, which should be a stack address.
- Move frame pointer validation into the separate function.
[1] https://lore.kernel.org/linux-riscv/20240426072701.6463-1-dev.mbstr@gmail.com/
[2] https://lore.kernel.org/linux-riscv/20240521131314.48895-1-dev.mbstr@gmail.com/
Fixes: f766f77a74 ("riscv/stacktrace: Fix stack output without ra on the stack top")
Signed-off-by: Matthew Bystrin <dev.mbstr@gmail.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240521191727.62012-1-dev.mbstr@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
When running kfence_test, I found some testcases failed like this:
# test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
Expected report_matches(&expect) to be true, but is false
not ok 1 - test_out_of_bounds_read
The corresponding call-trace is:
BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84
Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef4493 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).
With this patch, the call-trace will be:
BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e
Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
test_out_of_bounds_read+0x88/0x19e
kunit_try_run_case+0x38/0x84
kunit_generic_run_threadfn_adapter+0x12/0x1e
kthread+0xc8/0xde
ret_from_exception+0x0/0xc
Fixes: 6a00ef4493 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To follow the existing per-arch conventions, rename "sp_in_global" to
"current_stack_pointer". This will let it be used in non-arch places
(like HARDENED_USERCOPY).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
I tried different pieces of code which uses __builtin_frame_address(1)
(with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
expected on riscv64. The result is negative.
What the compiler had generated is as below:
31 fp = (unsigned long)__builtin_frame_address(1);
0xffffffff80006024 <+200>: ld s1,0(s0)
It takes '0(s0)' as the address of frame 1 (caller), but the actual address
should be '-16(s0)'.
| ... | <-+
+-----------------+ |
| return address | |
| previous fp | |
| saved registers | |
| local variables | |
$fp --> | ... | |
+-----------------+ |
| return address | |
| previous fp --------+
| saved registers |
$sp --> | local variables |
+-----------------+
This leads the kernel can not dump the full stack trace on riscv.
[ 7.222126][ T1] Call Trace:
[ 7.222804][ T1] [<ffffffff80006058>] dump_backtrace+0x2c/0x3a
This problem is not exposed on most riscv builds just because the '0(s0)'
occasionally is the address frame 2 (caller's caller), if only ra and fp
are stored in frame 1 (caller).
| ... | <-+
+-----------------+ |
| return address | |
$fp --> | previous fp | |
+-----------------+ |
| return address | |
| previous fp --------+
| saved registers |
$sp --> | local variables |
+-----------------+
This could be a *bug* of gcc that should be fixed. But as noted in gcc
manual "Calling this function with a nonzero argument can have
unpredictable effects, including crashing the calling program.", let's
remove the '__builtin_frame_address(1)' in backtrace code.
With this fix now it can show full stack trace:
[ 10.444838][ T1] Call Trace:
[ 10.446199][ T1] [<ffffffff8000606c>] dump_backtrace+0x2c/0x3a
[ 10.447711][ T1] [<ffffffff800060ac>] show_stack+0x32/0x3e
[ 10.448710][ T1] [<ffffffff80a005c0>] dump_stack_lvl+0x58/0x7a
[ 10.449941][ T1] [<ffffffff80a005f6>] dump_stack+0x14/0x1c
[ 10.450929][ T1] [<ffffffff804c04ee>] ubsan_epilogue+0x10/0x5a
[ 10.451869][ T1] [<ffffffff804c092e>] __ubsan_handle_load_invalid_value+0x6c/0x78
[ 10.453049][ T1] [<ffffffff8018f834>] __pagevec_release+0x62/0x64
[ 10.455476][ T1] [<ffffffff80190830>] truncate_inode_pages_range+0x132/0x5be
[ 10.456798][ T1] [<ffffffff80190ce0>] truncate_inode_pages+0x24/0x30
[ 10.457853][ T1] [<ffffffff8045bb04>] kill_bdev+0x32/0x3c
...
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Fixes: eac2f3059e ("riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
As [1] and [2] said, the arch_stack_walk should not to trace itself, or it will
leave the trace unexpectedly when called. The example is when we do "cat
/sys/kernel/debug/page_owner", all pages' stack is the same.
arch_stack_walk+0x18/0x20
stack_trace_save+0x40/0x60
register_dummy_stack+0x24/0x5e
init_page_owner+0x2e
So we use __builtin_frame_address(1) as the first frame to be walked. And mark
the arch_stack_walk() noinline.
We found that pr_cont will affact pages' stack whose task state is RUNNING when
testing "echo t > /proc/sysrq-trigger". So move the place of pr_cont and mark
the function dump_backtrace() noinline.
Also we move the case when task == NULL into else branch, and test for it in
"echo c > /proc/sysrq-trigger".
[1] https://lore.kernel.org/lkml/20210319184106.5688-1-mark.rutland@arm.com/
[2] https://lore.kernel.org/lkml/20210317142050.57712-1-chenjun102@huawei.com/
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Fixes: 5d8544e2d0 ("RISC-V: Generic library routines and assembly")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
The const annotation should not be used for 'sp', or it will
become read only and lead to bad stack output.
Fixes: dec822771b ("riscv: stacktrace: Move register keyword to beginning of declaration")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Pull RISC-V updates from Palmer Dabbelt:
"A handful of new RISC-V related patches for this merge window:
- A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may
catch errors in new drivers.
- Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.
- NUMA support for RISC-V, which involves making the arm64 code
generic.
- Support for kasan on the vmalloc region.
- A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.
- Support for allocating ASIDs.
- Preliminary support for kernels larger than 128MiB.
- Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.
We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.
* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
riscv: Improve kasan population by using hugepages when possible
riscv: Improve kasan population function
riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
riscv: Improve kasan definitions
riscv: Get rid of MAX_EARLY_MAPPING_SIZE
soc: canaan: Sort the Makefile alphabetically
riscv: Disable KSAN_SANITIZE for vDSO
riscv: Remove unnecessary declaration
riscv: Add Canaan Kendryte K210 SD card defconfig
riscv: Update Canaan Kendryte K210 defconfig
riscv: Add Kendryte KD233 board device tree
riscv: Add SiPeed MAIXDUINO board device tree
riscv: Add SiPeed MAIX GO board device tree
riscv: Add SiPeed MAIX DOCK board device tree
riscv: Add SiPeed MAIX BiT board device tree
riscv: Update Canaan Kendryte K210 device tree
dt-bindings: add resets property to dw-apb-timer
dt-bindings: fix sifive gpio properties
dt-bindings: update sifive uart compatible string
dt-bindings: update sifive clint compatible string
...
Using global sp_in_global directly to fix the following warning,
arch/riscv/kernel/stacktrace.c:31:3: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration]
31 | const register unsigned long current_sp = sp_in_global;
| ^~~~~
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
When a function doesn't have a callee, then it will not
push ra into the stack, such as lkdtm_BUG() function,
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
ebreak
The struct stackframe use {fp,ra} to get information from
stack, if walk_stackframe() with pr_regs, we will obtain
wrong value and bad stacktrace,
[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
---[ end trace 18da3fbdf08e25d5 ]---
Correct the next fp and pc, after that, full stacktrace
shown as expects,
[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
[<ffffffe0008b24a4>] lkdtm_do_action+0x14/0x1c
[<ffffffe00066c372>] direct_entry+0xc0/0x10a
[<ffffffe000439f86>] full_proxy_write+0x42/0x6a
[<ffffffe000309626>] vfs_write+0x7e/0x214
[<ffffffe00030992a>] ksys_write+0x98/0xc0
[<ffffffe000309960>] sys_write+0xe/0x16
[<ffffffe0002014bc>] ret_from_syscall+0x0/0x2
---[ end trace 61917f3d9a9fadcd ]---
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
In order to use generic arch_stack_walk() code, make stack walk
callback consistent with it.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>