Many architectures do not include asm-generic/cacheflush.h, so turn
the includes on their head and add linux/cacheflush.h which includes
asm/cacheflush.h.
Move the flush_dcache_folio() declaration from asm-generic/cacheflush.h
to linux/cacheflush.h and change linux/highmem.h to include
linux/cacheflush.h instead of asm/cacheflush.h so that all necessary
places will see flush_dcache_folio().
More functions should have their default implementations moved in the
future, but those are for follow-on patches. This fixes csky, sparc and
sparc64 which were missed in the commit which added flush_dcache_folio().
Fixes: 08b0b0059b ("mm: Add flush_dcache_folio()")
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Pull asm-generic cleanup from Arnd Bergmann:
"This is a single cleanup from Peter Collingbourne, removing some dead
code"
* tag 'asm-generic-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
arch: remove unused function syscall_set_arguments()
Pull tracing updates from Steven Rostedt:
- kprobes: Restructured stack unwinder to show properly on x86 when a
stack dump happens from a kretprobe callback.
- Fix to bootconfig parsing
- Have tracefs allow owner and group permissions by default (only
denying others). There's been pressure to allow non root to tracefs
in a controlled fashion, and using groups is probably the safest.
- Bootconfig memory managament updates.
- Bootconfig clean up to have the tools directory be less dependent on
changes in the kernel tree.
- Allow perf to be traced by function tracer.
- Rewrite of function graph tracer to be a callback from the function
tracer instead of having its own trampoline (this change will happen
on an arch by arch basis, and currently only x86_64 implements it).
- Allow multiple direct trampolines (bpf hooks to functions) be batched
together in one synchronization.
- Allow histogram triggers to add variables that can perform
calculations against the event's fields.
- Use the linker to determine architecture callbacks from the ftrace
trampoline to allow for proper parameter prototypes and prevent
warnings from the compiler.
- Extend histogram triggers to key off of variables.
- Have trace recursion use bit magic to determine preempt context over
if branches.
- Have trace recursion disable preemption as all use cases do anyway.
- Added testing for verification of tracing utilities.
- Various small clean ups and fixes.
* tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits)
tracing/histogram: Fix semicolon.cocci warnings
tracing/histogram: Fix documentation inline emphasis warning
tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together
tracing: Show size of requested perf buffer
bootconfig: Initialize ret in xbc_parse_tree()
ftrace: do CPU checking after preemption disabled
ftrace: disable preemption when recursion locked
tracing/histogram: Document expression arithmetic and constants
tracing/histogram: Optimize division by a power of 2
tracing/histogram: Covert expr to const if both operands are constants
tracing/histogram: Simplify handling of .sym-offset in expressions
tracing: Fix operator precedence for hist triggers expression
tracing: Add division and multiplication support for hist triggers
tracing: Add support for creating hist trigger variables from literal
selftests/ftrace: Stop tracing while reading the trace file by default
MAINTAINERS: Update KPROBES and TRACING entries
test_kprobes: Move it from kernel/ to lib/
docs, kprobes: Remove invalid URL and add new reference
samples/kretprobes: Fix return value if register_kretprobe() failed
lib/bootconfig: Fix the xbc_get_info kerneldoc
...
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
Pull timer updates from Thomas Gleixner:
"Time, timers and timekeeping updates:
- No core updates
- No new clocksource/event driver
- A large rework of the ARM architected timer driver to prepare for
the support of the upcoming ARMv8.6 support
- Fix Kconfig options for Exynos MCT, Samsung PWM and TI DM timers
- Address a namespace collison in the ARC sp804 timer driver"
* tag 'timers-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-ti-dm: Select TIMER_OF
clocksource/drivers/exynosy: Depend on sub-architecture for Exynos MCT and Samsung PWM
clocksource/drivers/arch_arm_timer: Move workaround synchronisation around
clocksource/drivers/arm_arch_timer: Fix masking for high freq counters
clocksource/drivers/arm_arch_timer: Drop unnecessary ISB on CVAL programming
clocksource/drivers/arm_arch_timer: Remove any trace of the TVAL programming interface
clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations
clocksource/drivers/arm_arch_timer: Advertise 56bit timer to the core code
clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to CVAL
clocksource/drivers/arm_arch_timer: Fix MMIO base address vs callback ordering issue
clocksource/drivers/arm_arch_timer: Move drop _tval from erratum function names
clocksource/drivers/arm_arch_timer: Move system register timer programming over to CVAL
clocksource/drivers/arm_arch_timer: Extend write side of timer register accessors to u64
clocksource/drivers/arm_arch_timer: Drop CNT*_TVAL read accessors
clocksource/arm_arch_timer: Add build-time guards for unhandled register accesses
clocksource/drivers/arc_timer: Eliminate redefined macro error
Pull memory folios from Matthew Wilcox:
"Add memory folios, a new type to represent either order-0 pages or the
head page of a compound page. This should be enough infrastructure to
support filesystems converting from pages to folios.
The point of all this churn is to allow filesystems and the page cache
to manage memory in larger chunks than PAGE_SIZE. The original plan
was to use compound pages like THP does, but I ran into problems with
some functions expecting only a head page while others expect the
precise page containing a particular byte.
The folio type allows a function to declare that it's expecting only a
head page. Almost incidentally, this allows us to remove various calls
to VM_BUG_ON(PageTail(page)) and compound_head().
This converts just parts of the core MM and the page cache. For 5.17,
we intend to convert various filesystems (XFS and AFS are ready; other
filesystems may make it) and also convert more of the MM and page
cache to folios. For 5.18, multi-page folios should be ready.
The multi-page folios offer some improvement to some workloads. The
80% win is real, but appears to be an artificial benchmark (postgres
startup, which isn't a serious workload). Real workloads (eg building
the kernel, running postgres in a steady state, etc) seem to benefit
between 0-10%. I haven't heard of any performance losses as a result
of this series. Nobody has done any serious performance tuning; I
imagine that tweaking the readahead algorithm could provide some more
interesting wins. There are also other places where we could choose to
create large folios and currently do not, such as writes that are
larger than PAGE_SIZE.
I'd like to thank all my reviewers who've offered review/ack tags:
Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes
Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil
Babka, William Kucharski, Yu Zhao and Zi Yan.
I'd also like to thank those who gave feedback I incorporated but
haven't offered up review tags for this part of the series: Nick
Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard,
Hugh Dickins, and probably a few others who I forget"
* tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits)
mm/writeback: Add folio_write_one
mm/filemap: Add FGP_STABLE
mm/filemap: Add filemap_get_folio
mm/filemap: Convert mapping_get_entry to return a folio
mm/filemap: Add filemap_add_folio()
mm/filemap: Add filemap_alloc_folio
mm/page_alloc: Add folio allocation functions
mm/lru: Add folio_add_lru()
mm/lru: Convert __pagevec_lru_add_fn to take a folio
mm: Add folio_evictable()
mm/workingset: Convert workingset_refault() to take a folio
mm/filemap: Add readahead_folio()
mm/filemap: Add folio_mkwrite_check_truncate()
mm/filemap: Add i_blocks_per_folio()
mm/writeback: Add folio_redirty_for_writepage()
mm/writeback: Add folio_account_redirty()
mm/writeback: Add folio_clear_dirty_for_io()
mm/writeback: Add folio_cancel_dirty()
mm/writeback: Add folio_account_cleaned()
mm/writeback: Add filemap_dirty_folio()
...
If available, we use the __builtin_thread_pointer() helper to get the
value of the TLS register, to help the compiler understand that it
doesn't need to reload it every time we access 'current'.
Unfortunately, Clang fails to emit the MRC system register read
directly when building for Thumb2, and instead, it issues a call to the
__aeabi_read_tp helper, which the kernel does not provide, and so this
result in link failures at build time.
So create a special case for this, and emit the MRC directly using an
asm() block, just like we do when the helper is not available to begin
with.
Link: https://github.com/ClangBuiltLinux/linux/issues/1485
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
The code that implements the rarely used PID_IN_CONTEXTIDR feature
dereferences the 'task' field of struct thread_info directly, and this
is no longer possible when THREAD_INFO_IN_TASK=y, as the 'task' field is
omitted from the struct definition in that case. Instead, we should just
cast the thread_info pointer to a task_struct pointer, given that the
former is now the first member of the latter.
So use a helper that abstracts this, and provide implementations for
both cases.
Reported by: Arnd Bergmann <arnd@arndb.de>
Fixes: 18ed1c01a7 ("ARM: smp: Enable THREAD_INFO_IN_TASK")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
On BE32 kernels, the __opcode_to_mem_thumb32() interface is intentionally
not defined, but it is referenced whenever runtime patching is enabled
for the kernel, which may be for ftrace, jump label, kprobes or kgdb:
arch/arm/kernel/patch.c: In function '__patch_text_real':
arch/arm/kernel/patch.c:94:32: error: implicit declaration of function '__opcode_to_mem_thumb32' [-Werror=implicit-function-declaration]
94 | insn = __opcode_to_mem_thumb32(insn);
| ^~~~~~~~~~~~~~~~~~~~~~~
Since BE32 kernels never run Thumb2 code, we never end up using the
result of this call, so providing an extern declaration without
a definition makes it build correctly.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Since the kretprobe replaces the function return address with
the kretprobe_trampoline on the stack, arm unwinder shows it
instead of the correct return address.
This finds the correct return address from the per-task
kretprobe_instances list and verify it is in between the
caller fp and callee fp.
Note that this supports both GCC and clang if CONFIG_FRAME_POINTER=y
and CONFIG_ARM_UNWIND=n. For the ARM unwinder, this is still
not working correctly.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
ARM: kasan: Fix __get_user_check failure with kasan
In macro __get_user_check defined in arch/arm/include/asm/uaccess.h,
error code is store in register int __e(r0). When kasan is
enabled, assigning value to kernel address might trigger kasan check,
which unexpectedly overwrites r0 and causes undefined behavior on arm
kasan images.
One example is failure in do_futex and results in process soft lockup.
Log:
watchdog: BUG: soft lockup - CPU#0 stuck for 62946ms! [rs:main
Q:Reg:1151]
...
(__asan_store4) from (futex_wait_setup+0xf8/0x2b4)
(futex_wait_setup) from (futex_wait+0x138/0x394)
(futex_wait) from (do_futex+0x164/0xe40)
(do_futex) from (sys_futex_time32+0x178/0x230)
(sys_futex_time32) from (ret_fast_syscall+0x0/0x50)
The soft lockup happens in function futex_wait_setup. The reason is
function get_futex_value_locked always return EINVAL, thus pc jump
back to retry label and causes looping.
This line in function get_futex_value_locked
ret = __get_user(*dest, from);
is expanded to
*dest = (typeof(*(p))) __r2; ,
in macro __get_user_check. Writing to pointer dest triggers kasan check
and overwrites the return value of __get_user_x function.
The assembly code of get_futex_value_locked in kernel/futex.c:
...
c01f6dc8: eb0b020e bl c04b7608 <__get_user_4>
// "x = (typeof(*(p))) __r2;" triggers kasan check and r0 is overwritten
c01f6dCc: e1a00007 mov r0, r7
c01f6dd0: e1a05002 mov r5, r2
c01f6dd4: eb04f1e6 bl c0333574 <__asan_store4>
c01f6dd8: e5875000 str r5, [r7]
// save ret value of __get_user(*dest, from), which is dest address now
c01f6ddc: e1a05000 mov r5, r0
...
// checking return value of __get_user failed
c01f6e00: e3550000 cmp r5, #0
...
c01f6e0c: 01a00005 moveq r0, r5
// assign return value to EINVAL
c01f6e10: 13e0000d mvnne r0, #13
Return value is the destination address of get_user thus certainly
non-zero, so get_futex_value_locked always return EINVAL.
Fix it by using a tmp vairable to store the error code before the
assignment. This fix has no effects to non-kasan images thanks to compiler
optimization. It only affects cases that overwrite r0 due to kasan check.
This should fix bug discussed in Link:
[1] https://lore.kernel.org/linux-arm-kernel/0ef7c2a5-5d8b-c5e0-63fa-31693fd4495c@gmail.com/
Fixes: 421015713b ("ARM: 9017/2: Enable KASan for ARM")
Signed-off-by: Lexi Shao <shaolexi@huawei.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
__arm_iomem_set_ro() marks an ioremapped area read-only. This is
intended for use with __arm_ioremap_exec() to allow the kernel to
write some code into e.g. SRAM and then write-protect it so the
kernel doesn't complain about W+X mappings.
Tested-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
This is a default implementation which calls flush_dcache_page() on
each page in the folio. If architectures can do better, they should
implement their own version of it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Switching from TVAL to CVAL has a small drawback: we need an ISB
before reading the counter. We cannot get rid of it, but we can
instead remove the one that comes just after writing to CVAL.
This reduces the number of ISBs from 3 to 2 when programming
the timer.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-12-maz@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Similarily to the sysreg-based timer, move the MMIO over to using
the CVAL registers instead of TVAL. Note that there is no warranty
that the 64bit MMIO access will be atomic, but the timer is always
disabled at the point where we program CVAL.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-8-maz@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
In order to cope better with high frequency counters, move the
programming of the timers from the countdown timer (TVAL) over
to the comparator (CVAL).
The programming model is slightly different, as we now need to
read the current counter value to have an absolute deadline
instead of a relative one.
There is a small overhead to this change, which we will address
in the following patches.
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-5-maz@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
The various accessors for the timer sysreg and MMIO registers are
currently hardwired to 32bit. However, we are about to introduce
the use of the CVAL registers, which require a 64bit access.
Upgrade the write side of the accessors to take a 64bit value
(the read side is left untouched as we don't plan to ever read
back any of these registers).
No functional change expected.
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211017124225.3018098-4-maz@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Now that we no longer rely on thread_info living at the base of the task
stack to be able to access the 'current' pointer, we can wire up the
generic support for moving thread_info into the task struct itself.
Note that this requires us to update the cpu field in thread_info
explicitly, now that the core code no longer does so. Ideally, we would
switch the percpu code to access the cpu field in task_struct instead,
but this unleashes #include circular dependency hell.
Co-developed-by: Keith Packard <keithpac@amazon.com>
Signed-off-by: Keith Packard <keithpac@amazon.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Now that the user space TLS register is assigned on every return to user
space, we can use it to keep the 'current' pointer while running in the
kernel. This removes the need to access it via thread_info, which is
located at the base of the stack, but will be moved out of there in a
subsequent patch.
Use the __builtin_thread_pointer() helper when available - this will
help GCC understand that reloading the value within the same function is
not necessary, even when using the per-task stack protector (which also
generates accesses via the TLS register). For example, the generated
code below loads TPIDRURO only once, and uses it to access both the
stack canary and the preempt_count fields.
<do_one_initcall>:
e92d 41f0 stmdb sp!, {r4, r5, r6, r7, r8, lr}
ee1d 4f70 mrc 15, 0, r4, cr13, cr0, {3}
4606 mov r6, r0
b094 sub sp, #80 ; 0x50
f8d4 34e8 ldr.w r3, [r4, #1256] ; 0x4e8 <- stack canary
9313 str r3, [sp, #76] ; 0x4c
f8d4 8004 ldr.w r8, [r4, #4] <- preempt count
Co-developed-by: Keith Packard <keithpac@amazon.com>
Signed-off-by: Keith Packard <keithpac@amazon.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Amit Daniel Kachhap <amit.kachhap@arm.com>