Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
kbuild: fix single directory build
kbuild: add -fno-discard-value-names to cmd_cc_ll_c
scripts/clang-tools: Convert clang-tidy args to list
modpost: put modpost options before argument
kbuild: Stop including vmlinux.bz2 in the rpm's
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:
/tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:
.Ldebug_loc0:
.byte 4 # DW_LLE_offset_pair
.uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp1-.Lfunc_begin0 # ending offset
.byte 1 # Loc expr size
.byte 90 # DW_OP_reg10
.byte 0 # DW_LLE_end_of_list
.Ldebug_ranges0:
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp6-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp27-.Lfunc_begin0 # ending offset
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp28-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp30-.Lfunc_begin0 # ending offset
.byte 0 # DW_RLE_end_of_list
There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.
To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT does not give explicit
-gdwarf-* flag. The actual DWARF version is up to the toolchain.
The combination of GCC and GAS works fine, and Clang with the integrated
assembler is good too.
The combination of Clang and GAS is tricky, but at least, the -g flag
works for Clang <=13, which defaults to DWARF v4.
Clang 14 switched its default to DWARF v5.
Now, CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT has the same issue as
addressed by commit 98cd6f521f ("Kconfig: allow explicit opt in to
DWARF v5").
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y for Clang >= 14 and
GAS < 2.35 produces a ton of errors like follows:
/tmp/main-c2741c.s: Assembler messages:
/tmp/main-c2741c.s:109: Error: junk at end of line, first unrecognized character is `"'
/tmp/main-c2741c.s:109: Error: file number less than one
Add 'depends on' to check toolchains.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Commit c0a5c81ca9 ("Kconfig.debug: drop GCC 5+ version check for
DWARF5") could have cleaned up the code a bit more.
"CC_IS_CLANG &&" is unneeded. No functional change is intended.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
Pull preempt RT updates from Thomas Gleixner:
"Introduce preempt_[dis|enable_nested() and use it to clean up various
places which have open coded PREEMPT_RT conditionals.
On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither
disabling preemption nor interrupts. Though there are a few places
which depend on the implicit preemption/interrupt disable of those
locks, e.g. seqcount write sections, per CPU statistics updates etc.
PREEMPT_RT added open coded CONFIG_PREEMPT_RT conditionals to
disable/enable preemption in the related code parts all over the
place. That's hard to read and does not really explain why this is
necessary.
Linus suggested to use helper functions (preempt_disable_nested() and
preempt_enable_nested()) and use those in the affected places. On !RT
enabled kernels these functions are NOPs, but contain a lockdep assert
to validate that preemption is actually disabled to catch call sites
which do not have preemption disabled.
Clean up the affected code paths in mm, dentry and lib"
* tag 'sched-rt-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
u64_stats: Streamline the implementation
flex_proportions: Disable preemption entering the write section.
mm/compaction: Get rid of RT ifdeffery
mm/memcontrol: Replace the PREEMPT_RT conditionals
mm/debug: Provide VM_WARN_ON_IRQS_ENABLED()
mm/vmstat: Use preempt_[dis|en]able_nested()
dentry: Use preempt_[dis|en]able_nested()
preempt: Provide preempt_[dis|en]able_nested()
Pull perf events updates from Ingo Molnar:
"PMU driver updates:
- Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature
support for Zen 4 processors.
- Extend the perf ABI to provide branch speculation information, if
available, and use this on CPUs that have it (eg. LbrExtV2).
- Improve Intel PEBS TSC timestamp handling & integration.
- Add Intel Raptor Lake S CPU support.
- Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs
by utilizing IBS tagged load/store samples.
- Clean up & optimize various x86 PMU details.
HW breakpoints:
- Big rework to optimize the code for systems with hundreds of CPUs
and thousands of breakpoints:
- Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem
per-CPU rwsem that is read-locked during most of the key
operations.
- Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and
fetch_bp_busy_slots().
- Apply micro-optimizations & cleanups.
- Misc cleanups & enhancements"
* tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex
perf: Fix pmu_filter_match()
perf: Fix lockdep_assert_event_ctx()
perf/x86/amd/lbr: Adjust LBR regardless of filtering
perf/x86/utils: Fix uninitialized var in get_branch_type()
perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file
perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR
perf/x86/amd: Support PERF_SAMPLE_ADDR
perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT}
perf/x86/amd: Support PERF_SAMPLE_DATA_SRC
perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions
perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO}
perf/x86/uncore: Add new Raptor Lake S support
perf/x86/cstate: Add new Raptor Lake S support
perf/x86/msr: Add new Raptor Lake S support
perf/x86: Add new Raptor Lake S support
bpf: Check flags for branch stack in bpf_read_branch_records helper
perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails
perf: Use sample_flags for raw_data
perf: Use sample_flags for addr
...
Pull driver core updates from Greg KH:
"Here is the big set of driver core and debug printk changes for
6.1-rc1. Included in here is:
- dynamic debug updates for the core and the drm subsystem. The drm
changes have all been acked by the relevant maintainers
- kernfs fixes for syzbot reported problems
- kernfs refactors and updates for cgroup requirements
- magic number cleanups and removals from the kernel tree (they were
not being used and they really did not actually do anything)
- other tiny cleanups
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits)
docs: filesystems: sysfs: Make text and code for ->show() consistent
Documentation: NBD_REQUEST_MAGIC isn't a magic number
a.out: restore CMAGIC
device property: Add const qualifier to device_get_match_data() parameter
drm_print: add _ddebug descriptor to drm_*dbg prototypes
drm_print: prefer bare printk KERN_DEBUG on generic fn
drm_print: optimize drm_debug_enabled for jump-label
drm-print: add drm_dbg_driver to improve namespace symmetry
drm-print.h: include dyndbg header
drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro
drm_print: interpose drm_*dbg with forwarding macros
drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.
drm_print: condense enum drm_debug_category
debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops
driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs()
Documentation: ENI155_MAGIC isn't a magic number
Documentation: NBD_REPLY_MAGIC isn't a magic number
nbd: remove define-only NBD_MAGIC, previously magic number
Documentation: FW_HEADER_MAGIC isn't a magic number
Documentation: EEPROM_MAGIC_VALUE isn't a magic number
...
Pull kernel hardening updates from Kees Cook:
"Most of the collected changes here are fixes across the tree for
various hardening features (details noted below).
The most notable new feature here is the addition of the memcpy()
overflow warning (under CONFIG_FORTIFY_SOURCE), which is the next step
on the path to killing the common class of "trivially detectable"
buffer overflow conditions (i.e. on arrays with sizes known at compile
time) that have resulted in many exploitable vulnerabilities over the
years (e.g. BleedingTooth).
This feature is expected to still have some undiscovered false
positives. It's been in -next for a full development cycle and all the
reported false positives have been fixed in their respective trees.
All the known-bad code patterns we could find with Coccinelle are also
either fixed in their respective trees or in flight.
The commit message in commit 54d9469bc5 ("fortify: Add run-time WARN
for cross-field memcpy()") for the feature has extensive details, but
I'll repeat here that this is a warning _only_, and is not intended to
actually block overflows (yet). The many patches fixing array sizes
and struct members have been landing for several years now, and we're
finally able to turn this on to find any remaining stragglers.
Summary:
Various fixes across several hardening areas:
- loadpin: Fix verity target enforcement (Matthias Kaehlcke).
- zero-call-used-regs: Add missing clobbers in paravirt (Bill
Wendling).
- CFI: clean up sparc function pointer type mismatches (Bart Van
Assche).
- Clang: Adjust compiler flag detection for various Clang changes
(Sami Tolvanen, Kees Cook).
- fortify: Fix warnings in arch-specific code in sh, ARM, and xen.
Improvements to existing features:
- testing: improve overflow KUnit test, introduce fortify KUnit test,
add more coverage to LKDTM tests (Bart Van Assche, Kees Cook).
- overflow: Relax overflow type checking for wider utility.
New features:
- string: Introduce strtomem() and strtomem_pad() to fill a gap in
strncpy() replacement needs.
- um: Enable FORTIFY_SOURCE support.
- fortify: Enable run-time struct member memcpy() overflow warning"
* tag 'hardening-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (27 commits)
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
hardening: Remove Clang's enable flag for -ftrivial-auto-var-init=zero
sparc: Unbreak the build
x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled
x86/paravirt: clean up typos and grammaros
fortify: Convert to struct vs member helpers
fortify: Explicitly check bounds are compile-time constants
x86/entry: Work around Clang __bdos() bug
ARM: decompressor: Include .data.rel.ro.local
fortify: Adjust KUnit test for modular build
sh: machvec: Use char[] for section boundaries
kunit/memcpy: Avoid pathological compile-time string size
lib: Improve the is_signed_type() kunit test
LoadPin: Require file with verity root digests to have a header
dm: verity-loadpin: Only trust verity targets with enforcement
LoadPin: Fix Kconfig doc about format of file with verity digests
um: Enable FORTIFY_SOURCE
lkdtm: Update tests for memcpy() run-time warnings
fortify: Add run-time WARN for cross-field memcpy()
fortify: Use SIZE_MAX instead of (size_t)-1
...
For each memory location KernelMemorySanitizer maintains two types of
metadata:
1. The so-called shadow of that location - а byte:byte mapping describing
whether or not individual bits of memory are initialized (shadow is 0)
or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
4-byte IDs of the stack traces where uninitialized values were
created.
Each struct page now contains pointers to two struct pages holding KMSAN
metadata (shadow and origins) for the original struct page. Utility
routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata
creation, addressing, copying and checking. mm/kmsan/report.c performs
error reporting in the cases an uninitialized value is used in a way that
leads to undefined behavior.
KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory. mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.
To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.
Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares
CFLAGS_KMSAN, which are applied to files compiled with KMSAN. The
KMSAN_SANITIZE:=n Makefile directive can be used to completely disable
KMSAN instrumentation for certain files.
Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.
Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.
Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Introducing the Maple Tree"
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
Davidlor said
: Yes I like the maple tree, and at this stage I don't think we can ask for
: more from this series wrt the MM - albeit there seems to still be some
: folks reporting breakage. Fundamentally I see Liam's work to (re)move
: complexity out of the MM (not to say that the actual maple tree is not
: complex) by consolidating the three complimentary data structures very
: much worth it considering performance does not take a hit. This was very
: much a turn off with the range locking approach, which worst case scenario
: incurred in prohibitive overhead. Also as Liam and Matthew have
: mentioned, RCU opens up a lot of nice performance opportunities, and in
: addition academia[1] has shown outstanding scalability of address spaces
: with the foundation of replacing the locked rbtree with RCU aware trees.
A similar work has been discovered in the academic press
https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
Sheer coincidence. We designed our tree with the intention of solving the
hardest problem first. Upon settling on a b-tree variant and a rough
outline, we researched ranged based b-trees and RCU b-trees and did find
that article. So it was nice to find reassurances that we were on the
right path, but our design choice of using ranges made that paper unusable
for us.
This patch (of 70):
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. If you use an rbtree with other data
structures to improve performance or an interval tree to track
non-overlapping ranges, then this is for you.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter
than the rbtree so it has fewer cache misses. The removal of the linked
list between subsequent entries also reduces the cache misses and the need
to pull in the previous and next VMA during many tree alterations.
The first user that is covered in this patch set is the vm_area_struct,
where three data structures are replaced by the maple tree: the augmented
rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The
long term goal is to reduce or remove the mmap_lock contention.
The plan is to get to the point where we use the maple tree in RCU mode.
Readers will not block for writers. A single write operation will be
allowed at a time. A reader re-walks if stale data is encountered. VMAs
would be RCU enabled and this mode would be entered once multiple tasks
are using the mm_struct.
There is additional BUG_ON() calls added within the tree, most of which
are in debug code. These will be replaced with a WARN_ON() call in the
future. There is also additional BUG_ON() calls within the code which
will also be reduced in number at a later date. These exist to catch
things such as out-of-range accesses which would crash anyways.
Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Some places in the VM code expect interrupts disabled, which is a valid
expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in
some places because the RT spinlock substitution does not disable
interrupts.
To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places,
provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and
PREEMPT_RT=n.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Link: https://lore.kernel.org/r/20220825164131.402717-5-bigeasy@linutronix.de
Add KUnit test for hw_breakpoint constraints accounting, with various
interesting mixes of breakpoint targets (some care was taken to catch
interesting corner cases via bug-injection).
The test cannot be built as a module because it requires access to
hw_breakpoint_slots(), which is not inlinable or exported on all
architectures.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20220829124719.675715-2-elver@google.com
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
Pull MM updates from Andrew Morton:
"Most of the MM queue. A few things are still pending.
Liam's maple tree rework didn't make it. This has resulted in a few
other minor patch series being held over for next time.
Multi-gen LRU still isn't merged as we were waiting for mapletree to
stabilize. The current plan is to merge MGLRU into -mm soon and to
later reintroduce mapletree, with a view to hopefully getting both
into 6.1-rc1.
Summary:
- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe
Lin, Yang Shi, Anshuman Khandual and Mike Rapoport
- Some kmemleak fixes from Patrick Wang and Waiman Long
- DAMON updates from SeongJae Park
- memcg debug/visibility work from Roman Gushchin
- vmalloc speedup from Uladzislau Rezki
- more folio conversion work from Matthew Wilcox
- enhancements for coherent device memory mapping from Alex Sierra
- addition of shared pages tracking and CoW support for fsdax, from
Shiyang Ruan
- hugetlb optimizations from Mike Kravetz
- Mel Gorman has contributed some pagealloc changes to improve
latency and realtime behaviour.
- mprotect soft-dirty checking has been improved by Peter Xu
- Many other singleton patches all over the place"
[ XFS merge from hell as per Darrick Wong in
https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ]
* tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits)
tools/testing/selftests/vm/hmm-tests.c: fix build
mm: Kconfig: fix typo
mm: memory-failure: convert to pr_fmt()
mm: use is_zone_movable_page() helper
hugetlbfs: fix inaccurate comment in hugetlbfs_statfs()
hugetlbfs: cleanup some comments in inode.c
hugetlbfs: remove unneeded header file
hugetlbfs: remove unneeded hugetlbfs_ops forward declaration
hugetlbfs: use helper macro SZ_1{K,M}
mm: cleanup is_highmem()
mm/hmm: add a test for cross device private faults
selftests: add soft-dirty into run_vmtests.sh
selftests: soft-dirty: add test for mprotect
mm/mprotect: fix soft-dirty check in can_change_pte_writable()
mm: memcontrol: fix potential oom_lock recursion deadlock
mm/gup.c: fix formatting in check_and_migrate_movable_page()
xfs: fail dax mount if reflink is enabled on a partition
mm/memcontrol.c: remove the redundant updating of stats_flush_threshold
userfaultfd: don't fail on unrecognized features
hugetlb_cgroup: fix wrong hugetlb cgroup numa stat
...
Pull driver core / kernfs updates from Greg KH:
"Here is the set of driver core and kernfs changes for 6.0-rc1.
The "biggest" thing in here is some scalability improvements for
kernfs for large systems. Other than that, included in here are:
- arch topology and cache info changes that have been reviewed and
discussed a lot.
- potential error path cleanup fixes
- deferred driver probe cleanups
- firmware loader cleanups and tweaks
- documentation updates
- other small things
All of these have been in the linux-next tree for a while with no
reported problems"
* tag 'driver-core-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (63 commits)
docs: embargoed-hardware-issues: fix invalid AMD contact email
firmware_loader: Replace kmap() with kmap_local_page()
sysfs docs: ABI: Fix typo in comment
kobject: fix Kconfig.debug "its" grammar
kernfs: Fix typo 'the the' in comment
docs: driver-api: firmware: add driver firmware guidelines. (v3)
arch_topology: Fix cache attributes detection in the CPU hotplug path
ACPI: PPTT: Leave the table mapped for the runtime usage
cacheinfo: Use atomic allocation for percpu cache attributes
drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
MAINTAINERS: Change mentions of mpm to olivia
docs: ABI: sysfs-devices-soc: Update Lee Jones' email address
docs: ABI: sysfs-class-pwm: Update Lee Jones' email address
Documentation/process: Add embargoed HW contact for LLVM
Revert "kernfs: Change kernfs_notify_list to llist."
ACPI: Remove the unused find_acpi_cpu_cache_topology()
arch_topology: Warn that topology for nested clusters is not supported
arch_topology: Add support for parsing sockets in /cpu-map
arch_topology: Set cluster identifier in each core/thread from /cpu-map
arch_topology: Limit span of cpu_clustergroup_mask()
...