Pull MM updates from Andrew Morton:
- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.
- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My
bad.
- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"
- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability
of cgroup writeback"
- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache
index".
- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of
the zeropage in MAP_SHARED mappings. I don't see any runtime effects
here - more a cleanup/understandability/maintainablity thing.
- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
of higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".
- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".
- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
the series "Enhance soft hwpoison handling and injection".
- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.
- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
has simplified migration's use of hardware-offload memory copying.
- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".
- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.
- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.
- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.
- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".
- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".
- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".
- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".
- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.
It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.
- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".
- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.
- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.
- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".
- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.
- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".
- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.
- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.
- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large
folio userspace copying.
- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.
- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.
- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".
- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".
- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's
self testing code.
- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.
- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.
- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
under config option" and "mm: memcg: put cgroup v1-specific memcg
data under CONFIG_MEMCG_V1"
- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.
- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of
excessive correctable memory errors. In order to permit userspace to
monitor and handle this situation.
- Kefeng Wang's series "mm: migrate: support poison recover from
migrate folio" teaches the kernel to appropriately handle migration
from poisoned source folios rather than simply panicing.
- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.
- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory
utilization.
- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than
bare refcount increments. So these paes can first be moved aside if
they reside in the movable zone or a CMA block.
- Andrii Nakryiko has added a binary ioctl()-based API to
/proc/pid/maps for much faster reading of vma information. The series
is "query VMAs from /proc/<pid>/maps".
- In the series "mm: introduce per-order mTHP split counters" Lance
Yang improves the kernel's presentation of developer information
related to multisize THP splitting.
- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.
- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and
not very useful feature from slab fault injection.
* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
mm/mglru: fix ineffective protection calculation
mm/zswap: fix a white space issue
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix possible recursive locking detected warning
mm/gup: clear the LRU flag of a page before adding to LRU batch
mm/numa_balancing: teach mpol_to_str about the balancing mode
mm: memcg1: convert charge move flags to unsigned long long
alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
lib: reuse page_ext_data() to obtain codetag_ref
lib: add missing newline character in the warning message
mm/mglru: fix overshooting shrinker memory
mm/mglru: fix div-by-zero in vmpressure_calc_level()
mm/kmemleak: replace strncpy() with strscpy()
mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
mm: ignore data-race in __swap_writepage
hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
mm: shmem: rename mTHP shmem counters
mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
mm/migrate: putback split folios when numa hint migration fails
...
The p_align values in PT_LOAD were ignored for static PIE executables
(i.e. ET_DYN without PT_INTERP). This is because there is no way to
request a non-fixed mmap region with a specific alignment. ET_DYN with
PT_INTERP uses a separate base address (ELF_ET_DYN_BASE) and binfmt_elf
performs the ASLR itself, which means it can also apply alignment. For
the mmap region, the address selection happens deep within the vm_mmap()
implementation (when the requested address is 0).
The earlier attempt to implement this:
commit 9630f0d60f ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
commit 925346c129 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
did not take into account the different base address origins, and were
eventually reverted:
aeb7923733 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"")
In order to get the correct alignment from an mmap base, binfmt_elf must
perform a 0-address load first, then tear down the mapping and perform
alignment on the resulting address. Since this is slightly more overhead,
only do this when it is needed (i.e. the alignment is not the default
ELF alignment). This does, however, have the benefit of being able to
use MAP_FIXED_NOREPLACE, to avoid potential collisions.
With this fixed, enable the static PIE self tests again.
Reported-by: H.J. Lu <hjl.tools@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215275
Link: https://lore.kernel.org/r/20240508173149.677910-3-keescook@chromium.org
Signed-off-by: Kees Cook <kees@kernel.org>
After commit 4d1cd3b2c5 ("tools/testing/selftests/exec: fix link
error"), the load address alignment tests tried to build statically.
This was silently ignored in some cases. However, after attempting to
further fix the build by switching to "-static-pie", the test started
failing. This appears to be due to non-PT_INTERP ET_DYN execs ("static
PIE") not doing alignment correctly, which remains unfixed[1]. See commit
aeb7923733 ("revert "fs/binfmt_elf: use PT_LOAD p_align values for
static PIE"") for more details.
Provide rules to build both static and non-static PIE binaries, improve
debug reporting, and perform several test steps instead of a single
all-or-nothing test. However, do not actually enable static-pie tests;
alignment specification is only supported for ET_DYN with PT_INTERP
("regular PIE").
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215275 [1]
Link: https://lore.kernel.org/r/20240508173149.677910-1-keescook@chromium.org
Signed-off-by: Kees Cook <kees@kernel.org>
Use ksft_exit_fail_perror() to print the value of errno and its string
form. This is the first user of the ksft_exit_fail_perror() and proves
the usefulness of this API.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
It seems some shells linked to /bin/sh don't have consistent behavior
with error codes on execution failures. Explicitly use /bin/bash so that
"not found" errors are correctly generated. Repeating the comment from
the test:
/*
* Execute as a long pathname relative to "/". If this is a script,
* the interpreter will launch but fail to open the script because its
* name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
*
* The failure code is usually 127 (POSIX: "If a command is not found,
* the exit status shall be 127."), but some systems give 126 (POSIX:
* "If the command name is found, but it is not an executable utility,
* the exit status shall be 126."), so allow either.
*/
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Closes: https://lore.kernel.org/lkml/02c8bf8e-1934-44ab-a886-e065b37366a7@collabora.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kselftest@vger.kernel.org
Currently the execveat test does not produce KTAP output but rather a
custom format. This means that we only get a pass/fail for the suite, not
for each individual test that the suite does. Convert to using the standard
kselftest output functions which result in KTAP output being generated.
The main trick with this is that, being an exec() related test, the
program executes itself and returns specific exit codes to verify
success meaning that we need to only use the top level kselftest
header/summary functions when invoked directly rather than when run as
part of a test.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pull Kselftest updates from Shuah Khan:
"Several build and cleanup fixes:
- removing obsolete config options
- removing dependency on internal kernel macros
- adding config options
- several build fixes related to headers and install paths"
* tag 'linux-kselftest-next-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (22 commits)
selftests: Fix build when $(O) points to a relative path
selftests: netfilter: fix a build error on openSUSE
selftests: kvm: add generated file to the .gitignore
selftests/exec: add generated files to .gitignore
selftests: add kselftest_install to .gitignore
selftests/rtc: continuously read RTC in a loop for 30s
selftests/lkdtm: Add UBSAN config
selftests/lkdtm: Remove dead config option
selftests/exec: Rename file binfmt_script to binfmt_script.py
selftests: Use -isystem instead of -I to include headers
selftests: vm: remove dependecy from internal kernel macros
selftests: vm: Add the uapi headers include variable
selftests: mptcp: Add the uapi headers include variable
selftests: net: Add the uapi headers include variable
selftests: landlock: Add the uapi headers include variable
selftests: kvm: Add the uapi headers include variable
selftests: futex: Add the uapi headers include variable
selftests: Correct the headers install path
selftests: Add and export a kernel uapi headers path
selftests: set the BUILD variable to absolute path
...
Pull execve updates from Kees Cook:
"Execve and binfmt updates.
Eric and I have stepped up to be the active maintainers of this area,
so here's our first collection. The bulk of the work was in coredump
handling fixes; additional details are noted below:
- Handle unusual AT_PHDR offsets (Akira Kawata)
- Fix initial mapping size when PT_LOADs are not ordered (Alexey
Dobriyan)
- Move more code under CONFIG_COREDUMP (Alexey Dobriyan)
- Fix missing mmap_lock in file_files_note (Eric W. Biederman)
- Remove a.out support for alpha and m68k (Eric W. Biederman)
- Include first pages of non-exec ELF libraries in coredump (Jann
Horn)
- Don't write past end of notes for regset gap in coredump (Rick
Edgecombe)
- Comment clean-ups (Tom Rix)
- Force single empty string when argv is empty (Kees Cook)
- Add NULL argv selftest (Kees Cook)
- Properly redefine PT_GNU_* in terms of PT_LOOS (Kees Cook)
- MAINTAINERS: Update execve entry with tree (Kees Cook)
- Introduce initial KUnit testing for binfmt_elf (Kees Cook)"
* tag 'execve-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: Don't write past end of notes for regset gap
a.out: Stop building a.out/osf1 support on alpha and m68k
coredump: Don't compile flat_core_dump when coredumps are disabled
coredump: Use the vma snapshot in fill_files_note
coredump/elf: Pass coredump_params into fill_note_info
coredump: Remove the WARN_ON in dump_vma_snapshot
coredump: Snapshot the vmas in do_coredump
coredump: Move definition of struct coredump_params into coredump.h
binfmt_elf: Introduce KUnit test
ELF: Properly redefine PT_GNU_* in terms of PT_LOOS
MAINTAINERS: Update execve entry with more details
exec: cleanup comments
fs/binfmt_elf: Refactor load_elf_binary function
fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
binfmt: move more stuff undef CONFIG_COREDUMP
selftests/exec: Test for empty string on NULL argv
exec: Force single empty string when argv is empty
coredump: Also dump first pages of non-executable ELF libraries
ELF: fix overflow in total mapping size calculation
non-regular file needs to be compiled and then copied to the output
directory. Remove it from TEST_PROGS and add it to TEST_GEN_PROGS. This
removes error thrown by rsync when non-regular object isn't found:
rsync: [sender] link_stat "/linux/tools/testing/selftests/exec/non-regular" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Fixes: 0f71241a8e ("selftests/exec: add file type errno tests")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
pipe named FIFO special file is being created in execveat.c to perform
some tests. Makefile doesn't need to do anything with the pipe. When it
isn't found, Makefile generates the following build error:
make: *** No rule to make target
'../tools/testing/selftests/exec/pipe', needed by 'all'. Stop.
pipe is created and removed during test run-time.
Amended change log to add pipe remove info:
Shuah Khan <skhan@linuxfoundation.org>
Fixes: 61016db15b ("selftests/exec: Verify execve of non-regular files fail")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix the link error by adding '-static':
gcc -Wall -Wl,-z,max-page-size=0x1000 -pie load_address.c -o /home/yang/linux/tools/testing/selftests/exec/load_address_4096
/usr/bin/ld: /tmp/ccopEGun.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /tmp/ccopEGun.o(.text+0x158): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `stderr@@GLIBC_2.17'
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make: *** [Makefile:25: tools/testing/selftests/exec/load_address_4096] Error 1
Link: https://lkml.kernel.org/r/20210514092422.2367367-1-yangyingliang@huawei.com
Fixes: 206e22f019 ("tools/testing/selftests: add self-test for verifying load alignment")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Chris Kennelly <ckennelly@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull kselftest updates from Shuah Khan:
"This consists of:
- Several fixes from Masami Hiramatsu to improve coverage for lib and
sysctl tests.
- Clean up to vdso test and a new test for getcpu() from Mark Brown.
- Add new gen_tar selftests Makefile target generate selftest package
running "make gen_tar" in selftests directory from Veronika
Kabatova.
- Other miscellaneous fixes to timens, exec, tpm2 tests"
* tag 'linux-kselftest-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sysctl: Make sysctl test driver as a module
selftests/sysctl: Fix to load test_sysctl module
lib: Make test_sysctl initialized as module
lib: Make prime number generator independently selectable
selftests/ftrace: Return unsupported if no error_log file
selftests/ftrace: Use printf for backslash included command
selftests/timens: handle a case when alarm clocks are not supported
Kernel selftests: Add check if TPM devices are supported
selftests: vdso: Add a selftest for vDSO getcpu()
selftests: vdso: Use a header file to prototype parse_vdso API
selftests: vdso: Rename vdso_test to vdso_test_gettimeofday
selftests/exec: Verify execve of non-regular files fail
selftests: introduce gen_tar Makefile target
Add a named pipe as an exec target to make sure that non-regular
files are rejected by execve() with EACCES. This can help verify
commit 73601ea5b7 ("fs/open.c: allow opening only regular files
during execve()").
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
While working on commit b5372fe5dc ("exec: load_script: Do not exec
truncated interpreter path"), I wrote a series of test scripts to verify
corner cases. However, soon after, commit 6eb3c3d0a5 ("exec: increase
BINPRM_BUF_SIZE to 256") landed, resulting in the tests needing to be
refactored for the larger BINPRM_BUF_SIZE, which got lost on my TODO
list. During the recent exec refactoring work[1], the need for these tests
resurfaced, so I've finished them up for addition to the kernel selftests.
[1] https://lore.kernel.org/lkml/202005191144.E3112135@keescook/
Link: https://lkml.kernel.org/r/202005200204.D07DF079@keescook
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>