linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Dev Jain	5bb6345cd2	mm: remove redundant condition for THP folio folio_test_pmd_mappable() implies folio_test_large(), therefore, simplify the expression for is_thp. Link: https://lkml.kernel.org/r/20241018094151.3458-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: "Huang, Ying" <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:16 -08:00
Liam R. Howlett	4b6b0a5188	mm/mremap: remove goto from mremap_to() mremap_to() has a goto label at the end that doesn't unwind anything. Removing the label makes the code cleaner. This commit also adds documentation to the function. Link: https://lkml.kernel.org/r/20241018174114.2871880-3-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Pedro Falcato <pedro.falcato@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:16 -08:00
Liam R. Howlett	58f1069311	mm/mremap: cleanup vma_to_resize() Patch series "mm/mremap: Remove extra vma tree walk", v2. An extra vma tree walk was discovered in some mremap call paths during the discussion on mseal() changes. This patch set removes the extra vma tree walk and further cleans up mremap_to(). This patch (of 2): vma_to_resize() is used in two locations to find and validate the vma for the mremap location. One of the two locations already has the vma, which is then re-found to validate the same vma. This code can be simplified by moving the vma_lookup() from vma_to_resize() to mremap_to() and changing the return type to an int error. Since the function now just validates the vma, the function is renamed to resize_is_valid() to better reflect what it is doing. This commit also adds documentation about the function. Link: https://lkml.kernel.org/r/20241018174114.2871880-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20241018174114.2871880-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Pedro Falcato <pedro.falcato@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jeff Xu <jeffxu@chromium.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:16 -08:00
Wei Yang	38dc8f4952	maple_tree: remove sanity check from mas_wr_slot_store() After commit `5d659bbb52` ("maple_tree: introduce mas_wr_store_type()"), the check here is redundant. Let's remove it. Link: https://lkml.kernel.org/r/20241017015809.23392-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:16 -08:00
Wei Yang	61e9df7085	maple_tree: calculate new_end when needed Patch series "Following cleanup after introduce mas_wr_store_type()", v2. Patch 1 postpone new_end calculation when needed. Patch 2 removes a unnecessary sanity check in mas_wr_slot_store(). This patch (of 2): For wr_exact_fit/wr_new_root, we don't need to calculate new_end. Let's postpone it until necessary. Link: https://lkml.kernel.org/r/20241017015809.23392-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20241017015809.23392-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Pankaj Raghav	0938b16146	mm: don't set readahead flag on a folio when lookahead_size > nr_to_read The readahead flag is set on a folio based on the lookahead_size and nr_to_read. For example, when the readahead happens from index to index + nr_to_read, then the readahead `mark` offset from index is set at nr_to_read - lookahead_size. There are some scenarios where the lookahead_size > nr_to_read. For example, readahead window was created, but the file was truncated before the readahead starts. do_page_cache_ra() will clamp the nr_to_read if the readahead window extends beyond EOF after truncation. If this happens, readahead flag should not be set on any folio on the current readahead window. The current calculation for `mark` with mapping_min_order > 0 gives incorrect results when lookahead_size > nr_to_read due to rounding up operation: index = 128 nr_to_read = 16 lookahead_size = 28 mapping_min_order = 4 (16 pages) ra_folio_index = round_up(128 + 16 - 28, 16) = 128; mark = 128 - 128 = 0; # offset from index to set RA flag In the above example, the lookahead_size is actually lying outside the current readahead window. Without this patch, RA flag will be set incorrectly on the folio at index 128. This can lead to marking the readahead flag on the wrong folio, therefore, triggering a readahead when it is not necessary. Explicitly initialize `mark` to be ULONG_MAX and only calculate it when lookahead_size is within the readahead window. Link: https://lkml.kernel.org/r/20241017062342.478973-1-kernel@pankajraghav.com Fixes: `26cfdb395e` ("readahead: allocate folios with mapping_min_order in readahead") Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Kefeng Wang	4a9a27fdf7	mm: shmem: remove __shmem_huge_global_enabled() Remove __shmem_huge_global_enabled() since it as only one caller, and remove repeated check of VM_NOHUGEPAGE/MMF_DISABLE_THP as they are checked in shmem_allowable_huge_orders(), also remove unnecessary vma parameter. Link: https://lkml.kernel.org/r/20241017141457.1169092-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Kefeng Wang	9884efd795	mm: huge_memory: move file_thp_enabled() into huge_memory.c file_thp_enabled() is only used in __thp_vma_allowable_orders(), so move it into huge_memory.c, also check READ_ONLY_THP_FOR_FS ahead to avoid unnecessary code if config disabled. Link: https://lkml.kernel.org/r/20241017141457.1169092-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Kefeng Wang	5a90c155de	tmpfs: don't enable large folios if not supported tmpfs can support large folios, but there are some configurable options (mount options and runtime deny/force) to enable/disable large folio allocation, so there is a performance issue when performing writes without large folios. The issue is similar to commit `4e527d5841` ("iomap: fault in smaller chunks for non-large folio mappings"). Since 'deny' is for emergencies and 'force' is for testing, performance issues should not be a problem in real production environments, so don't call mapping_set_large_folios() in __shmem_get_inode() when large folio is disabled with mount huge=never option (default policy). Link: https://lkml.kernel.org/r/20241017141742.1169404-1-wangkefeng.wang@huawei.com Fixes: `9aac777aaf` ("filemap: Convert generic_perform_write() to support large folios") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Lorenzo Stoakes	7146de5ff5	tools: testing: fix phys_addr_t size on 64-bit systems The phys_addr_t size is predicated on whether CONFIG_PHYS_ADDR_T_64BIT is set or not. In the VMA tests, virt_to_phys() from tools/include/linux casts a volatile void * pointer to phys_addr_t, if CONFIG_PHYS_ADDR_T_64BIT is not set, this will be 32-bit and trigger a warning. Obviously this might also lead to truncation, which we would rather avoid. Fix this by adjusting the generation of generated/bit-length.h to generate a CONFIG_PHYS_ADDR_T{bits}BIT define. This does result in the generation of the useless CONFIG_PHYS_ADDR_T_32BIT define for 32-bit systems, but this should have no effect, and makes implementation of this easier. This resolves the issue and the warning. [lorenzo.stoakes@oracle.com: VMA tests not properly importing bit-length.h] Link: https://lkml.kernel.org/r/a6183df9-3108-4d59-8128-4fc6c14e22a5@lucifer.local Link: https://lkml.kernel.org/r/20241017165638.95602-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Tested-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Wei Xu	f1001f3d3b	mm/mglru: reset page lru tier bits when activating When a folio is activated, lru_gen_add_folio() moves the folio to the youngest generation. But unlike folio_update_gen()/folio_inc_gen(), lru_gen_add_folio() doesn't reset the folio lru tier bits (LRU_REFS_MASK \| LRU_REFS_FLAGS). This inconsistency can affect how pages are aged via folio_mark_accessed() (e.g. fd accesses), though no user visible impact related to this has been detected yet. Note that lru_gen_add_folio() cannot clear PG_workingset if the activation is due to workingset refault, otherwise PSI accounting will be skipped. So fix lru_gen_add_folio() to clear the lru tier bits other than PG_workingset when activating a folio, and also clear all the lru tier bits when a folio is activated via folio_activate() in lru_gen_look_around(). Link: https://lkml.kernel.org/r/20241017181528.3358821-1-weixugc@google.com Fixes: `018ee47f14` ("mm: multi-gen LRU: exploit locality in rmap") Signed-off-by: Wei Xu <weixugc@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Jan Alexander Steffens <heftig@archlinux.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:15 -08:00
Thorsten Blum	d3ea85c6c5	mm: swap: use str_true_false() helper function Remove hard-coded strings by using the helper function str_true_false(). Link: https://lkml.kernel.org/r/20241016141040.79168-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Andy Shevchenko	4a7bba1df0	percpu: add a test case for the specific 64-bit value addition It might be a corner case when we add UINT_MAX as 64-bit unsigned value to the percpu variable as it's not the same as -1 (ULONG_LONG_MAX). Add a test case for that. Link: https://lkml.kernel.org/r/20241016182635.1156168-3-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Andy Shevchenko	6c2625e9c2	x86/percpu: fix clang warning when dealing with unsigned types Patch series "percpu: Add a test case and fix for clang", v2. Add a test case to percpu to check a corner case with the specific 64-bit unsigned value. This test case shows why the first patch is done in the way it's done. The before and after has been tested with binary comparison of the percpu_test module and runnig it on the real Intel system. This patch (of 2): When percpu_add_op() is used with an unsigned argument, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 187 \| NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 \| tp->compressed_ack); \| ~~~~~~~~~~~~~~~~~~~ ... arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op' 238 \| ((val) == 1 \|\| (val) == -1)) ? \ \| ~~~~~ ^ ~~ Fix this by casting -1 to the type of the parameter and then compare. Link: https://lkml.kernel.org/r/20241016182635.1156168-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20241016182635.1156168-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Sabyrzhan Tasbolatov	e4137f0881	mm, kasan, kmsan: instrument copy_from/to_kernel_nofault Instrument copy_from_kernel_nofault() with KMSAN for uninitialized kernel memory check and copy_to_kernel_nofault() with KASAN, KCSAN to detect the memory corruption. syzbot reported that bpf_probe_read_kernel() kernel helper triggered KASAN report via kasan_check_range() which is not the expected behaviour as copy_from_kernel_nofault() is meant to be a non-faulting helper. Solution is, suggested by Marco Elver, to replace KASAN, KCSAN check in copy_from_kernel_nofault() with KMSAN detection of copying uninitilaized kernel memory. In copy_to_kernel_nofault() we can retain instrument_write() explicitly for the memory corruption instrumentation. copy_to_kernel_nofault() is tested on x86_64 and arm64 with CONFIG_KASAN_SW_TAGS. On arm64 with CONFIG_KASAN_HW_TAGS, kunit test currently fails. Need more clarification on it. [akpm@linux-foundation.org: fix comment layout, per checkpatch Link: https://lore.kernel.org/linux-mm/CANpmjNMAVFzqnCZhEity9cjiqQ9CVN1X7qeeeAp_6yKjwKo8iw@mail.gmail.com/ Link: https://lkml.kernel.org/r/20241011035310.2982017-1-snovitoll@gmail.com Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Reported-by: syzbot+61123a5daeb9f7454599@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=61123a5daeb9f7454599 Reported-by: Andrey Konovalov <andreyknvl@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=210505 Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN] Tested-by: Andrey Konovalov <andreyknvl@gmail.com> [KASAN] Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	908378a30b	maple_tree: simplify mas_push_node() When count is not 0, we know head is valid. So we can put the assignment in if (count) instead of checking the head pointer again. Also count represents current total, we can assign the new total by increasing the count by one. Link: https://lkml.kernel.org/r/20241015120746.15850-4-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	4223dd93bf	maple_tree: total is not changed for nomem_one case If it jumps to nomem_one, the total allocated number is not changed. So we don't need to adjust it. For the nomem_bulk case, we know there is a valid mas->alloc. So we don't need to do the check. Link: https://lkml.kernel.org/r/20241015120746.15850-3-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	e852cb1d00	maple_tree: clear request_count for new allocated one Patch series "maple_tree: simplify mas_push_node()", v2. When count is not 0, we know head is valid. So we can put the assignment in if (count) instead of checking the head pointer again. Also count represents current total, we can assign the new total by increasing the count by one. This patch (of 3): If this is not a new allocated one, the request_count has already been cleared in mas_set_alloc_req(). Link: https://lkml.kernel.org/r/20241015120746.15850-1-richard.weiyang@gmail.com Link: https://lkml.kernel.org/r/20241015120746.15850-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:14 -08:00
Wei Yang	0cc8d68abe	maple_tree: root node could be handled by !p_slot too For a root node, mte_parent_slot() return 0, this exactly fits the following !p_slot check. So we can remove the special handling for root node. Link: https://lkml.kernel.org/r/20240913063128.27391-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Jiazi Li	0f85eb3395	maple_tree: add some alloc node test case Add some maple_tree alloc node tese case. Link: https://lkml.kernel.org/r/20240626160631.3636515-2-Liam.Howlett@oracle.com Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Jiazi Li	5b2100f723	maple_tree: fix alloc node fail issue In the following code, the second call to the mas_node_count will return -ENOMEM: mas_node_count(mas, MAPLE_ALLOC_SLOTS + 1); mas_node_count(mas, MAPLE_ALLOC_SLOTS * 2 + 2); This is because there may be some full maple_alloc node in current maple state. Use full maple_alloc node will make max_req equal to 0. And it leads to mt_alloc_bulk return 0. As a result, mas_node_count set mas.node to MA_ERROR(-ENOMEM). Find a non-full maple_alloc node, and if necessary, use this non-full node in the next while loop. Link: https://lkml.kernel.org/r/20240626160631.3636515-1-Liam.Howlett@oracle.com Fixes: `54a611b605` ("Maple Tree: add new data structure") Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Suggested-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Saurabh Sengar	f69c2e4dc6	mm/vmstat: defer the refresh_zone_stat_thresholds after all CPUs bringup refresh_zone_stat_thresholds function has two loops which is expensive for higher number of CPUs and NUMA nodes. Below is the rough estimation of total iterations done by these loops based on number of NUMA and CPUs. Total number of iterations: nCPU * 2 * Numa * mCPU Where: nCPU = total number of CPUs Numa = total number of NUMA nodes mCPU = mean value of total CPUs (e.g., 512 for 1024 total CPUs) For the system under test with 16 NUMA nodes and 1024 CPUs, this results in a substantial increase in the number of loop iterations during boot-up when NUMA is enabled: No NUMA = 102421512 = 1,048,576 : Here refresh_zone_stat_thresholds takes around 224 ms total for all the CPUs in the system under test. 16 NUMA = 1024216512 = 16,777,216 : Here refresh_zone_stat_thresholds takes around 4.5 seconds total for all the CPUs in the system under test. Calling this for each CPU is expensive when there are large number of CPUs along with multiple NUMAs. Fix this by deferring refresh_zone_stat_thresholds to be called later at once when all the secondary CPUs are up. Also, register the DYN hooks to keep the existing hotplug functionality intact. Link: https://lkml.kernel.org/r/1723443220-20623-1-git-send-email-ssengar@linux.microsoft.com Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Srivatsa S. Bhat (Microsoft) <srivatsa@csail.mit.edu> Cc: Saurabh Singh Sengar <ssengar@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Jaewon Kim	1f2d03cc53	vmscan: add a vmscan event for reclaim_pages reclaim_folio_list uses a dummy reclaim_stat and is not being used. To know the memory stat, add a new trace event. This is useful how how many pages are not reclaimed or why. This is an example: mm_vmscan_reclaim_pages: nid=0 nr_scanned=112 nr_reclaimed=112 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=0 nr_unmap_fail=0 Currently reclaim_folio_list is only called by reclaim_pages, and reclaim_pages is used by damon and madvise. In the latest Android, reclaim_pages is also used by shmem to reclaim all pages in a address_space. [jaewon31.kim@samsung.com: use sc.nr_scanned rather than new counting] Link: https://lkml.kernel.org/r/20241016143227.961162-1-jaewon31.kim@samsung.com Link: https://lkml.kernel.org/r/20241011124928.1224813-1-jaewon31.kim@samsung.com Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jaewon Kim <jaewon31.kim@samsung.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Zi Yan	5708d96da2	mm: avoid zeroing user movable page twice with init_on_alloc=1 Commit `6471384af2` ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") forces allocated page to be zeroed in post_alloc_hook() when init_on_alloc=1. For order-0 folios, if arch does not define vma_alloc_zeroed_movable_folio(), the default implementation again zeros the page return from the buddy allocator. So the page is zeroed twice. Fix it by passing __GFP_ZERO instead to avoid double page zeroing. At the moment, s390,arm64,x86,alpha,m68k are not impacted since they define their own vma_alloc_zeroed_movable_folio(). For >0 order folios (mTHP and PMD THP), folio_zero_user() is called to zero the folio again. Fix it by calling folio_zero_user() only if init_on_alloc is set. All arch are impacted. Add alloc_zeroed() helper to encapsulate the init_on_alloc check. [ziy@nvidia.com: comment fixes, per David] Link: https://lkml.kernel.org/r/97DB52E1-C594-49B5-9736-89AC302FAB01@nvidia.com Link: https://lkml.kernel.org/r/20241011150304.709590-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00
Kairui Song	773ee2cda5	mm/zswap: avoid touching XArray for unnecessary invalidation zswap_invalidation simply calls xa_erase, which acquires the Xarray lock first, then does a look up. This has a higher overhead even if zswap is not used or the tree is empty. So instead, do a very lightweight xa_empty check first, if there is nothing to erase, don't touch the lock or the tree. Using xa_empty rather than zswap_never_enabled is more helpful as it cover both case where zswap wes never used or the particular range doesn't have any zswap entry. And it's safe as the swap slot should be currently pinned by caller with HAS_CACHE. Sequential SWAP in/out tests with zswap disabled showed a minor performance gain, SWAP in of zero page with zswap enabled also showed a performance gain. (swapout is basically unchanged so only test one case): Swapout of 2G zero page using brd as SWAP, zswap disabled (total time, 4 testrun, +0.1%): Before: 1705013 us 1703119 us 1704335 us 1705848 us. After: 1703579 us `1710640` us 1703625 us 1708699 us. Swapin of 2G zero page using brd as SWAP, zswap disabled (total time, 4 testrun, -3.5%): Before: 1912312 us 1915692 us 1905837 us 1912706 us. After: 1845354 us 1849691 us 1845868 us 1841828 us. Swapin of 2G zero page using brd as SWAP, zswap enabled (total time, 4 testrun, -3.3%): Before: `1897994` us 1894681 us 1899982 us 1898333 us After: 1835894 us 1834113 us 1832047 us 1833125 us Swapin of 2G random page using brd as SWAP, zswap enabled (total time, 4 testrun, -0.1%): Before: 4519747 us 4431078 us 4430185 us 4439999 us After: 4492176 us 4437796 us 4434612 us 4434289 us And the performance is very slightly better or unchanged for build kernel test with zswap enabled or disabled. Build Linux Kernel with defconfig and -j32 in 1G memory cgroup, using brd SWAP, zswap disabled (sys time in seconds, 6 testrun, -0.1%): Before: 1648.83 1653.52 1666.34 1665.95 1663.06 1656.67 After: 1651.36 1661.89 1645.70 1657.45 1662.07 1652.83 Build Linux Kernel with defconfig and -j32 in 2G memory cgroup, using brd SWAP zswap enabled (sys time in seconds, 6 testrun, -0.3%): Before: 1240.25 1254.06 1246.77 1265.92 1244.23 1227.74 After: 1226.41 1218.21 1249.12 1249.13 1244.39 1233.01 Link: https://lkml.kernel.org/r/20241011171950.62684-1-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Chris Li <chrisl@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-11-06 20:11:13 -08:00

1 2 3 4 5 ...

1310816 Commits