Commit Graph

1296163 Commits

Author SHA1 Message Date
Roman Gushchin
f77bd4b14c mm: memcg: don't call propagate_protected_usage() needlessly
Patch series "mm: memcg: page counters optimizations", v3.

This patchset contains 3 independent small optimizations of page counters.


This patch (of 3):

Memory protection (min/low) requires a constant tracking of protected
memory usage.  propagate_protected_usage() is called on each page counters
update and does a number of operations even in cases when the actual
memory protection functionality is not supported (e.g.  hugetlb cgroups or
memcg swap counters).

It's obviously inefficient and leads to a waste of CPU cycles.  It can be
addressed by calling propagate_protected_usage() only for the counters
which do support memory guarantees.  As of now it's only memcg->memory -
the unified memory memcg counter.

Link: https://lkml.kernel.org/r/20240726203110.1577216-2-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:50 -07:00
Kefeng Wang
6c469957cd mm: hugetlb: remove left over comment about follow_huge_foo()
The comment is useless after commit 57a196a584 ("hugetlb: simplify
hugetlb handling in follow_page_mask") since all follow_huge_foo() are
killed.

Link: https://lkml.kernel.org/r/20240725021643.1358536-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:50 -07:00
Pavel Tikhomirov
e0b2fdb352 kmemleak-test: add percpu leak
Add a per-CPU memory leak, which will be reported like:

unreferenced object 0x3efa840195f8 (size 64):
  comm "modprobe", pid 4667, jiffies 4294688677
  hex dump (first 32 bytes on cpu 0):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace (crc 0):
    [<ffffffffa7fa87af>] pcpu_alloc+0x3df/0x840
    [<ffffffffc11642d9>] kmemleak_test_init+0x2c9/0x2f0 [kmemleak_test]
    [<ffffffffa7c02264>] do_one_initcall+0x44/0x300
    [<ffffffffa7de9e10>] do_init_module+0x60/0x240
    [<ffffffffa7deb946>] init_module_from_file+0x86/0xc0
    [<ffffffffa7deba99>] idempotent_init_module+0x109/0x2a0
    [<ffffffffa7debd2a>] __x64_sys_finit_module+0x5a/0xb0
    [<ffffffffa88f4f3a>] do_syscall_64+0x7a/0x160
    [<ffffffffa8a0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Link: https://lkml.kernel.org/r/20240725041223.872472-3-ptikhomirov@virtuozzo.com
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Chen Jun <chenjun102@huawei.com>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:50 -07:00
Pavel Tikhomirov
6c99d4eb7c kmemleak: enable tracking for percpu pointers
Patch series "kmemleak: support for percpu memory leak detect'.

This is a rework of this series:
https://lore.kernel.org/lkml/20200921020007.35803-1-chenjun102@huawei.com/

Originally I was investigating a percpu leak on our customer nodes and
having this functionality was a huge help, which lead to this fix [1].

So probably it's a good idea to have it in mainstream too, especially as
after [2] it became much easier to implement (we already have a separate
tree for percpu pointers).

[1] commit 0af8c09c89 ("netfilter: x_tables: fix percpu counter block leak on error path when creating new netns")
[2] commit 39042079a0 ("kmemleak: avoid RCU stalls when freeing metadata for per-CPU pointers")


This patch (of 2):

This basically does:

- Add min_percpu_addr and max_percpu_addr to filter out unrelated data
  similar to min_addr and max_addr;

- Set min_count for percpu pointers to 1 to start tracking them;

- Calculate checksum of percpu area as xor of crc32 for each cpu;

- Split pointer lookup and update refs code into separate helper and use
  it twice: once as if the pointer is a virtual pointer and once as if
  it's percpu.

[ptikhomirov@virtuozzo.com: v2]
  Link: https://lkml.kernel.org/r/20240731025526.157529-2-ptikhomirov@virtuozzo.com
Link: https://lkml.kernel.org/r/20240725041223.872472-1-ptikhomirov@virtuozzo.com
Link: https://lkml.kernel.org/r/20240725041223.872472-2-ptikhomirov@virtuozzo.com
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Chen Jun <chenjun102@huawei.com>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:49 -07:00
Pasha Tatashin
fbe76a6557 task_stack: uninline stack_not_used
Given that stack_not_used() is not performance critical function
uninline it.

Link: https://lkml.kernel.org/r/20240730150158.832783-4-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240724203322.2765486-4-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:49 -07:00
Pasha Tatashin
c4a6fce856 vmstat: kernel stack usage histogram
As part of the dynamic kernel stack project, we need to know the amount of
data that can be saved by reducing the default kernel stack size [1].

Provide a kernel stack usage histogram to aid in optimizing kernel stack
sizes and minimizing memory waste in large-scale environments.  The
histogram divides stack usage into power-of-two buckets and reports the
results in /proc/vmstat.  This information is especially valuable in
environments with millions of machines, where even small optimizations can
have a significant impact.

The histogram data is presented in /proc/vmstat with entries like
"kstack_1k", "kstack_2k", and so on, indicating the number of threads that
exited with stack usage falling within each respective bucket.

Example outputs:
Intel:
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0

ARM with 64K page_size:
$ grep kstack /proc/vmstat
kstack_1k 1
kstack_2k 340
kstack_4k 25212
kstack_8k 1659
kstack_16k 0
kstack_32k 0
kstack_64k 0

Note: once the dynamic kernel stack is implemented it will depend on the
implementation the usability of this feature: On hardware that supports
faults on kernel stacks, we will have other metrics that show the total
number of pages allocated for stacks.  On hardware where faults are not
supported, we will most likely have some optimization where only some
threads are extended, and for those, these metrics will still be very
useful.

[1] https://lwn.net/Articles/974367

Link: https://lkml.kernel.org/r/20240730150158.832783-3-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240724203322.2765486-3-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:49 -07:00
Shakeel Butt
9db298a439 memcg: increase the valid index range for memcg stats
Patch series "Kernel stack usage histogram", v6.

Provide histogram of stack sizes for the exited threads:
Example outputs:
Intel:
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0

ARM with 64K page_size:
$ grep kstack /proc/vmstat
kstack_1k 1
kstack_2k 340
kstack_4k 25212
kstack_8k 1659
kstack_16k 0
kstack_32k 0
kstack_64k 0


This patch (of 3):

At the moment the valid index for the indirection tables for memcg stats
and events is < S8_MAX.  These indirection tables are used in performance
critical codepaths.  With the latest addition to the vm_events, the
NR_VM_EVENT_ITEMS has gone over S8_MAX.  One way to resolve is to increase
the entry size of the indirection table from int8_t to int16_t but this
will increase the potential number of cachelines needed to access the
indirection table.

This patch took a different approach and make the valid index < U8_MAX. 
In this way the size of the indirection tables will remain same and we
only need to invalid index check from less than 0 to equal to U8_MAX.  In
this approach we have also removed a subtraction from the performance
critical codepaths.

[pasha.tatashin@soleen.com: v6]
Link: https://lkml.kernel.org/r/20240730150158.832783-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240724203322.2765486-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20240724203322.2765486-2-pasha.tatashin@soleen.com
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Co-developed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:49 -07:00
Zhiguo Jiang
c495b97624 mm: shrink skip folio mapped by an exiting process
The releasing process of the non-shared anonymous folio mapped solely by
an exiting process may go through two flows: 1) the anonymous folio is
firstly is swaped-out into swapspace and transformed into a swp_entry in
shrink_folio_list; 2) then the swp_entry is released in the process
exiting flow.  This will result in the high cpu load of releasing a
non-shared anonymous folio mapped solely by an exiting process.

When the low system memory and the exiting process exist at the same time,
it will be likely to happen, because the non-shared anonymous folio mapped
solely by an exiting process may be reclaimed by shrink_folio_list.

This patch is that shrink skips the non-shared anonymous folio solely
mapped by an exting process and this folio is only released directly in
the process exiting flow, which will save swap-out time and alleviate the
load of the process exiting.

Barry provided some effectiveness testing in [1].  "I observed that
this patch effectively skipped 6114 folios (either 4KB or 64KB mTHP),
potentially reducing the swap-out by up to 92MB (97,300,480 bytes)
during the process exit.  The working set size is 256MB."

Link: https://lkml.kernel.org/r/20240710083641.546-1-justinjiang@vivo.com
Link: https://lore.kernel.org/linux-mm/20240710033212.36497-1-21cnbao@gmail.com/ [1]
Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
Acked-by: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:48 -07:00
Yu Zhao
afb6d780b9 mm/swap: remove boilerplate
Remove boilerplate by using a macro to choose the corresponding lock and
handler for each folio_batch in cpu_fbatches.

[yuzhao@google.com: handle zero-length local_lock_t]
  Link: https://lkml.kernel.org/r/Zq_0X04WsqgUnz30@google.com
[yuzhao@google.com: fix "BUG: using smp_processor_id() in preemptible"]
  Link: https://lkml.kernel.org/r/ZqNHHMiHn-9vy_II@google.com
Link: https://lkml.kernel.org/r/20240711021317.596178-6-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Barry Song <21cnbao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:48 -07:00
Yu Zhao
bed71b50b0 mm/swap: remove remaining _fn suffix
Remove remaining _fn suffix from cpu_fbatches handlers, which are already
self-explanatory.

Link: https://lkml.kernel.org/r/20240711021317.596178-5-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:48 -07:00
Yu Zhao
2f52c77128 mm/swap: fold lru_rotate into cpu_fbatches
Fold lru_rotate into cpu_fbatches, and rename the folio_batch and the lock
protecting it to lru_move_tail and lock_irq respectively so that all the
boilerplate can be removed at the end of this series.

Also remove data_race() around folio_batch_count(), which is out of place:
all folio_batch_count() calls on remote cpu_fbatches are subject to
data_race(), and therefore data_race() should be inside
folio_batch_count().

Link: https://lkml.kernel.org/r/20240711021317.596178-4-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:48 -07:00
Yu Zhao
380d705493 mm/swap: rename cpu_fbatches->activate
Rename cpu_fbatches->activate to cpu_fbatches->lru_activate, and its
handler folio_activate_fn() to lru_activate() so that all the boilerplate
can be removed at the end of this series.

Link: https://lkml.kernel.org/r/20240711021317.596178-3-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:48 -07:00
Yu Zhao
b03484c2a7 mm/swap: reduce indentation level
Patch series "mm/swap: remove boilerplate".


This patch (of 5):

Use folio_activate() as an example:

Before this series
------------------
    if (!folio_test_active(folio) && !folio_test_unevictable(folio)) {
      struct folio_batch *fbatch;

      folio_get(folio);
      if (!folio_test_clear_lru(folio)) {
        folio_put(folio);
        return;
      }

      local_lock(&cpu_fbatches.lock);
      fbatch = this_cpu_ptr(&cpu_fbatches.activate);
      folio_batch_add_and_move(fbatch, folio, folio_activate_fn);
      local_unlock(&cpu_fbatches.lock);
    }
  }

After this series
-----------------
  void folio_activate(struct folio *folio)
  {
    if (folio_test_active(folio) || folio_test_unevictable(folio))
      return;
  
    folio_batch_add_and_move(folio, lru_activate, true);
  }

And this is applied to all 6 folio_batch handlers in mm/swap.c.

bloat-o-meter
-------------
  add/remove: 12/13 grow/shrink: 3/2 up/down: 4653/-4721 (-68)
  ...
  Total: Before=28083019, After=28082951, chg -0.00%


This patch (of 5):

Reduce indentation level by returning directly when there is no cleanup
needed, i.e.,

  if (condition) {    |    if (condition) {
    do_this();        |      do_this();
    return;           |      return;
  } else {            |    }
    do_that();        |
  }                   |    do_that();

and

  if (condition) {    |    if (!condition)
    do_this();        |      return;
    do_that();        |
  }                   |    do_this();
  return;             |    do_that();

Presumably the old style became repetitive as the result of copy and
paste.

Link: https://lkml.kernel.org/r/20240711021317.596178-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/20240711021317.596178-2-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:47 -07:00
Zi Yan
ac59a1f014 memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
memory tiering can be enabled/disabled at runtime and
sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to
check it.  In migrate_misplaced_folio(), the check is missing when
PGPROMOTE_SUCCESS is incremented.  Add the missing check.

Link: https://lkml.kernel.org/r/20240724130115.793641-4-ziy@nvidia.com
Fixes: 33024536ba ("memory tiering: hot page selection with hint page fault latency")
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:47 -07:00
Zi Yan
2a28713a67 memory tiering: introduce folio_use_access_time() check
If memory tiering mode is on and a folio is not in the top tier memory,
folio's cpupid field is repurposed to store page access time.  Instead of
an open coded check, use a function to encapsulate the check.

Link: https://lkml.kernel.org/r/20240724130115.793641-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:47 -07:00
Zi Yan
3eb2091c65 memory tiering: read last_cpupid correctly in do_huge_pmd_numa_page()
Patch series "Various memory tiering fixes", v3.


This patch (of 3):

last_cpupid is only available when memory tiering is off or the folio is
in toptier node.  Complete the check to read last_cpupid when it is
available.

Before the fix, the default last_cpupid will be used even if memory
tiering mode is turned off at runtime instead of the actual value.  This
can prevent task_numa_fault() from getting right numa fault stats, but
should not cause any crash.  User might see performance changes after the
fix.

Link: https://lkml.kernel.org/r/20240724130115.793641-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20240724130115.793641-2-ziy@nvidia.com
Fixes: 33024536ba ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/linux-mm/9af34a6b-ca56-4a64-8aa6-ade65f109288@redhat.com/
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:47 -07:00
Barry Song
d2539ed7ee mm: extend 'usage' parameter so that cluster_swap_free_nr() can be reused
Extend a usage parameter so that cluster_swap_free_nr() can be reused by
both swapcache_clear() and swap_free().  __swap_entry_free() is quite
similar but more tricky as it requires the return value of
__swap_entry_free_locked() which cluster_swap_free_nr() doesn't support.

Link: https://lkml.kernel.org/r/20240724020056.65838-1-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chuanhua Han <hanchuanhua@oppo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:46 -07:00
Muchun Song
4fd568faf6 mm: kmem: remove mem_cgroup_from_obj()
There is no user of mem_cgroup_from_obj(), remove it.

Link: https://lkml.kernel.org/r/20240718091821.44740-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:46 -07:00
Josef Bacik
dc21e70079 mm: remove foll_flags in __get_user_pages
Now that we're not passing around a pointer to the flags, there's no
reason to have an extra variable for the gup_flags, simply pass the
gup_flags directly everywhere.

Link: https://lkml.kernel.org/r/1e79b84bd30287cc9847f2aeb002374e6e60a10f.1721337845.git.josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:46 -07:00
Josef Bacik
478729533e mm: cleanup flags usage in faultin_page
Patch series "mm: some small page fault cleanups".

I was recently wreaking havoc in the page fault code and I noticed some
things that could be cleaned up.  We no longer modify the gup flags in
faultin_page, so we can clean up how we pass the flags in and remove the
extra variable in __get_user_pages.


This patch (of 2):

We're passing a pointer to the foll_flags for faultin_page, however we
never modify the flags in this call.  Change this to just take the flags
value instead.

Link: https://lkml.kernel.org/r/2df51a54c06bdf93e1cb09a19a9ef1df6557b59e.1721337845.git.josef@toxicpanda.com
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:46 -07:00
Peng Hao
c39542732a mm/damon/lru_sort: adjust local variable to dynamic allocation
When KASAN is enabled and built with clang:
    mm/damon/lru_sort.c:199:12: error: stack frame size (2328) exceeds
limit (2048) in 'damon_lru_sort_apply_parameters' [-Werror,-Wframe-larger-than]
    static int damon_lru_sort_apply_parameters(void)
               ^
    1 error generated.

This is because damon_lru_sort_quota contains a large array, and
assigning this variable to a local variable causes a large amount of
stack space to be occupied.

So adjust local variable to dynamic allocation.

Link: https://lkml.kernel.org/r/20240723035513.20153-1-flyingpeng@tencent.com
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:45 -07:00
Yu Zhao
c2a967f6ab mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO
hugetlb_vmemmap_optimize_folio() and hugetlb_vmemmap_restore_folio() are
wrappers meant to be called regardless of whether HVO is enabled. 
Therefore, they should not call synchronize_rcu().  Otherwise, it
regresses use cases not enabling HVO.

So move synchronize_rcu() to __hugetlb_vmemmap_optimize_folio() and
__hugetlb_vmemmap_restore_folio(), and call it once for each batch of
folios when HVO is enabled.

Link: https://lkml.kernel.org/r/20240719042503.2752316-1-yuzhao@google.com
Fixes: bd225530a4 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202407091001.1250ad4a-oliver.sang@intel.com
Reported-by: Janosch Frank <frankja@linux.ibm.com>
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:45 -07:00
Carlos Maiolino
9eace7e8e6 shmem_quota: build the object file conditionally to the config option
Initially I added shmem-quota to obj-y, move it to the correct place and
remove the unneeded full file #ifdef

Link: https://lkml.kernel.org/r/20240717063737.910840-1-cem@kernel.org
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Suggested-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:45 -07:00
Valdis Kletnieks
fcb4824b26 mm: fix typo in Kconfig
Fix typo in Kconfig help

Link: https://lkml.kernel.org/r/78656.1720853990@turing-police
Fixes: e93d4166b4 ("mm: memcg: put cgroup v1-specific code under a config option")
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:45 -07:00
Baolin Wang
6beeab870e mm: shmem: move shmem_huge_global_enabled() into shmem_allowable_huge_orders()
Move shmem_huge_global_enabled() into shmem_allowable_huge_orders(), so
that shmem_allowable_huge_orders() can also help to find the allowable
huge orders for tmpfs.  Moreover the shmem_huge_global_enabled() can
become static.  While we are at it, passing the vma instead of mm for
shmem_huge_global_enabled() makes code cleaner.

No functional changes.

Link: https://lkml.kernel.org/r/8e825146bb29ee1a1c7bd64d2968ff3e19be7815.1721626645.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Barry Song <21cnbao@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01 20:25:44 -07:00