mirror of
https://github.com/armbian/linux-cix.git
synced 2026-01-06 12:30:45 -08:00
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
This commit is contained in:
@@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
|
||||
|
||||
When it is set to 0 only pages from the same node are merged,
|
||||
otherwise pages from all nodes can be merged together (default).
|
||||
|
||||
What: /sys/kernel/mm/ksm/general_profit
|
||||
Date: April 2023
|
||||
KernelVersion: 6.4
|
||||
Contact: Linux memory management mailing list <linux-mm@kvack.org>
|
||||
Description: Measure how effective KSM is.
|
||||
general_profit: how effective is KSM. The formula for the
|
||||
calculation is in Documentation/admin-guide/mm/ksm.rst.
|
||||
|
||||
@@ -172,7 +172,7 @@ variables.
|
||||
Offset of the free_list's member. This value is used to compute the number
|
||||
of free pages.
|
||||
|
||||
Each zone has a free_area structure array called free_area[MAX_ORDER].
|
||||
Each zone has a free_area structure array called free_area[MAX_ORDER + 1].
|
||||
The free_list represents a linked list of free page blocks.
|
||||
|
||||
(list_head, next|prev)
|
||||
@@ -189,8 +189,8 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
|
||||
information. Makedumpfile gets the start address of the vmalloc region
|
||||
from this.
|
||||
|
||||
(zone.free_area, MAX_ORDER)
|
||||
---------------------------
|
||||
(zone.free_area, MAX_ORDER + 1)
|
||||
-------------------------------
|
||||
|
||||
Free areas descriptor. User-space tools use this value to iterate the
|
||||
free_area ranges. MAX_ORDER is used by the zone buddy allocator.
|
||||
|
||||
@@ -4012,7 +4012,7 @@
|
||||
[KNL] Minimal page reporting order
|
||||
Format: <integer>
|
||||
Adjust the minimal page reporting order. The page
|
||||
reporting is disabled when it exceeds (MAX_ORDER-1).
|
||||
reporting is disabled when it exceeds MAX_ORDER.
|
||||
|
||||
panic= [KNL] Kernel behaviour on panic: delay <timeout>
|
||||
timeout > 0: seconds before rebooting
|
||||
|
||||
@@ -157,6 +157,8 @@ stable_node_chains_prune_millisecs
|
||||
|
||||
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
|
||||
|
||||
general_profit
|
||||
how effective is KSM. The calculation is explained below.
|
||||
pages_shared
|
||||
how many shared pages are being used
|
||||
pages_sharing
|
||||
@@ -207,7 +209,8 @@ several times, which are unprofitable memory consumed.
|
||||
ksm_rmap_items * sizeof(rmap_item).
|
||||
|
||||
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
|
||||
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``.
|
||||
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``. The process profit
|
||||
is also shown in ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
|
||||
|
||||
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
|
||||
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
|
||||
|
||||
@@ -219,6 +219,31 @@ former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
|
||||
you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
|
||||
used.
|
||||
|
||||
Userfaultfd write-protect mode currently behave differently on none ptes
|
||||
(when e.g. page is missing) over different types of memories.
|
||||
|
||||
For anonymous memory, ``ioctl(UFFDIO_WRITEPROTECT)`` will ignore none ptes
|
||||
(e.g. when pages are missing and not populated). For file-backed memories
|
||||
like shmem and hugetlbfs, none ptes will be write protected just like a
|
||||
present pte. In other words, there will be a userfaultfd write fault
|
||||
message generated when writing to a missing page on file typed memories,
|
||||
as long as the page range was write-protected before. Such a message will
|
||||
not be generated on anonymous memories by default.
|
||||
|
||||
If the application wants to be able to write protect none ptes on anonymous
|
||||
memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ. On
|
||||
newer kernels, one can also detect the feature UFFD_FEATURE_WP_UNPOPULATED
|
||||
and set the feature bit in advance to make sure none ptes will also be
|
||||
write protected even upon anonymous memory.
|
||||
|
||||
When using ``UFFDIO_REGISTER_MODE_WP`` in combination with either
|
||||
``UFFDIO_REGISTER_MODE_MISSING`` or ``UFFDIO_REGISTER_MODE_MINOR``, when
|
||||
resolving missing / minor faults with ``UFFDIO_COPY`` or ``UFFDIO_CONTINUE``
|
||||
respectively, it may be desirable for the new page / mapping to be
|
||||
write-protected (so future writes will also result in a WP fault). These ioctls
|
||||
support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
|
||||
respectively) to configure the mapping this way.
|
||||
|
||||
QEMU/KVM
|
||||
========
|
||||
|
||||
|
||||
@@ -575,20 +575,26 @@ The field width is passed by value, the bitmap is passed by reference.
|
||||
Helper macros cpumask_pr_args() and nodemask_pr_args() are available to ease
|
||||
printing cpumask and nodemask.
|
||||
|
||||
Flags bitfields such as page flags, gfp_flags
|
||||
---------------------------------------------
|
||||
Flags bitfields such as page flags, page_type, gfp_flags
|
||||
--------------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
%pGp 0x17ffffc0002036(referenced|uptodate|lru|active|private|node=0|zone=2|lastcpupid=0x1fffff)
|
||||
%pGt 0xffffff7f(buddy)
|
||||
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
|
||||
%pGv read|exec|mayread|maywrite|mayexec|denywrite
|
||||
|
||||
For printing flags bitfields as a collection of symbolic constants that
|
||||
would construct the value. The type of flags is given by the third
|
||||
character. Currently supported are [p]age flags, [v]ma_flags (both
|
||||
expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag
|
||||
names and print order depends on the particular type.
|
||||
character. Currently supported are:
|
||||
|
||||
- p - [p]age flags, expects value of type (``unsigned long *``)
|
||||
- t - page [t]ype, expects value of type (``unsigned int *``)
|
||||
- v - [v]ma_flags, expects value of type (``unsigned long *``)
|
||||
- g - [g]fp_flags, expects value of type (``gfp_t *``)
|
||||
|
||||
The flag names and print order depends on the particular type.
|
||||
|
||||
Note that this format should not be used directly in the
|
||||
:c:func:`TP_printk()` part of a tracepoint. Instead, use the show_*_flags()
|
||||
|
||||
@@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page)
|
||||
open: yes
|
||||
close: yes
|
||||
fault: yes can return with page locked
|
||||
map_pages: yes
|
||||
map_pages: read
|
||||
page_mkwrite: yes can return with page locked
|
||||
pfn_mkwrite: yes
|
||||
access: yes
|
||||
@@ -661,7 +661,7 @@ locked. The VM will unlock the page.
|
||||
|
||||
->map_pages() is called when VM asks to map easy accessible pages.
|
||||
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
||||
till "end_pgoff". ->map_pages() is called with page table locked and must
|
||||
till "end_pgoff". ->map_pages() is called with the RCU lock held and must
|
||||
not block. If it's not possible to reach a page without blocking,
|
||||
filesystem should skip it. Filesystem should use do_set_pte() to setup
|
||||
page table entry. Pointer to entry associated with the page is passed in
|
||||
|
||||
@@ -996,6 +996,7 @@ Example output. You may not have all of these fields.
|
||||
VmallocUsed: 40444 kB
|
||||
VmallocChunk: 0 kB
|
||||
Percpu: 29312 kB
|
||||
EarlyMemtestBad: 0 kB
|
||||
HardwareCorrupted: 0 kB
|
||||
AnonHugePages: 4149248 kB
|
||||
ShmemHugePages: 0 kB
|
||||
@@ -1146,6 +1147,13 @@ VmallocChunk
|
||||
Percpu
|
||||
Memory allocated to the percpu allocator used to back percpu
|
||||
allocations. This stat excludes the cost of metadata.
|
||||
EarlyMemtestBad
|
||||
The amount of RAM/memory in kB, that was identified as corrupted
|
||||
by early memtest. If memtest was not run, this field will not
|
||||
be displayed at all. Size is never rounded down to 0 kB.
|
||||
That means if 0 kB is reported, you can safely assume
|
||||
there was at least one pass of memtest and none of the passes
|
||||
found a single faulty byte of RAM.
|
||||
HardwareCorrupted
|
||||
The amount of RAM/memory in KB, the kernel identifies as
|
||||
corrupted.
|
||||
|
||||
@@ -13,17 +13,29 @@ everything stored therein is lost.
|
||||
|
||||
tmpfs puts everything into the kernel internal caches and grows and
|
||||
shrinks to accommodate the files it contains and is able to swap
|
||||
unneeded pages out to swap space. It has maximum size limits which can
|
||||
be adjusted on the fly via 'mount -o remount ...'
|
||||
unneeded pages out to swap space, if swap was enabled for the tmpfs
|
||||
mount. tmpfs also supports THP.
|
||||
|
||||
If you compare it to ramfs (which was the template to create tmpfs)
|
||||
you gain swapping and limit checking. Another similar thing is the RAM
|
||||
disk (/dev/ram*), which simulates a fixed size hard disk in physical
|
||||
RAM, where you have to create an ordinary filesystem on top. Ramdisks
|
||||
cannot swap and you do not have the possibility to resize them.
|
||||
tmpfs extends ramfs with a few userspace configurable options listed and
|
||||
explained further below, some of which can be reconfigured dynamically on the
|
||||
fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs
|
||||
filesystem can be resized but it cannot be resized to a size below its current
|
||||
usage. tmpfs also supports POSIX ACLs, and extended attributes for the
|
||||
trusted.* and security.* namespaces. ramfs does not use swap and you cannot
|
||||
modify any parameter for a ramfs filesystem. The size limit of a ramfs
|
||||
filesystem is how much memory you have available, and so care must be taken if
|
||||
used so to not run out of memory.
|
||||
|
||||
Since tmpfs lives completely in the page cache and on swap, all tmpfs
|
||||
pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||
An alternative to tmpfs and ramfs is to use brd to create RAM disks
|
||||
(/dev/ram*), which allows you to simulate a block device disk in physical RAM.
|
||||
To write data you would just then need to create an regular filesystem on top
|
||||
this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are also
|
||||
configured in size at initialization and you cannot dynamically resize them.
|
||||
Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely on the
|
||||
block layer at all.
|
||||
|
||||
Since tmpfs lives completely in the page cache and optionally on swap,
|
||||
all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
|
||||
free(1). Notice that these counters also include shared memory
|
||||
(shmem, see ipcs(1)). The most reliable way to get the count is
|
||||
using df(1) and du(1).
|
||||
@@ -72,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance. The default
|
||||
is half of the number of your physical RAM pages, or (on a
|
||||
machine with highmem) the number of lowmem RAM pages,
|
||||
whichever is the lower.
|
||||
noswap Disables swap. Remounts must respect the original settings.
|
||||
By default swap is enabled.
|
||||
========= ============================================================
|
||||
|
||||
These parameters accept a suffix k, m or g for kilo, mega and giga and
|
||||
@@ -85,6 +99,36 @@ mount with such options, since it allows any user with write access to
|
||||
use up all the memory on the machine; but enhances the scalability of
|
||||
that instance in a system with many CPUs making intensive use of it.
|
||||
|
||||
tmpfs also supports Transparent Huge Pages which requires a kernel
|
||||
configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for
|
||||
your system (has_transparent_hugepage(), which is architecture specific).
|
||||
The mount options for this are:
|
||||
|
||||
====== ============================================================
|
||||
huge=0 never: disables huge pages for the mount
|
||||
huge=1 always: enables huge pages for the mount
|
||||
huge=2 within_size: only allocate huge pages if the page will be
|
||||
fully within i_size, also respect fadvise()/madvise() hints.
|
||||
huge=3 advise: only allocate huge pages if requested with
|
||||
fadvise()/madvise()
|
||||
====== ============================================================
|
||||
|
||||
There is a sysfs file which you can also use to control system wide THP
|
||||
configuration for all tmpfs mounts, the file is:
|
||||
|
||||
/sys/kernel/mm/transparent_hugepage/shmem_enabled
|
||||
|
||||
This sysfs file is placed on top of THP sysfs directory and so is registered
|
||||
by THP code. It is however only used to control all tmpfs mounts with one
|
||||
single knob. Since it controls all tmpfs mounts it should only be used either
|
||||
for emergency or testing purposes. The values you can set for shmem_enabled are:
|
||||
|
||||
== ============================================================
|
||||
-1 deny: disables huge on shm_mnt and all mounts, for
|
||||
emergency use
|
||||
-2 force: enables huge on shm_mnt and all mounts, w/o needing
|
||||
option, for testing
|
||||
== ============================================================
|
||||
|
||||
tmpfs has a mount option to set the NUMA memory allocation policy for
|
||||
all files in that instance (if CONFIG_NUMA is enabled) - which can be
|
||||
|
||||
@@ -2,6 +2,12 @@
|
||||
Active MM
|
||||
=========
|
||||
|
||||
Note, the mm_count refcount may no longer include the "lazy" users
|
||||
(running tasks with ->active_mm == mm && ->mm == NULL) on kernels
|
||||
with CONFIG_MMU_LAZY_TLB_REFCOUNT=n. Taking and releasing these lazy
|
||||
references must be done with mmgrab_lazy_tlb() and mmdrop_lazy_tlb()
|
||||
helpers, which abstract this config option.
|
||||
|
||||
::
|
||||
|
||||
List: linux-kernel
|
||||
|
||||
@@ -214,7 +214,7 @@ HugeTLB Page Table Helpers
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| pte_huge | Tests a HugeTLB |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| pte_mkhuge | Creates a HugeTLB |
|
||||
| arch_make_huge_pte | Creates a HugeTLB |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| huge_pte_dirty | Tests a dirty HugeTLB |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
|
||||
@@ -103,7 +103,8 @@ moving across tiers only involves atomic operations on
|
||||
``folio->flags`` and therefore has a negligible cost. A feedback loop
|
||||
modeled after the PID controller monitors refaults over all the tiers
|
||||
from anon and file types and decides which tiers from which types to
|
||||
evict or protect.
|
||||
evict or protect. The desired effect is to balance refault percentages
|
||||
between anon and file types proportional to the swappiness level.
|
||||
|
||||
There are two conceptually independent procedures: the aging and the
|
||||
eviction. They form a closed-loop system, i.e., the page reclaim.
|
||||
@@ -156,6 +157,27 @@ This time-based approach has the following advantages:
|
||||
and memory sizes.
|
||||
2. It is more reliable because it is directly wired to the OOM killer.
|
||||
|
||||
``mm_struct`` list
|
||||
------------------
|
||||
An ``mm_struct`` list is maintained for each memcg, and an
|
||||
``mm_struct`` follows its owner task to the new memcg when this task
|
||||
is migrated.
|
||||
|
||||
A page table walker iterates ``lruvec_memcg()->mm_list`` and calls
|
||||
``walk_page_range()`` with each ``mm_struct`` on this list to scan
|
||||
PTEs. When multiple page table walkers iterate the same list, each of
|
||||
them gets a unique ``mm_struct``, and therefore they can run in
|
||||
parallel.
|
||||
|
||||
Page table walkers ignore any misplaced pages, e.g., if an
|
||||
``mm_struct`` was migrated, pages left in the previous memcg will be
|
||||
ignored when the current memcg is under reclaim. Similarly, page table
|
||||
walkers will ignore pages from nodes other than the one under reclaim.
|
||||
|
||||
This infrastructure also tracks the usage of ``mm_struct`` between
|
||||
context switches so that page table walkers can skip processes that
|
||||
have been sleeping since the last iteration.
|
||||
|
||||
Rmap/PT walk feedback
|
||||
---------------------
|
||||
Searching the rmap for PTEs mapping each page on an LRU list (to test
|
||||
@@ -170,7 +192,7 @@ promotes hot pages. If the scan was done cacheline efficiently, it
|
||||
adds the PMD entry pointing to the PTE table to the Bloom filter. This
|
||||
forms a feedback loop between the eviction and the aging.
|
||||
|
||||
Bloom Filters
|
||||
Bloom filters
|
||||
-------------
|
||||
Bloom filters are a space and memory efficient data structure for set
|
||||
membership test, i.e., test if an element is not in the set or may be
|
||||
@@ -186,6 +208,18 @@ is false positive, the cost is an additional scan of a range of PTEs,
|
||||
which may yield hot pages anyway. Parameters of the filter itself can
|
||||
control the false positive rate in the limit.
|
||||
|
||||
PID controller
|
||||
--------------
|
||||
A feedback loop modeled after the Proportional-Integral-Derivative
|
||||
(PID) controller monitors refaults over anon and file types and
|
||||
decides which type to evict when both types are available from the
|
||||
same generation.
|
||||
|
||||
The PID controller uses generations rather than the wall clock as the
|
||||
time domain because a CPU can scan pages at different rates under
|
||||
varying memory pressure. It calculates a moving average for each new
|
||||
generation to avoid being permanently locked in a suboptimal state.
|
||||
|
||||
Memcg LRU
|
||||
---------
|
||||
An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
|
||||
@@ -223,9 +257,9 @@ parts:
|
||||
|
||||
* Generations
|
||||
* Rmap walks
|
||||
* Page table walks
|
||||
* Bloom filters
|
||||
* PID controller
|
||||
* Page table walks via ``mm_struct`` list
|
||||
* Bloom filters for rmap/PT walk feedback
|
||||
* PID controller for refault feedback
|
||||
|
||||
The aging and the eviction form a producer-consumer model;
|
||||
specifically, the latter drives the former by the sliding window over
|
||||
|
||||
@@ -42,6 +42,8 @@ The unevictable list addresses the following classes of unevictable pages:
|
||||
|
||||
* Those owned by ramfs.
|
||||
|
||||
* Those owned by tmpfs with the noswap mount option.
|
||||
|
||||
* Those mapped into SHM_LOCK'd shared memory regions.
|
||||
|
||||
* Those mapped into VM_LOCKED [mlock()ed] VMAs.
|
||||
|
||||
@@ -13457,13 +13457,14 @@ F: arch/powerpc/include/asm/membarrier.h
|
||||
F: include/uapi/linux/membarrier.h
|
||||
F: kernel/sched/membarrier.c
|
||||
|
||||
MEMBLOCK
|
||||
MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION
|
||||
M: Mike Rapoport <rppt@kernel.org>
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
F: Documentation/core-api/boot-time-mm.rst
|
||||
F: include/linux/memblock.h
|
||||
F: mm/memblock.c
|
||||
F: mm/mm_init.c
|
||||
F: tools/testing/memblock/
|
||||
|
||||
MEMORY CONTROLLER DRIVERS
|
||||
@@ -13498,6 +13499,7 @@ F: include/linux/memory_hotplug.h
|
||||
F: include/linux/mm.h
|
||||
F: include/linux/mmzone.h
|
||||
F: include/linux/pagewalk.h
|
||||
F: include/trace/events/ksm.h
|
||||
F: mm/
|
||||
F: tools/mm/
|
||||
F: tools/testing/selftests/mm/
|
||||
@@ -13506,6 +13508,7 @@ VMALLOC
|
||||
M: Andrew Morton <akpm@linux-foundation.org>
|
||||
R: Uladzislau Rezki <urezki@gmail.com>
|
||||
R: Christoph Hellwig <hch@infradead.org>
|
||||
R: Lorenzo Stoakes <lstoakes@gmail.com>
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
W: http://www.linux-mm.org
|
||||
|
||||
32
arch/Kconfig
32
arch/Kconfig
@@ -465,6 +465,38 @@ config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
|
||||
irqs disabled over activate_mm. Architectures that do IPI based TLB
|
||||
shootdowns should enable this.
|
||||
|
||||
# Use normal mm refcounting for MMU_LAZY_TLB kernel thread references.
|
||||
# MMU_LAZY_TLB_REFCOUNT=n can improve the scalability of context switching
|
||||
# to/from kernel threads when the same mm is running on a lot of CPUs (a large
|
||||
# multi-threaded application), by reducing contention on the mm refcount.
|
||||
#
|
||||
# This can be disabled if the architecture ensures no CPUs are using an mm as a
|
||||
# "lazy tlb" beyond its final refcount (i.e., by the time __mmdrop frees the mm
|
||||
# or its kernel page tables). This could be arranged by arch_exit_mmap(), or
|
||||
# final exit(2) TLB flush, for example.
|
||||
#
|
||||
# To implement this, an arch *must*:
|
||||
# Ensure the _lazy_tlb variants of mmgrab/mmdrop are used when manipulating
|
||||
# the lazy tlb reference of a kthread's ->active_mm (non-arch code has been
|
||||
# converted already).
|
||||
config MMU_LAZY_TLB_REFCOUNT
|
||||
def_bool y
|
||||
depends on !MMU_LAZY_TLB_SHOOTDOWN
|
||||
|
||||
# This option allows MMU_LAZY_TLB_REFCOUNT=n. It ensures no CPUs are using an
|
||||
# mm as a lazy tlb beyond its last reference count, by shooting down these
|
||||
# users before the mm is deallocated. __mmdrop() first IPIs all CPUs that may
|
||||
# be using the mm as a lazy tlb, so that they may switch themselves to using
|
||||
# init_mm for their active mm. mm_cpumask(mm) is used to determine which CPUs
|
||||
# may be using mm as a lazy tlb mm.
|
||||
#
|
||||
# To implement this, an arch *must*:
|
||||
# - At the time of the final mmdrop of the mm, ensure mm_cpumask(mm) contains
|
||||
# at least all possible CPUs in which the mm is lazy.
|
||||
# - It must meet the requirements for MMU_LAZY_TLB_REFCOUNT=n (see above).
|
||||
config MMU_LAZY_TLB_SHOOTDOWN
|
||||
bool
|
||||
|
||||
config ARCH_HAVE_NMI_SAFE_CMPXCHG
|
||||
bool
|
||||
|
||||
|
||||
@@ -556,7 +556,7 @@ endmenu # "ARC Architecture Configuration"
|
||||
|
||||
config ARCH_FORCE_MAX_ORDER
|
||||
int "Maximum zone order"
|
||||
default "12" if ARC_HUGEPAGE_16M
|
||||
default "11"
|
||||
default "11" if ARC_HUGEPAGE_16M
|
||||
default "10"
|
||||
|
||||
source "kernel/power/Kconfig"
|
||||
|
||||
@@ -74,11 +74,6 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
|
||||
base, TO_MB(size), !in_use ? "Not used":"");
|
||||
}
|
||||
|
||||
bool arch_has_descending_max_zone_pfns(void)
|
||||
{
|
||||
return !IS_ENABLED(CONFIG_ARC_HAS_PAE40);
|
||||
}
|
||||
|
||||
/*
|
||||
* First memory setup routine called from setup_arch()
|
||||
* 1. setup swapper's mm @init_mm
|
||||
|
||||
@@ -1352,20 +1352,19 @@ config ARM_MODULE_PLTS
|
||||
configurations. If unsure, say y.
|
||||
|
||||
config ARCH_FORCE_MAX_ORDER
|
||||
int "Maximum zone order"
|
||||
default "12" if SOC_AM33XX
|
||||
default "9" if SA1111
|
||||
default "11"
|
||||
int "Order of maximal physically contiguous allocations"
|
||||
default "11" if SOC_AM33XX
|
||||
default "8" if SA1111
|
||||
default "10"
|
||||
help
|
||||
The kernel memory allocator divides physically contiguous memory
|
||||
blocks into "zones", where each zone is a power of two number of
|
||||
pages. This option selects the largest power of two that the kernel
|
||||
keeps in the memory allocator. If you need to allocate very large
|
||||
blocks of physically contiguous memory, then you may need to
|
||||
increase this value.
|
||||
The kernel page allocator limits the size of maximal physically
|
||||
contiguous allocations. The limit is called MAX_ORDER and it
|
||||
defines the maximal power of two of number of pages that can be
|
||||
allocated as a single contiguous block. This option allows
|
||||
overriding the default setting when ability to allocate very
|
||||
large blocks of physically contiguous memory is required.
|
||||
|
||||
This config option is actually maximum order plus one. For example,
|
||||
a value of 11 means that the largest free memory block is 2^10 pages.
|
||||
Don't change if unsure.
|
||||
|
||||
config ALIGNMENT_TRAP
|
||||
def_bool CPU_CP15_MMU
|
||||
|
||||
@@ -31,7 +31,7 @@ CONFIG_SOC_VF610=y
|
||||
CONFIG_SMP=y
|
||||
CONFIG_ARM_PSCI=y
|
||||
CONFIG_HIGHMEM=y
|
||||
CONFIG_ARCH_FORCE_MAX_ORDER=14
|
||||
CONFIG_ARCH_FORCE_MAX_ORDER=13
|
||||
CONFIG_CMDLINE="noinitrd console=ttymxc0,115200"
|
||||
CONFIG_KEXEC=y
|
||||
CONFIG_CPU_FREQ=y
|
||||
|
||||
@@ -26,7 +26,7 @@ CONFIG_THUMB2_KERNEL=y
|
||||
# CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11 is not set
|
||||
# CONFIG_ARM_PATCH_IDIV is not set
|
||||
CONFIG_HIGHMEM=y
|
||||
CONFIG_ARCH_FORCE_MAX_ORDER=12
|
||||
CONFIG_ARCH_FORCE_MAX_ORDER=11
|
||||
CONFIG_SECCOMP=y
|
||||
CONFIG_KEXEC=y
|
||||
CONFIG_EFI=y
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user