mirror of
https://github.com/armbian/linux-cix.git
synced 2026-01-06 12:30:45 -08:00
Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
This commit is contained in:
@@ -137,3 +137,17 @@ Description:
|
||||
The writeback_limit file is read-write and specifies the maximum
|
||||
amount of writeback ZRAM can do. The limit could be changed
|
||||
in run time.
|
||||
|
||||
What: /sys/block/zram<id>/recomp_algorithm
|
||||
Date: November 2022
|
||||
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
|
||||
Description:
|
||||
The recomp_algorithm file is read-write and allows to set
|
||||
or show secondary compression algorithms.
|
||||
|
||||
What: /sys/block/zram<id>/recompress
|
||||
Date: November 2022
|
||||
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
|
||||
Description:
|
||||
The recompress file is write-only and triggers re-compression
|
||||
with secondary compression algorithms.
|
||||
|
||||
@@ -44,6 +44,21 @@ Description:
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/min_ratio_fine
|
||||
Date: November 2022
|
||||
Contact: Stefan Roesch <shr@devkernel.io>
|
||||
Description:
|
||||
Under normal circumstances each device is given a part of the
|
||||
total write-back cache that relates to its current average
|
||||
writeout speed in relation to the other devices.
|
||||
|
||||
The 'min_ratio_fine' parameter allows assigning a minimum reserve
|
||||
of the write-back cache to a particular device. The value is
|
||||
expressed as part of 1 million. For example, this is useful for
|
||||
providing a minimum QoS.
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/max_ratio
|
||||
Date: January 2008
|
||||
Contact: Peter Zijlstra <a.p.zijlstra@chello.nl>
|
||||
@@ -55,6 +70,59 @@ Description:
|
||||
mount that is prone to get stuck, or a FUSE mount which cannot
|
||||
be trusted to play fair.
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/max_ratio_fine
|
||||
Date: November 2022
|
||||
Contact: Stefan Roesch <shr@devkernel.io>
|
||||
Description:
|
||||
Allows limiting a particular device to use not more than the
|
||||
given value of the write-back cache. The value is given as part
|
||||
of 1 million. This is useful in situations where we want to avoid
|
||||
one device taking all or most of the write-back cache. For example
|
||||
in case of an NFS mount that is prone to get stuck, or a FUSE mount
|
||||
which cannot be trusted to play fair.
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/min_bytes
|
||||
Date: October 2022
|
||||
Contact: Stefan Roesch <shr@devkernel.io>
|
||||
Description:
|
||||
Under normal circumstances each device is given a part of the
|
||||
total write-back cache that relates to its current average
|
||||
writeout speed in relation to the other devices.
|
||||
|
||||
The 'min_bytes' parameter allows assigning a minimum
|
||||
percentage of the write-back cache to a particular device
|
||||
expressed in bytes.
|
||||
For example, this is useful for providing a minimum QoS.
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/max_bytes
|
||||
Date: October 2022
|
||||
Contact: Stefan Roesch <shr@devkernel.io>
|
||||
Description:
|
||||
Allows limiting a particular device to use not more than the
|
||||
given 'max_bytes' of the write-back cache. This is useful in
|
||||
situations where we want to avoid one device taking all or
|
||||
most of the write-back cache. For example in case of an NFS
|
||||
mount that is prone to get stuck, a FUSE mount which cannot be
|
||||
trusted to play fair, or a nbd device.
|
||||
|
||||
(read-write)
|
||||
|
||||
What: /sys/class/bdi/<bdi>/strict_limit
|
||||
Date: October 2022
|
||||
Contact: Stefan Roesch <shr@devkernel.io>
|
||||
Description:
|
||||
Forces per-BDI checks for the share of given device in the write-back
|
||||
cache even before the global background dirty limit is reached. This
|
||||
is useful in situations where the global limit is much higher than
|
||||
affordable for given relatively slow (or untrusted) device. Turning
|
||||
strictlimit on has no visible effect if max_ratio is equal to 100%.
|
||||
|
||||
(read-write)
|
||||
What: /sys/class/bdi/<bdi>/stable_pages_required
|
||||
Date: January 2008
|
||||
|
||||
@@ -27,6 +27,10 @@ Description: Writing 'on' or 'off' to this file makes the kdamond starts or
|
||||
makes the kdamond reads the user inputs in the sysfs files
|
||||
except 'state' again. Writing 'update_schemes_stats' to the
|
||||
file updates contents of schemes stats files of the kdamond.
|
||||
Writing 'update_schemes_tried_regions' to the file updates
|
||||
contents of 'tried_regions' directory of every scheme directory
|
||||
of this kdamond. Writing 'clear_schemes_tried_regions' to the
|
||||
file removes contents of the 'tried_regions' directory.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/pid
|
||||
Date: Mar 2022
|
||||
@@ -283,3 +287,31 @@ Date: Mar 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading this file returns the number of the exceed events of
|
||||
the scheme's quotas.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/start
|
||||
Date: Oct 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading this file returns the start address of a memory region
|
||||
that corresponding DAMON-based Operation Scheme's action has
|
||||
tried to be applied.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/end
|
||||
Date: Oct 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading this file returns the end address of a memory region
|
||||
that corresponding DAMON-based Operation Scheme's action has
|
||||
tried to be applied.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/nr_accesses
|
||||
Date: Oct 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading this file returns the 'nr_accesses' of a memory region
|
||||
that corresponding DAMON-based Operation Scheme's action has
|
||||
tried to be applied.
|
||||
|
||||
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/tried_regions/<R>/age
|
||||
Date: Oct 2022
|
||||
Contact: SeongJae Park <sj@kernel.org>
|
||||
Description: Reading this file returns the 'age' of a memory region that
|
||||
corresponding DAMON-based Operation Scheme's action has tried
|
||||
to be applied.
|
||||
|
||||
@@ -348,8 +348,13 @@ this can be accomplished with::
|
||||
|
||||
echo huge_idle > /sys/block/zramX/writeback
|
||||
|
||||
If a user chooses to writeback only incompressible pages (pages that none of
|
||||
algorithms can compress) this can be accomplished with::
|
||||
|
||||
echo incompressible > /sys/block/zramX/writeback
|
||||
|
||||
If an admin wants to write a specific page in zram device to the backing device,
|
||||
they could write a page index into the interface.
|
||||
they could write a page index into the interface::
|
||||
|
||||
echo "page_index=1251" > /sys/block/zramX/writeback
|
||||
|
||||
@@ -401,6 +406,87 @@ budget in next setting is user's job.
|
||||
If admin wants to measure writeback count in a certain period, they could
|
||||
know it via /sys/block/zram0/bd_stat's 3rd column.
|
||||
|
||||
recompression
|
||||
-------------
|
||||
|
||||
With CONFIG_ZRAM_MULTI_COMP, zram can recompress pages using alternative
|
||||
(secondary) compression algorithms. The basic idea is that alternative
|
||||
compression algorithm can provide better compression ratio at a price of
|
||||
(potentially) slower compression/decompression speeds. Alternative compression
|
||||
algorithm can, for example, be more successful compressing huge pages (those
|
||||
that default algorithm failed to compress). Another application is idle pages
|
||||
recompression - pages that are cold and sit in the memory can be recompressed
|
||||
using more effective algorithm and, hence, reduce zsmalloc memory usage.
|
||||
|
||||
With CONFIG_ZRAM_MULTI_COMP, zram supports up to 4 compression algorithms:
|
||||
one primary and up to 3 secondary ones. Primary zram compressor is explained
|
||||
in "3) Select compression algorithm", secondary algorithms are configured
|
||||
using recomp_algorithm device attribute.
|
||||
|
||||
Example:::
|
||||
|
||||
#show supported recompression algorithms
|
||||
cat /sys/block/zramX/recomp_algorithm
|
||||
#1: lzo lzo-rle lz4 lz4hc [zstd]
|
||||
#2: lzo lzo-rle lz4 [lz4hc] zstd
|
||||
|
||||
Alternative compression algorithms are sorted by priority. In the example
|
||||
above, zstd is used as the first alternative algorithm, which has priority
|
||||
of 1, while lz4hc is configured as a compression algorithm with priority 2.
|
||||
Alternative compression algorithm's priority is provided during algorithms
|
||||
configuration:::
|
||||
|
||||
#select zstd recompression algorithm, priority 1
|
||||
echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm
|
||||
|
||||
#select deflate recompression algorithm, priority 2
|
||||
echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm
|
||||
|
||||
Another device attribute that CONFIG_ZRAM_MULTI_COMP enables is recompress,
|
||||
which controls recompression.
|
||||
|
||||
Examples:::
|
||||
|
||||
#IDLE pages recompression is activated by `idle` mode
|
||||
echo "type=idle" > /sys/block/zramX/recompress
|
||||
|
||||
#HUGE pages recompression is activated by `huge` mode
|
||||
echo "type=huge" > /sys/block/zram0/recompress
|
||||
|
||||
#HUGE_IDLE pages recompression is activated by `huge_idle` mode
|
||||
echo "type=huge_idle" > /sys/block/zramX/recompress
|
||||
|
||||
The number of idle pages can be significant, so user-space can pass a size
|
||||
threshold (in bytes) to the recompress knob: zram will recompress only pages
|
||||
of equal or greater size:::
|
||||
|
||||
#recompress all pages larger than 3000 bytes
|
||||
echo "threshold=3000" > /sys/block/zramX/recompress
|
||||
|
||||
#recompress idle pages larger than 2000 bytes
|
||||
echo "type=idle threshold=2000" > /sys/block/zramX/recompress
|
||||
|
||||
Recompression of idle pages requires memory tracking.
|
||||
|
||||
During re-compression for every page, that matches re-compression criteria,
|
||||
ZRAM iterates the list of registered alternative compression algorithms in
|
||||
order of their priorities. ZRAM stops either when re-compression was
|
||||
successful (re-compressed object is smaller in size than the original one)
|
||||
and matches re-compression criteria (e.g. size threshold) or when there are
|
||||
no secondary algorithms left to try. If none of the secondary algorithms can
|
||||
successfully re-compressed the page such a page is marked as incompressible,
|
||||
so ZRAM will not attempt to re-compress it in the future.
|
||||
|
||||
This re-compression behaviour, when it iterates through the list of
|
||||
registered compression algorithms, increases our chances of finding the
|
||||
algorithm that successfully compresses a particular page. Sometimes, however,
|
||||
it is convenient (and sometimes even necessary) to limit recompression to
|
||||
only one particular algorithm so that it will not try any other algorithms.
|
||||
This can be achieved by providing a algo=NAME parameter:::
|
||||
|
||||
#use zstd algorithm only (if registered)
|
||||
echo "type=huge algo=zstd" > /sys/block/zramX/recompress
|
||||
|
||||
memory tracking
|
||||
===============
|
||||
|
||||
@@ -411,9 +497,11 @@ pages of the process with*pagemap.
|
||||
If you enable the feature, you could see block state via
|
||||
/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
|
||||
|
||||
300 75.033841 .wh.
|
||||
301 63.806904 s...
|
||||
302 63.806919 ..hi
|
||||
300 75.033841 .wh...
|
||||
301 63.806904 s.....
|
||||
302 63.806919 ..hi..
|
||||
303 62.801919 ....r.
|
||||
304 146.781902 ..hi.n
|
||||
|
||||
First column
|
||||
zram's block index.
|
||||
@@ -430,6 +518,10 @@ Third column
|
||||
huge page
|
||||
i:
|
||||
idle page
|
||||
r:
|
||||
recompressed page (secondary compression algorithm)
|
||||
n:
|
||||
none (including secondary) of algorithms could compress it
|
||||
|
||||
First line of above example says 300th block is accessed at 75.033841sec
|
||||
and the block's state is huge so it is written back to the backing
|
||||
|
||||
@@ -543,7 +543,8 @@ inactive_anon # of bytes of anonymous and swap cache memory on inactive
|
||||
LRU list.
|
||||
active_anon # of bytes of anonymous and swap cache memory on active
|
||||
LRU list.
|
||||
inactive_file # of bytes of file-backed memory on inactive LRU list.
|
||||
inactive_file # of bytes of file-backed memory and MADV_FREE anonymous memory(
|
||||
LazyFree pages) on inactive LRU list.
|
||||
active_file # of bytes of file-backed memory on active LRU list.
|
||||
unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
|
||||
=============== ===============================================================
|
||||
|
||||
@@ -1245,17 +1245,13 @@ PAGE_SIZE multiple when read back.
|
||||
This is a simple interface to trigger memory reclaim in the
|
||||
target cgroup.
|
||||
|
||||
This file accepts a single key, the number of bytes to reclaim.
|
||||
No nested keys are currently supported.
|
||||
This file accepts a string which contains the number of bytes to
|
||||
reclaim.
|
||||
|
||||
Example::
|
||||
|
||||
echo "1G" > memory.reclaim
|
||||
|
||||
The interface can be later extended with nested keys to
|
||||
configure the reclaim behavior. For example, specify the
|
||||
type of memory to reclaim from (anon, file, ..).
|
||||
|
||||
Please note that the kernel can over or under reclaim from
|
||||
the target cgroup. If less bytes are reclaimed than the
|
||||
specified amount, -EAGAIN is returned.
|
||||
@@ -1267,6 +1263,13 @@ PAGE_SIZE multiple when read back.
|
||||
This means that the networking layer will not adapt based on
|
||||
reclaim induced by memory.reclaim.
|
||||
|
||||
This file also allows the user to specify the nodes to reclaim from,
|
||||
via the 'nodes=' key, for example::
|
||||
|
||||
echo "1G nodes=0,1" > memory.reclaim
|
||||
|
||||
The above instructs the kernel to reclaim memory from nodes 0,1.
|
||||
|
||||
memory.peak
|
||||
A read-only single value file which exists on non-root
|
||||
cgroups.
|
||||
@@ -1488,12 +1491,18 @@ PAGE_SIZE multiple when read back.
|
||||
pgscan_direct (npn)
|
||||
Amount of scanned pages directly (in an inactive LRU list)
|
||||
|
||||
pgscan_khugepaged (npn)
|
||||
Amount of scanned pages by khugepaged (in an inactive LRU list)
|
||||
|
||||
pgsteal_kswapd (npn)
|
||||
Amount of reclaimed pages by kswapd
|
||||
|
||||
pgsteal_direct (npn)
|
||||
Amount of reclaimed pages directly
|
||||
|
||||
pgsteal_khugepaged (npn)
|
||||
Amount of reclaimed pages by khugepaged
|
||||
|
||||
pgfault (npn)
|
||||
Total number of page faults incurred
|
||||
|
||||
|
||||
@@ -88,6 +88,9 @@ comma (","). ::
|
||||
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
|
||||
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
|
||||
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
|
||||
│ │ │ │ │ │ │ tried_regions/
|
||||
│ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
|
||||
│ │ │ │ │ │ │ │ ...
|
||||
│ │ │ │ │ │ ...
|
||||
│ │ │ │ ...
|
||||
│ │ ...
|
||||
@@ -125,7 +128,14 @@ in the state. Writing ``commit`` to the ``state`` file makes kdamond reads the
|
||||
user inputs in the sysfs files except ``state`` file again. Writing
|
||||
``update_schemes_stats`` to ``state`` file updates the contents of stats files
|
||||
for each DAMON-based operation scheme of the kdamond. For details of the
|
||||
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`.
|
||||
stats, please refer to :ref:`stats section <sysfs_schemes_stats>`. Writing
|
||||
``update_schemes_tried_regions`` to ``state`` file updates the DAMON-based
|
||||
operation scheme action tried regions directory for each DAMON-based operation
|
||||
scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state``
|
||||
file clears the DAMON-based operating scheme action tried regions directory for
|
||||
each DAMON-based operation scheme of the kdamond. For details of the
|
||||
DAMON-based operation scheme action tried regions directory, please refer to
|
||||
:ref:tried_regions section <sysfs_schemes_tried_regions>`.
|
||||
|
||||
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
|
||||
|
||||
@@ -166,6 +176,8 @@ You can set and get what type of monitoring operations DAMON will use for the
|
||||
context by writing one of the keywords listed in ``avail_operations`` file and
|
||||
reading from the ``operations`` file.
|
||||
|
||||
.. _sysfs_monitoring_attrs:
|
||||
|
||||
contexts/<N>/monitoring_attrs/
|
||||
------------------------------
|
||||
|
||||
@@ -235,6 +247,9 @@ In each region directory, you will find two files (``start`` and ``end``). You
|
||||
can set and get the start and end addresses of the initial monitoring target
|
||||
region by writing to and reading from the files, respectively.
|
||||
|
||||
Each region should not overlap with others. ``end`` of directory ``N`` should
|
||||
be equal or smaller than ``start`` of directory ``N+1``.
|
||||
|
||||
contexts/<N>/schemes/
|
||||
---------------------
|
||||
|
||||
@@ -252,8 +267,9 @@ to ``N-1``. Each directory represents each DAMON-based operation scheme.
|
||||
schemes/<N>/
|
||||
------------
|
||||
|
||||
In each scheme directory, four directories (``access_pattern``, ``quotas``,
|
||||
``watermarks``, and ``stats``) and one file (``action``) exist.
|
||||
In each scheme directory, five directories (``access_pattern``, ``quotas``,
|
||||
``watermarks``, ``stats``, and ``tried_regions``) and one file (``action``)
|
||||
exist.
|
||||
|
||||
The ``action`` file is for setting and getting what action you want to apply to
|
||||
memory regions having specific access pattern of the interest. The keywords
|
||||
@@ -348,6 +364,32 @@ should ask DAMON sysfs interface to updte the content of the files for the
|
||||
stats by writing a special keyword, ``update_schemes_stats`` to the relevant
|
||||
``kdamonds/<N>/state`` file.
|
||||
|
||||
.. _sysfs_schemes_tried_regions:
|
||||
|
||||
schemes/<N>/tried_regions/
|
||||
--------------------------
|
||||
|
||||
When a special keyword, ``update_schemes_tried_regions``, is written to the
|
||||
relevant ``kdamonds/<N>/state`` file, DAMON creates directories named integer
|
||||
starting from ``0`` under this directory. Each directory contains files
|
||||
exposing detailed information about each of the memory region that the
|
||||
corresponding scheme's ``action`` has tried to be applied under this directory,
|
||||
during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The
|
||||
information includes address range, ``nr_accesses``, , and ``age`` of the
|
||||
region.
|
||||
|
||||
The directories will be removed when another special keyword,
|
||||
``clear_schemes_tried_regions``, is written to the relevant
|
||||
``kdamonds/<N>/state`` file.
|
||||
|
||||
tried_regions/<N>/
|
||||
------------------
|
||||
|
||||
In each region directory, you will find four files (``start``, ``end``,
|
||||
``nr_accesses``, and ``age``). Reading the files will show the start and end
|
||||
addresses, ``nr_accesses``, and ``age`` of the region that corresponding
|
||||
DAMON-based operation scheme ``action`` has tried to be applied.
|
||||
|
||||
Example
|
||||
~~~~~~~
|
||||
|
||||
@@ -465,8 +507,9 @@ regions in case of physical memory monitoring. Therefore, users should set the
|
||||
monitoring target regions by themselves.
|
||||
|
||||
In such cases, users can explicitly set the initial monitoring target regions
|
||||
as they want, by writing proper values to the ``init_regions`` file. Each line
|
||||
of the input should represent one region in below form.::
|
||||
as they want, by writing proper values to the ``init_regions`` file. The input
|
||||
should be a sequence of three integers separated by white spaces that represent
|
||||
one region in below form.::
|
||||
|
||||
<target idx> <start address> <end address>
|
||||
|
||||
@@ -481,9 +524,9 @@ ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
|
||||
# cd <debugfs>/damon
|
||||
# cat target_ids
|
||||
42 4242
|
||||
# echo "0 1 100
|
||||
0 100 200
|
||||
1 20 40
|
||||
# echo "0 1 100 \
|
||||
0 100 200 \
|
||||
1 20 40 \
|
||||
1 50 100" > init_regions
|
||||
|
||||
Note that this sets the initial monitoring target regions only. In case of
|
||||
|
||||
@@ -428,14 +428,16 @@ with the memory region, as the case would be with BSS (uninitialized data).
|
||||
The "pathname" shows the name associated file for this mapping. If the mapping
|
||||
is not associated with a file:
|
||||
|
||||
============= ====================================
|
||||
=================== ===========================================
|
||||
[heap] the heap of the program
|
||||
[stack] the stack of the main process
|
||||
[vdso] the "virtual dynamic shared object",
|
||||
the kernel system call handler
|
||||
[anon:<name>] an anonymous mapping that has been
|
||||
[anon:<name>] a private anonymous mapping that has been
|
||||
named by userspace
|
||||
============= ====================================
|
||||
[anon_shmem:<name>] an anonymous shared memory mapping that has
|
||||
been named by userspace
|
||||
=================== ===========================================
|
||||
|
||||
or if empty, the mapping is anonymous.
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ PMD Page Table Helpers
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| pmd_trans_huge | Tests a Transparent Huge Page (THP) at PMD |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| pmd_present | Tests a valid mapped PMD |
|
||||
| pmd_present | Tests whether pmd_page() points to valid memory |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
| pmd_young | Tests a young PMD |
|
||||
+---------------------------+--------------------------------------------------+
|
||||
|
||||
@@ -117,31 +117,15 @@ pages:
|
||||
- ->_refcount in tail pages is always zero: get_page_unless_zero() never
|
||||
succeeds on tail pages.
|
||||
|
||||
- map/unmap of the pages with PTE entry increment/decrement ->_mapcount
|
||||
on relevant sub-page of the compound page.
|
||||
- map/unmap of PMD entry for the whole compound page increment/decrement
|
||||
->compound_mapcount, stored in the first tail page of the compound page;
|
||||
and also increment/decrement ->subpages_mapcount (also in the first tail)
|
||||
by COMPOUND_MAPPED when compound_mapcount goes from -1 to 0 or 0 to -1.
|
||||
|
||||
- map/unmap of the whole compound page is accounted for in compound_mapcount
|
||||
(stored in first tail page). For file huge pages, we also increment
|
||||
->_mapcount of all sub-pages in order to have race-free detection of
|
||||
last unmap of subpages.
|
||||
|
||||
PageDoubleMap() indicates that the page is *possibly* mapped with PTEs.
|
||||
|
||||
For anonymous pages, PageDoubleMap() also indicates ->_mapcount in all
|
||||
subpages is offset up by one. This additional reference is required to
|
||||
get race-free detection of unmap of subpages when we have them mapped with
|
||||
both PMDs and PTEs.
|
||||
|
||||
This optimization is required to lower the overhead of per-subpage mapcount
|
||||
tracking. The alternative is to alter ->_mapcount in all subpages on each
|
||||
map/unmap of the whole compound page.
|
||||
|
||||
For anonymous pages, we set PG_double_map when a PMD of the page is split
|
||||
for the first time, but still have a PMD mapping. The additional references
|
||||
go away with the last compound_mapcount.
|
||||
|
||||
File pages get PG_double_map set on the first map of the page with PTE and
|
||||
goes away when the page gets evicted from the page cache.
|
||||
- map/unmap of sub-pages with PTE entry increment/decrement ->_mapcount
|
||||
on relevant sub-page of the compound page, and also increment/decrement
|
||||
->subpages_mapcount, stored in first tail page of the compound page, when
|
||||
_mapcount goes from -1 to 0 or 0 to -1: counting sub-pages mapped by PTE.
|
||||
|
||||
split_huge_page internally has to distribute the refcounts in the head
|
||||
page to the tail pages before clearing all PG_head/tail bits from the page
|
||||
|
||||
12
MAINTAINERS
12
MAINTAINERS
@@ -13399,10 +13399,20 @@ F: include/linux/memory_hotplug.h
|
||||
F: include/linux/mm.h
|
||||
F: include/linux/mmzone.h
|
||||
F: include/linux/pagewalk.h
|
||||
F: include/linux/vmalloc.h
|
||||
F: mm/
|
||||
F: tools/testing/selftests/vm/
|
||||
|
||||
VMALLOC
|
||||
M: Andrew Morton <akpm@linux-foundation.org>
|
||||
R: Uladzislau Rezki <urezki@gmail.com>
|
||||
R: Christoph Hellwig <hch@infradead.org>
|
||||
L: linux-mm@kvack.org
|
||||
S: Maintained
|
||||
W: http://www.linux-mm.org
|
||||
T: git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
|
||||
F: include/linux/vmalloc.h
|
||||
F: mm/vmalloc.c
|
||||
|
||||
MEMORY HOT(UN)PLUG
|
||||
M: David Hildenbrand <david@redhat.com>
|
||||
M: Oscar Salvador <osalvador@suse.de>
|
||||
|
||||
@@ -313,8 +313,6 @@ extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
|
||||
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
|
||||
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
|
||||
|
||||
#define kern_addr_valid(addr) (1)
|
||||
|
||||
#define pte_ERROR(e) \
|
||||
printk("%s:%d: bad pte %016lx.\n", __FILE__, __LINE__, pte_val(e))
|
||||
#define pmd_ERROR(e) \
|
||||
|
||||
@@ -120,8 +120,6 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
|
||||
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
|
||||
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
|
||||
|
||||
#define kern_addr_valid(addr) (1)
|
||||
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
#include <asm/hugepage.h>
|
||||
#endif
|
||||
|
||||
@@ -21,8 +21,6 @@
|
||||
#define pgd_none(pgd) (0)
|
||||
#define pgd_bad(pgd) (0)
|
||||
#define pgd_clear(pgdp)
|
||||
#define kern_addr_valid(addr) (1)
|
||||
/* FIXME */
|
||||
/*
|
||||
* PMD_SHIFT determines the size of the area a second-level page table can map
|
||||
* PGDIR_SHIFT determines what a third-level page table entry can map
|
||||
|
||||
@@ -300,10 +300,6 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
|
||||
*/
|
||||
#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > __SWP_TYPE_BITS)
|
||||
|
||||
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
|
||||
/* FIXME: this is not correct */
|
||||
#define kern_addr_valid(addr) (1)
|
||||
|
||||
/*
|
||||
* We provide our own arch_get_unmapped_area to cope with VIPT caches.
|
||||
*/
|
||||
|
||||
@@ -1020,8 +1020,6 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
|
||||
*/
|
||||
#define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > __SWP_TYPE_BITS)
|
||||
|
||||
extern int kern_addr_valid(unsigned long addr);
|
||||
|
||||
#ifdef CONFIG_ARM64_MTE
|
||||
|
||||
#define __HAVE_ARCH_PREPARE_TO_SWAP
|
||||
|
||||
@@ -814,53 +814,6 @@ void __init paging_init(void)
|
||||
create_idmap();
|
||||
}
|
||||
|
||||
/*
|
||||
* Check whether a kernel address is valid (derived from arch/x86/).
|
||||
*/
|
||||
int kern_addr_valid(unsigned long addr)
|
||||
{
|
||||
pgd_t *pgdp;
|
||||
p4d_t *p4dp;
|
||||
pud_t *pudp, pud;
|
||||
pmd_t *pmdp, pmd;
|
||||
pte_t *ptep, pte;
|
||||
|
||||
addr = arch_kasan_reset_tag(addr);
|
||||
if ((((long)addr) >> VA_BITS) != -1UL)
|
||||
return 0;
|
||||
|
||||
pgdp = pgd_offset_k(addr);
|
||||
if (pgd_none(READ_ONCE(*pgdp)))
|
||||
return 0;
|
||||
|
||||
p4dp = p4d_offset(pgdp, addr);
|
||||
if (p4d_none(READ_ONCE(*p4dp)))
|
||||
return 0;
|
||||
|
||||
pudp = pud_offset(p4dp, addr);
|
||||
pud = READ_ONCE(*pudp);
|
||||
if (pud_none(pud))
|
||||
return 0;
|
||||
|
||||
if (pud_sect(pud))
|
||||
return pfn_valid(pud_pfn(pud));
|
||||
|
||||
pmdp = pmd_offset(pudp, addr);
|
||||
pmd = READ_ONCE(*pmdp);
|
||||
if (pmd_none(pmd))
|
||||
return 0;
|
||||
|
||||
if (pmd_sect(pmd))
|
||||
return pfn_valid(pmd_pfn(pmd));
|
||||
|
||||
ptep = pte_offset_kernel(pmdp, addr);
|
||||
pte = READ_ONCE(*ptep);
|
||||
if (pte_none(pte))
|
||||
return 0;
|
||||
|
||||
return pfn_valid(pte_pfn(pte));
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
static void free_hotplug_page_range(struct page *page, size_t size,
|
||||
struct vmem_altmap *altmap)
|
||||
@@ -1184,53 +1137,28 @@ static void free_empty_tables(unsigned long addr, unsigned long end,
|
||||
}
|
||||
#endif
|
||||
|
||||
void __meminit vmemmap_set_pmd(pmd_t *pmdp, void *p, int node,
|
||||
unsigned long addr, unsigned long next)
|
||||
{
|
||||
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
|
||||
}
|
||||
|
||||
int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
|
||||
unsigned long addr, unsigned long next)
|
||||
{
|
||||
vmemmap_verify((pte_t *)pmdp, node, addr, next);
|
||||
return 1;
|
||||
}
|
||||
|
||||
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
|
||||
struct vmem_altmap *altmap)
|
||||
{
|
||||
unsigned long addr = start;
|
||||
unsigned long next;
|
||||
pgd_t *pgdp;
|
||||
p4d_t *p4dp;
|
||||
pud_t *pudp;
|
||||
pmd_t *pmdp;
|
||||
|
||||
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
|
||||
|
||||
if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
|
||||
return vmemmap_populate_basepages(start, end, node, altmap);
|
||||
|
||||
do {
|
||||
next = pmd_addr_end(addr, end);
|
||||
|
||||
pgdp = vmemmap_pgd_populate(addr, node);
|
||||
if (!pgdp)
|
||||
return -ENOMEM;
|
||||
|
||||
p4dp = vmemmap_p4d_populate(pgdp, addr, node);
|
||||
if (!p4dp)
|
||||
return -ENOMEM;
|
||||
|
||||
pudp = vmemmap_pud_populate(p4dp, addr, node);
|
||||
if (!pudp)
|
||||
return -ENOMEM;
|
||||
|
||||
pmdp = pmd_offset(pudp, addr);
|
||||
if (pmd_none(READ_ONCE(*pmdp))) {
|
||||
void *p = NULL;
|
||||
|
||||
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
|
||||
if (!p) {
|
||||
if (vmemmap_populate_basepages(addr, next, node, altmap))
|
||||
return -ENOMEM;
|
||||
continue;
|
||||
}
|
||||
|
||||
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
|
||||
} else
|
||||
vmemmap_verify((pte_t *)pmdp, node, addr, next);
|
||||
} while (addr = next, addr != end);
|
||||
|
||||
return 0;
|
||||
else
|
||||
return vmemmap_populate_hugepages(start, end, node, altmap);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_MEMORY_HOTPLUG
|
||||
|
||||
@@ -202,8 +202,7 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
|
||||
|
||||
/*
|
||||
* This function is used to determine if a linear map page has been marked as
|
||||
* not-valid. Walk the page table and check the PTE_VALID bit. This is based
|
||||
* on kern_addr_valid(), which almost does what we need.
|
||||
* not-valid. Walk the page table and check the PTE_VALID bit.
|
||||
*
|
||||
* Because this is only called on the kernel linear map, p?d_sect() implies
|
||||
* p?d_present(). When debug_pagealloc is enabled, sections mappings are
|
||||
|
||||
@@ -249,9 +249,6 @@ extern void paging_init(void);
|
||||
void update_mmu_cache(struct vm_area_struct *vma, unsigned long address,
|
||||
pte_t *pte);
|
||||
|
||||
/* Needs to be defined here and not in linux/mm.h, as it is arch dependent */
|
||||
#define kern_addr_valid(addr) (1)
|
||||
|
||||
#define io_remap_pfn_range(vma, vaddr, pfn, size, prot) \
|
||||
remap_pfn_range(vma, vaddr, pfn, size, prot)
|
||||
|
||||
|
||||
@@ -131,13 +131,6 @@ static inline void clear_page(void *page)
|
||||
|
||||
#define page_to_virt(page) __va(page_to_phys(page))
|
||||
|
||||
/*
|
||||
* For port to Hexagon Virtual Machine, MAYBE we check for attempts
|
||||
* to reference reserved HVM space, but in any case, the VM will be
|
||||
* protected.
|
||||
*/
|
||||
#define kern_addr_valid(addr) (1)
|
||||
|
||||
#include <asm/mem-layout.h>
|
||||
#include <asm-generic/memory_model.h>
|
||||
/* XXX Todo: implement assembly-optimized version of getorder. */
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user