Pull memblock updates from Mike Rapoport:
- new memblock_estimated_nr_free_pages() helper to replace
totalram_pages() which is less accurate when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
- fixes for memblock tests
* tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
s390/mm: get estimated free pages by memblock api
kernel/fork.c: get estimated free pages by memblock api
mm/memblock: introduce a new helper memblock_estimated_nr_free_pages()
memblock test: fix implicit declaration of function 'strscpy'
memblock test: fix implicit declaration of function 'isspace'
memblock test: fix implicit declaration of function 'memparse'
memblock test: add the definition of __setup()
memblock test: fix implicit declaration of function 'virt_to_phys'
tools/testing: abstract two init.h into common include directory
memblock tests: include export.h in linkage.h as kernel dose
memblock tests: include memory_hotplug.h in mmzone.h as kernel dose
The dummy entry is introduced in the initial implementation of lmb in
commit 7c8c6b9776 ("powerpc: Merge lmb.c and make MM initialization
use it.").
As the comment says the empty dummy entry is to simplify the code.
/* Create a dummy zero size LMB which will get coalesced away later.
* This simplifies the lmb_add() code below...
*/
While current code is reimplemented by Tejun in commit 784656f9c6
("memblock: Reimplement memblock_add_region()"). This empty dummy entry
seems not benefit the code any more.
Let's remove it.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Paul Mackerras <paulus@ozlabs.org>
CC: Tejun Heo <tj@kernel.org>
CC: Mike Rapoport <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240405015821.13411-1-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Building memblock tests produces the following warning:
cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c
In file included from tests/common.h:9,
from tests/basic_api.h:5,
from main.c:2:
./linux/memblock.h:601:50: warning: ‘struct seq_file’ declared inside parameter list will not be visible outside of this definition or declaration
601 | static inline void memtest_report_meminfo(struct seq_file *m) { }
| ^~~~~~~~
Add declaration of 'struct seq_file' to tools/include/linux/seq_file.h
to fix it.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
This patch fix the follow errors.
commit 61167ad5fe ("mm: pass nid to reserve_bootmem_region()") pass nid
parameter to reserve_bootmem_region(),
$ make -C tools/testing/memblock/
...
memblock.c: In function ‘memmap_init_reserved_pages’:
memblock.c:2111:25: error: too many arguments to function ‘reserve_bootmem_region’
2111 | reserve_bootmem_region(start, end, nid);
| ^~~~~~~~~~~~~~~~~~~~~~
../../include/linux/mm.h:32:6: note: declared here
32 | void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
| ^~~~~~~~~~~~~~~~~~~~~~
memblock.c:2122:17: error: too many arguments to function ‘reserve_bootmem_region’
2122 | reserve_bootmem_region(start, end, nid);
| ^~~~~~~~~~~~~~~~~~~~~~
commit dcdfdd40fa ("mm: Add support for unaccepted memory") call
accept_memory() in memblock.c
$ make -C tools/testing/memblock/
...
cc -fsanitize=address -fsanitize=undefined main.o memblock.o \
lib/slab.o mmzone.o slab.o tests/alloc_nid_api.o \
tests/alloc_helpers_api.o tests/alloc_api.o tests/basic_api.o \
tests/common.o tests/alloc_exact_nid_api.o -o main
/usr/bin/ld: memblock.o: in function `memblock_alloc_range_nid':
memblock.c:(.text+0x7ae4): undefined reference to `accept_memory'
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Fixes: dcdfdd40fa ("mm: Add support for unaccepted memory")
Fixes: 61167ad5fe ("mm: pass nid to reserve_bootmem_region()")
Link: https://lore.kernel.org/r/tencent_6F19BC082167F15DF2A8D8BEFE8EF220F60A@qq.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
This test is aimed at verifying the memblock_alloc_node() to work as
expected, so setting the correct NUMA node for the new allocated
region. The memblock_alloc_node() is called directly without using any
stub. The core check is between the requested NUMA node and the `nid`
field inside the memblock_region structure. These two are supposed to
be equal for the test to succeed.
Signed-off-by: Claudio Migliorelli <claudio.migliorelli@mail.polimi.it>
Link: https://lore.kernel.org/r/ea5e938e-6b74-b188-af59-4b94b18bc0@mail.polimi.it
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
This reverts commit 115d9d77bb.
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying to
coalesce buddies. This can, for example, trigger this BUG:
BUG: unable to handle page fault for address: ffffe964c02580c8
RIP: 0010:__list_del_entry_valid+0x3f/0x70
<TASK>
__free_one_page+0x139/0x410
__free_pages_ok+0x21d/0x450
memblock_free_late+0x8c/0xb9
efi_free_boot_services+0x16b/0x25c
efi_enter_virtual_mode+0x403/0x446
start_kernel+0x678/0x714
secondary_startup_64_no_verify+0xd2/0xdb
</TASK>
A proper fix will be more involved so revert this change for the time
being.
Fixes: 115d9d77bb ("mm: Always release pages to the buddy allocator in memblock_free_late().")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/20230207082151.1303-1-dev@aaront.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages()
only releases pages to the buddy allocator if they are not in the
deferred range. This is correct for free pages (as defined by
for_each_free_mem_pfn_range_in_zone()) because free pages in the
deferred range will be initialized and released as part of the deferred
init process. memblock_free_pages() is called by memblock_free_late(),
which is used to free reserved ranges after memblock_free_all() has
run. All pages in reserved ranges have been initialized at that point,
and accordingly, those pages are not touched by the deferred init
process. This means that currently, if the pages that
memblock_free_late() intends to release are in the deferred range, they
will never be released to the buddy allocator. They will forever be
reserved.
In addition, memblock_free_pages() calls kmsan_memblock_free_pages(),
which is also correct for free pages but is not correct for reserved
pages. KMSAN metadata for reserved pages is initialized by
kmsan_init_shadow(), which runs shortly before memblock_free_all().
For both of these reasons, memblock_free_pages() should only be called
for free pages, and memblock_free_late() should call __free_pages_core()
directly instead.
One case where this issue can occur in the wild is EFI boot on
x86_64. The x86 EFI code reserves all EFI boot services memory ranges
via memblock_reserve() and frees them later via memblock_free_late()
(efi_reserve_boot_services() and efi_free_boot_services(),
respectively). If any of those ranges happens to fall within the
deferred init range, the pages will not be released and that memory will
be unavailable.
For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI:
v6.2-rc2:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 178867
v6.2-rc2 + patch:
# grep -E 'Node|spanned|present|managed' /proc/zoneinfo
Node 0, zone DMA
spanned 4095
present 3999
managed 3840
Node 0, zone DMA32
spanned 246652
present 245868
managed 222816 # +43,949 pages
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/01010185892de53e-e379acfb-7044-4b24-b30a-e2657c1ba989-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Commit cf4694be2b ("tools: Add atomic_test_and_set_bit()") changed
tools/arch/x86/include/asm/atomic.h to include <asm/asm.h>, which causes
'make -C tools/testing/memblock' to fail with:
In file included from ../../include/asm/atomic.h:6,
from ../../include/linux/atomic.h:5,
from ./linux/mmzone.h:5,
from ../../include/linux/mm.h:5,
from ../../include/linux/pfn.h:5,
from ./linux/memory_hotplug.h:6,
from ./linux/init.h:7,
from ./linux/memblock.h:11,
from tests/common.h:8,
from tests/basic_api.h:5,
from main.c:2:
../../include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/asm.h: No such file or directory
11 | #include <asm/asm.h>
| ^~~~~~~~~~~
Create a symlink to asm/asm.h in the same manner as the existing one to
asm/cmpxchg.h.
Signed-off-by: Aaron Thompson <dev@aaront.org>
Link: https://lore.kernel.org/r/010101857c402765-96e2dbc6-b82b-47e2-a437-4834dbe0b96b-000000@us-west-2.amazonses.com
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all but one of
these tests set nid != NUMA_NO_NODE. All tests are run for both top-down
and bottom-up allocation directions.
The tested scenarios are:
Range unrestricted:
- region cannot be allocated:
+ there are no previously reserved regions, but requested node is
too small
+ the requested node is fully reserved
+ the requested node is partially reserved and does not have
enough space
+ none of the nodes have enough memory to allocate the region
Range restricted:
- region can be allocated in the specific node requested without
dropping min_addr:
+ the range fully overlaps with the node, and there are adjacent
reserved regions
- region cannot be allocated:
+ range partially overlaps with two different nodes, where the
second node is the requested node
+ range overlaps with multiple nodes along node boundaries, and
the requested node starts after max_addr
+ nid is set to NUMA_NO_NODE and the total range can fit the
region, but the range is split between two nodes and everything
else is reserved
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/51b14da46e6591428df3aefc5acc7dca9341a541.1667802195.git.remckee0@gmail.com
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all of these
tests set nid != NUMA_NO_NODE. These tests are run with a bottom-up
allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the
first node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and
the requested node ends before min_addr
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/935f0eed5e06fd44dc67d9f49b277923d7896bd3.1667802195.git.remckee0@gmail.com
Add tests for memblock_alloc_exact_nid_raw() where the simulated physical
memory is set up with multiple NUMA nodes. Additionally, all of these
tests set nid != NUMA_NO_NODE. These tests are run with a top-down
allocation direction.
The tested scenarios are:
Range unrestricted:
- region can be allocated in the specific node requested:
+ there are no previously reserved regions
+ the requested node is partially reserved but has enough space
Range restricted:
- region can be allocated in the specific node requested after dropping
min_addr:
+ range partially overlaps with two different nodes, where the
first node is the requested node
+ range partially overlaps with two different nodes, where the
requested node ends before min_addr
+ range overlaps with multiple nodes along node boundaries, and
the requested node ends before min_addr
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/2cc0883243d68ddc3faf833d2d9e86f48534c1d7.1667802195.git.remckee0@gmail.com
Add TEST_F_EXACT flag, which specifies that tests should run
memblock_alloc_exact_nid_raw(). Introduce range tests for
memblock_alloc_exact_nid_raw() by using the TEST_F_EXACT flag to run the
range tests in alloc_nid_api.c, since memblock_alloc_exact_nid_raw() and
memblock_alloc_try_nid_raw() behave the same way when nid = NUMA_NO_NODE.
Rename tests and other functions in alloc_nid_api.c by removing "_try".
Since the test names will be displayed in verbose output, they need to
be general enough to refer to any of the memblock functions that the
tests may run.
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Rebecca Mckeever <remckee0@gmail.com>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Link: https://lore.kernel.org/r/5a4b6d1b6130ab7375314e1c45a6d5813dfdabbd.1667802195.git.remckee0@gmail.com