Reincarnate get_mapsize as bootmap_bytes and implement
bootmem_bootmap_pages on top of it.
Adjust users of these helpers and make free_all_bootmem_core use
bootmem_bootmap_pages instead of open-coding it.
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce the bootmem_debug kernel parameter that enables very verbose
diagnostics regarding all range operations of bootmem as well as the
initialization and release of nodes.
[akpm@linux-foundation.org: fix printk warnings]
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Straight forward variant of the existing __alloc_bootmem_node, only
subsequent patch when allocating giant hugepages at boot -- don't want to
panic if we can't allocate as many as the user asked for.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a number of different views to how much memory is currently active.
There is the arch-independent zone-sizing view, the bootmem allocator and
memory models view.
Architectures register this information at different times and is not
necessarily in sync particularly with respect to some SPARSEMEM limitations.
This patch introduces mminit_validate_memmodel_limits() which is able to
validate and correct PFN ranges with respect to the memory model. It is only
SPARSEMEM that currently validates itself.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch changes the function reserve_bootmem_node() from void to int,
returning -ENOMEM if the allocation fails.
This fixes a build problem on x86 with CONFIG_KEXEC=y and
CONFIG_NEED_MULTIPLE_NODES=y
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch set is to free pages which is allocated by bootmem for
memory-hotremove. Some structures of memory management are allocated by
bootmem. ex) memmap, etc.
To remove memory physically, some of them must be freed according to
circumstance. This patch set makes basis to free those pages, and free
memmaps.
Basic my idea is using remain members of struct page to remember information
of users of bootmem (section number or node id). When the section is
removing, kernel can confirm it. By this information, some issues can be
solved.
1) When the memmap of removing section is allocated on other
section by bootmem, it should/can be free.
2) When the memmap of removing section is allocated on the
same section, it shouldn't be freed. Because the section has to be
logical memory offlined already and all pages must be isolated against
page allocater. If it is freed, page allocator may use it which will
be removed physically soon.
3) When removing section has other section's memmap,
kernel will be able to show easily which section should be removed
before it for user. (Not implemented yet)
4) When the above case 2), the page isolation will be able to check and skip
memmap's page when logical memory offline (offline_pages()).
Current page isolation code fails in this case because this page is
just reserved page and it can't distinguish this pages can be
removed or not. But, it will be able to do by this patch.
(Not implemented yet.)
5) The node information like pgdat has similar issues. But, this
will be able to be solved too by this.
(Not implemented yet, but, remembering node id in the pages.)
Fortunately, current bootmem allocator just keeps PageReserved flags,
and doesn't use any other members of page struct. The users of
bootmem doesn't use them too.
This patch:
This is to register information which is node or section's id. Kernel can
distinguish which node/section uses the pages allcated by bootmem. This is
basis for hot-remove sections or nodes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
split reserve_bootmem_core() into two functions, one which checks
conflicts, and one which sets the bits.
and make reserve_bootmem to loop bdata_list to cross the nodes.
user could be crashkernel and ramdisk..., in case the range provided
by those externalities crosses the nodes.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
need offset alignment when node_boot_start's alignment is less than
the alignment required.
use local node_boot_start to match alignment - so don't add extra operation
in search loop.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the nodes other than node 0 use bdata->last_success for fast
search too.
We need to use __alloc_bootmem_core() for vmemmap allocation for other
nodes when numa and sparsemem/vmemmap are enabled.
Also, make fail_block path increase i with incr only after ALIGN
to avoid extra increase when size is larger than align.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With numa enabled, some callers could have a range of memory on one node
but try to free that on other node. This can cause some pages to be
freed wrongly.
For example: when we try to allocate 128g boot ram early for
gart/swiotlb, and free that range later so gart/swiotlb can get some
range afterwards.
With this patch, we don't need to care which node holds the range, just
loop to call free_bootmem_node for all online nodes.
This patch makes free_bootmem_core() more robust by trimming the sidx
and eidx according the ram range that the node has.
And make the free_bootmem_core handle this out of range case. We could
use bdata_list to make sure the range can be freed for sure. So next
time, we don't need to loop online nodes and could use free_bootmem
directly.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: <linux-arch@vger.kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In time for 2.6.20, we can get rid of this junk.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When booting a NUMA system with nodes that have no memory (eg by limiting
memory), bootmem_alloc_core tried to find pages in an uninitialized
bootmem_map. This caused a null pointer access. This fix adds a check, so
that NULL is returned. That will enable the caller (bootmem_alloc_nopanic)
to alloc memory on other without a panic.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce ARCH_LOW_ADDRESS_LIMIT which can be set per architecture to
override the 4GB default limit used by the bootmem allocater within
__alloc_bootmem_low() and __alloc_bootmem_low_node(). E.g. s390 needs a
2GB limit instead of 4GB.
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It fixes various coding style issues, specially when spaces are useless. For
example '*' go next to the function name.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It also creates get_mapsize() helper in order to make the code more readable
when it calculates the boot bitmap size.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>