memblock_reserve() would add a new range to memblock.reserved in case
the new range is not totally covered by any of the current
memblock.reserved range. If the memblock.reserved is full and can't
resize, memblock_reserve() would fail.
This doesn't happen in real world now, I observed this during code
review. While theoretically, it has the chance to happen. And if it
happens, others would think this range of memory is still available and
may corrupt the memory.
This patch checks the return value and goto "done" after it succeeds.
Link: http://lkml.kernel.org/r/1482363033-24754-3-git-send-email-richard.weiyang@gmail.com
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When using a sparse memory model memmap_init_zone() when invoked with
the MEMMAP_EARLY context will skip over pages which aren't valid - ie.
which aren't in a populated region of the sparse memory map. However if
the memory map is extremely sparse then it can spend a long time
linearly checking each PFN in a large non-populated region of the memory
map & skipping it in turn.
When CONFIG_HAVE_MEMBLOCK_NODE_MAP is enabled, we have sufficient
information to quickly discover the next valid PFN given an invalid one
by searching through the list of memory regions & skipping forwards to
the first PFN covered by the memory region to the right of the
non-populated region. Implement this in order to speed up
memmap_init_zone() for systems with extremely sparse memory maps.
James said "I have tested this patch on a virtual model of a Samurai CPU
with a sparse memory map. The kernel boot time drops from 109 to
62 seconds. "
Link: http://lkml.kernel.org/r/20161125185518.29885-1-paul.burton@imgtec.com
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Tested-by: James Hartley <james.hartley@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some of the kmemleak_*() callbacks in memblock, bootmem, CMA convert a
physical address to a virtual one using __va(). However, such physical
addresses may sometimes be located in highmem and using __va() is
incorrect, leading to inconsistent object tracking in kmemleak.
The following functions have been added to the kmemleak API and they take
a physical address as the object pointer. They only perform the
corresponding action if the address has a lowmem mapping:
kmemleak_alloc_phys
kmemleak_free_part_phys
kmemleak_not_leak_phys
kmemleak_ignore_phys
The affected calling places have been updated to use the new kmemleak
API.
Link: http://lkml.kernel.org/r/1471531432-16503-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh R <vigneshr@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It causes NULL dereference error and failure to get type_a->regions[0]
info if parameter type_b of __next_mem_range_rev() == NULL
Fix this by checking before dereferring and initializing idx_b to 0
The approach is tested by dumping all types of region via
__memblock_dump_all() and __next_mem_range_rev() fixed to UART
separately the result is okay after checking the logs.
Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com
Signed-off-by: zijun_hu <zijun_hu@htc.com>
Tested-by: zijun_hu <zijun_hu@htc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comparing an u64 variable to >= 0 returns always true and can therefore
be removed. This issue was detected using the -Wtype-limits gcc flag.
This patch fixes following type-limits warning:
mm/memblock.c: In function `__next_reserved_mem_region':
mm/memblock.c:843:11: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
if (*idx >= 0 && *idx < type->cnt) {
Link: http://lkml.kernel.org/r/20160510103625.3a7f8f32@g0hl1n.net
Signed-off-by: Richard Leitner <dev@g0hl1n.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memblock_add_region() and memblock_reserve_region() do nothing specific
before the call of memblock_add_range(), only print debug output.
We can do the same in memblock_add() and memblock_reserve() since both
memblock_add_region() and memblock_reserve_region() are not used by
anybody outside of memblock.c and memblock_{add,reserve}() have the same
set of flags and nids.
Since memblock_add_region() and memblock_reserve_region() will be
inlined, there will not be functional changes, but will improve code
readability a little.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel style prefers a single string over split strings when the string is
'user-visible'.
Miscellanea:
- Add a missing newline
- Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org> [percpu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We define struct memblock_type *type in the memblock_add_region() and
memblock_reserve_region() functions only for passing it to the
memlock_add_range() and memblock_reserve_range() functions. Let's
remove these variables and will pass a type directly.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At the moment memblock_phys_mem_size() is marked as __init, and so is
discarded after boot. This is different from most of the memblock
functions which are marked __init_memblock, and are only discarded after
boot if memory hotplug is not configured.
To allow for upcoming code which will need memblock_phys_mem_size() in
the hotplug path, change it from __init to __init_memblock.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We already have the for_each_memblock() macro in <linux/memblock.h>
which provides ability to iterate over memblock regions of a known type.
The for_each_memblock() macro allows us to pass the pointer to the
struct memblock_type, instead we need to pass name of the type.
This patch introduces a new macro for_each_memblock_type() which allows
us iterate over memblock regions with the given type when the type is
unknown.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove rgnbase and rgnsize variables from memblock_overlaps_region().
We use these variables only for passing to the memblock_addrs_overlap()
function and that's all. Let's remove them.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make memblock_is_memory() and memblock_is_reserved return bool to
improve readability due to these particular functions only using either
one or zero as their return value.
No functional change.
Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces the MEMBLOCK_NOMAP attribute and the required plumbing
to make it usable as an indicator that some parts of normal memory
should not be covered by the kernel direct mapping. It is up to the
arch to actually honor the attribute when laying out this mapping,
but the memblock code itself is modified to disregard these regions
for allocations and other general use.
Cc: linux-mm@kvack.org
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit e3239ff92a ("memblock: Rename memblock_region to
memblock_type and memblock_property to memblock_region"), all local
variables of the membock_type type were renamed to 'type'. This commit
renames all remaining local variables with the memblock_type type to the
same view.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When parsing SRAT, all memory ranges are added into numa_meminfo. In
numa_init(), before entering numa_cleanup_meminfo(), all possible memory
ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all
ranges over max_pfn or empty.
But, this only works if the nodes are continuous. Let's have a look at
the following example:
We have an SRAT like this:
SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
On boot, only node 0,1,2,3 exist.
And the numa_meminfo will look like this:
numa_meminfo.nr_blks = 9
1. on node 0: [0, 60000000]
2. on node 0: [100000000, 20000000000]
3. on node 1: [20000000000, 40000000000]
4. on node 4: [40000000000, 60000000000]
5. on node 5: [60000000000, 80000000000]
6. on node 2: [80000000000, a0000000000]
7. on node 3: [a0000000000, a0800000000]
8. on node 6: [c0000000000, a0800000000]
9. on node 7: [e0000000000, a0800000000]
And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
end address is over max_pfn, which is a0800000000. But 4 and 5 are not
removed because their end addresses are less then max_pfn. But in fact,
node 4 and 5 don't exist.
In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
Since memory ranges in node 4 and 5 are in numa_meminfo, in
numa_register_memblks(), node 4 and 5 will be mistakenly set to online.
If you run lscpu, it will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node2 CPU(s):
NUMA node3 CPU(s):
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220
In this patch, we use memblock_overlaps_region() to check if ranges in
numa_meminfo overlap with ranges in memory_block. Since memory_block
contains all available memory at boot time, if they overlap, it means the
ranges exist. If not, then remove them from numa_meminfo.
After this patch, lscpu will show:
NUMA node0 CPU(s): 0-14,128-142
NUMA node1 CPU(s): 15-29,143-157
NUMA node4 CPU(s): 62-76,190-204
NUMA node5 CPU(s): 78-92,206-220
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>