Commit 248ac0e194 ("mm/vmalloc: remove guard page from between vmap
blocks") had the side effect of making vmap_area.va_end member point to
the next vmap_area.va_start. This was creating an artificial reference
to vmalloc'ed objects and kmemleak was rarely reporting vmalloc() leaks.
This patch marks the vmap_area containing pointers explicitly and
reduces the min ref_count to 2 as vm_struct still contains a reference
to the vmalloc'ed object. The kmemleak add_scan_area() function has
been improved to allow a SIZE_MAX argument covering the rest of the
object (for simpler calling sites).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race window between vmap_area tear down and show vmap_area
information.
A B
remove_vm_area
spin_lock(&vmap_area_lock);
va->vm = NULL;
va->flags &= ~VM_VM_AREA;
spin_unlock(&vmap_area_lock);
spin_lock(&vmap_area_lock);
if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEZING))
return 0;
if (!(va->flags & VM_VM_AREA)) {
seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
(void *)va->va_start, (void *)va->va_end,
va->va_end - va->va_start);
return 0;
}
free_unmap_vmap_area(va);
flush_cache_vunmap
free_unmap_vmap_area_noflush
unmap_vmap_area
free_vmap_area_noflush
va->flags |= VM_LAZY_FREE
The assumption !VM_VM_AREA represents vm_map_ram allocation is
introduced by d4033afdf8 ("mm, vmalloc: iterate vmap_area_list,
instead of vmlist, in vmallocinfo()").
However, !VM_VM_AREA also represents vmap_area is being tear down in
race window mentioned above. This patch fix it by don't dump any
information for !VM_VM_AREA case and also remove (VM_LAZY_FREE |
VM_LAZY_FREEING) check since they are not possible for !VM_VM_AREA case.
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When searching a vmap area in the vmalloc space, we use (addr + size -
1) to check if the value is less than addr, which is an overflow. But
we assign (addr + size) to vmap_area->va_end.
So if we come across the below case:
(addr + size - 1) : not overflow
(addr + size) : overflow
we will assign an overflow value (e.g 0) to vmap_area->va_end, And this
will trigger BUG in __insert_vmap_area, causing system panic.
So using (addr + size) to check the overflow should be the correct
behaviour, not (addr + size - 1).
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Reported-by: Ghennadi Procopciuc <unix140@gmail.com>
Tested-by: Daniel Baluta <dbaluta@ixiacom.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should check the VM_UNITIALIZED flag in s_show(). If this flag is
set, that said, the vm_struct is not fully initialized. So it is
unnecessary to try to show the information contained in vm_struct.
We checked this flag in show_numa_info(), but I think it's better to
check it earlier.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VM_UNLIST was used to indicate that the vm_struct is not listed in
vmlist.
But after commit 4341fa4547 ("mm, vmalloc: remove list management of
vmlist after initializing vmalloc"), the meaning of this flag changed.
It now means the vm_struct is not fully initialized. So renaming it to
VM_UNINITIALIZED seems more reasonable.
Also change clear_vm_unlist to clear_vm_uninitialized_flag.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use goto to jump to the fail label to give a failure message before
returning NULL. This makes the failure handling in this function
consistent.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Space in a vmap block that was once allocated is considered dirty and
not made available for allocation again before the whole block is
recycled. The result is that free space within a vmap block is always
contiguous.
So if a vmap block has enough free space for allocation, the allocation
is impossible to fail. Thus, the fragmented block purging was never
invoked from vb_alloc(). So remove this dead code.
[ Same patches also sent by:
Chanho Min <chanho.min@lge.com>
Johannes Weiner <hannes@cmpxchg.org>
but git doesn't do "multiple authors" ]
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now for insert_vmalloc_vm, it only calls the two functions:
- setup_vmalloc_vm: fill vm_struct and vmap_area instances
- clear_vm_unlist: clear VM_UNLIST bit in vm_struct->flags
So in __get_vm_area_node(), if VM_UNLIST bit unset in flags, that is the
else branch here, we don't need to clear VM_UNLIST bit for vm->flags since
this bit is obviously not set. That is to say, we could only call
setup_vmalloc_vm instead of insert_vmalloc_vm here. And then we could
even remove the if test here.
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want to allocate ELF note segment buffer on the 2nd kernel in vmalloc
space and remap it to user-space in order to reduce the risk that memory
allocation fails on system with huge number of CPUs and so with huge ELF
note segment that exceeds 11-order block size.
Although there's already remap_vmalloc_range for the purpose of
remapping vmalloc memory to user-space, we need to specify user-space
range via vma.
Mmap on /proc/vmcore needs to remap range across multiple objects, so
the interface that requires vma to cover full range is problematic.
This patch introduces remap_vmalloc_range_partial that receives user-space
range as a pair of base address and size and can be used for mmap on
/proc/vmcore case.
remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
[akpm@linux-foundation.org: use PAGE_ALIGNED()]
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, __find_vmap_area searches for the kernel VM area starting at
a given address. This patch changes this behavior so that it searches
for the kernel VM area to which the address belongs. This change is
needed by remap_vmalloc_range_partial to be introduced in later patch
that receives any position of kernel VM area as target address.
This patch changes the condition (addr > va->va_start) to the equivalent
(addr >= va->va_end) by taking advantage of the fact that each kernel VM
area is non-overlapping.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>