A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If someone calls vb_alloc() (or vm_map_ram() for that matter) to allocate
0 bytes (0 pages), get_order() returns BITS_PER_LONG - PAGE_CACHE_SHIFT
and interesting stuff happens. So make debugging such problems easier and
warn about 0-size allocation.
[akpm@linux-foundation.org: use WARN_ON-return-value feature]
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a walk by repeating rb_next to find a suitable hole. Could be
simply replaced by walk on the sorted vmap_area_list. More simpler and
efficient.
Mutation of the list and tree only happens in pair within
__insert_vmap_area and __free_vmap_area, under protection of
vmap_area_lock. The patch code is also under vmap_area_lock, so the list
walk is safe, and consistent with the tree walk.
Tested on SMP by repeating batch of vmalloc anf vfree for random sizes and
rounds for hours.
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull DMA-mapping updates from Marek Szyprowski:
"Those patches are continuation of my earlier work.
They contains extensions to DMA-mapping framework to remove limitation
of the current ARM implementation (like limited total size of DMA
coherent/write combine buffers), improve performance of buffer sharing
between devices (attributes to skip cpu cache operations or creation
of additional kernel mapping for some specific use cases) as well as
some unification of the common code for dma_mmap_attrs() and
dma_mmap_coherent() functions. All extensions have been implemented
and tested for ARM architecture."
* 'for-linus-for-3.6-rc1' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: dma-mapping: add support for DMA_ATTR_SKIP_CPU_SYNC attribute
common: DMA-mapping: add DMA_ATTR_SKIP_CPU_SYNC attribute
ARM: dma-mapping: add support for dma_get_sgtable()
common: dma-mapping: introduce dma_get_sgtable() function
ARM: dma-mapping: add support for DMA_ATTR_NO_KERNEL_MAPPING attribute
common: DMA-mapping: add DMA_ATTR_NO_KERNEL_MAPPING attribute
common: dma-mapping: add support for generic dma_mmap_* calls
ARM: dma-mapping: fix error path for memory allocation failure
ARM: dma-mapping: add more sanity checks in arm_dma_mmap()
ARM: dma-mapping: remove custom consistent dma region
mm: vmalloc: use const void * for caller argument
scatterlist: add sg_alloc_table_from_pages function
This patch changes dma-mapping subsystem to use generic vmalloc areas
for all consistent dma allocations. This increases the total size limit
of the consistent allocations and removes platform hacks and a lot of
duplicated code.
Atomic allocations are served from special pool preallocated on boot,
because vmalloc areas cannot be reliably created in atomic context.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
'const void *' is a safer type for caller function type. This patch
updates all references to caller function type.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
The transfer of ->flags causes some of the static mapping virtual
addresses to be prematurely freed (before the mapping is removed) because
VM_LAZY_FREE gets "set" if tmp->flags has VM_IOREMAP set. This might
cause subsequent vmalloc/ioremap calls to fail because it might allocate
one of the freed virtual address ranges that aren't unmapped.
va->flags has different types of flags from tmp->flags. If a region with
VM_IOREMAP set is registered with vm_area_add_early(), it will be removed
by __purge_vmap_area_lazy().
Fix vmalloc_init() to correctly initialize vmap_area for the given
vm_struct.
Also initialise va->vm. If it is not set, find_vm_area() for the early
vm regions will always fail.
Signed-off-by: KyongHo Cho <pullip.cho@samsung.com>
Cc: "Olav Haugan" <ohaugan@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The advantage of kcalloc is, that will prevent integer overflows which
could result from the multiplication of number of elements and size and
it is also a bit nicer to read.
The semantic patch that makes this change is available in
https://lkml.org/lkml/2011/11/25/107
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If either of the vas or vms arrays are not properly kzalloced, then the
code jumps to the err_free label.
The err_free label runs a loop to check and free each of the array members
of the vas and vms arrays which is not required for this situation as none
of the array members have been allocated till this point.
Eliminate the extra loop we have to go through by introducing a new label
err_free2 and then jumping to it.
[akpm@linux-foundation.org: remove now-unneeded tests]
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vmap_area->private is void* but we don't use the field for various purpose
but use only for vm_struct. So change it to a vm_struct* with naming to
improve for readability and type checking.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised. Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area. In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in
get_vmalloc_info().
This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range. It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.
Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.1.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The existing vm_area_register_early() allows for early vmalloc space
allocation. However upcoming cleanups in the ARM architecture require
that some fixed locations in the vmalloc area be reserved also very early.
The name "vm_area_register_early" would have been a good name for the
reservation part without the allocation. Since it is already in use with
different semantics, let's create vm_area_add_early() instead.
Both vm_area_register_early() and vm_area_add_early() can be used together
meaning that the former is now implemented using the later where it is
ensured that no conflicting areas are added, but no attempt is made to
make the allocation scheme in vm_area_register_early() more sophisticated.
After all, you must know what you're doing when using those functions.
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
When mapping a foreign page with xenbus_map_ring_valloc() with the
GNTTABOP_map_grant_ref hypercall, set the GNTMAP_contains_pte flag and
pass a pointer to the PTE (in init_mm).
After the page is mapped, the usual fault mechanism can be used to
update additional MMs. This allows the vmalloc_sync_all() to be
removed from alloc_vm_area().
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
[v1: Squashed fix by Michal for no-mmu case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Some vmalloc failure paths do not report OOM conditions.
Add warn_alloc_failed, which also does a dump_stack, to those failure
paths.
This allows more site specific vmalloc failure logging message printks to
be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add __attribute__((format (printf...) to the function to validate format
and arguments. Use vsprintf extension %pV to avoid any possible message
interleaving. Coalesce format string. Convert printks/pr_warning to
pr_warn.
[akpm@linux-foundation.org: use the __printf() macro]
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The /proc/vmallocinfo shows information about vmalloc allocations in
vmlist that is a linklist of vm_struct. It, however, may access pages
field of vm_struct where a page was not allocated. This results in a null
pointer access and leads to a kernel panic.
Why this happens: In __vmalloc_node_range() called from vmalloc(), newly
allocated vm_struct is added to vmlist at __get_vm_area_node() and then,
some fields of vm_struct such as nr_pages and pages are set at
__vmalloc_area_node(). In other words, it is added to vmlist before it is
fully initialized. At the same time, when the /proc/vmallocinfo is read,
it accesses the pages field of vm_struct according to the nr_pages field
at show_numa_info(). Thus, a null pointer access happens.
The patch adds the newly allocated vm_struct to the vmlist *after* it is
fully initialized. So, it can avoid accessing the pages field with
unallocated page when show_numa_info() is called.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xen backend drivers (e.g., blkback and netback) would sometimes fail to
map grant pages into the vmalloc address space allocated with
alloc_vm_area(). The GNTTABOP_map_grant_ref would fail because Xen could
not find the page (in the L2 table) containing the PTEs it needed to
update.
(XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
netback and blkback were making the hypercall from a kernel thread where
task->active_mm != &init_mm and alloc_vm_area() was only updating the page
tables for init_mm. The usual method of deferring the update to the page
tables of other processes (i.e., after taking a fault) doesn't work as a
fault cannot occur during the hypercall.
This would work on some systems depending on what else was using vmalloc.
Fix this by reverting ef691947d8 ("vmalloc: remove vmalloc_sync_all()
from alloc_vm_area()") and add a comment to explain why it's needed.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Keir Fraser <keir.xen@gmail.com>
Cc: <stable@kernel.org> [3.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit db64fe0225 ("mm: rewrite vmap layer") introduced code that does
address calculations under the assumption that VMAP_BLOCK_SIZE is a
power of two. However, this might not be true if CONFIG_NR_CPUS is not
set to a power of two.
Wrong vmap_block index/offset values could lead to memory corruption.
However, this has never been observed in practice (or never been
diagnosed correctly); what caught this was the BUG_ON in vb_alloc() that
checks for inconsistent vmap_block indices.
To fix this, ensure that VMAP_BLOCK_SIZE always is a power of two.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=31572
Reported-by: Pavel Kysilka <goldenfish@linuxsoft.cz>
Reported-by: Matias A. Fonzo <selk@dragora.org>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: 2.6.28+ <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>