Fix oom_kill_task() so it doesn't call mmput() (which may sleep) while
holding tasklist_lock.
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Peterson <dsp@llnl.gov> points out that badness() is playing with
mm_structs without taking a reference on them.
mmput() can sleep, so taking a reference here (inside tasklist_lock) is
hard. Fix it up via task_lock() instead.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert for-loops that explicitly reference "NR_CPUS" into the
potentially more efficient for_each_possible_cpu() construct.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
madvise_remove needs to respect file and mmap protections.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
[ Will the real CVE-2006-1524 stand up, please.. ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/nommu.c.
When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory() in mm/mmap.c.
When the OVERCOMMIT_GUESS algorithm calculates the number of free pages,
the algorithm subtracts the number of reserved pages from the result
nr_free_pages().
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
__vm_enough_memory().
- why the kernel needed patching
When the kernel can't allocate anonymous pages in practice, currnet
OVERCOMMIT_GUESS could return success. This implementation might be
the cause of oom kill in memory pressure situation.
If the Linux runs with page reservation features like
/proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
the oom kill occurs easily.
- the overall design approach in the patch
When the OVERCOMMET_GUESS algorithm calculates number of free pages,
the reserved free pages are regarded as non-free pages.
This change helps to avoid the pitfall that the number of free pages
become less than the number which the kernel tries to keep free.
- testing results
I tested the patches using my test kernel module.
If the patches aren't applied to the kernel, __vm_enough_memory()
returns success in the situation but autual page allocation is
failed.
On the other hand, if the patches are applied to the kernel, memory
allocation failure is avoided since __vm_enough_memory() returns
failure in the situation.
I checked that on i386 SMP 16GB memory machine. I haven't tested on
nommu environment currently.
This patch adds totalreserve_pages for __vm_enough_memory().
Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
pages_high in each zone. Finally, the function stores the sum of each
zone to totalreserve_pages.
The totalreserve_pages is calculated when the VM is initilized.
And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
or /proc/sys/vm/min_free_kbytes are changed.
Signed-off-by: Hideo Aoki <haoki@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The code checks for newbrk with oldbrk which are page aligned before making
a check for the memory limit set of data segment. If the memory limit is
not page aligned in that case it bypasses the test for the limit if the
memory allocation is still for the same page.
Signed-off-by: Ram Gupta <ram.gupta5@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The earlier patch to consolidate mmu and nommu page allocation and
refcounting by using compound pages for nommu allocations had a bug:
kmalloc slabs who's pages were initially allocated by a non-__GFP_COMP
allocator could be passed into mm/nommu.c kmalloc allocations which really
wanted __GFP_COMP underlying pages. Fix that by having nommu pass
__GFP_COMP to all higher order slab allocations.
Signed-off-by: Luke Yang <luke.adi@gmail.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a statistics counter which is incremented everytime the alien cache
overflows. alien_cache limit is hardcoded to 12 right now. We can use
this statistics to tune alien cache if needed in the future.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rohit found an obscure bug causing buddy list corruption.
page_is_buddy is using a non-atomic test (PagePrivate && page_count == 0)
to determine whether or not a free page's buddy is itself free and in the
buddy lists.
Each of the conjuncts may be true at different times due to unrelated
conditions, so the non-atomic page_is_buddy test may find each conjunct to
be true even if they were not both true at the same time (ie. the page was
not on the buddy lists).
Signed-off-by: Martin Bligh <mbligh@google.com>
Signed-off-by: Rohit Seth <rohitseth@google.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.
This can happen with memory hotplug when the hotplug area defines an so
far empty node.
Now use bootmem to try to allocate the mem_map in other nodes.
And if it fails don't panic, but just ignore the node.
To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.
TBD should try to use nearby nodes here. Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch updates the comments to match the actual code.
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
Reasons:
- It's more flexible. Things which would require two or three syscalls with
fadvise() can be done in a single syscall.
- Using fadvise() in this manner is something not covered by POSIX.
The patch wires up the syscall for x86.
The sycall is implemented in the new fs/sync.c. The intention is that we can
move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
Documentation for the syscall is in fs/sync.c.
A test app (sync_file_range.c) is in
http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for
NFS_DATA_SYNC which is hopefully the more common."
Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
the queue is congested. This is trivial to fix: add a new flag bit, set
wbc->nonblocking. But I'm not sure that we want to expose implementation
details down to that level.
Note: it's notable that we can sync an fd which wasn't opened for writing.
Same with fsync() and fdatasync()).
Note: the code takes some care to handle attempts to sync file contents
outside the 16TB offset on 32-bit machines. It makes such attempts appear to
succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such
requests fail...
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>