atomic_add_unless (atomic_inc_not_zero) no longer requires an offset refcount
to function correctly.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time. This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.
Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
installation it's easiest if it's a boot time option.
Also I moved the variable to a more appropiate place and make
it independent from sysctl
And marked __read_mostly which it is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Compound pages on SMP systems can now often be freed from pagetables via
the release_pages path. This uses put_page_testzero which does not handle
compound pages at all. Releasing constituent pages from process mappings
decrements their count to a large negative number and leaks the reference
at the head page - net result is a memory leak.
The problem was hidden because the debug check in put_page_testzero itself
actually did take compound pages into consideration.
Fix the bug and the debug check.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Arjan van de Ven <arjan@infradead.org>
Mark a number of functions as 'must inline'. The functions affected by this
patch need to be inlined because they use knowledge that their arguments are
constant so that most of the function optimizes away. At this point this
patch does not change behavior, it's for documentation only (and for future
patches in the inline series)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move capable() from sched.h to capability.h;
- Use <linux/capability.h> where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
arguments instead, since all its callers were just calculating the 'to'
address for themselves anyway... (and sometimes doing so badly).
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
more mutex debugging: check for held locks during memory freeing,
task exit, enable sysrq printouts, etc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Reduce the size of the pageframe for NR_CPUS>4, CONFIG_PREEMPT back to the
minimal size by unionising both ->private and ->mapping with the pagetable
lock.
It uses an anonymous struct and hence requires gcc-3.x.
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add /proc/sys/vm/drop_caches. When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can. THis
operation requires root permissions.
It won't drop dirty data, so the user should run `sync' first.
Caveats:
a) Holds inode_lock for exorbitant amounts of time.
b) Needs to be taught about NUMA nodes: propagate these all the way through
so the discarding can be controlled on a per-node basis.
This is a debugging feature: useful for getting consistent results between
filesystem benchmarks. We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make the futex code compilable and usable on NOMMU by making the attempt to
handle page faults conditional on CONFIG_MMU. If this is not enabled, then
we can assume that EFAULT returned from futex_atomic_op_inuser() is not
recoverable, and that the address lies outside of valid memory.
handle_mm_fault() is made to BUG if called on NOMMU without attempting to
invoke the actual handler (__handle_mm_fault).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes the SYSV IPC shared memory facilities use the new
ramfs facilities on a no-MMU kernel.
The following changes are made:
(1) There are now shmem_mmap() and shmem_get_unmapped_area() functions to
allow the IPC SHM facilities to commune with the tiny-shmem and shmem
code.
(2) ramfs files now need resizing using do_truncate() rather than by modifying
the inode size directly (see shmem_file_setup()). This causes ramfs to
attempt to bind a block of pages of sufficient size to the inode.
(3) CONFIG_SYSVIPC is no longer contingent on CONFIG_MMU.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix two warnings in ipc/shm.c
ipc/shm.c:122: warning: statement with no effect
ipc/shm.c:560: warning: statement with no effect
by converting the macros to empty inline functions. For safety, let's do
all three. This also has the advantage that typechecking gets performed
even without CONFIG_SHMEM enabled.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is the patch to implement madvise(MADV_REMOVE) - which frees up a
given range of pages & its associated backing store. Current
implementation supports only shmfs/tmpfs and other filesystems return
-ENOSYS.
"Some app allocates large tmpfs files, then when some task quits and some
client disconnect, some memory can be released. However the only way to
release tmpfs-swap is to MADV_REMOVE". - Andrea Arcangeli
Databases want to use this feature to drop a section of their bufferpool
(shared memory segments) - without writing back to disk/swap space.
This feature is also useful for supporting hot-plug memory on UML.
Concerns raised by Andrew Morton:
- "We have no plan for holepunching! If we _do_ have such a plan (or
might in the future) then what would the API look like? I think
sys_holepunch(fd, start, len), so we should start out with that."
- Using madvise is very weird, because people will ask "why do I need to
mmap my file before I can stick a hole in it?"
- None of the other madvise operations call into the filesystem in this
manner. A broad question is: is this capability an MM operation or a
filesytem operation? truncate, for example, is a filesystem operation
which sometimes has MM side-effects. madvise is an mm operation and with
this patch, it gains FS side-effects, only they're really, really
significant ones."
Comments:
- Andrea suggested the fs operation too but then it's more efficient to
have it as a mm operation with fs side effects, because they don't
immediatly know fd and physical offset of the range. It's possible to
fixup in userland and to use the fs operation but it's more expensive,
the vmas are already in the kernel and we can use them.
Short term plan & Future Direction:
- We seem to need this interface only for shmfs/tmpfs files in the short
term. We have to add hooks into the filesystem for correctness and
completeness. This is what this patch does.
- In the future, plan is to support both fs and mmap apis also. This
also involves (other) filesystem specific functions to be implemented.
- Current patch doesn't support VM_NONLINEAR - which can be addressed in
the future.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes truncate_inode_pages_range from truncate_inode_pages.
truncate_inode_pages became a one-liner call to truncate_inode_pages_range.
Reiser4 needs truncate_inode_pages_ranges because it tries to keep
correspondence between existences of metadata pointing to data pages and pages
to which those metadata point to. So, when metadata of certain part of file
is removed from filesystem tree, only pages of corresponding range are to be
truncated.
(Needed by the madvise(MADV_REMOVE) patch)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The logic that decides that a fork() might be able to avoid copying a VM
area when it can be re-created by page faults didn't know about the new
vm_insert_page() case.
Also make some things a bit more anal wrt VM_PFNMAP.
Pointed out by Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With the previous commit, we can handle arbitrary shared re-mappings
even without this complexity, and since the only known private mappings
are for strange users of /dev/mem (which never create an incomplete one),
there seems to be no reason to support it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is what a lot of drivers will actually want to use to insert
individual pages into a user VMA. It doesn't have the old PageReserved
restrictions of remap_pfn_range(), and it doesn't complain about partial
remappings.
The page you insert needs to be a nice clean kernel allocation, so you
can't insert arbitrary page mappings with this, but that's not what
people want.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These get created by some drivers that don't generally even want a pfn
remapping at all, but would really mostly prefer to just map pages
they've allocated individually instead.
For now, create a helper function that turns such an incomplete PFN
remapping call into a loop that does that explicit mapping. In the long
run we almost certainly want to export a totally different interface for
that, though.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.
Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.
Sparc update from David in the next commit.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Although we tend to associate VM_RESERVED with remap_pfn_range, quite a few
drivers set VM_RESERVED on areas which are then populated by nopage. The
PageReserved removal in 2.6.15-rc1 changed VM_RESERVED not to free pages in
zap_pte_range, without changing those drivers not to set it: so their pages
just leak away.
Let's not change miscellaneous drivers now: introduce VM_UNPAGED at the core,
to flag the special areas where the ptes may have no struct page, or if they
have then it's not to be touched. Replace most instances of VM_RESERVED in
core mm by VM_UNPAGED. Force it on in remap_pfn_range, and the sparc and
sparc64 io_remap_pfn_range.
Revert addition of VM_RESERVED to powerpc vdso, it's not needed there. Is it
needed anywhere? It still governs the mm->reserved_vm statistic, and special
vmas not to be merged, and areas not to be core dumped; but could probably be
eliminated later (the drivers are probably specifying it because in 2.4 it
kept swapout off the vma, but in 2.6 we work from the LRU, which these pages
don't get on).
Use the VM_SHM slot for VM_UNPAGED, and define VM_SHM to 0: it serves no
purpose whatsoever, and should be removed from drivers when we clean up.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.
That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>