Commit Graph

2180 Commits

Author SHA1 Message Date
Hugh Dickins
d98c7a0984 [PATCH] compound page: default destructor
Somehow I imagined that calling a NULL destructor would free a compound page
rather than oopsing.  No, we must supply a default destructor, __free_pages_ok
using the order noted by prep_compound_page.  hugetlb can still replace this
as before with its own free_huge_page pointer.

The case that needs this is not common: rarely does put_compound_page's
put_page_testzero bring the count down to 0.  But if get_user_pages is applied
to some part of a compound page, without immediate release (e.g.  AIO or
Infiniband), then it's possible for its put_page to come after the containing
vma has been unmapped and the driver done its free_pages.

That's just the kind of case compound pages are supposed to be guarding
against (but Nick points out, nor did PageReserved handle this right).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:33 -08:00
Hugh Dickins
41d78ba550 [PATCH] compound page: use page[1].lru
If a compound page has its own put_page_testzero destructor (the only current
example is free_huge_page), that is noted in page[1].mapping of the compound
page.  But that's rather a poor place to keep it: functions which call
set_page_dirty_lock after get_user_pages (e.g.  Infiniband's
__ib_umem_release) ought to be checking first, otherwise set_page_dirty is
liable to crash on what's not the address of a struct address_space.

And now I'm about to make that worse: it turns out that every compound page
needs a destructor, so we can no longer rely on hugetlb pages going their own
special way, to avoid further problems of page->mapping reuse.  For example,
not many people know that: on 50% of i386 -Os builds, the first tail page of a
compound page purports to be PageAnon (when its destructor has an odd
address), which surprises page_add_file_rmap.

Keep the compound page destructor in page[1].lru.next instead.  And to free up
the common pairing of mapping and index, also move compound page order from
index to lru.prev.  Slab reuses page->lru too: but if we ever need slab to use
compound pages, it can easily stack its use above this.

(akpm: decoded version of the above: the tail pages of a compound page now
have ->mapping==NULL, so there's no need for the set_page_dirty[_lock]()
caller to check that they're not compund pages before doing the dirty).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14 16:09:33 -08:00
Christoph Lameter
2903fb1694 [PATCH] vmscan: skip reclaim_mapped determination if we do not swap
This puts the variables and the way to get to reclaim_mapped in one block.
And allows zone_reclaim or other things to skip the determination (maybe
this whole block of code does not belong into refill_inactive_zone()?)

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11 21:41:11 -08:00
Christoph Lameter
072eaa5d9c [PATCH] vmscan: remove duplicate increment of reclaim_in_progress
shrink_zone() already increments reclaim_in_progress.  No need to do it in
balance_pgdat.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11 21:41:11 -08:00
Christoph Lameter
80e4342601 [PATCH] zone reclaim: do not check references to a page during zone reclaim
shrink_list() and refill_inactive() check all ptes pointing to a page for
reference bits in order to decide if the page should be put on the active
list.  This is not necessary for zone_reclaim since we are only interested
in removing unmapped pages.  Skip the checks in both functions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11 21:41:11 -08:00
Christoph Lameter
418aade459 [PATCH] Updates for page migration
This adds some additional comments in order to help others figure out how
exactly the code works.  And fix a variable name.

Also swap_page does need to ignore all reference bits when unmapping a
page.  Otherwise we may have to repeatedly unmap a frequently touched page.
So change the try_to_unmap parameter to 1.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10 08:13:13 -08:00
Ravikiran G Thirumalai
f0188f4748 [PATCH] slab: Avoid deadlock at kmem_cache_create/kmem_cache_destroy
Prevents deadlock situation between
kmem_cache_create()/kmem_cache_destory(), and kmem_cache_create() /cpu
hotplug.  The locking order probably got moved over time.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-10 08:13:12 -08:00
Ingo Molnar
9934a7939e [PATCH] SLOB=y && SMP=y fix
fix CONFIG_SLOB=y (when CONFIG_SMP=y): get rid of the 'align' parameter
from its __alloc_percpu() implementation. Boot-tested on x86.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-08 07:52:58 -08:00
Nick Piggin
8519fb30e4 [PATCH] mm: compound release fix
Compound pages on SMP systems can now often be freed from pagetables via
the release_pages path.  This uses put_page_testzero which does not handle
compound pages at all.  Releasing constituent pages from process mappings
decrements their count to a large negative number and leaks the reference
at the head page - net result is a memory leak.

The problem was hidden because the debug check in put_page_testzero itself
actually did take compound pages into consideration.

Fix the bug and the debug check.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07 16:12:33 -08:00
Christoph Lameter
0df420d8b6 [PATCH] hugetlbpage: return VM_FAULT_OOM on oom
Remove wrong and misleading comments.

Return VM_FAULT_OOM if the hugetlbpage fault handler cannot allocate a
page.  do_no_page will end up doing do_exit(SIGKILL).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07 16:12:31 -08:00
David Gibson
a2dfef6947 [PATCH] Hugepages need clear_user_highpage() not clear_highpage()
When hugepages are newly allocated to a file in mm/hugetlb.c, we clear them
with a call to clear_highpage() on each of the subpages.  We should be
using clear_user_highpage(): on powerpc, at least, clear_highpage() doesn't
correctly mark the page as icache dirty so if the page is executed shortly
after it's possible to get strange results.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07 16:12:31 -08:00
Linus Torvalds
7a21ef6fe9 mm/slab.c (non-NUMA): Fix compile warning and clean up code
The non-NUMA case would do an unmatched "free_alien_cache()" on an alien
pointer that had never been allocated.

It might not matter from a code generation standpoint (since in the
non-NUMA case, the code doesn't actually _do_ anything), but it not only
results in a compiler warning, it's really really ugly too.

Fix the compiler warning by just having a matching dummy allocation.
That also avoids an unnecessary #ifdef in the code.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:26:38 -08:00
Ravikiran G Thirumalai
4484ebf12b [PATCH] NUMA slab locking fixes: fix cpu down and up locking
This fixes locking and bugs in cpu_down and cpu_up paths of the NUMA slab
allocator.  Sonny Rao <sonny@burdell.org> reported problems sometime back on
POWER5 boxes, when the last cpu on the nodes were being offlined.  We could
not reproduce the same on x86_64 because the cpumask (node_to_cpumask) was not
being updated on cpu down.  Since that issue is now fixed, we can reproduce
Sonny's problems on x86_64 NUMA, and here is the fix.

The problem earlier was on CPU_DOWN, if it was the last cpu on the node to go
down, the array_caches (shared, alien) and the kmem_list3 of the node were
being freed (kfree) with the kmem_list3 lock held.  If the l3 or the
array_caches were to come from the same cache being cleared, we hit on
badness.

This patch cleans up the locking in cpu_up and cpu_down path.  We cannot
really free l3 on cpu down because, there is no node offlining yet and even
though a cpu is not yet up, node local memory can be allocated for it.  So l3s
are usually allocated at keme_cache_create and destroyed at
kmem_cache_destroy.  Hence, we don't need cachep->spinlock protection to get
to the cachep->nodelist[nodeid] either.

Patch survived onlining and offlining on a 4 core 2 node Tyan box with a 4
dbench process running all the time.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:53 -08:00
Ravikiran G Thirumalai
ca3b9b9173 [PATCH] NUMA slab locking fixes: irq disabling from cahep->spinlock to l3 lock
Earlier, we had to disable on chip interrupts while taking the
cachep->spinlock because, at cache_grow, on every addition of a slab to a slab
cache, we incremented colour_next which was protected by the cachep->spinlock,
and cache_grow could occur at interrupt context.  Since, now we protect the
per-node colour_next with the node's list_lock, we do not need to disable on
chip interrupts while taking the per-cache spinlock, but we just need to
disable interrupts when taking the per-node kmem_list3 list_lock.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:53 -08:00
Ravikiran G Thirumalai
2e1217cf96 [PATCH] NUMA slab locking fixes: move color_next to l3
colour_next is used as an index to add a colouring offset to a new slab in the
cache (colour_off * colour_next).  Now with the NUMA aware slab allocator, it
makes sense to colour slabs added on the same node sequentially with
colour_next.

This patch moves the colouring index "colour_next" per-node by placing it on
kmem_list3 rather than kmem_cache.

This also helps simplify locking for CPU up and down paths.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:53 -08:00
Christoph Lameter
64b4a954b0 [PATCH] hugetlb: add comment explaining reasons for Bus Errors
I just spent some time researching a Bus Error.  Turns out that the huge
page fault handler can return VM_FAULT_SIGBUS for various conditions where
no huge page is available.

Add a note explaining the reasoning in the source.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:53 -08:00
Eric Dumazet
88a2a4ac6b [PATCH] percpu data: only iterate over possible CPUs
percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h.  powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05 11:06:51 -08:00
Chen, Kenneth W
00ac59adfc [PATCH] x86_64: Fix memory policy build without CONFIG_HUGETLBFS
> mm/mempolicy.c: In function `huge_zonelist':
> mm/mempolicy.c:1045: error: `HPAGE_SHIFT' undeclared (first use in this function)
> mm/mempolicy.c:1045: error: (Each undeclared identifier is reported only once
> mm/mempolicy.c:1045: error: for each function it appears in.)
> make[1]: *** [mm/mempolicy.o] Error 1

Need to wrap huge_zonelist function with CONFIG_HUGETLBFS.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04 16:43:14 -08:00
Randy Dunlap
ee13d785ea [PATCH] slab: fix sparse warning
mm/slab.c:1522:13: error: incompatible types for operation (&)

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Randy.Dunlap
a70773ddb9 [PATCH] mm/slab: add kernel-doc for one function
Fix kernel-doc for calculate_slab_order().

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Pekka Enberg
7fd6b14130 [PATCH] slab: fix kzalloc and kstrdup caller report for CONFIG_DEBUG_SLAB
Fix kzalloc() and kstrdup() caller report for CONFIG_DEBUG_SLAB.  We must
pass the caller to __cache_alloc() instead of directly doing
__builtin_return_address(0) there; otherwise kzalloc() and kstrdup() are
reported as the allocation site instead of the real one.

Thanks to Valdis Kletnieks for reporting the problem and Steven Rostedt for
the original idea.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Andrew Morton
b958f7d9f3 [PATCH] dump_stack() in oom handler
Sometimes it's nice to know who's calling.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Pekka Enberg
343e0d7a93 [PATCH] slab: replace kmem_cache_t with struct kmem_cache
Replace uses of kmem_cache_t with proper struct kmem_cache in mm/slab.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Pekka Enberg
9a2dba4b49 [PATCH] slab: rename ac_data to cpu_cache_get
Rename the ac_data() function to more descriptive cpu_cache_get().

Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00
Pekka Enberg
6ed5eb2211 [PATCH] slab: extract virt_to_{cache|slab}
Introduce virt_to_cache() and virt_to_slab() functions to reduce duplicate
code and introduce a proper abstraction should we want to support other kind
of mapping for address to slab and cache (eg.  for vmalloc() or I/O memory).

Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01 08:53:18 -08:00