Commit Graph

2180 Commits

Author SHA1 Message Date
Pekka Enberg
8fea4e96a8 [PATCH] slab: object to index mapping cleanup
Clean up the object to index mapping that has been spread around mm/slab.c.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
a482289d46 [PATCH] hugepage allocator cleanup
Insert "fresh" huge pages into the hugepage allocator by the same means as
they are freed back into it.  This reduces code size and allows
enqueue_huge_page to be inlined into the hugepage free fastpath.

Eliminate occurances of hugepages on the free list with non-zero refcount.
This can allow stricter refcount checks in future.  Also required for
lockless pagecache.

Signed-off-by: Nick Piggin <npiggin@suse.de>

"This patch also eliminates a leak "cleaned up" by re-clobbering the
refcount on every allocation from the hugepage freelists.  With respect to
the lockless pagecache, the crucial aspect is to eliminate unconditional
set_page_count() to 0 on pages with potentially nonzero refcounts, though
closer inspection suggests the assignments removed are entirely spurious."

Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
545b1ea9bf [PATCH] mm: cleanup bootmem
The bootmem code added to page_alloc.c duplicated some page freeing code
that it really doesn't need to because it is not so performance critical.

While we're here, make prefetching work properly by actually prefetching
the page we're about to use before prefetching ahead to the next one (ie.
get the most important transaction started first).  Also prefetch just a
single page ahead rather than leaving a gap of 16.

Jack Steiner reported no problems with SGI's ia64 simulator.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:58 -08:00
Nick Piggin
8dfcc9ba27 [PATCH] mm: split highorder pages
Have an explicit mm call to split higher order pages into individual pages.
 Should help to avoid bugs and be more explicit about the code's intention.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
7c8ee9a863 [PATCH] mm: simplify vmscan vs release refcounting
The VM has an interesting race where a page refcount can drop to zero, but it
is still on the LRU lists for a short time.  This was solved by testing a 0->1
refcount transition when picking up pages from the LRU, and dropping the
refcount in that case.

Instead, use atomic_add_unless to ensure we never pick up a 0 refcount page
from the LRU, thus a 0 refcount page will never have its refcount elevated
until it is allocated again.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
f205b2fe62 [PATCH] mm: slab less atomics
Atomic operation removal from slab

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
5e9dace8d3 [PATCH] mm: page_alloc less atomics
More atomic operation removal from page allocator

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
674539115c [PATCH] mm: less atomic ops
In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0.  So no need for atomic
operations here.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
4c84cacfa4 [PATCH] mm: PageActive no testset
PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:57 -08:00
Nick Piggin
8d438f96d2 [PATCH] mm: PageLRU no testset
PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:56 -08:00
Nick Piggin
46453a6e19 [PATCH] mm: never ClearPageLRU released pages
If vmscan finds a zero refcount page on the lru list, never ClearPageLRU
it.  This means the release code need not hold ->lru_lock to stabilise
PageLRU, so that lock may be skipped entirely when releasing !PageLRU pages
(because we know PageLRU won't have been temporarily cleared by vmscan,
which was previously guaranteed by holding the lock to synchronise against
vmscan).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:56 -08:00
Andrew Morton
b40607fc02 [PATCH] __get_page_state() cpumask cleanup and fix
__get_page_state() has an open-coded for_each_cpu_mask() loop in it.

Tidy that up, then notice that the code was buggy:

	while (cpu < NR_CPUS) {
		unsigned long *in, *out, off;

		if (!cpu_isset(cpu, *cpumask))
			continue;

an obvious infinite loop.  I guess we just never call it with a holey cpu
mask.

Even after my cpumask size-reduction work, this patch increases code size :(

Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:55 -08:00
Steven Whitehouse
9a21247181 Merge branch 'master' 2006-03-20 12:47:40 -05:00
Hugh Dickins
6f5e6b9e69 [PATCH] fix free swap cache latency
Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all.  We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:26 -08:00
Christoph Lameter
5b40dc780e [PATCH] fix race in pagevec_strip?
We can call try_to_release_page() with PagePrivate off and a valid
page->mapping This may cause all sorts of trouble for the filesystem
*_releasepage() handlers.  XFS bombs out in that case.

Lock the page before checking for page private.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:25 -08:00
Christoph Lameter
90036ee593 [PATCH] page migration: Fail with error if swap not setup
Currently the migration of anonymous pages will silently fail if no swap is
setup.  This patch makes page migration functions check for available swap
and fail with -ENODEV if no swap space is available.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:25 -08:00
Christoph Lameter
74c0024105 [PATCH] Consistent capabilites associated with MPOL_MOVE_ALL
It seems that setting scheduling policy and priorities is also the kind of
thing that might be performed in apps that also use the NUMA API, so it
would seem consistent to use CAP_SYS_NICE for NUMA also.

So use CAP_SYS_NICE for controlling migration permissions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 21:43:02 -08:00
Christoph Lameter
4983da07f1 [PATCH] page migration: fail if page is in a vma flagged VM_LOCKED
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.

However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1).  We can check for that return code
and avoid retrying the migration.

migrate_page_remove_references() now needs to return a reason why the
failure occured.  So switch migrate_page_remove_references to use -Exx
style error messages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 21:43:02 -08:00
Christoph Lameter
8fce4d8e3b [PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages.
The cache reaper currently tries to free all alien caches and all remote
per cpu pages in each pass of cache_reap.  For a machines with large number
of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
Interrupts are disabled while reclaiming creating unacceptable delays.

This patch changes that behavior by adding a per cpu reap_node variable.
Instead of attempting to free all caches, we free only one alien cache and
the per cpu pages from one remote node.  That reduces the time spend in
cache_reap.  However, doing so will lengthen the time it takes to
completely drain all remote per cpu pagesets and all alien caches.  The
time needed will grow with the number of nodes in the system.  All caches
are drained when they overflow their respective capacity.  So the drawback
here is only that a bit of memory may be wasted for awhile longer.

Details:

1. Rename drain_remote_pages to drain_node_pages to allow the specification
   of the node to drain of pcp pages.

2. Add additional functions init_reap_node, next_reap_node for NUMA
   that manage a per cpu reap_node counter.

3. Add a reap_alien function that reaps only from the current reap_node.

For us this seems to be a critical issue.  Holdoffs of an average of ~7ms
cause some HPC benchmarks to slow down significantly.  F.e.  NAS parallel
slows down dramatically.  NAS parallel has a 12-16 seconds runtime w/o rotor
compared to 5.8 secs with the rotor patches.  It gets down to 5.05 secs with
the additional interrupt holdoff reductions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:38 -08:00
Yasunori Goto
f2937be589 [PATCH] memory hotadd: pgdat->node_present_pages fix
When pages are onlined, not only zone->present_pages but also
pgdat->node_present_pages should be refreshed.

This parameter is used to show information at
/sys/device/system/node/nodeX/meminfo via si_meminfo_node().

So, it shows strange value for MemUsed which is calculated
(node_present_pages - all zones free pages).

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:38 -08:00
Christoph Lameter
a6bf527091 [PATCH] vmscan: no zone_reclaim if PF_MALLOC is set
If the process has already set PF_MALLOC and is already using
current->reclaim_state then do not try to reclaim memory from the zone.
This is set by kswapd and/or synchrononous global reclaim which will not
take it lightly if we zap the reclaim_state.

Signed-off-by: Christoph Lameter <clameter@sig.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:37 -08:00
Hugh Dickins
85a6cd03a9 [PATCH] page_add_file_rmap(): remove BUG_ON()s
Remove two early-development BUG_ONs from page_add_file_rmap.

The pfn_valid test (originally useful for checking that nobody passed an
artificial struct page) comes too late, since we already have the struct
page.

The PageAnon test (useful when anon was first distinguished from file rmap)
prevents ->nopage implementations from reusing ->mapping, which would
otherwise be available.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-09 19:47:36 -08:00
Jack Steiner
07ed76b2a0 [PATCH] slab: allocate larger cache_cache if order 0 fails
kmem_cache_init() incorrectly assumes that the cache_cache object will fit
in an order 0 allocation.  On very large systems, this is not true.  Change
the code to try larger order allocations if order 0 fails.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:15:04 -08:00
Andrew Morton
e2bab3d924 [PATCH] percpu_counter_sum()
Implement percpu_counter_sum().  This is a more accurate but slower version of
percpu_counter_read_positive().

We need this for Alex's speedup-ext3_statfs patch and for the nr_file
accounting fix.  Otherwise these things would be too inaccurate on large CPU
counts.

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Cc: Alex Tomas <alex@clusterfs.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:01 -08:00
Andrew Morton
7f709ed0e3 [PATCH] numa_maps-update fix
Fix the mm/mempolicy.c build for !CONFIG_HUGETLB_PAGE.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08 14:14:00 -08:00