Commit Graph

208 Commits

Author SHA1 Message Date
Al Viro
260b23674f [PATCH] gfp_t: the rest
zone handling, mapping->flags handling

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:51 -07:00
Al Viro
6daa0e2862 [PATCH] gfp_t: mm/* (easy parts)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:47 -07:00
Al Viro
af4ca457ea [PATCH] gfp_t: infrastructure
Beginning of gfp_t annotations:

 - -Wbitwise added to CHECKFLAGS
 - old __bitwise renamed to __bitwise__
 - __bitwise defined to either __bitwise__ or nothing, depending on
   __CHECK_ENDIAN__ being defined
 - gfp_t switched from __nocast to __bitwise__
 - force cast to gfp_t added to __GFP_... constants
 - new helper - gfp_zone(); extracts zone bits out of gfp_t value and casts
   the result to int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:46 -07:00
Magnus Damm
1c6fe94659 [PATCH] NUMA: broken per cpu pageset counters
The NUMA counters in struct per_cpu_pageset (linux/mmzone.h) are never
cleared today.  This works ok for CPU 0 on NUMA machines because
boot_pageset[] is already zero, but for other CPU:s this results in
uninitialized counters.

Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:43 -07:00
Hugh Dickins
ac9b9c667c [PATCH] Fix handling spurious page fault for hugetlb region
This reverts commit 3359b54c8c and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-20 09:02:07 -07:00
Yasunori Goto
281dd25cdc [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.

We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.

The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator.  But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages().  We may be done that post 2.6.14 as a
cleanup.

With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:11:33 -07:00
Hugh Dickins
1c59827d1d [PATCH] mm: hugetlb truncation fixes
hugetlbfs allows truncation of its files (should it?), but hugetlb.c often
forgets that: crashes and misaccounting ensue.

copy_hugetlb_page_range better grab the src page_table_lock since we don't
want to guess what happens if concurrently truncated.  unmap_hugepage_range
rss accounting must not assume the full range was mapped.  follow_hugetlb_page
must guard with page_table_lock and be prepared to exit early.

Restyle copy_hugetlb_page_range with a for loop like the others there.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:04:30 -07:00
Seth, Rohit
3359b54c8c [PATCH] Handle spurious page fault for hugetlb region
The hugetlb pages are currently pre-faulted.  At the time of mmap of
hugepages, we populate the new PTEs.  It is possible that HW has already
cached some of the unused PTEs internally.  These stale entries never
get a chance to be purged in existing control flow.

This patch extends the check in page fault code for hugepages.  Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).

Signed-off-by: Rohit Seth <rohit.seth@intel.com>

[ This is apparently arguably an ia64 port bug. But the code won't
  hurt, and for now it fixes a real problem on some ia64 machines ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 13:56:27 -07:00
Linus Torvalds
3d80636a0d Fix memory ordering bug in page reclaim
As noticed by Nick Piggin, we need to make sure that we check the page
count before we check for PageDirty, since the dirty check is only valid
if the count implies that we're the only possible ones holding the page.

We always did do this, but the code needs a read-memory-barrier to make
sure that the orderign is also honored by the CPU.

(The writer side is ordered due to the atomic decrement and test on the
page count, see the discussion on linux-kernel)

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-16 17:36:06 -07:00
Hugh Dickins
f5154a98a1 [PATCH] Don't map the same page too much
Refuse to install a page into a mapping if the mapping count is already
ridiculously large.

You probably cannot trigger this on 32-bit architectures, but on a
64-bit setup we should protect against it.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 12:03:47 -07:00
Suzuki
1bef400329 [PATCH] madvise: Avoid returning error code -EBADF for anonymous mappings
Revert this recent correctness change: Douglas Crosher <dcrosher@scieneer.com>
reported that it broke an existing application, and that madvise() works
without error on anonymous mappings on Solaris.

This means that madvise() will remain non-standards-compliant: we should
return -EBADF for all requests against non-file-backed vma's, but Linux only
does this for MADV_WILLNEED requests.

Signed-off-by: Suzuki K P <suzuki@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-11 09:46:54 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Linus Torvalds
6e3254c4e2 Revert "x86-64: Reverse order of bootmem lists"
As requested by Thomas Gleixner <tglx@linutronix.de>:

  "5d3d0f7704ed0bc7eaca0501eeae3e5da1ea6c87 breaks a couple of ARM
   boards, which depend on the historical bootmem allocation order.
   There is a cleaner solution around to remove the pgdat list
   completely, but this is a topic for post 2.6.14

   Andi signalled ACK already."

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:38:27 -07:00
Alok N Kataria
5c38230087 [PATCH] kmalloc_node IRQ safety fix
In kmalloc_node we are checking if the allocation is for the same node when
interrupts are "on".  This may lead to an allocation on another node than
intended.

This patch just shifts the check for the current node in __cache_alloc_node
when interrupts are disabled.

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:42 -07:00
Nick Piggin
8b1f312461 [PATCH] mm: move_pte to remap ZERO_PAGE
Move the ZERO_PAGE remapping complexity to the move_pte macro in
asm-generic, have it conditionally depend on
__HAVE_ARCH_MULTIPLE_ZERO_PAGE, which gets defined for MIPS.

For architectures without __HAVE_ARCH_MULTIPLE_ZERO_PAGE, move_pte becomes
a noop.

From: Hugh Dickins <hugh@veritas.com>

Fix nasty little bug we've missed in Nick's mremap move ZERO_PAGE patch.
The "pte" at that point may be a swap entry or a pte_file entry: we must
check pte_present before perhaps corrupting such an entry.

Patch below against 2.6.14-rc2-mm1, but the same bug is in 2.6.14-rc2's
mm/mremap.c, and more dangerous there since it's affecting all arches: I
think the safest course is to send Nick's patch and Yoichi's build fix and
this fix (build tested) on to Linus - so only MIPS can be affected.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:40 -07:00
Andrew Morton
dbdb904500 [PATCH] revert oversized kmalloc check
As davem points out, this wasn't such a great idea.  There may be some code
which does:

	size = 1024*1024;
	while (kmalloc(size, ...) == 0)
		size /= 2;

which will now explode.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-23 13:35:37 -07:00
Rob Landley
f7b3a4359b [PATCH] Fix bd_claim() error code.
Problem: In some circumstances, bd_claim() is returning the wrong error
code.

If we try to swapon an unused block device that isn't swap formatted, we
get -EINVAL.  But if that same block device is already mounted, we instead
get -EBUSY, even though it still isn't a valid swap device.

This issue came up on the busybox list trying to get the error message
from "swapon -a" right.  If a swap device is already enabled, we get -EBUSY,
and we shouldn't report this as an error.  But we can't distinguish the two
-EBUSY conditions, which are very different errors.

In the code, bd_claim() returns either 0 or -EBUSY, but in this case busy
means "somebody other than sys_swapon has already claimed this", and
_that_ means this block device can't be a valid swap device.  So return
-EINVAL there.

Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:37 -07:00
Christoph Lameter
eafb42707b [PATCH] __kmalloc: Generate BUG if size requested is too large.
I had an issue on ia64 where I got a bug in kernel/workqueue because
kzalloc returned a NULL pointer due to the task structure getting too big
for the slab allocator.  Usually these cases are caught by the kmalloc
macro in include/linux/slab.h.

Compilation will fail if a too big value is passed to kmalloc.

However, kzalloc uses __kmalloc which has no check for that.  This patch
makes __kmalloc bug if a too large entity is requested.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:36 -07:00
Christoph Lameter
ff69416e63 [PATCH] slab: fix handling of pages from foreign NUMA nodes
The numa slab allocator may allocate pages from foreign nodes onto the
lists for a particular node if a node runs out of memory.  Inspecting the
slab->nodeid field will not reflect that the page is now in use for the
slabs of another node.

This patch fixes that issue by adding a node field to free_block so that
the caller can indicate which node currently uses a slab.

Also removes the check for the current node from kmalloc_cache_node since
the process may shift later to another node which may lead to an allocation
on another node than intended.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:35 -07:00
Ivan Kokshaysky
7243cc05ba [PATCH] slab: alpha inlining fix
It is essential that index_of() be inlined.  But alpha undoes the gcc
inlining hackery and index_of() ends up out-of-line.  So fiddle with things
to make that function inline again.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-22 22:17:34 -07:00
Paolo 'Blaisorblade' Giarrusso
7e2cff42cf [PATCH] mm: add a note about partially hardcoded VM_* flags
Hugh made me note this line for permission checking in mprotect():

		if ((newflags & ~(newflags >> 4)) & 0xf) {

after figuring out what's that about, I decided it's nasty enough.  Btw
Hugh itself didn't like the 0xf.

We can safely change it to VM_READ|VM_WRITE|VM_EXEC because we never change
VM_SHARED, so no need to check that.

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 10:11:55 -07:00
Paolo 'Blaisorblade' Giarrusso
f10df68604 [PATCH] fix locking comment in unmap_region()
That comment is plain wrong (we even take the pagetable lock inside
unmap_region()).

Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-21 10:11:55 -07:00
Dave Hansen
f3519f9194 [PATCH] fix mm/Kconfig spelling
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:01 -07:00
Alok Kataria
c7e43c78ae [PATCH] Fix slab BUG_ON() triggered by change in array cache size
With the new changes that we made in the initialization of the slab
allocator, we first setup the cache from which array caches are allocated,
and then the cache, from which kmem_list3's are allocated.

Now if the array cache comes from a cache in which objsize > 32, (in this
instance size-64) then, first size-64 cache will be allocated and then the
size-128 (if this is the cache from which kmem_list3's are going to be
allocated).

So with these new changes, we are not guaranteed that we will be
initializing the malloc_sizes array in a serialized order. Thus there is
a bug in __find_general_cachep, as we are checking whether the first
cache_sizes ptr is NULL.

This is replaced by checking whether the array-cache cache is initialized.
Attached is a patch which does that.  Boots fine on a x86-64, with
DEBUG_SPIN, DEBUG_SLAB, and preempt.

Attached is a patch which does that.  Boots fine on a x86-64, with
DEBUG_SPIN, DEBUG_SLAB, and preempt.Thanks & Regards, Alok

Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhitdayal.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 12:31:45 -07:00
Hugh Dickins
2fd4ef85e0 [PATCH] error path in setup_arg_pages() misses vm_unacct_memory()
Pavel Emelianov and Kirill Korotaev observe that fs and arch users of
security_vm_enough_memory tend to forget to vm_unacct_memory when a
failure occurs further down (typically in setup_arg_pages variants).

These are all users of insert_vm_struct, and that reservation will only
be unaccounted on exit if the vma is marked VM_ACCOUNT: which in some
cases it is (hidden inside VM_STACK_FLAGS) and in some cases it isn't.

So x86_64 32-bit and ppc64 vDSO ELFs have been leaking memory into
Committed_AS each time they're run.  But don't add VM_ACCOUNT to them,
it's inappropriate to reserve against the very unlikely case that gdb
be used to COW a vDSO page - we ought to do something about that in
do_wp_page, but there are yet other inconsistencies to be resolved.

The safe and economical way to fix this is to let insert_vm_struct do
the security_vm_enough_memory check when it finds VM_ACCOUNT is set.

And the MIPS irix_brk has been calling security_vm_enough_memory before
calling do_brk which repeats it, doubly accounting and so also leaking.
Remove that, and all the fs and arch calls to security_vm_enough_memory:
give it a less misleading name later on.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-Off-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 11:18:13 -07:00