Commit Graph

1935 Commits

Author SHA1 Message Date
Christoph Lameter b20a35035f [PATCH] page migration reorg
Centralize the page migration functions in anticipation of additional
tinkering.  Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
   and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:06 -08:00
Chen, Kenneth W bba1e9b211 [PATCH] convert hugetlbfs_counter to atomic
Implementation of hugetlbfs_counter() is functionally equivalent to
atomic_inc_return().  Use the simpler atomic form.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:04 -08:00
David Gibson b45b5bd65f [PATCH] hugepage: Strict page reservation for hugepage inodes
These days, hugepages are demand-allocated at first fault time.  There's a
somewhat dubious (and racy) heuristic when making a new mmap() to check if
there are enough available hugepages to fully satisfy that mapping.

A particularly obvious case where the heuristic breaks down is where a
process maps its hugepages not as a single chunk, but as a bunch of
individually mmap()ed (or shmat()ed) blocks without touching and
instantiating the pages in between allocations.  In this case the size of
each block is compared against the total number of available hugepages.
It's thus easy for the process to become overcommitted, because each block
mapping will succeed, although the total number of hugepages required by
all blocks exceeds the number available.  In particular, this defeats such
a program which will detect a mapping failure and adjust its hugepage usage
downward accordingly.

The patch below addresses this problem, by strictly reserving a number of
physical hugepages for hugepage inodes which have been mapped, but not
instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
only if the sysadmin explicitly reduces the hugepage pool between mapping
and instantiation)

This patch appears to address the problem at hand - it allows DB2 to start
correctly, for instance, which previously suffered the failure described
above.

This patch causes no regressions on the libhugetblfs testsuite, and makes a
test (designed to catch this problem) pass which previously failed (ppc64,
POWER5).

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:03 -08:00
Nick Piggin 84097518d1 [PATCH] mm: nommu use compound pages
Now that compound page handling is properly fixed in the VM, move nommu
over to using compound pages rather than rolling their own refcounting.

nommu vm page refcounting is broken anyway, but there is no need to have
divergent code in the core VM now, nor when it gets fixed.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: David Howells <dhowells@redhat.com>

(Needs testing, please).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:54:01 -08:00
Christoph Lameter ac2b898ca6 [PATCH] slab: Remove SLAB_NO_REAP option
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure.  However, that is not what happens.  The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap().  Cache_reap()
is run every few seconds independently of memory pressure.

Could we remove the whole thing?  Its only used by three slabs anyways and
I cannot find a reason for having this option.

There is an additional problem with SLAB_NO_REAP.  If set then the recovery
of objects from alien caches is switched off.  Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk.  (Strangely __cache_shrink does not
check for SLAB_NO_REAP)

Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:59 -08:00
Latchesar Ionkov 5e7a99ac45 [PATCH] v9fs: assign dentry ops to negative dentries
If a file is not found in v9fs_vfs_lookup, the function creates negative
dentry, but doesn't assign any dentry ops.  This leaves the negative entry
in the cache (there is no d_delete to mark it for removal).  If the file is
created outside of the mounted v9fs filesystem, the file shows up in the
directory with weird permissions.

This patch assigns the default v9fs dentry ops to the negative dentry.

Signed-off-by: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-22 07:53:55 -08:00
Greg Kroah-Hartman b3229087c5 [PATCH] sysfs: fix a kobject leak in sysfs_add_link on the error path
As pointed out by Oliver Neukum.

Cc: Maneesh Soni <maneesh@in.ibm.com>
Cc: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Greg Kroah-Hartman 832c57e9af [PATCH] sysfs: don't export dir symbols
These functions should only be used by the kobject core, and if any
driver tries to use them, bad things happen.  Unexport them to try to
prevent this from happening.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Michael Ellerman dd308bc355 [PATCH] debugfs: Add debugfs_create_blob() helper for exporting binary data
I wanted to export a binary blob via debugfs, and although it was pretty easy
it seems like it'd be easier if there was a helper for it. It's a pity we need
the wrapper struct but I can't see a cleaner way to do it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Maneesh Soni c516865cfb [PATCH] sysfs: fix problem with duplicate sysfs directories and files
The following patch checks for existing sysfs_dirent before
preparing new one while creating sysfs directories and files.

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:59 -08:00
Eric Sesterhenn 58d49283b8 [PATCH] sysfs: kzalloc conversion
this converts fs/sysfs to kzalloc() usage.
compile tested with make allyesconfig

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Jes Sorensen 58383af629 [PATCH] kobj_map semaphore to mutex conversion
Convert the kobj_map code to use a mutex instead of a semaphore.  It
converts the single two users as well, genhd.c and char_dev.c.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:58 -08:00
Greg Kroah-Hartman 641e6f30a0 [PATCH] sysfs: sysfs_remove_dir() needs to invalidate the dentry
When calling sysfs_remove_dir() don't allow any further sysfs functions
to work for this kobject anymore.  This fixes a nasty USB cdc-acm oops
on disconnect.

Many thanks to Bob Copeland and Paul Fulghum for taking the time to
track this down.

Cc: Bob Copeland <email@bobcopeland.com>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-20 13:42:57 -08:00
Linus Torvalds 88dcb91177 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  JFS: add uid, gid, and umask mount options
  JFS: Take logsync lock before testing mp->lsn
  JFS: kzalloc conversion
  JFS: Add missing file from fa3241d24c
  JFS: Use the kthread_ API
  JFS: Fix regression.  fsck complains if symlinks do not have INLINEEA attribute
  JFS: ext2 inode attributes for jfs
  JFS: semaphore to mutex conversion.
  JFS: make buddy table static
  JFS: Add back directory i_size calculations for legacy partitions
2006-03-20 10:32:33 -08:00
Peter Staubach 85c6932ef0 [PATCH] nfsservctl(): remove user-triggerable printk
A user can use nfsservctl() to spam the logs.

This can happen because the arguments to the nfsservctl() system call are
versioned.  This is a good thing.  However, when a bad version is detected,
the kernel prints a message and then returns an error.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:25 -08:00
Eric Van Hensbergen 8532159f55 [PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache
There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon.  This was likely a part of the wierd cwd behavior
folks were seeing.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-17 07:51:25 -08:00
Al Viro 2d7f2ea9c9 [PATCH] Fix ext2 readdir f_pos re-validation logic
This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
in the ext2 filesystem "readdir()" function.  It also cleans up the code
to avoid the unnecessary goto mess.

The bugs were related to re-valiating the f_pos value after somebody had
either done an "lseek()" on the directory to an invalid offset, or when
the offset had become invalid due to a file being unlinked in the
directory.  The code would not only set the f_version too eagerly, it
would also not update f_pos appropriately for when the offset fixup took
place.

When that happened, we'd occasionally subsequently fail the readdir()
even when we shouldn't (no real harm done, but an ugly printk, and
obviously you would end up not necessarily seeing all entries).

Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
and had a test-case for it, and also fixed up a thinko in the first
version of this patch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Masoud Sharbiani <masouds@google.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-15 16:31:51 -08:00
Adrian Bunk f13b83580a [PATCH] fs/namespace.c:dup_namespace(): fix a use after free
The Coverity checker spotted the following bug in dup_namespace():

<--  snip  -->

        if (!new_ns->root) {
                up_write(&namespace_sem);
                kfree(new_ns);
                goto out;
        }
...
out:
        return new_ns;

<--  snip  -->

Callers expect a non-NULL result to not be freed.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-15 09:37:34 -08:00
Christoph Lameter 4983da07f1 [PATCH] page migration: fail if page is in a vma flagged VM_LOCKED
page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.

However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1).  We can check for that return code
and avoid retrying the migration.

migrate_page_remove_references() now needs to return a reason why the
failure occured.  So switch migrate_page_remove_references to use -Exx
style error messages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 21:43:02 -08:00
Linus Torvalds 0ee10a4423 Merge git://oss.sgi.com:8090/oss/git/rc-fixes
* git://oss.sgi.com:8090/oss/git/rc-fixes:
  Fix a direct I/O locking issue revealed by the new mutex code.
2006-03-14 20:50:45 -08:00
Nathan Scott 3fb962bde4 Fix a direct I/O locking issue revealed by the new mutex code.
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations.  This can cause
a direct read to attempt a double i_mutex unlock inside XFS.

We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.

Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2

Signed-off-by: Nathan Scott <nathans@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2006-03-15 15:14:45 +11:00
Dave Kleikamp c5111f504d Merge with /home/shaggy/git/linus-clean/ 2006-03-14 17:05:45 -06:00
Dave Kleikamp a488edc914 [PATCH] JFS: Take logsync lock before testing mp->lsn
This fixes a race where lsn could be cleared before taking the lock

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 14:00:48 -08:00
Trond Myklebust 30f4e20a0d [PATCH] NLM: Ensure we do not Oops in the case of an unlock
In theory, NLM specs assure us that the server will only reply LCK_GRANTED or
LCK_DENIED_GRACE_PERIOD to our NLM_UNLOCK request.

In practice, we should not assume this to be the case, and the code will
currently Oops if we do.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 07:57:18 -08:00
Trond Myklebust c12e87f465 [PATCH] NFSv4: fix mount segfault on errors returned that are < -1000
It turns out that nfs4_proc_get_root() may return raw NFSv4 errors instead of
mapping them to kernel errors.  Problem spotted by Neil Horman
<nhorman@tuxdriver.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-14 07:57:18 -08:00