Commit Graph

64 Commits

Author SHA1 Message Date
Steven Whitehouse b9af7ca6d3 [GFS2] Fix setting of inherit jdata attr
Due to a mix up between the jdata attribute and inherit jdata attribute
it has not been possible to set the inherit jdata attribute on
directories. This is now fixed and the ioctl will report the inherit
jdata attribute for directories rather than the jdata attribute as it
did previously. This stems from our need to have the one bit in the
ioctl attr flags mean two different things according to whether the
underlying inode is a directory or not.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-08-14 10:34:11 +01:00
Christoph Hellwig 0af1a45046 rename setlease to generic_setlease
Make it a little more clear that this is the default implementation for
the setleast operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:43 -07:00
Nick Piggin d0217ac04c mm: fault feedback #1
Change ->fault prototype.  We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
 FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things.  This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin 54cb8821de mm: merge populate and nopage into fault (fixes nonlinear)
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Nick Piggin d00806b183 mm: fix fault vs invalidate race for linear mappings
Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache.  Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size.  do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size).  The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data.  Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit.  He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight.  However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit.  Scalability is not an issue.

This patch implements this latter approach.  ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so).  do_no_page only unlocks the page after setting up the mapping
completely.  invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:41 -07:00
Marc Eshel 60446067ba gfs2: stop giving out non-cluster-coherent leases
Since gfs2 can't prevent conflicting opens or leases on other nodes, we
probably shouldn't allow it to give out leases at all.

Put the newly defined lease operation into use in gfs2 by turning off
lease, unless we're using the "nolock' locking module (in which case all
locking is local anyway).

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
2007-07-18 19:17:19 -04:00
Linus Torvalds 1b21f458dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
  [GFS2] Accept old format NFS filehandles
  [GFS2] Small fixes to logging code
  [DLM] dump more lock values
  [GFS2] Remove i_mode passing from NFS File Handle
  [GFS2] Obtaining no_formal_ino from directory entry
  [GFS2] git-gfs2-nmw-build-fix
  [GFS2] System won't suspend with GFS2 file system mounted
  [GFS2] remounting w/o acl option leaves acls enabled
  [GFS2] inode size inconsistency
  [DLM] Telnet to port 21064 can stop all lockspaces
  [GFS2] Fix gfs2_block_truncate_page err return
  [GFS2] Addendum to the journaled file/unmount patch
  [GFS2] Simplify multiple glock aquisition
  [GFS2] assertion failure after writing to journaled file, umount
  [GFS2] Use zero_user_page() in stuffed_readpage()
  [GFS2] Remove bogus '\0' in rgrp.c
  [GFS2] Journaled file write/unstuff bug
  [DLM] don't require FS flag on all nodes
  [GFS2] Fix deallocation issues
  [GFS2] return conflicts for GETLK
  ...
2007-07-10 13:56:13 -07:00
Jens Axboe 5ffc4ef45b sendfile: remove .sendfile from filesystems that use generic_file_sendfile()
They can use generic_file_splice_read() instead. Since sys_sendfile() now
prefers that, there should be no change in behaviour.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:13 +02:00
Steven Whitehouse dbb7cae2a3 [GFS2] Clean up inode number handling
This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09 08:22:24 +01:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Marc Eshel 586759f03e gfs2: nfs lock support for gfs2
Add NFS lock support to GFS2.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-06 20:38:50 -04:00
Marc Eshel 9d6a8c5c21 locks: give posix_test_lock same interface as ->lock
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces.  Modify posix_test_lock() so the two agree,
simplifying some code in the process.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 17:39:00 -04:00
Tim Schmielau cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Steven Whitehouse 3699e3a44b [GFS2] Clean up/speed up readdir
This removes the extra filldir callback which gfs2 was using to
enclose an attempt at readahead for inodes during readdir. The
code was too complicated and also hurts performance badly in the
case that the getdents64/readdir call isn't being followed by
stat() and it wasn't even getting it right all the time when it
was.

As a result, on my test box an "ls" of a directory containing 250000
files fell from about 7mins (freshly mounted, so nothing cached) to
between about 15 to 25 seconds. When the directory content was cached,
the time taken fell from about 3mins to about 4 or 5 seconds.

Interestingly in the cached case, running "ls -l" once reduced the time
taken for subsequent runs of "ls" to about 6 secs even without this
patch. Now it turns out that there was a special case of glocks being
used for prefetching the metadata, but because of the timeouts for these
locks (set to 10 secs) the metadata was being timed out before it was
being used and this the prefetch code was constantly trying to prefetch
the same data over and over.

Calling "ls -l" meant that the inodes were brought into memory and once
the inodes are cached, the glocks are not disposed of until the inodes
are pushed out of the cache, thus extending the lifetime of the glocks,
and thus bringing down the time for subsequent runs of "ls"
considerably.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:04 -05:00
Josef Sipek 81454098f7 [PATCH] struct path: convert gfs2
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08 08:28:45 -08:00
Steven Whitehouse 34126f9f41 [GFS2] Change gfs2_fsync() to use write_inode_now()
This is a bit better than the previous version of gfs2_fsync()
although it would be better still if we were able to call a
function which only wrote the inode & metadata. Its no big deal
though that this will potentially write the data as well since
the VFS has already done that before calling gfs2_fsync(). I've
also added a comment to explain whats going on here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@osdl.org>
2006-12-07 09:13:14 -05:00
Steven Whitehouse 33c3de3287 [GFS2] Don't flush everything on fdatasync
The gfs2_fsync() function was doing a journal flush on each
and every call. While this is correct, its also a lot of
overhead. This patch means that on fdatasync flushes we
rely on the VFS to flush the data for us and we don't do
a journal flush unless we really need to.

We have to do a journal flush for stuffed files though because
they have the data and the inode metadata in the same block.
Journaled files also need a journal flush too of course.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:37:44 -05:00
Steven Whitehouse dcd2479959 [GFS2] Remove unused function from inode.c
The gfs2_glock_nq_m_atime function is unused in so far as its only
ever called with num_gh = 1, and this falls through to the
gfs2_glock_nq_atime function, so we might as well call that directly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:35:57 -05:00
Steven Whitehouse 6b124d8dba [GFS2] Only set inode flags when required
We were setting the inode flags from GFS2's flags far too often, even when they
couldn't possibly have changed. This patch reduces the amount of flag
setting going on so that we do it only when the inode is read in or
when the flags have changed. The create case is covered by the "when
the inode is read in" case.

This also fixes a bug where we didn't set S_SYNC correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:45 -05:00
Steven Whitehouse b60623c238 [GFS2] Shrink gfs2_inode (3) - di_mode
This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:34:14 -05:00
Steven Whitehouse 539e5d6b7a [GFS2] Change argument of gfs2_dinode_out
Everywhere this was called, a struct gfs2_inode was available,
but despite that, it was always called with a struct gfs2_dinode
as an argument. By making this change it paves the way to start
eliminating fields duplicated between the kernel's struct inode
and the struct gfs2_dinode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:54 -05:00
Al Viro 9c9ab3d541 [GFS2] gfs2 __user misannotation fix
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:49 -05:00
Al Viro 629a21e7ec [GFS2] split and annotate gfs2_inum
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-11-30 10:33:32 -05:00
Steven Whitehouse 128e5ebaf8 [GFS2] Remove iflags.h, use FS_
Update GFS2 in the light of David Howells' patch:

[PATCH] BLOCK: Move common FS-specific ioctls to linux/fs.h [try #6]
36695673b0

which calls the filesystem independant flags FS_..._FL. As a result
we no longer need the flags.h file and the conversion routine is
moved into the GFS2 source code.

Userland programs which used to include iflags.h should now include
fs.h and use the new flag names.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 11:24:43 -04:00
Steven Whitehouse d00223f169 [GFS2] Fix code style/indent in ops_file.c
Fix a couple of minor issues.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-10-02 10:28:05 -04:00