Commit Graph

209 Commits

Author SHA1 Message Date
Linus Torvalds
541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
Peter Zijlstra
14358e6dda lockdep: annotate dir vs file i_mutex
On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
> The circular lock seems to be this:
> 
> #1:
> 
>   sys_mmap2:              down_write(&mm->mmap_sem);
>   nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
> 
> 
> #0:
> 
>   vfs_readdir:     mutex_lock(&inode->i_mutex);
>    - during the readdir (filldir64), we take a user fault (missing page?)
>     and call do_page_fault -
>   do_page_fault:   down_read(&mm->mmap_sem);
> 
> 
> So it does indeed look like a circular locking. Now the question is, "is
> this a bug?".  Looking like the inode of #1 must be a file or something
> else that you can mmap and the inode of #0 seems it must be a directory.
> I would say "no".
> 
> Now if you can readdir on a file or mmap a directory, then this could be
> an issue.
> 
> Otherwise, I'd love to see someone teach lockdep about this issue! ;-)

Make a distinction between file and dir usage of i_mutex.
The inode should be complete and unused at unlock_new_inode(), re-init
i_mutex depending on its type.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-14 01:38:33 +02:00
Peter Zijlstra
d475fd428c lockdep: per filesystem inode lock class
Give each filesystem its own inode lock class. The various filesystems have
different locking order wrt the inode locks; esp. the pseudo filesystems differ
from the rest.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15 14:51:31 +02:00
Pavel Emelyanov
7f8ada98d9 Rework /proc/locks via seq_files and seq_list helpers
Currently /proc/locks is shown with a proc_read function, but its behavior
is rather complex as it has to manually handle current offset and buffer
length.  On the other hand, files that show objects from lists can be
easily reimplemented using the sequential files and the seq_list_XXX()
helpers.

This saves (as usually) 16 lines of code and more than 200 from
the .text section.

[akpm@linux-foundation.org: no externs in C]
[akpm@linux-foundation.org: warning fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
Pavel Emelyanov
a16877ca9c Cleanup macros for distinguishing mandatory locks
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode).  However, fs/locks.c and some filesystems still perform
the explicit i_mode checking.  Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.

Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.

The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
Adrian Bunk
ec05b297f9 [PATCH] remove mm/filemap.c:file_send_actor()
This patch removes the no longer used file_send_actor().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-08-11 22:34:47 +02:00
Christoph Hellwig
0af1a45046 rename setlease to generic_setlease
Make it a little more clear that this is the default implementation for
the setleast operation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31 15:39:43 -07:00
Fengguang Wu
f9acc8c7b3 readahead: sanify file_ra_state names
Rename some file_ra_state variables and remove some accessors.

It results in much simpler code.
Kudos to Rusty!

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
Fengguang Wu
c743d96b6d readahead: remove the old algorithm
Remove the old readahead algorithm.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
Fengguang Wu
5ce1110b92 readahead: data structure and routines
Extend struct file_ra_state to support the on-demand readahead logic.  Also
define some helpers for it.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
Akinobu Mita
e53252d97e unregister_chrdev() return void
unregister_chrdev() does not return meaningful value.  This patch makes it
return void like most unregister_* functions.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:43 -07:00
Linus Torvalds
a8dcf12f9e Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
  locks: fix vfs_test_lock() comment
  locks: make posix_test_lock() interface more consistent
  nfs: disable leases over NFS
  gfs2: stop giving out non-cluster-coherent leases
  locks: export setlease to filesystems
  locks: provide a file lease method enabling cluster-coherent leases
  locks: rename lease functions to reflect locks.c conventions
  locks: share more common lease code
  locks: clean up lease_alloc()
  locks: convert an -EINVAL return to a BUG
  leases: minor break_lease() comment clarification
2007-07-18 18:27:00 -07:00
J. Bruce Fields
6d34ac199a locks: make posix_test_lock() interface more consistent
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
or something other than F_UNLCK, the return value is no longer needed.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18 19:17:19 -04:00
J. Bruce Fields
4698afe8e3 locks: export setlease to filesystems
Export setlease so it can used by filesystems to implement their lease
methods.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18 19:17:06 -04:00
J. Bruce Fields
f9ffed26d6 locks: provide a file lease method enabling cluster-coherent leases
Currently leases are only kept locally, so there's no way for a distributed
filesystem to enforce them against multiple clients.  We're particularly
interested in the case of nfsd exporting a cluster filesystem, in which
case nfsd needs cluster-coherent leases in order to implement delegations
correctly.

Also add some documentation.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-07-18 19:14:47 -04:00
J. Bruce Fields
a9933cea7a locks: rename lease functions to reflect locks.c conventions
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.

So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.

Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18 19:14:12 -04:00
Amit Arora
97ac73506c sys_fallocate() implementation on i386, x86_64 and powerpc
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
   and ppc). Patches for s390(x) and ia64 are already available from
   previous posts, but it was decided that they should be added later
   once fallocate is in the mainline. Hence not including those patches
   in this take.
2. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()

Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17 21:42:44 -04:00
Satyam Sharma
3bd858ab1c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 12:00:03 -07:00
Christoph Hellwig
a569425512 knfsd: exportfs: add exportfs.h header
currently the export_operation structure and helpers related to it are in
fs.h.  fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:06 -07:00
Akinobu Mita
f4480240f7 unregister_blkdev(): return void
Put WARN_ON and fixed all callers of unregister_blkdev().  Now we can make
unregister_blkdev return void.

Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Adrian Bunk
62239ac2b3 proper prototype for proc_nr_files()
Add a proper prototype for proc_nr_files() in include/linux/fs.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:03 -07:00
Stefan Richter
9e7bf24b1b fs: clarify "dummy" member in struct inodes_stat_t
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:45 -07:00
David Howells
e8d6c55412 AFS: implement file locking
Implement file locking for AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:43 -07:00
Andrew Morton
fc9a07e7bf invalidate_mapping_pages(): add cond_resched
invalidate_mapping_pages() can sometimes take a long time (millions of pages
to free).  Long enough for the softlockup detector to trigger.

We used to have a cond_resched() in there but I took it out because the
drop_caches code calls invalidate_mapping_pages() under inode_lock.

The patch adds a nasty flag and puts the cond_resched() back.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:36 -07:00
Jens Axboe
d96e6e7164 Remove remnants of sendfile()
There are now zero users of .sendfile() in the kernel, so kill
it from the file_operations structure and in do_sendfile().

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:15 +02:00