Currently /proc/locks is shown with a proc_read function, but its behavior
is rather complex as it has to manually handle current offset and buffer
length. On the other hand, files that show objects from lists can be
easily reimplemented using the sequential files and the seq_list_XXX()
helpers.
This saves (as usually) 16 lines of code and more than 200 from
the .text section.
[akpm@linux-foundation.org: no externs in C]
[akpm@linux-foundation.org: warning fixes]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/locks.c: use list_for_each_entry() instead of list_for_each() in
posix_locks_deadlock() and get_locks_status()
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.
Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.
The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This code is run under lock_kernel(), which is dropped during
sleeping operations, so the following race is possible:
CPU1: CPU2:
vfs_setlease(); vfs_setlease();
lock_kernel();
lock_kernel(); /* spin */
generic_setlease():
...
for (before = ...)
/* here we found some lease after
* which we will insert the new one
*/
fl = locks_alloc_lock();
/* go to sleep in this allocation and
* drop the BKL
*/
generic_setlease():
...
for (before = ...)
/* here we find the "before" pointing
* at the one we found on CPU1
*/
->fl_change(my_before, arg);
lease_modify();
locks_free_lock();
/* and we freed it */
...
unlock_kernel();
locks_insert_lock(before, fl);
/* OOPS! We have just tried to add the lease
* at the tail of already removed one
*/
The similar races are already handled in other code - all the
allocations are performed before any checks/updates.
Thanks to Kamalesh Babulal for testing and for a bug report on an
earlier version.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
This routine deletes all the elements from the list
with the "while (!list_empty())" loop, and we already
have a list_first_entry() macro to help it look nicer :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
This comment wasn't updated when lease support was added, and it makes
essentially the same mistake that the code made before a recent bugfix.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When the flock_lock_file() is called to change the flock
from F_RDLCK to F_WRLCK or vice versa the existing flock
can be removed without appropriate warning.
Look:
for_each_lock(inode, before) {
struct file_lock *fl = *before;
if (IS_POSIX(fl))
break;
if (IS_LEASE(fl))
continue;
if (filp != fl->fl_file)
continue;
if (request->fl_type == fl->fl_type)
goto out;
found = 1;
locks_delete_lock(before); <<<<<< !
break;
}
if after this point the subsequent locks_alloc_lock() will
fail the return code will be -ENOMEM, but the existing lock
is already removed.
This is a known feature that such "re-locking" is not atomic,
but in the racy case the file should stay locked (although by
some other process), but in this case the file will be unlocked.
The proposal is to prepare the lock in advance keeping no chance
to fail in the future code.
Found during making the flocks pid-namespaces aware.
(Note: Thanks to Reuben Farrelly for finding a bug in an earlier version
of this patch.)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Reuben Farrelly <reuben-linuxkernel@reub.net>
There's no need for another variable local to this loop; we can use the
variable (of the same name!) already declared at the top of the function,
and not used till later (at which point it's initialized, so this is safe).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The first argument to posix_locks_conflict() is meant to be a lock request,
and the second a lock from an inode's lock request. It doesn't really
make a difference which order you call them in, since the only
asymmetric test in posix_lock_conflict() is the check whether the second
argument is a posix lock--and every caller already does that check for
some reason.
But may as well fix posix_test_lock() to call posix_locks_conflict()
with the arguments in the same order as everywhere else.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
The inode->i_flock list contains the leases, flocks and posix
locks in the specified order. However, the flocks are added in
the head of this list thus hiding the leases from F_GETLEASE
command, from time_out_leases() and other code that expects
the leases to come first.
The following example will demonstrate this:
#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/file.h>
static void show_lease(int fd)
{
int res;
res = fcntl(fd, F_GETLEASE);
switch (res) {
case F_RDLCK:
printf("Read lease\n");
break;
case F_WRLCK:
printf("Write lease\n");
break;
case F_UNLCK:
printf("No leases\n");
break;
default:
printf("Some shit\n");
break;
}
}
int main(int argc, char **argv)
{
int fd, res;
fd = open(argv[1], O_RDONLY);
if (fd == -1) {
perror("Can't open file");
return 1;
}
res = fcntl(fd, F_SETLEASE, F_WRLCK);
if (res == -1) {
perror("Can't set lease");
return 1;
}
show_lease(fd);
if (flock(fd, LOCK_SH) == -1) {
perror("Can't flock shared");
return 1;
}
show_lease(fd);
return 0;
}
The first call to show_lease() will show the write lease set, but
the second will show no leases.
Fix the flock adding so that the leases always stay in the head
of this list.
Found during making the flocks pid-namespaces aware.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Thanks to Doug Chapman for pointing out that the comment here is
inconsistent with the function prototype.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
or something other than F_UNLCK, the return value is no longer needed.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Currently leases are only kept locally, so there's no way for a distributed
filesystem to enforce them against multiple clients. We're particularly
interested in the case of nfsd exporting a cluster filesystem, in which
case nfsd needs cluster-coherent leases in order to implement delegations
correctly.
Also add some documentation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.
So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.
Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Share more code between setlease (used by nfsd) and fcntl.
Also some minor cleanup.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Christoph Hellwig <hch@infradead.org>
Return the newly allocated structure as the return value instead of
using a struct ** parameter.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's no point trying to return an error in these cases, which all represent
bugs in the callers.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
clarify that break_lease() checks for presence of any lock, not just leases.
Signed-off-by: David M. Richter <richterd@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
In 9d6a8c5c21 we changed posix_test_lock
to modify its single file_lock argument instead of taking separate input
and output arguments. This makes it no longer safe to set the output
lock's fl_type to F_UNLCK before looking for a conflict, since that
means searching for a conflict against a lock with type F_UNLCK.
This fixes a regression which causes F_GETLK to incorrectly report no
conflict on most filesystems (including any filesystem that doesn't do
its own locking).
Also fix posix_lock_to_flock() to copy the lock type. This isn't
strictly necessary, since the caller already does this; but it seems
less likely to cause confusion in the future.
Thanks to Doug Chapman for the bug report.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
gfs2: nfs lock support for gfs2
lockd: add code to handle deferred lock requests
lockd: always preallocate block in nlmsvc_lock()
lockd: handle test_lock deferrals
lockd: pass cookie in nlmsvc_testlock
lockd: handle fl_grant callbacks
lockd: save lock state on deferral
locks: add fl_grant callback for asynchronous lock return
nfsd4: Convert NFSv4 to new lock interface
locks: add lock cancel command
locks: allow {vfs,posix}_lock_file to return conflicting lock
locks: factor out generic/filesystem switch from setlock code
locks: factor out generic/filesystem switch from test_lock
locks: give posix_test_lock same interface as ->lock
locks: make ->lock release private data before returning in GETLK case
locks: create posix-to-flock helper functions
locks: trivial removal of unnecessary parentheses
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.
I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.
I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.
Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).
There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>