Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup().
This is not a big issue because fuse_notify_poll_wakeup() should be
atomic, but it's cleaner this way, and later uses of notification will
need to be able to finish the copying before performing some actions.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
If a fuse filesystem is unmounted but the device file descriptor
remains open and a new mount reuses the old device number, then the
mount fails with EEXIST and the following warning is printed in the
kernel log:
WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d()
sysfs: duplicate filename '0:15' can not be created
The cause is that the bdi belonging to the fuse filesystem was
destoryed only after the device file was released. Fix this by
calling bdi_destroy() from fuse_put_super() instead.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up annotations of fc->lock
fuse: fix sparse warning in ioctl
fuse: update interface version
fuse: add fuse_conn->release()
fuse: separate out fuse_conn_init() from new_conn()
fuse: add fuse_ prefix to several functions
fuse: implement poll support
fuse: implement unsolicited notification
fuse: add file kernel handle
fuse: implement ioctl support
fuse: don't let fuse_req->end() put the base reference
fuse: move FUSE_MINOR to miscdevice.h
fuse: style fixes
Makes the existing annotations match the more common one per line style
and adds a few missing annotations.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE. With or without CUSE
export, having the function names scoped is a good idea for
debuggability.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Implement poll support. Polled files are indexed using kh in a RB
tree rooted at fuse_conn->polled_files.
Client should send FUSE_NOTIFY_POLL notification once after processing
FUSE_POLL which has FUSE_POLL_SCHEDULE_NOTIFY set. Sending
notification unconditionally after the latest poll or everytime file
content might have changed is inefficient but won't cause malfunction.
fuse_file_poll() can sleep and requires patches from the following
thread which allows f_op->poll() to sleep.
http://thread.gmane.org/gmane.linux.kernel/726176
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Clients always used to write only in response to read requests. To
implement poll efficiently, clients should be able to issue
unsolicited notifications. This patch implements basic notification
support.
Zero fuse_out_header.unique is now accepted and considered unsolicited
notification and the error field contains notification code. This
patch doesn't implement any actual notification.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
fuse_req->end() was supposed to be put the base reference but there's
no reason why it should. It only makes things more complex. Move it
out of ->end() and make it the responsibility of request_end().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: fuse-devel@lists.sourceforge.net
Signed-off-by: James Morris <jmorris@namei.org>
As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.
So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set. And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quoting Linus (3 years ago, FUSE inclusion discussions):
"User-space filesystems are hard to get right. I'd claim that they
are almost impossible, unless you limit them somehow (shared
writable mappings are the nastiest part - if you don't have those,
you can reasonably limit your problems by limiting the number of
dirty pages you accept through normal "write()" calls)."
Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others). This nicely solved the biggest problem: limiting the number of pages
used for write caching.
Some small details remained, however, which this largish patch attempts to
address. It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks. Performance may not be very
good for certain usage patterns, but generally it should be acceptable.
It has been tested extensively with fsx-linux and bash-shared-mapping.
Fuse page writeback design
--------------------------
fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.
The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.
For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented. The per-bdi writeback count is not decremented until the actual
write completes.
On dirtying the page, fuse waits for a previous write to finish before
proceeding. This makes sure, there can only be one temporary page used at a
time for one cached page.
This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?
The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write. It may be buggy or even
malicious, and fail to complete WRITE requests. We don't want unrelated parts
of the system to grind to a halt in such cases.
Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request. There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.
Currently there are several cases where the kernel can block on page
writeback:
- allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
- page migration
- throttle_vm_writeout (through NR_WRITEBACK)
- sync(2)
Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.
As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default. This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.
With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Libfuse basically creates a new thread for each new request. This is fine for
synchronous requests, which are naturally limited. However background
requests (especially writepage) can cause a thread creation storm.
To avoid this, limit the number of background requests available to userspace.
This is done by introducing another queue for background requests, and a
counter for the number of "active" requests, which are currently available for
userspace.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't return -ENOENT for a read() on the fuse device when the request was
aborted. Instead return -ENODEV, meaning the filesystem has been
force-umounted or aborted.
Previously ENOENT meant that the request was interrupted, but now the
'aborted' flag is not set in case of interrupts.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't set 'aborted' flag on a request if it's interrupted. We have to wait
for the answer anyway, and this would only a very little time while copying
the reply.
This means, that write() on the fuse device will not return -ENOENT during
normal operation, only if the filesystem is aborted by a forced umount or
through the fusectl interface.
This could simplify userspace code somewhat when backward compatibility with
earlier kernel versions is not required.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make lifetime of 'struct fuse_file' independent from 'struct file' by adding a
reference counter and destructor.
This will enable asynchronous page writeback, where it cannot be guaranteed,
that the file is not released while a request with this file handle is being
served.
The actual RELEASE request is only sent when there are no more references to
the fuse_file.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use wake_up_all instead of wake_up in put_reserved_req(), otherwise it is
possible that the right task is not woken up.
Also create a separate reserved_req_waitq in addition to the blocked_waitq,
since they fulfill totally separate functions.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Set the read and write congestion state if the request queue is close to
blocking, and clear it when it's not.
This prevents unnecessary blocking in readahead and (when writable mmaps are
allowed) writeback.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#
set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done
The script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
request_end and fuse_read_interrupt release fc->lock. Add lock annotations
to these two functions so that sparse can check callers for lock pairing,
and so that sparse will not complain about these functions since they
intentionally use locks in this manner.
Signed-off-by: Josh Triplett <josh@freedesktop.org>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>