Commit Graph

78 Commits

Author SHA1 Message Date
Linus Torvalds
5fec8bdbf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: clean up annotations of fc->lock
  fuse: fix sparse warning in ioctl
  fuse: update interface version
  fuse: add fuse_conn->release()
  fuse: separate out fuse_conn_init() from new_conn()
  fuse: add fuse_ prefix to several functions
  fuse: implement poll support
  fuse: implement unsolicited notification
  fuse: add file kernel handle
  fuse: implement ioctl support
  fuse: don't let fuse_req->end() put the base reference
  fuse: move FUSE_MINOR to miscdevice.h
  fuse: style fixes
2009-01-06 17:01:20 -08:00
Tejun Heo
b93f858ab2 fuse: add fuse_ prefix to several functions
Add fuse_ prefix to request_send*() and get_root_inode() as some of
those functions will be exported for CUSE.  With or without CUSE
export, having the function names scoped is a good idea for
debuggability.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Tejun Heo
acf99433d9 fuse: add file kernel handle
The file handle, fuse_file->fh, is opaque value supplied by userland
FUSE server and uniqueness is not guaranteed.  Add file kernel handle,
fuse_file->kh, which is allocated by the kernel on file allocation and
guaranteed to be unique.

This will be used by poll to match notification to the respective file
but can be used for other purposes where unique file handle is
necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:55 +01:00
Miklos Szeredi
1729a16c2c fuse: style fixes
Fix coding style errors reported by checkpatch and others.  Uptdate
copyright date to 2008.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-11-26 12:03:54 +01:00
David Howells
c69e8d9c01 CRED: Use RCU to access another task's creds and to release a task's own creds
Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:19 +11:00
David Howells
b6dff3ec5e CRED: Separate task security context from task_struct
Separate the task security context from task_struct.  At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:16 +11:00
Al Viro
a110343f0d [PATCH] fix MAY_CHDIR/MAY_ACCESS/LOOKUP_ACCESS mess
* MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS
* MAY_ACCESS on fuse should affect only the last step of pathname resolution
* fchdir() and chroot() should pass MAY_ACCESS, for the same reason why
  chdir() needs that.
* now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be
  removed; it has no business being in nameidata.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:21 -04:00
Al Viro
e6305c43ed [PATCH] sanitize ->permission() prototype
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:14 -04:00
Miklos Szeredi
c180eebe13 fuse: add fuse_lookup_name() helper
Add a new helper function which sends a LOOKUP request with the supplied
name.  This will be used by the next patch to send special LOOKUP requests
with "." and ".." as the name.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi
dbd561d236 fuse: add export operations
Implement export_operations, to allow fuse filesystems to be exported to
NFS.  This feature has been in the out-of-tree fuse module, and is widely
used and tested.

It has not been originally merged into mainline, because doing the NFS
export in userspace was thought to be a cleaner and more efficient way of
doing it, than through the kernel.

While that is true, it would also have involved a lot of duplicated effort
at reimplementing NFS exporting (all the different versions of the
protocol).  This effort was unfortunately not undertaken by anyone, so we
are left with doing it the easy but less efficient way.

If this feature goes in, the out-of-tree fuse module can go away,
which would have several advantages:

  - not having to maintain two versions
  - less confusion for users
  - no bugs due to kernel API changes

Comment from hch:
 - Use the same fh_type values as XFS, since we use the same fh encoding.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi
0de6256daa fuse: prepare lookup for nfs export
Use d_splice_alias() instead of d_add() in fuse lookup code, to allow NFS
exporting.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Miklos Szeredi
5c5c5e51b2 fuse: update file size on short read
If the READ request returned a short count, then either

  - cached size is incorrect
  - filesystem is buggy, as short reads are only allowed on EOF

So assume that the size is wrong and refresh it, so that cached read() doesn't
zero fill the missing chunk.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
3be5a52b30 fuse: support writable mmap
Quoting Linus (3 years ago, FUSE inclusion discussions):

  "User-space filesystems are hard to get right. I'd claim that they
   are almost impossible, unless you limit them somehow (shared
   writable mappings are the nastiest part - if you don't have those,
   you can reasonably limit your problems by limiting the number of
   dirty pages you accept through normal "write()" calls)."

Instead of attempting the impossible, I've just waited for the dirty page
accounting infrastructure to materialize (thanks to Peter Zijlstra and
others).  This nicely solved the biggest problem: limiting the number of pages
used for write caching.

Some small details remained, however, which this largish patch attempts to
address.  It provides a page writeback implementation for fuse, which is
completely safe against VM related deadlocks.  Performance may not be very
good for certain usage patterns, but generally it should be acceptable.

It has been tested extensively with fsx-linux and bash-shared-mapping.

Fuse page writeback design
--------------------------

fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
It copies the contents of the original page, and queues a WRITE request to the
userspace filesystem using this temp page.

The writeback is finished instantly from the MM's point of view: the page is
removed from the radix trees, and the PageDirty and PageWriteback flags are
cleared.

For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
incremented.  The per-bdi writeback count is not decremented until the actual
write completes.

On dirtying the page, fuse waits for a previous write to finish before
proceeding.  This makes sure, there can only be one temporary page used at a
time for one cached page.

This approach is wasteful in both memory and CPU bandwidth, so why is this
complication needed?

The basic problem is that there can be no guarantee about the time in which
the userspace filesystem will complete a write.  It may be buggy or even
malicious, and fail to complete WRITE requests.  We don't want unrelated parts
of the system to grind to a halt in such cases.

Also a filesystem may need additional resources (particularly memory) to
complete a WRITE request.  There's a great danger of a deadlock if that
allocation may wait for the writepage to finish.

Currently there are several cases where the kernel can block on page
writeback:

  - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
  - page migration
  - throttle_vm_writeout (through NR_WRITEBACK)
  - sync(2)

Of course in some cases (fsync, msync) we explicitly want to allow blocking.
So for these cases new code has to be added to fuse, since the VM is not
tracking writeback pages for us any more.

As an extra safetly measure, the maximum dirty ratio allocated to a single
fuse filesystem is set to 1% by default.  This way one (or several) buggy or
malicious fuse filesystems cannot slow down the rest of the system by hogging
dirty memory.

With appropriate privileges, this limit can be raised through
'/sys/class/bdi/<bdi>/max_ratio'.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:50 -07:00
Miklos Szeredi
1a823ac9ff fuse: fix permission checking
I added a nasty local variable shadowing bug to fuse in 2.6.24, with the
result, that the 'default_permissions' mount option is basically ignored.

How did this happen?

 - old err declaration in inner scope
 - new err getting declared in outer scope
 - 'return err' from inner scope getting removed
 - old declaration not being noticed

-Wshadow would have saved us, but it doesn't seem practical for
the kernel :(

More testing would have also saved us :((

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23 17:12:13 -08:00
David Howells
e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Miklos Szeredi
0952b2a4a8 fuse: fix attribute caching after create
Invalidate attributes on create, since st_ctime is updated.  Reported by
Szabolcs Szakacsits.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:13 -08:00
Miklos Szeredi
08b633070a fuse: fix attribute caching after rename
Invalidate attributes on rename, since some filesystems may update
st_ctime.  Reported by Szabolcs Szakacsits

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi
a6643094e7 fuse: pass open flags to read and write
Some open flags (O_APPEND, O_DIRECT) can be changed with fcntl(F_SETFL, ...)
after open, but fuse currently only sends the flags to userspace in open.

To make it possible to correcly handle changing flags, send the
current value to userspace in each read and write.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi
7dca9fd39f fuse: cleanup: add fuse_get_attr_version()
Extract repeated code into helper function, as suggested by Akpm.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi
bcb4be809d fuse: fix reading past EOF
Currently reading a fuse file will stop at cached i_size and return
EOF, even though the file might have grown since the attributes were
last updated.

So detect if trying to read past EOF, and refresh the attributes
before continuing with the read.

Thanks to mpb for the report.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:54 -08:00
Miklos Szeredi
0e9663ee45 fuse: add blksize field to fuse_attr
There are cases when the filesystem will be passed the buffer from a single
read or write call, namely:

 1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go
    through the page cache, but go directly to the userspace fs

 2) currently buffered writes are done with single page requests, but
    if Nick's ->perform_write() patch goes it, it will be possible to
    do larger write requests.  But only if the original write() was
    also bigger than a page.

In these cases the filesystem might want to give a hint to the app
about the optimal I/O size.

Allow the userspace filesystem to supply a blksize value to be returned by
stat() and friends.  If the field is zero, it defaults to the old
PAGE_CACHE_SIZE value.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
f33321141b fuse: add support for mandatory locking
For mandatory locking the userspace filesystem needs to know the lock
ownership for read, write and truncate operations.

This patch adds the necessary fields to the protocol.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
6ff958edbf fuse: add atomic open+truncate support
This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single
request, instead of separate truncate and open requests.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:31 -07:00
Miklos Szeredi
17637cbaba fuse: improve utimes support
Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW.  These
mean, that atime or mtime should be changed to the current time.

Also it is now possible to update atime or mtime individually, not just
together.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:30 -07:00
Miklos Szeredi
49d4914fd7 fuse: clean up open file passing in setattr
Clean up supplying open file to the setattr operation.  In addition to being a
cleanup it prepares for the changes in the way the open file is passed to the
setattr method.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:30 -07:00