Commit Graph

453 Commits

Author SHA1 Message Date
Linus Torvalds 9bc9ccd7db Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
2013-11-13 15:34:18 +09:00
Al Viro ceaec15d49 9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-10-24 23:34:47 -04:00
David Howells 94d30ae90a FS-Cache: Provide the ability to enable/disable cookies
Provide the ability to enable and disable fscache cookies.  A disabled cookie
will reject or ignore further requests to:

	Acquire a child cookie
	Invalidate and update backing objects
	Check the consistency of a backing object
	Allocate storage for backing page
	Read backing pages
	Write to backing pages

but still allows:

	Checks/waits on the completion of already in-progress objects
	Uncaching of pages
	Relinquishment of cookies

Two new operations are provided:

 (1) Disable a cookie:

	void fscache_disable_cookie(struct fscache_cookie *cookie,
				    bool invalidate);

     If the cookie is not already disabled, this locks the cookie against other
     dis/enablement ops, marks the cookie as being disabled, discards or
     invalidates any backing objects and waits for cessation of activity on any
     associated object.

     This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
     but it reinitialises the cookie such that it can be reenabled.

     All possible failures are handled internally.  The caller should consider
     calling fscache_uncache_all_inode_pages() afterwards to make sure all page
     markings are cleared up.

 (2) Enable a cookie:

	void fscache_enable_cookie(struct fscache_cookie *cookie,
				   bool (*can_enable)(void *data),
				   void *data)

     If the cookie is not already enabled, this locks the cookie against other
     dis/enablement ops, invokes can_enable() and, if the cookie is not an
     index cookie, will begin the procedure of acquiring backing objects.

     The optional can_enable() function is passed the data argument and returns
     a ruling as to whether or not enablement should actually be permitted to
     begin.

     All possible failures are handled internally.  The cookie will only be
     marked as enabled if provisional backing objects are allocated.

A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
is then contingent on i_writecount <= 0.  can_enable() checks for a race
between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
handling and allows us to get rid of open(O_RDONLY) accidentally introducing
caching to an inode that's open for writing already.

One operation has its API modified:

 (3) Acquire a cookie.

	struct fscache_cookie *fscache_acquire_cookie(
		struct fscache_cookie *parent,
		const struct fscache_cookie_def *def,
		void *netfs_data,
		bool enable);

     This now has an additional argument that indicates whether the requested
     cookie should be enabled by default.  It doesn't need the can_enable()
     function because the caller must prevent multiple calls for the same netfs
     object and it doesn't need to take the enablement lock because no one else
     can get at the cookie before this returns.

Signed-off-by: David Howells <dhowells@redhat.com
2013-09-27 18:40:25 +01:00
Al Viro 8061a6fa56 9p: don't forget to destroy inode cache if fscache registration fails
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-17 22:31:01 -04:00
Al Viro 03da633aa7 atomic_open: take care of EEXIST in no-open case with O_CREAT|O_EXCL in fs/namei.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-09-17 17:08:50 -04:00
Will Deacon 50192abe02 fs/9p: avoid accessing utsname after namespace has been torn down
During trinity fuzzing in a kvmtool guest, I stumbled across the
following:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
PC is at v9fs_file_do_lock+0xc8/0x1a0
LR is at v9fs_file_do_lock+0x48/0x1a0
[<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124)
[<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4)
[<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8)
[<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8)
[<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0)
[<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18)

I believe this is due to an attempt to access utsname()->nodename, after
exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
as NULL and causing the above dereference.

A similar issue was fixed for lockd in 9a1b6bf818 ("LOCKD: Don't call
utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
something similar for 9pfs.

Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-08-26 10:28:46 -05:00
Chen Gang b3f8ab4b79 fs: 9p: use strlcpy instead of strncpy
For 'NULL' terminated string, recommend always to be ended by zero.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-30 10:42:56 -05:00
Linus Torvalds 19d2f8e0fb Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull second round of 9p patches from Eric Van Hensbergen:
 "Several of these patches were rebased in order to correct style
  issues.  Only stylistic changes were made versus the patches which
  were in linux-next for two weeks.  The rebases have been in linux-next
  for 3 days and have passed my regressions.

  The bulk of these are RDMA fixes and improvements.  There's also some
  additions on the extended attributes front to support some additional
  namespaces and a new option for TCP to force allocation of mount
  requests from a priviledged port"

* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
  9P: Add cancelled() to the transport functions.
  9P/RDMA: count posted buffers without a pending request
  9P/RDMA: Improve error handling in rdma_request
  9P/RDMA: Do not free req->rc in error handling in rdma_request()
  9P/RDMA: Use a semaphore to protect the RQ
  9P/RDMA: Protect against duplicate replies
  9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
  9pnet: refactor struct p9_fcall alloc code
  9P/RDMA: rdma_request() needs not allocate req->rc
  9P: Fix fcall allocation for rdma
  fs/9p: xattr: add trusted and security namespaces
  net/9p: add privport option to 9p tcp transport
2013-07-11 10:21:23 -07:00
Gu Zheng f2692ea8d5 fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
Delete the unused variable "err" in v9fs_vfs_getattr()

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:18:31 -05:00
Jim Garlick d9a738597f fs/9p: xattr: add trusted and security namespaces
Allow requests for security.* and trusted.* xattr name spaces
to pass through to server.

The new files are 99% cut and paste from fs/9p/xattr_user.c with the
namespaces changed.  It has the intended effect in superficial testing.
I do not know much detail about how these namespaces are used, but passing
them through to the server, which can decide whether to handle them or not,
seems reasonable.

I want to support a use case where an ext4 file system is mounted via 9P,
then re-exported via samba to windows clients in a cluster.  Windows wants
to store xattrs such as security.NTACL.  This works when ext4 directly
backs samba, but not when 9P is inserted.  This use case is documented here:
   http://code.google.com/p/diod/issues/detail?id=95

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2013-07-07 22:02:18 -05:00
Linus Torvalds 9e239bb939 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 update from Ted Ts'o:
 "Lots of bug fixes, cleanups and optimizations.  In the bug fixes
  category, of note is a fix for on-line resizing file systems where the
  block size is smaller than the page size (i.e., file systems 1k blocks
  on x86, or more interestingly file systems with 4k blocks on Power or
  ia64 systems.)

  In the cleanup category, the ext4's punch hole implementation was
  significantly improved by Lukas Czerner, and now supports bigalloc
  file systems.  In addition, Jan Kara significantly cleaned up the
  write submission code path.  We also improved error checking and added
  a few sanity checks.

  In the optimizations category, two major optimizations deserve
  mention.  The first is that ext4_writepages() is now used for
  nodelalloc and ext3 compatibility mode.  This allows writes to be
  submitted much more efficiently as a single bio request, instead of
  being sent as individual 4k writes into the block layer (which then
  relied on the elevator code to coalesce the requests in the block
  queue).  Secondly, the extent cache shrink mechanism, which was
  introduce in 3.9, no longer has a scalability bottleneck caused by the
  i_es_lru spinlock.  Other optimizations include some changes to reduce
  CPU usage and to avoid issuing empty commits unnecessarily."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  ext4: optimize starting extent in ext4_ext_rm_leaf()
  jbd2: invalidate handle if jbd2_journal_restart() fails
  ext4: translate flag bits to strings in tracepoints
  ext4: fix up error handling for mpage_map_and_submit_extent()
  jbd2: fix theoretical race in jbd2__journal_restart
  ext4: only zero partial blocks in ext4_zero_partial_blocks()
  ext4: check error return from ext4_write_inline_data_end()
  ext4: delete unnecessary C statements
  ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
  jbd2: move superblock checksum calculation to jbd2_write_superblock()
  ext4: pass inode pointer instead of file pointer to punch hole
  ext4: improve free space calculation for inline_data
  ext4: reduce object size when !CONFIG_PRINTK
  ext4: improve extent cache shrink mechanism to avoid to burn CPU time
  ext4: implement error handling of ext4_mb_new_preallocation()
  ext4: fix corruption when online resizing a fs with 1K block size
  ext4: delete unused variables
  ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
  jbd2: remove debug dependency on debug_fs and update Kconfig help text
  jbd2: use a single printk for jbd_debug()
  ...
2013-07-02 09:39:34 -07:00
Al Viro 8f29843a51 [readdir] convert 9p
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:45 +04:00
Lukas Czerner d47992f86b mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
2013-05-21 23:17:23 -04:00
Kent Overstreet a27bb332c0 aio: don't include aio.h in sched.h
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 20:16:25 -07:00
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds 56a79b7b02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull  more VFS bits from Al Viro:
 "Unfortunately, it looks like xattr series will have to wait until the
  next cycle ;-/

  This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
  etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
  more file_inode() work"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  constify path_get/path_put and fs_struct.c stuff
  fix nommu breakage in shmem.c
  cache the value of file_inode() in struct file
  9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
  9p: make sure ->lookup() adds fid to the right dentry
  9p: untangle ->lookup() a bit
  9p: double iput() in ->lookup() if d_materialise_unique() fails
  9p: v9fs_fid_add() can't fail now
  v9fs: get rid of v9fs_dentry
  9p: turn fid->dlist into hlist
  9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
  more file_inode() open-coded instances
  selinux: opened file can't have NULL or negative ->f_path.dentry

(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)
2013-03-03 13:23:03 -08:00
Al Viro 5e608671df 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
... otherwise the path we'd built isn't worth much.  Don't accept such fids
obtained from paths unless dentry is still alived by the end of the work.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:50:20 -05:00
Al Viro a3b2157e72 9p: make sure ->lookup() adds fid to the right dentry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:29:48 -05:00
Al Viro 7b5be62199 9p: untangle ->lookup() a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:28:21 -05:00
Al Viro 3509b678a6 9p: double iput() in ->lookup() if d_materialise_unique() fails
d_materialise_unique() does iput() itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:21:38 -05:00
Al Viro 2ea03e1d62 9p: v9fs_fid_add() can't fail now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:18:14 -05:00
Al Viro aaeb7ecfb4 v9fs: get rid of v9fs_dentry
->d_fsdata can act as hlist_head...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-28 01:13:19 -05:00
Al Viro c4d30967f3 9p: turn fid->dlist into hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27 22:51:08 -05:00
Al Viro 634095dab2 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-27 22:37:21 -05:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00