Commit Graph

122 Commits

Author SHA1 Message Date
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Wang Shilong 2b0542a4a0 Ext2: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return
NULL value. It will be better to use unlikely to optimize it.

Signed-off-by: Wang shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Zhao Hongjiang ae2cf4284e ext2: fix return values on parse_options() failure
parse_options() in ext2 should return 0 when parse the mount options fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:23:53 +02:00
Kirill A. Shutemov 8c0a853770 fs: push rcu_barrier() from deactivate_locked_super() to filesystems
There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Linus Torvalds a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Jan Kara 1e8b212fe5 ext2: Implement freezing
The only missing piece to make freezing work reliably with ext2 is to
stop iput() of unlinked inode from deleting the inode on frozen filesystem.
So add a necessary protection to ext2_evict_inode().

We also provide appropriate ->freeze_fs and ->unfreeze_fs functions.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 09:45:53 +04:00
Linus Torvalds 08d9329c29 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
 "Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs.  I'm
  on vacation and scarcely checking email since we are expecting baby
  any day now but these fixes should be safe to go in and I don't want
  to delay them unnecessarily."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: avoid info leak on export
  isofs: avoid info leak on export
  udf: Improve table length check to avoid possible overflow
  ext3: Check return value of blkdev_issue_flush()
  jbd: Check return value of blkdev_issue_flush()
  udf: Do not decrement i_blocks when freeing indirect extent block
  udf: Fix memory leak when mounting
  ext2: cleanup the confused goto label
  UDF: Remove unnecessary variable "offset" from udf_fill_inode
  udf: stop using s_dirt
  ext3: force ro mount if ext3_setup_super() fails
  quota: fix checkpatch.pl warning by replacing <asm/uaccess.h> with <linux/uaccess.h>
2012-07-24 17:40:44 -07:00
Jan Kara a117782571 quota: Move quota syncing to ->sync_fs method
Since the moment writes to quota files are using block device page cache and
space for quota structures is reserved at the moment they are first accessed we
have no reason to sync quota before inode writeback. In fact this order is now
only harmful since quota information can easily change during inode writeback
(either because conversion of delayed-allocated extents or simply because of
allocation of new blocks for simple filesystems not using page_mkwrite).

So move syncing of quota information after writeback of inodes into ->sync_fs
method. This way we do not have to use ->quota_sync callback which is primarily
intended for use by quotactl syscall anyway and we get rid of calling
->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
also support legacy non-journalled quotas) and thus there are no dirty quota
structures.

CC: "Theodore Ts'o" <tytso@mit.edu>
CC: Joel Becker <jlbec@evilplan.org>
CC: reiserfs-devel@vger.kernel.org
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Dave Kleikamp <shaggy@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22 23:58:34 +04:00
Wanlong Gao e124a32043 ext2: cleanup the confused goto label
Cleanup the confused goto label, since the big lock has been removed.

Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-07-09 12:03:12 +02:00
Linus Torvalds ece78b7df7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3 and quota fixes from Jan Kara:
 "Interesting bits are:
   - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
     quota code does not need i_mutex anymore in any unusual way.
   - backport (from ext4) of a fix of a checkpointing bug (missing cache
     flush) that could lead to fs corruption on power failure

  The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: trivial fix to comment for ext2_free_blocks
  ext2: remove the redundant comment for ext2_export_ops
  ext3: return 32/64-bit dir name hash according to usage type
  quota: Get rid of nested I_MUTEX_QUOTA locking subclass
  quota: Use precomputed value of sb_dqopt in dquot_quota_sync
  ext2: Remove i_mutex use from ext2_quota_write()
  reiserfs: Remove i_mutex use from reiserfs_quota_write()
  ext4: Remove i_mutex use from ext4_quota_write()
  ext3: Remove i_mutex use from ext3_quota_write()
  quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
  jbd: Write journal superblock with WRITE_FUA after checkpointing
  jbd: protect all log tail updates with j_checkpoint_mutex
  jbd: Split updating of journal superblock and marking journal empty
  ext2: do not register write_super within VFS
  ext2: Remove s_dirt handling
  ext2: write superblock only once on unmount
  ext3: update documentation with barrier=1 default
  ext3: remove max_debt in find_group_orlov()
  jbd: Refine commit writeout logic
2012-05-25 08:14:59 -07:00
Eric W. Biederman b8a9f9e183 userns: Convert ext2 to use kuid/kgid where appropriate.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-15 14:59:26 -07:00
Wang Sheng-Hui aa9e939d52 ext2: remove the redundant comment for ext2_export_ops
The comment is outdated and isn't particularly informative anyway - NULL
meaning the default behavior is very common in kernel. And we really set about
half of entries. So remove the whole comment for ext2_export_ops.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:39 +02:00
Jan Kara e2a3fde750 ext2: Remove i_mutex use from ext2_quota_write()
We don't need i_mutex in ext2_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-05-15 23:34:38 +02:00
Artem Bityutskiy f72cf5e223 ext2: do not register write_super within VFS
Jan Kara removed 'sb->s_dirt' VFS flag references, so we do not need to
register the ext2 'ext2_write_super()' method in the VFS superblock operations,
because 'sb->s_dirt' won't be ever set to 1 and VFS won't ever call
'->write_super()' anyway. Thus, remove the method.

Tested using xfstests.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:46 +02:00
Jan Kara b838ec2232 ext2: Remove s_dirt handling
Places which modify superblock feature / state fields mark the superblock
buffer dirty so it is written out by flusher thread. Thus there's no need to
set s_dirt there.

The only other fields changing in the superblock are the numbers of free
blocks, free inodes and s_wtime. There's no real need to write (or even
compute) these periodically. Free blocks / inodes counters are recomputed on
every mount from group counters anyway and value of s_wtime is only
informational and imprecise anyway. So it should be enough to write these
opportunistically on mount, remount, umount, and sync_fs times.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:45 +02:00
Artem Bityutskiy f2b2242081 ext2: write superblock only once on unmount
Currently on unmount if we are mounted R/W, we first write the superblock to
the media if it is dirty, and then write it again, which is not optimal. This
patch makes ext2 write the superblock on unmount less times.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-04-11 11:12:45 +02:00
Al Viro 48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Al Viro 8de5277879 vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link}
New field of struct super_block - ->s_max_links.  Maximal allowed
value of ->i_nlink or 0; in the latter case all checks still need
to be done in ->link/->mkdir/->rename instances.  Note that this
limit applies both to directoris and to non-directories.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:32 -04:00
Linus Torvalds ac69e09280 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2/3/4: delete unneeded includes of module.h
  ext{3,4}: Fix potential race when setversion ioctl updates inode
  udf: Mark LVID buffer as uptodate before marking it dirty
  ext3: Don't warn from writepage when readonly inode is spotted after error
  jbd: Remove j_barrier mutex
  reiserfs: Force inode evictions before umount to avoid crash
  reiserfs: Fix quota mount option parsing
  udf: Treat symlink component of type 2 as /
  udf: Fix deadlock when converting file from in-ICB one to normal one
  udf: Cleanup calling convention of inode_getblk()
  ext2: Fix error handling on inode bitmap corruption
  ext3: Fix error handling on inode bitmap corruption
  ext3: replace ll_rw_block with other functions
  ext3: NULL dereference in ext3_evict_inode()
  jbd: clear revoked flag on buffers before a new transaction started
  ext3: call ext3_mark_recovery_complete() when recovery is really needed
2012-01-09 12:51:21 -08:00
Paul Gortmaker 302bf2f325 ext2/3/4: delete unneeded includes of module.h
Delete any instances of include module.h that were not strictly
required.  In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-01-09 13:52:10 +01:00
Al Viro 34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Al Viro 6b520e0565 vfs: fix the stupidity with i_dentry in inode destructors
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Li Haifeng f32948ddd1 ext2: fix the outdated comment in ext2_nfs_get_inode()
Signed-off-by: Li Haifeng <omycle@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-08-30 17:38:47 +02:00
Robin Dong 4e299c1d91 ext2: fix error msg when mounting fs with too-large blocksize
When ext2 mounts a filesystem, it attempts to set the block device
blocksize with a call to sb_set_blocksize, which can fail for
several reasons.  The current failure message in ext2 prints:

  EXT2-fs (loop1): error: blocksize is too small

which is not correct in all cases.  This can be demonstrated
by creating a filesystem with

  # mkfs.ext2 -b 8192

on a 4k page system, and attempting to mount it.

Change the error message to a more generic:

  EXT2-fs (loop1): bad blocksize 8192

to match the error message in ext3.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: Coly Li <bosong.ly@taobao.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-05-17 13:47:42 +02:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00