Commit Graph

582 Commits

Author SHA1 Message Date
Linus Torvalds 14a9e5c09d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3/jbd fixes from Jan Kara:
 "A couple of ext3/jbd fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: use kmem_cache_zalloc for allocating journal head
  jbd: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  jbd: don't wait (forever) for stale tid caused by wraparound
  ext3: fix data=journal fast mount/umount hang
2013-05-03 09:56:25 -07:00
Darrick J. Wong 7136851117 mm: make snapshotting pages for stable writes a per-bio operation
Walking a bio's page mappings has proved problematic, so create a new
bio flag to indicate that a bio's data needs to be snapshotted in order
to guarantee stable pages during writeback.  Next, for the one user
(ext3/jbd) of snapshotting, hook all the places where writes can be
initiated without PG_writeback set, and set BIO_SNAP_STABLE there.

We must also flag journal "metadata" bios for stable writeout, since
file data can be written through the journal.  Finally, the
MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
rid of it.

[akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
[akpm@linux-foundation.org: teeny cleanup]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Jan Kara e643692138 ext3: fix data=journal fast mount/umount hang
In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.
That triggers the WARN_ONCE in __log_start_commit().

Arguably we should adjust ext3_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
the patch much more obviously correct by inspection(tm), we'll fix it
by explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.

This can be easily replicated via:

     mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb

This is a port of ext4 fix from Ted Ts'o.

Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-20 14:49:04 +01:00
Lars-Peter Clausen 8d0c2d10dd ext3: Fix format string issues
ext3_msg() takes the printk prefix as the second parameter and the
format string as the third parameter. Two callers of ext3_msg omit the
prefix and pass the format string as the second parameter and the first
parameter to the format string as the third parameter. In both cases
this string comes from an arbitrary source. Which means the string may
contain format string characters, which will
lead to undefined and potentially harmful behavior.

The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages
in ext3") and is fixed by this patch.

CC: stable@vger.kernel.org
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11 22:05:56 +01:00
Eric W. Biederman 7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Linus Torvalds bbbd27e694 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3, udf updates from Jan Kara:
 "Several UDF fixes, a support for UDF extent cache, and couple of ext2
  and ext3 cleanups and minor fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  Ext2: remove the static function release_blocks to optimize the kernel
  Ext2: mark inode dirty after the function dquot_free_block_nodirty is called
  Ext2: remove the overhead check about sb in the function ext2_new_blocks
  udf: Remove unused s_extLength from udf_bitmap
  udf: Make s_block_bitmap standard array
  udf: Fix bitmap overflow on large filesystems with small block size
  udf: add extent cache support in case of file reading
  udf: Write LVID to disk after opening / closing
  Ext3: return ENOMEM rather than EIO if sb_getblk fails
  Ext2: return ENOMEM rather than EIO if sb_getblk fails
  Ext3: use unlikely to improve the efficiency of the kernel
  Ext2: use unlikely to improve the efficiency of the kernel
  Ext3: add necessary check in case IO error happens
  Ext2: free memory allocated and forget buffer head when io error happens
  ext3: Fix memory leak when quota options are specified multiple times
  ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
2013-02-26 14:51:52 -08:00
Al Viro 496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Darrick J. Wong ffecfd1a72 block: optionally snapshot page contents to provide stable pages during write
This provides a band-aid to provide stable page writes on jbd without
needing to backport the fixed locking and page writeback bit handling
schemes of jbd2.  The band-aid works by using bounce buffers to snapshot
page contents instead of waiting.

For those wondering about the ext3 bandage -- fixing the jbd locking
(which was done as part of ext4dev years ago) is a lot of surgery, and
setting PG_writeback on data pages when we actually hold the page lock
dropped ext3 performance by nearly an order of magnitude.  If we're
going to migrate iscsi and raid to use stable page writes, the
complaints about high latency will likely return.  We might as well
centralize their page snapshotting thing to one place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21 17:22:20 -08:00
Wang Shilong c04e88e271 Ext3: return ENOMEM rather than EIO if sb_getblk fails
It will be better to use ENOMEM rather than EIO, because the only
reason that sb_getblk fails is that allocation fails.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong 1b7d76e9b1 Ext3: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return
NULL value,it will be better to use unlikely to check it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong 61f43e6880 Ext3: add necessary check in case IO error happens
As we know io error may happen when the function 'sb_getblk'
is called.Add necessary check for it

The patch also fix a coding style problem.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Jan Kara f56426ae4d ext3: Fix memory leak when quota options are specified multiple times
When usrjquota or grpjquota mount options are specified several times,
we leak memory storing the names. Free the memory correctly.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Guo Chao 306a74920b ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
This macro, initially introduced by ext2 in v0.99.15, does not
have any users from the beginning. It has been removed in later
ext2 version but still remains in the code of ext3, ext4, ocfs2.
Remove this macro there.

Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Mark Fasheh <mfasheh@suse.de>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Andrew Morton 965c8e59cf lseek: the "whence" argument is called "whence"
But the kernel decided to call it "origin" instead.  Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:12 -08:00
Julia Lawall 4789775477 ext3: drop if around WARN_ON
Just use WARN_ON rather than an if containing only WARN_ON(1).

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-12-13 16:33:24 +01:00
Zhao Hongjiang 195c0f96f0 ext3: get rid of the duplicate code on ext3_fill_super
Setting s_mount_opt to 0 is unnecessary because we use kzalloc() for sb
allocation. s_resuid and s_resgid are set again few lines below based on
values in on disk superblock.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-12-13 16:33:24 +01:00
Lukas Czerner ae49eeec78 ext3: Avoid underflow of in ext3_trim_fs()
Currently if len argument in ext3_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:36:12 +01:00
Linus Torvalds 5d5c5dca9c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3, quota fixes from Jan Kara:
 "Fix three regressions caused by user namespace conversions (ext2,
  ext3, quota) and minor ext3 fix and cleanup."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Silence warning about PRJQUOTA not being handled in need_print_warning()
  ext3: fix return values on parse_options() failure
  ext2: fix return values on parse_options() failure
  ext3: ext3_bread usage audit
  ext3: fix possible non-initialized variable on htree_dirblock_to_tree()
2012-10-16 18:12:38 -07:00
Marco Stornelli 67e2c19a3b ext3: drop lock/unlock super
Removed lock/unlock super.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Zhao Hongjiang 66415f6b50 ext3: fix return values on parse_options() failure
parse_options() in ext3 should return 0 when parse the mount options fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:24:03 +02:00
Carlos Maiolino c3d59ad6ab ext3: ext3_bread usage audit
This is the ext3 version of the same patch applied to Ext4, where such goal is
to audit the usage of ext3_bread() due a possible misinterpretion of its return
value.

Focused on directory blocks, a NULL value returned from ext3_bread() means a
hole, which cannot exist into a directory inode. It can pass undetected after a
fix in an uninitialized error variable.

The (now) initialized variable into ext3_getblk() may lead to a zero'ed return
value of ext3_bread() to its callers, which can make the caller do not detect
the hole in the directory inode.

This patch creates a new wrapper function ext3_dir_bread() which checks for
holes properly, reports error, and returns EIO in that case.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:21:42 +02:00
Carlos Maiolino aa9660196b ext3: fix possible non-initialized variable on htree_dirblock_to_tree()
This is a backport of ext4 commit 90b0a9732 which fixes a possible
non-initialized variable on htree_dirblock_to_tree().
Ext3 has the same non initialized variable, but, in any case it will be
initialized by ext3_get_blocks_handle(), which will avoid the bug to be
triggered, but, the non-initialized variable by htree_dirblock_to_tree() is
still a bug.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:21:42 +02:00
Linus Torvalds e1cc485262 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 & udf fixes from Jan Kara:
 "Shortlog pretty much says it all.

  The interesting bits are UDF support for direct IO and ext3 fix for a
  long standing oops in data=journal mode."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: Fix assertion failure in commit code due to lacking transaction credits
  UDF: Add support for O_DIRECT
  ext3: Replace 0 with NULL for pointer in super.c file
  udf: add writepages support for udf
  ext3: don't clear orphan list on ro mount with errors
  reiserfs: Make reiserfs_xattr_handlers static
2012-10-04 09:14:01 -07:00
Linus Torvalds aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00