Commit Graph

585 Commits

Author SHA1 Message Date
Linus Torvalds
5af43c24ca Merge branch 'akpm' (incoming from Andrew)
Merge more incoming from Andrew Morton:

 - Various fixes which were stalled or which I picked up recently

 - A large rotorooting of the AIO code.  Allegedly to improve
   performance but I don't really have good performance numbers (I might
   have lost the email) and I can't raise Kent today.  I held this out
   of 3.9 and we could give it another cycle if it's all too late/scary.

I ended up taking only the first two thirds of the AIO rotorooting.  I
left the percpu parts and the batch completion for later.  - Linus

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits)
  aio: don't include aio.h in sched.h
  aio: kill ki_retry
  aio: kill ki_key
  aio: give shared kioctx fields their own cachelines
  aio: kill struct aio_ring_info
  aio: kill batch allocation
  aio: change reqs_active to include unreaped completions
  aio: use cancellation list lazily
  aio: use flush_dcache_page()
  aio: make aio_read_evt() more efficient, convert to hrtimers
  wait: add wait_event_hrtimeout()
  aio: refcounting cleanup
  aio: make aio_put_req() lockless
  aio: do fget() after aio_get_req()
  aio: dprintk() -> pr_debug()
  aio: move private stuff out of aio.h
  aio: add kiocb_cancel()
  aio: kill return value of aio_complete()
  char: add aio_{read,write} to /dev/{null,zero}
  aio: remove retry-based AIO
  ...
2013-05-07 20:49:51 -07:00
Kent Overstreet
a27bb332c0 aio: don't include aio.h in sched.h
Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 20:16:25 -07:00
Al Viro
4385bab128 make blkdev_put() return void
same story as with the previous patches - note that return
value of blkdev_close() is lost, since there's nowhere the
caller (__fput()) could return it to.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07 02:16:31 -04:00
Linus Torvalds
14a9e5c09d Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3/jbd fixes from Jan Kara:
 "A couple of ext3/jbd fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: use kmem_cache_zalloc for allocating journal head
  jbd: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  jbd: don't wait (forever) for stale tid caused by wraparound
  ext3: fix data=journal fast mount/umount hang
2013-05-03 09:56:25 -07:00
Darrick J. Wong
7136851117 mm: make snapshotting pages for stable writes a per-bio operation
Walking a bio's page mappings has proved problematic, so create a new
bio flag to indicate that a bio's data needs to be snapshotted in order
to guarantee stable pages during writeback.  Next, for the one user
(ext3/jbd) of snapshotting, hook all the places where writes can be
initiated without PG_writeback set, and set BIO_SNAP_STABLE there.

We must also flag journal "metadata" bios for stable writeout, since
file data can be written through the journal.  Finally, the
MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
rid of it.

[akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
[akpm@linux-foundation.org: teeny cleanup]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:33 -07:00
Jan Kara
e643692138 ext3: fix data=journal fast mount/umount hang
In data=journal mode, if we unmount the file system before a
transaction has a chance to complete, when the journal inode is being
evicted, we can end up calling into log_wait_commit() for the
last transaction, after the journalling machinery has been shut down.
That triggers the WARN_ONCE in __log_start_commit().

Arguably we should adjust ext3_should_journal_data() to return FALSE
for the journal inode, but the only place it matters is
ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
the patch much more obviously correct by inspection(tm), we'll fix it
by explicitly not trying to waiting for a journal commit when we are
evicting the journal inode, since it's guaranteed to never succeed in
this case.

This can be easily replicated via:

     mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb

This is a port of ext4 fix from Ted Ts'o.

Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-20 14:49:04 +01:00
Lars-Peter Clausen
8d0c2d10dd ext3: Fix format string issues
ext3_msg() takes the printk prefix as the second parameter and the
format string as the third parameter. Two callers of ext3_msg omit the
prefix and pass the format string as the second parameter and the first
parameter to the format string as the third parameter. In both cases
this string comes from an arbitrary source. Which means the string may
contain format string characters, which will
lead to undefined and potentially harmful behavior.

The issue was introduced in commit 4cf46b67eb("ext3: Unify log messages
in ext3") and is fixed by this patch.

CC: stable@vger.kernel.org
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-03-11 22:05:56 +01:00
Eric W. Biederman
7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Linus Torvalds
bbbd27e694 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3, udf updates from Jan Kara:
 "Several UDF fixes, a support for UDF extent cache, and couple of ext2
  and ext3 cleanups and minor fixes"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  Ext2: remove the static function release_blocks to optimize the kernel
  Ext2: mark inode dirty after the function dquot_free_block_nodirty is called
  Ext2: remove the overhead check about sb in the function ext2_new_blocks
  udf: Remove unused s_extLength from udf_bitmap
  udf: Make s_block_bitmap standard array
  udf: Fix bitmap overflow on large filesystems with small block size
  udf: add extent cache support in case of file reading
  udf: Write LVID to disk after opening / closing
  Ext3: return ENOMEM rather than EIO if sb_getblk fails
  Ext2: return ENOMEM rather than EIO if sb_getblk fails
  Ext3: use unlikely to improve the efficiency of the kernel
  Ext2: use unlikely to improve the efficiency of the kernel
  Ext3: add necessary check in case IO error happens
  Ext2: free memory allocated and forget buffer head when io error happens
  ext3: Fix memory leak when quota options are specified multiple times
  ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
2013-02-26 14:51:52 -08:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Darrick J. Wong
ffecfd1a72 block: optionally snapshot page contents to provide stable pages during write
This provides a band-aid to provide stable page writes on jbd without
needing to backport the fixed locking and page writeback bit handling
schemes of jbd2.  The band-aid works by using bounce buffers to snapshot
page contents instead of waiting.

For those wondering about the ext3 bandage -- fixing the jbd locking
(which was done as part of ext4dev years ago) is a lot of surgery, and
setting PG_writeback on data pages when we actually hold the page lock
dropped ext3 performance by nearly an order of magnitude.  If we're
going to migrate iscsi and raid to use stable page writes, the
complaints about high latency will likely return.  We might as well
centralize their page snapshotting thing to one place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21 17:22:20 -08:00
Wang Shilong
c04e88e271 Ext3: return ENOMEM rather than EIO if sb_getblk fails
It will be better to use ENOMEM rather than EIO, because the only
reason that sb_getblk fails is that allocation fails.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong
1b7d76e9b1 Ext3: use unlikely to improve the efficiency of the kernel
Because the function 'sb_getblk' seldomly fails to return
NULL value,it will be better to use unlikely to check it.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:57 +01:00
Wang Shilong
61f43e6880 Ext3: add necessary check in case IO error happens
As we know io error may happen when the function 'sb_getblk'
is called.Add necessary check for it

The patch also fix a coding style problem.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:56 +01:00
Jan Kara
f56426ae4d ext3: Fix memory leak when quota options are specified multiple times
When usrjquota or grpjquota mount options are specified several times,
we leak memory storing the names. Free the memory correctly.

Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Guo Chao
306a74920b ext3, ext4, ocfs2: remove unused macro NAMEI_RA_INDEX
This macro, initially introduced by ext2 in v0.99.15, does not
have any users from the beginning. It has been removed in later
ext2 version but still remains in the code of ext3, ext4, ocfs2.
Remove this macro there.

Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Mark Fasheh <mfasheh@suse.de>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-01-21 11:19:55 +01:00
Andrew Morton
965c8e59cf lseek: the "whence" argument is called "whence"
But the kernel decided to call it "origin" instead.  Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:12 -08:00
Julia Lawall
4789775477 ext3: drop if around WARN_ON
Just use WARN_ON rather than an if containing only WARN_ON(1).

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-12-13 16:33:24 +01:00
Zhao Hongjiang
195c0f96f0 ext3: get rid of the duplicate code on ext3_fill_super
Setting s_mount_opt to 0 is unnecessary because we use kzalloc() for sb
allocation. s_resuid and s_resgid are set again few lines below based on
values in on disk superblock.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-12-13 16:33:24 +01:00
Lukas Czerner
ae49eeec78 ext3: Avoid underflow of in ext3_trim_fs()
Currently if len argument in ext3_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:36:12 +01:00
Linus Torvalds
5d5c5dca9c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, ext3, quota fixes from Jan Kara:
 "Fix three regressions caused by user namespace conversions (ext2,
  ext3, quota) and minor ext3 fix and cleanup."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Silence warning about PRJQUOTA not being handled in need_print_warning()
  ext3: fix return values on parse_options() failure
  ext2: fix return values on parse_options() failure
  ext3: ext3_bread usage audit
  ext3: fix possible non-initialized variable on htree_dirblock_to_tree()
2012-10-16 18:12:38 -07:00
Marco Stornelli
67e2c19a3b ext3: drop lock/unlock super
Removed lock/unlock super.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Zhao Hongjiang
66415f6b50 ext3: fix return values on parse_options() failure
parse_options() in ext3 should return 0 when parse the mount options fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:24:03 +02:00
Carlos Maiolino
c3d59ad6ab ext3: ext3_bread usage audit
This is the ext3 version of the same patch applied to Ext4, where such goal is
to audit the usage of ext3_bread() due a possible misinterpretion of its return
value.

Focused on directory blocks, a NULL value returned from ext3_bread() means a
hole, which cannot exist into a directory inode. It can pass undetected after a
fix in an uninitialized error variable.

The (now) initialized variable into ext3_getblk() may lead to a zero'ed return
value of ext3_bread() to its callers, which can make the caller do not detect
the hole in the directory inode.

This patch creates a new wrapper function ext3_dir_bread() which checks for
holes properly, reports error, and returns EIO in that case.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:21:42 +02:00