Commit Graph

34288 Commits

Author SHA1 Message Date
Linus Torvalds
70b23ce347 Merge tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback fix from Wu Fengguang:
 "Fix data corruption on NFS writeback.

  It has been in linux-next for one month"

* tag 'writeback-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Fix data corruption on NFS
2014-01-16 08:23:34 +07:00
Andreas Rohner
70f2fe3a26 nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-15 14:19:42 +07:00
Linus Torvalds
e2bc44706f Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
 "Here we have a bugfix for an off-by-one in the remote attribute
  verifier that results in a forced shutdown which you can hit with v5
  superblock by creating a 64k xattr, and a fix for a missing
  destroy_work_on_stack() in the allocation worker.

  It's a bit late, but they are both fairly straightforward"

* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify
2014-01-11 06:33:03 +07:00
Chuansheng Liu
1f4a63bf01 xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6f96b3063c)
2014-01-10 12:39:38 -06:00
Jie Liu
bba719b500 xfs: fix off-by-one error in xfs_attr3_rmt_verify
With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 85dd0707f0)
2014-01-10 12:38:41 -06:00
Linus Torvalds
ef350bb7c5 Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfix from Ted Ts'o:
 "Fix a regression introduced in v3.13-rc6"

* tag 'ext4_for_linus_stable' of http://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix bigalloc regression
2014-01-07 08:22:42 +08:00
Eric Whitney
d0abafac8c ext4: fix bigalloc regression
Commit f5a44db5d2 introduced a regression on filesystems created with
the bigalloc feature (cluster size > blocksize).  It causes xfstests
generic/006 and /013 to fail with an unexpected JBD2 failure and
transaction abort that leaves the test file system in a read only state.
Other xfstests run on bigalloc file systems are likely to fail as well.

The cause is the accidental use of a cluster mask where a cluster
offset was needed in ext4_ext_map_blocks().

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
2014-01-06 14:00:23 -05:00
Linus Torvalds
06f055f394 Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
 "Ten fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
  sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
  drivers/dma/ioat/dma.c: check DMA mapping error in ioat_dma_self_test()
  mm/memory-failure.c: transfer page count from head page to tail page after split thp
  MAINTAINERS: set up proper record for Xilinx Zynq
  mm: remove bogus warning in copy_huge_pmd()
  memcg: fix memcg_size() calculation
  mm: fix use-after-free in sys_remap_file_pages
  mm: munlock: fix deadlock in __munlock_pagevec()
  mm: munlock: fix a bug where THP tail page is encountered
2014-01-02 14:40:38 -08:00
Jason Baron
4ff36ee94d epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
commmit 67347fe4e6 ("epoll: do not take global 'epmutex' for simple
topologies").

The acquistion of the ep->mtx for the destination 'ep' was added such
that a concurrent EPOLL_CTL_ADD operation would see the correct state of
the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')

However, by simply not acquiring the lock, we do not serialize behind
the ep->mtx from the add path, and thus may perform a full path check
when if we had waited a little longer it may not have been necessary.
However, this is a transient state, and performing the full loop
checking in this case is not harmful.

The important point is that we wouldn't miss doing the full loop
checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
its operating upon.  The reason we don't need to do lock ordering in the
add path, is that we are already are holding the global 'epmutex'
whenever we do the double lock.  Further, the original posting of this
patch, which was tested for the intended performance gains, did not
perform this additional locking.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-02 14:40:30 -08:00
Linus Torvalds
152b734a9e Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
 "Here is a set of small fixes for GFS2.  There is a fix to drop
  s_umount which is copied in from the core vfs, two patches relate to a
  hard to hit "use after free" and memory leak.  Two patches related to
  using DIO and buffered I/O on the same file to ensure correct
  operation in relation to glock state changes.  The final patch adds an
  RCU read lock to ensure correct locking on an error path"

* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Fix unsafe dereference in dump_holder()
  GFS2: Wait for async DIO in glock state changes
  GFS2: Fix incorrect invalidation for DIO/buffered I/O
  GFS2: Fix slab memory leak in gfs2_bufdata
  GFS2: Fix use-after-free race when calling gfs2_remove_from_ail
  GFS2: don't hold s_umount over blkdev_put
2014-01-02 12:45:47 -08:00
Tetsuo Handa
0b3a2c9968 GFS2: Fix unsafe dereference in dump_holder()
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that
RCU read lock is held when using task_struct returned from pid_task().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-01-02 12:18:04 +00:00
Shirish Pargaonkar
f1e3268126 cifs: set FILE_CREATED
Set FILE_CREATED on O_CREAT|O_EXCL.

cifs code didn't change during commit 116cc02253

Kernel bugzilla 66251

Signed-off-by: Shirish Pargaonkar <spargaonkar@suse.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27 15:14:45 -06:00
Sachin Prabhu
750b8de6c4 cifs: We do not drop reference to tlink in CIFSCheckMFSymlink()
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
tlink which also grabs a reference to it. We do not drop this reference
to tlink once we are done with the call.

The patch fixes this issue by instead passing tcon as a parameter and
avoids having to obtain a reference to the tlink. A lookup for the tcon
is already made in the calling functions and this way we avoid having to
re-run the lookup. This is also consistent with the argument list for
other similar calls for M-F symlinks.

We should also return an ENOSYS when we do not find a protocol specific
function to lookup the MF Symlink data.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2013-12-27 15:14:44 -06:00
Steve French
ebcc943c11 Add missing end of line termination to some cifs messages
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Gregor Beck <gbeck@sernet.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2013-12-27 15:14:44 -06:00
Linus Torvalds
f41bfc9423 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
 "A collection of bug fixes destined for stable and some printk cleanups
  and a patch so that instead of BUG'ing we use the ext4_error()
  framework to mark the file system is corrupted"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: add explicit casts when masking cluster sizes
  ext4: fix deadlock when writing in ENOSPC conditions
  jbd2: rename obsoleted msg JBD->JBD2
  jbd2: revise KERN_EMERG error messages
  jbd2: don't BUG but return ENOSPC if a handle runs out of space
  ext4: Do not reserve clusters when fs doesn't support extents
  ext4: fix del_timer() misuse for ->s_err_report
  ext4: check for overlapping extents in ext4_valid_extent_entries()
  ext4: fix use-after-free in ext4_mb_new_blocks
  ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails
2013-12-26 09:26:12 -08:00
Linus Torvalds
dc0a6b4fee Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 fix from Jan Kara:
 "One simple fix of oops in ext2 which was recently hit by Christoph"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2: Fix oops in ext2_get_block() called from ext2_quota_write()
2013-12-23 17:24:38 -08:00
Linus Torvalds
a8472b4bb1 Merge git://git.kvack.org/~bcrl/aio-next
Pull AIO leak fixes from Ben LaHaise:
 "I've put these two patches plus Linus's change through a round of
  tests, and it passes millions of iterations of the aio numa
  migratepage test, as well as a number of repetitions of a few simple
  read and write tests.

  The first patch fixes the memory leak Kent introduced, while the
  second patch makes aio_migratepage() much more paranoid and robust"

* git://git.kvack.org/~bcrl/aio-next:
  aio/migratepages: make aio migrate pages sane
  aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
2013-12-22 11:03:49 -08:00
Linus Torvalds
3dc9acb676 aio: clean up and fix aio_setup_ring page mapping
Since commit 36bc08cc01 ("fs/aio: Add support to aio ring pages
migration") the aio ring setup code has used a special per-ring backing
inode for the page allocations, rather than just using random anonymous
pages.

However, rather than remembering the pages as it allocated them, it
would allocate the pages, insert them into the file mapping (dirty, so
that they couldn't be free'd), and then forget about them.  And then to
look them up again, it would mmap the mapping, and then use
"get_user_pages()" to get back an array of the pages we just created.

Now, not only is that incredibly inefficient, it also leaked all the
pages if the mmap failed (which could happen due to excessive number of
mappings, for example).

So clean it all up, making it much more straightforward.  Also remove
some left-overs of the previous (broken) mm_populate() usage that was
removed in commit d6c355c7da ("aio: fix race in ring buffer page
lookup introduced by page migration support") but left the pointless and
now misleading MAP_POPULATE flag around.

Tested-and-acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-22 11:03:08 -08:00
Benjamin LaHaise
8e321fefb0 aio/migratepages: make aio migrate pages sane
The arbitrary restriction on page counts offered by the core
migrate_page_move_mapping() code results in rather suspicious looking
fiddling with page reference counts in the aio_migratepage() operation.
To fix this, make migrate_page_move_mapping() take an extra_count parameter
that allows aio to tell the code about its own reference count on the page
being migrated.

While cleaning up aio_migratepage(), make it validate that the old page
being passed in is actually what aio_migratepage() expects to prevent
misbehaviour in the case of races.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-12-21 17:56:08 -05:00
Benjamin LaHaise
1881686f84 aio: fix kioctx leak introduced by "aio: Fix a trinity splat"
e34ecee2ae reworked the percpu reference
counting to correct a bug trinity found.  Unfortunately, the change lead
to kioctxes being leaked because there was no final reference count to
put.  Add that reference count back in to fix things.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
2013-12-21 15:57:09 -05:00
Linus Torvalds
a6ddeee32d Merge tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs
Pull xfs bugfixes from Ben Myers:
 "This contains fixes for some asserts
   related to project quotas, a memory leak, a hang when disabling group or
   project quotas before disabling user quotas, Dave's email address, several
   fixes for the alignment of file allocation to stripe unit/width geometry, a
   fix for an assertion with xfs_zero_remaining_bytes, and the behavior of
   metadata writeback in the face of IO errors.

   Details:
   - fix memory leak in xfs_dir2_node_removename
   - fix quota assertion in xfs_setattr_size
   - fix quota assertions in xfs_qm_vop_create_dqattach
   - fix for hang when disabling group and project quotas before
     disabling user quotas
   - fix Dave Chinner's email address in MAINTAINERS
   - fix for file allocation alignment
   - fix for assertion in xfs_buf_stale by removing xfsbdstrat
   - fix for alignment with swalloc mount option
   - fix for "retry forever" semantics on IO errors"

* tag 'xfs-for-linus-v3.13-rc5' of git://oss.sgi.com/xfs/xfs:
  xfs: abort metadata writeback on permanent errors
  xfs: swalloc doesn't align allocations properly
  xfs: remove xfsbdstrat error
  xfs: align initial file allocations correctly
  MAINTAINERS: fix incorrect mail address of XFS maintainer
  xfs: fix infinite loop by detaching the group/project hints from user dquot
  xfs: fix assertion failure at xfs_setattr_nonsize
  xfs: fix false assertion at xfs_qm_vop_create_dqattach
  xfs: fix memory leak in xfs_dir2_node_removename
2013-12-20 15:48:45 -08:00
Luck, Tony
df36ac1bc2 pstore: Don't allow high traffic options on fragile devices
Some pstore backing devices use on board flash as persistent
storage. These have limited numbers of write cycles so it
is a poor idea to use them from high frequency operations.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-20 13:12:01 -08:00
Theodore Ts'o
f5a44db5d2 ext4: add explicit casts when masking cluster sizes
The missing casts can cause the high 64-bits of the physical blocks to
be lost.  Set up new macros which allows us to make sure the right
thing happen, even if at some point we end up supporting larger
logical block numbers.

Thanks to the Emese Revfy and the PaX security team for reporting this
issue.

Reported-by: PaX Team <pageexec@freemail.hu>
Reported-by: Emese Revfy <re.emese@gmail.com>                                 
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-12-20 09:29:35 -05:00
Steven Whitehouse
582d2f7aed GFS2: Wait for async DIO in glock state changes
We need to wait for any outstanding DIO to complete in a couple
of situations. Firstly, in case we are changing out of deferred
mode (in inode_go_sync) where GLF_DIRTY will not be set. That
call could be prefixed with a test for gl_state == LM_ST_DEFERRED
but it doesn't seem worth it bearing in mind that the test for
outstanding DIO is very quick anyway, in the usual case that there
is none.

The second case is in inode_go_lock which will catch the cases
where we have a cached EX lock, but where we grant deferred locks
against it so that there is no glock state transistion. We only
need to wait if the state is not deferred, since DIO is valid
anyway in that state.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-12-20 10:42:08 +00:00
Steven Whitehouse
dfd11184d8 GFS2: Fix incorrect invalidation for DIO/buffered I/O
In patch 209806aba9 we allowed
local deferred locks to be granted against a cached exclusive
lock. That opened up a corner case which this patch now
fixes.

The solution to the problem is to check whether we have cached
pages each time we do direct I/O and if so to unmap, flush
and invalidate those pages. Since the glock state machine
normally does that for us, mostly the code will be a no-op.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-12-20 10:41:21 +00:00