Commit Graph

376385 Commits

Author SHA1 Message Date
Jaegeuk Kim 77888c1e42 f2fs: add f2fs_readonly()
Introduce a simple macro function for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 6f85b35203 f2fs: avoid RECLAIM_FS-ON-W: deadlock
This patch tries to avoid the following deadlock condition of which the reclaim
path can trigger f2fs_balance_fs again.

=================================
[ INFO: inconsistent lock state ]
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs]
{RECLAIM_FS-ON-W} state was registered at:
  [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140
  [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0
  [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0
  [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180
  [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0
  [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0
  [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs]
  [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs]
  [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs]
  [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs]
  [<ffffffff811ae51b>] notify_change+0x1db/0x3a0
  [<ffffffff8118fbd0>] do_truncate+0x60/0xa0
  [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0
  [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0
  [<ffffffff8118ffee>] SyS_truncate+0xe/0x10
  [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 2c2c149f7d f2fs: don't do checkpoint if error is occurred
If we met an error during the dentry recovery, we should not conduct checkpoint.
Otherwise, some errorneous dentry blocks overwrites the existing blocks that
contain the remaining recovery information.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 45856aff0d f2fs: fix to unlock page before exit
If we got an error after lock_page, we should unlock it before exit.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Jaegeuk Kim 9a55ed656c f2fs: remove unnecessary kmap/kunmap operations
The allocated page used by the recovery is not on HIGHMEM, so that we don't
need to use kmap/kunmap.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Namjae Jeon 9851e6e189 f2fs: reorganize f2fs_vm_page_mkwrite
Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
 changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
majianpeng 145b04e5ed f2fs: use list_for_each_entry rather than list_for_each_entry_safe
We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its
list operations.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Haicheng Li 81fb5e8746 f2fs: remove unecessary variable and code
Code cleanup without behavior changed.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Peter Zijlstra bfe35965ec f2fs, lockdep: annotate mutex_lock_all()
Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
acquires an array of locks (in global/local lock style).

Any such operation is always serialized using cp_mutex, therefore there is no
fs_lock[] lock-order issue; tell lockdep about this using the
mutex_lock_nest_lock() primitive.

Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim f356fe0cba f2fs: add debug msgs in the recovery routine
This patch adds some trivial debugging messages in the recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 44a83ff6a8 f2fs: update inode page after creation
I found a bug when testing power-off-recovery as follows.

[Bug Scenario]
1. create a file
2. fsync the file
3. reboot w/o any sync
4. try to recover the file
 - found its fsync mark
 - found its dentry mark
   : try to recover its dentry
    - get its file name
    - get its parent inode number
     : here we got zero value

The reason why we get the wrong parent inode number is that we didn't
synchronize the inode page with its newly created inode information perfectly.

Especially, previous f2fs stores fi->i_pino and writes it to the cached
node page in a wrong order, which incurs the zero-valued i_pino during the
recovery.

So, this patch modifies the creation flow to fix the synchronization order of
inode page with its inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 64aa7ed98d f2fs: change get_new_data_page to pass a locked node page
This patch is for passing a locked node page to get_dnode_of_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 1646cfac95 f2fs: skip get_node_page if locked node page is passed
If get_dnode_of_data gets a locked node page, let's skip redundant
get_node_page calls.
This is for the futher enhancement.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 0a364af18f f2fs: remove unnecessary por_doing check
This por_doing check is totally not related to the recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 74d0b917ef f2fs: fix BUG_ON during f2fs_evict_inode(dir)
During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
with its directory inode.

In the following scenario, a bug is captured.
 1. dir = f2fs_iget(pino)
 2. __f2fs_add_link(dir, name)
 3. iput(dir)
  -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))

Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
[<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
Call Trace:
 [<ffffffff8118ea00>] evict+0xb0/0x1b0
 [<ffffffff8118f1c5>] iput+0x105/0x190
 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
 [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70

This means that we should flush all the dentry pages between iget and iput().
But, during the recovery routine, it is unallowed due to consistency, so we
have to wait the whole recovery process.
And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
can put the stale dir inodes from the dirty_dir_inode_list.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 8c26d7d571 f2fs: fix por_doing variable coverage
The reason of using sbi->por_doing is to alleviate data writes during the
recovery.
The find_fsync_dnodes() produces some dirty dentry pages, so we should
cover it too with sbi->por_doing.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim addbe45b00 f2fs: remove redundant assignment
We don't need to assign a value redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Jaegeuk Kim 650495dedc f2fs: fix the inconsistent state of data pages
In get_lock_data_page, if there is a data race between get_dnode_of_data for
node and grab_cache_page for data, f2fs is able to face with the following
BUG_ON(dn.data_blkaddr == NEW_ADDR).

kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
 [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
Call Trace:
 [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b

This bug is able to be occurred when the block address of the data block is
changed after f2fs_put_dnode().
In order to avoid that, this patch fixes the lock order of node and data
blocks in which the node block lock is covered by the data block lock.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:00 +09:00
Jaegeuk Kim 65e5cd0a15 f2fs: fix inconsistency of block count during recovery
Currently f2fs recovers the dentry of fsynced files.
When power-off-recovery is conducted, this newly recovered inode should increase
node block count as well as inode block count.

This patch resolves this inconsistency that results in:

1. create a file
2. write data
3. fsync
4. reboot without sync
5. mount and recover the file
6. node block count is 1 and inode block count is 2
 : fall into the inconsistent state
7. unlink the file
 : trigger the following BUG_ON

------------[ cut here ]------------
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716!
Call Trace:
 [<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs]
 [<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs]
 [<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs]
 [<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs]
 [<ffffffff811c7a67>] evict+0xa7/0x1a0
 [<ffffffff811c82b5>] iput+0x105/0x190
 [<ffffffff811c2b30>] d_kill+0xe0/0x120
 [<ffffffff811c2c57>] dput+0xe7/0x1e0
 [<ffffffff811acc3d>] __fput+0x19d/0x2d0
 [<ffffffff811acd7e>] ____fput+0xe/0x10
 [<ffffffff81070645>] task_work_run+0xb5/0xe0
 [<ffffffff81002941>] do_notify_resume+0x71/0xb0
 [<ffffffff8175f14a>] int_signal+0x12/0x17

Reported-and-Tested-by: Chris Fries <C.Fries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:00 +09:00
Linus Torvalds e4aa937ec7 Linux 3.10-rc3 2013-05-26 16:00:47 -07:00
Manfred Spraul ab465df9dd ipc/sem.c: Fix missing wakeups in do_smart_update_queue()
do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array.  It must check
which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
 - if a sleeping complex op was completed, then all per-semaphore queues
   must be scanned - not only those that were modified by *sops
 - if a sleeping simple op proceeded, then the global queue must be
   scanned again

And:
 - the test for "|sops == NULL) before scanning the global queue is not
   required: If the global queue is empty, then it doesn't need to be
   scanned - regardless of the reason for calling do_smart_update_queue()

The patch is not optimized, i.e.  even completing a wait-for-zero
operation causes a rescan.  This is done to keep the patch as simple as
possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-26 15:14:51 -07:00
Linus Torvalds 89ff77837a Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:

 - Stable fix to prevent an rpc_task wakeup race
 - Fix a NFSv4.1 session drain deadlock
 - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
 - Ensure auth_gss pipe detection works in namespaces
 - Fix SETCLIENTID fallback if rpcsec_gss is not available

* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix SETCLIENTID fallback if GSS is not available
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4.1 Fix a pNFS session draining deadlock
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Fix a bug in gss_create_upcall
2013-05-26 12:33:05 -07:00
Linus Torvalds 932ff06b2a Merge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull amd64 edac fix from Borislav Petkov:
 "A sysfs file permissions correction"

* tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Fix bogus sysfs file permissions
2013-05-26 09:52:26 -07:00
Linus Torvalds 95f4838e21 Merge branch 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
 "This time we made the kernel- and interruption stack allocation
  reentrant which fixed some strange kernel crashes (specifically
  protection ID traps).

  Furthemore this patchset fixes the interrupt stack in UP and SMP
  configurations by using native locking instructions.  And finally
  usage of floating point calculations on parisc were disabled in the
  MPILIB."

* 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix irq stack on UP and SMP
  parisc/superio: Use module_pci_driver to register driver
  parisc: make interrupt and interruption stack allocation reentrant
  parisc: show number of FPE and unaligned access handler calls in /proc/interrupts
  parisc: add additional parisc git tree to MAINTAINERS file
  parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.S
  parisc: add rp5470 entry to machine database
  MPILIB: disable usage of floating point registers on parisc
2013-05-26 09:36:31 -07:00
Linus Torvalds 088d812fe9 Merge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Ben Myers:
 "Here are fixes for corruption on 512 byte filesystems, a rounding
  error, a use-after-free, some flags to fix lockdep reports, and
  several fixes related to CRCs.  We have a somewhat larger post -rc1
  queue than usual due to fixes related to the CRC feature we merged for
  3.10:

   - Fix for corruption with FSX on 512 byte blocksize filesystems
   - Fix rounding error in xfs_free_file_space
   - Fix use-after-free with extent free intents
   - Add several missing KM_NOFS flags to fix lockdep reports
   - Several fixes for CRC related code"

* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
  xfs: remote attribute lookups require the value length
  xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
  xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
  xfs: fix missing KM_NOFS tags to keep lockdep happy
  xfs: Don't reference the EFI after it is freed
  xfs: fix rounding in xfs_free_file_space
  xfs: fix sub-page blocksize data integrity writes
2013-05-26 09:35:02 -07:00