Commit Graph

39 Commits

Author SHA1 Message Date
Jaegeuk Kim f0947e5cce f2fs: fix i_name during f2fs_sync_file
As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1.

The errorneous scenario is like this.

1. touch test1
2. link test1 test2
3. unlink test2
4. fsync test1

After this, i_name should be test1.

CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:03 +09:00
Gu Zheng 4559071063 f2fs: introduce help function F2FS_NODE()
Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:02 +09:00
Jaegeuk Kim e5d2385ed6 f2fs: recover date requested by fdatasync
In order to support SQLite that uses fdatasync instead of fsync, we should
guarantee the data requested by fdatasync can be recovered after sudden-power-
off.

So, let's remove the fdatasync condition in f2fs_sync_file.
Otherwise, we can restore the data after sudden-power-off due to nonexistence
of any fsync mark'ed node blocks.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30 15:17:02 +09:00
Jaegeuk Kim 354a3399dc f2fs: recover wrong pino after checkpoint during fsync
If a file is linked, f2fs loose its parent inode number so that fsync calls
for the linked file should do checkpoint all the time.
But, if we can recover its parent inode number after the checkpoint, we can
adjust roll-forward mechanism for the further fsync calls, which is able to
improve the fsync performance significatly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14 09:04:45 +09:00
Jaegeuk Kim b3783873cc f2fs: avoid freqeunt write_inode calls
If update_inode is called, we don't need to do write_inode.
So, let's use a *dirty* flag for each inode.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14 09:04:43 +09:00
Namjae Jeon d7cc950b4c f2fs: optimise the truncate_data_blocks_range() range
The function truncate_data_blocks_range() decrements the valid
block count of inode via dec_valid_block_count(). Since this
function updates the i_blocks field of inode, we can update this
field once we have calculated total the number of blocks
to be freed.

Therefore we can decrement valid blocks outside of the for loop.

	if (nr_free) {
+		dec_valid_block_count(sbi, dn->inode, nr_free);
 		set_page_dirty(dn->node_page);
 		sync_inode_page(dn);
 	}

'nr_free' tells the total number of blocks freed. So, we can
just directly pass this value to dec_valid_block_count() and update
the i_blocks.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14 09:04:42 +09:00
Namjae Jeon 6a3e8ef0de f2fs: use the F2FS specific flags in f2fs_ioctl()
In f2fs_ioctl() function, it is using generic flags.
Since F2FS specific flags are defined. So lets use
those flags.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14 09:04:26 +09:00
Jaegeuk Kim 2d4d9fb591 f2fs: fix i_blocks translation on various types of files
Basically an inode manages the number of allocated blocks with inode->i_blocks
which is represented in a unit of sectors, not file system blocks.
But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr
translates it to the number of sectors when fstat is called.

However, previously f2fs_file_inode_operations only has this, so this patch adds
it to all the types of inode_operations.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-11 16:01:09 +09:00
Jaegeuk Kim b292dcab06 f2fs: reuse the locked dnode page and its inode
This patch fixes the following deadlock bug during the recovery.

INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount           D ffffffff81125870     0  1322   1266 0x00000000
 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b

The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.

This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:04 +09:00
Jaegeuk Kim 77888c1e42 f2fs: add f2fs_readonly()
Introduce a simple macro function for readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:03 +09:00
Namjae Jeon 9851e6e189 f2fs: reorganize f2fs_vm_page_mkwrite
Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
 changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:02 +09:00
Jaegeuk Kim 64aa7ed98d f2fs: change get_new_data_page to pass a locked node page
This patch is for passing a locked node page to get_dnode_of_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28 15:03:01 +09:00
Linus Torvalds 942d33da99 Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This patch-set includes the following major enhancement patches.
   - introduce a new gloabl lock scheme
   - add tracepoints on several major functions
   - fix the overall cleaning process focused on victim selection
   - apply the block plugging to merge IOs as much as possible
   - enhance management of free nids and its list
   - enhance the readahead mode for node pages
   - address several cretical deadlock conditions
   - reduce lock_page calls

  The other minor bug fixes and enhancements are as follows.
   - calculation mistakes: overflow
   - bio types: READ, READA, and READ_SYNC
   - fix the recovery flow, data races, and null pointer errors"

* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
  f2fs: cover free_nid management with spin_lock
  f2fs: optimize scan_nat_page()
  f2fs: code cleanup for scan_nat_page() and build_free_nids()
  f2fs: bugfix for alloc_nid_failed()
  f2fs: recover when journal contains deleted files
  f2fs: continue to mount after failing recovery
  f2fs: avoid deadlock during evict after f2fs_gc
  f2fs: modify the number of issued pages to merge IOs
  f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
  f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
  f2fs: check truncation of mapping after lock_page
  f2fs: enhance alloc_nid and build_free_nids flows
  f2fs: add a tracepoint on f2fs_new_inode
  f2fs: check nid == 0 in add_free_nid
  f2fs: add REQ_META about metadata requests for submit
  f2fs: give a chance to merge IOs by IO scheduler
  f2fs: avoid frequent background GC
  f2fs: add tracepoints to debug checkpoint request
  f2fs: add tracepoints for write page operations
  f2fs: add tracepoints to debug the block allocation
  ...
2013-05-08 15:11:48 -07:00
Jaegeuk Kim afcb7ca01f f2fs: check truncation of mapping after lock_page
We call lock_page when we need to update a page after readpage.
Between grab and lock page, the page can be truncated by other thread.
So, we should check the page after lock_page whether it was truncated or not.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-29 11:19:32 +09:00
Jaegeuk Kim c718379b6b f2fs: give a chance to merge IOs by IO scheduler
Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.

...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...

However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.

...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...

Note that this issue should be addressed in checkpoint, and some readahead
flows too.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-26 10:35:10 +09:00
Namjae Jeon c01e285324 f2fs: add tracepoints to debug the block allocation
Add tracepoints to debug the block allocation & fallocate.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: enhance information]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-23 18:15:16 +09:00
Namjae Jeon 51dd624934 f2fs: add tracepoints for truncate operation
add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.

Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-23 16:40:38 +09:00
Namjae Jeon a2a4a7e4ab f2fs: add tracepoints for sync & inode operations
Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-23 15:30:27 +09:00
Al Viro bdaec334bb f2fs: use mnt_want_write_file() in ioctl
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:12:56 -04:00
Jaegeuk Kim 399368372e f2fs: introduce a new global lock scheme
In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
	RENAME,		/* for renaming operations */
	DENTRY_OPS,	/* for directory operations */
	DATA_WRITE,	/* for data write */
	DATA_NEW,	/* for data allocation */
	DATA_TRUNC,	/* for data truncate */
	NODE_NEW,	/* for node allocation */
	NODE_TRUNC,	/* for node truncate */
	NODE_WRITE,	/* for node write */
	NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
 - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
 - f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
 - try to get an avaiable lock from the array.
 - returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
 - unlock the given index of the lock.

3. mutex_lock_all(sbi)
 - grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
 - release all the locks in the array after checkpoint.

5. block_operations()
 - call mutex_lock_all()
 - sync_dirty_dir_inodes()
 - grab node_write
 - sync_node_pages()

Note that,
 the pairs of mutex_lock_op()/mutex_unlock_op() and
 mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-09 18:21:18 +09:00
Jason Hrycay 1127a3d448 f2fs: move f2fs_balance_fs from truncate to punch_hole
Move the f2fs_balance_fs out of the truncate_hole function and only
perform that in punch_hole use case.  The commit:

  ed60b1644e7f7e5dd67d21caf7e4425dff05dad0

intended to do this but moved it into truncate_hole to cover more
cases.  However, a deadlock scenario is possible when deleting an inode
entry under specific conditions:

 f2fs_delete_entry()
     mutex_lock_op(sbi, DENTRY_OPS);
     truncate_hole()
         f2fs_balance_fs()
             mutex_lock(&sbi->gc_mutex);
             f2fs_gc()
                 write_checkpoint()
                     block_operations()
                         mutex_lock_op(sbi, DENTRY_OPS);

Lets move it into the punch_hole case to cover the original intent of
avoiding it during fallocate's expand_inode_data case.

Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-09 17:22:45 +09:00
Jaegeuk Kim 953a3e27e1 f2fs: fix to give correct parent inode number for roll forward
When we recover fsync'ed data after power-off-recovery, we should guarantee
that any parent inode number should be correct for each direct inode blocks.

So, let's make the following rules.

- The fsync should do checkpoint to all the inodes that were experienced hard
links.

- So, the only normal files can be recovered by roll-forward.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-27 09:16:25 +09:00
Jaegeuk Kim 0ff153a2f1 f2fs: do not skip writing file meta during fsync
This patch removes data_version check flow during the fsync call.
The original purpose for the use of data_version was to avoid writng inode
pages redundantly by the fsync calls repeatedly.
However, when user can modify file meta and then call fsync, we should not
skip fsync procedure.
So, let's remove this condition check and hope that user triggers in right
manner.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-27 09:16:16 +09:00
Jaegeuk Kim ae51fb31b8 f2fs: fix to call WRITE_FLUSH at the end of fsync
The fsync call should be ended after flushing the in-device caches.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20 18:30:14 +09:00
Jaegeuk Kim 266e97a81c f2fs: introduce readahead mode of node pages
Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
	ALLOC_NODE,	/* allocate a new node page if needed */
	LOOKUP_NODE,	/* look up a node without readahead */
	LOOKUP_NODE_RA,	/*
			 * look up a node with readahead called
			 * by get_datablock_ro.
			 */
}

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-03-18 21:00:33 +09:00