Commit Graph

280 Commits

Author SHA1 Message Date
Jin Xu
a26b7c8a01 f2fs: optimize gc for better performance
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch]  341         4227             1174          174840
[patched]   324         2958             645           106682

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

This patch is the revised version based on the suggestion from
Jaegeuk Kim.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 13:50:32 +09:00
Jaegeuk Kim
423e95ccbe f2fs: merge more bios of node block writes
Previously, we experience bio traces as follows when running simple sequential
write test.

 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500104928, size = 4K
 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499922208, size = 368K
 f2fs_do_submit_bio: type = NODE, io = no sync, sector = 499914752, size = 140K

 -> total 512K

The first one is to write an indirect node block, and the others are to write
direct node blocks.

The reason why there are two separate bios for direct node blocks is:
0. initial state
------------------    ------------------
|                |    |xxxxxxxx        |
------------------    ------------------

1. write 368K
------------------    ------------------
|                |    |xxxxxxxxWWWWWWWW|
------------------    ------------------

2. write 140K
------------------    ------------------
|WWWWWWW         |    |xxxxxxxxWWWWWWWW|
------------------    ------------------

This is because f2fs_write_node_pages tries to write just 512K totally, so that
we can lose the chance to merge more bios nicely.

After this patch is applied, we can get the following bio traces.

  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500103168, size = 8K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500111368, size = 4K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500107272, size = 512K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500108296, size = 512K
  f2fs_do_submit_bio: type = NODE, io = no sync, sector = 500109320, size = 500K

And finally, we can improve the sequential write performance,
    from 458.775 MB/s to 479.945 MB/s on SSD.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-05 10:17:19 +09:00
Jaegeuk Kim
222cbdc483 f2fs: avoid an overflow during utilization calculation
The current f2fs uses all the block counts with 32 bit numbers, which is able to
cover about 15TB volume.

But in calculation of utilization, f2fs multiplies the count by 100 which can
induce overflow.
This patch fixes this.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-03 13:41:37 +09:00
Jaegeuk Kim
c34e333fd5 f2fs: trigger GC when there are prefree segments
Previously, f2fs conducts SSR when free_sections() < overprovision_sections.
But, even though there are a lot of prefree segments, it can consider SSR only.
So, let's consider the number of prefree segments too for triggering SSR.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-09-03 10:11:20 +09:00
Gu Zheng
749ebfd174 f2fs: use strncasecmp() simplify the string comparison
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-27 21:50:12 +09:00
Jaegeuk Kim
8cb8268809 f2fs: fix omitting to update inode page
The f2fs_set_link updates its parent inode number, so we should sync this to
the inode block.
Otherwise, the data can be lost after sudden-power-off.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-27 21:49:04 +09:00
Jaegeuk Kim
65985d935d f2fs: support the inline xattrs
0. modified inode structure
--------------------------------------
metadata (e.g., i_mtime, i_ctime, etc)
--------------------------------------
direct pointers [0 ~ 873]

inline xattrs (200 bytes by default)

indirect pointers [0 ~ 4]
--------------------------------------
node footer
--------------------------------------

1. setxattr flow
 - read_all_xattrs copies all the xattrs from inline and xattr node block.
 - handle xattr entries
 - write_all_xattrs copies modified xattrs into inline and xattr node block.

2. getxattr flow
 - read_all_xattrs copies all the xattrs from inline and xattr node block.
 - check target entries

3. Usage
 # mount -t f2fs -o inline_xattr $DEV $MNT

 Once mounted with the inline_xattr option, f2fs marks all the newly created
 files to reserve an amount of inline xattr space explicitly inside the inode
 block. Without the mount option, f2fs will not touch any existing files and
 newly created files as well.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 20:15:23 +09:00
Jaegeuk Kim
4f16fb0f9b f2fs: add the truncate_xattr_node function
The truncate_xattr_node function will be used by inline xattr.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 20:15:06 +09:00
Jaegeuk Kim
dd9cfe236f f2fs: introduce __find_xattr for readability
The __find_xattr is to search the wanted xattr entry starting from the
base_addr.

If not found, the returned entry is the last empty xattr entry that can be
allocated newly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 20:15:06 +09:00
Jaegeuk Kim
de93653fe3 f2fs: reserve the xattr space dynamically
This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.

The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.

The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 20:15:01 +09:00
Jaegeuk Kim
444c580f7e f2fs: add flags for inline xattrs
This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR,
and add a mount option, inline_xattr, which is enabled when xattr is set.

If the mount option is enabled, all the files are marked with the inline_xattrs
flag.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 20:02:12 +09:00
Wei Yongjun
6e6b978c32 f2fs: fix error return code in init_f2fs_fs()
Fix to return -ENOMEM in the kset create and add error handling
case instead of 0, as done elsewhere in this function.

Introduced by commit b59d0bae6c.
(f2fs: add sysfs support for controlling the gc_thread)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Namjae Jeon <namjae.jeon@samsung.com>
[Jaegeuk Kim: merge the patch with previous modification]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-26 19:36:46 +09:00
Jaegeuk Kim
d59ff4df7b f2fs: fix wrong BUG_ON condition
This patch removes a false-alaramed BUG_ON.
The previous BUG_ON condition didn't cover the following true scenario.

In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully,
and then, 2) init_inode_metadata returns -ENOSPC.
At this moment, a new clean data page is remained in the page cache, but its
block address still indicates NEW_ADDR.
After then, even if sync is called, this clean data page cannot be written to
the disk due to the clean state.

So this means that get_lock_data_page should make a new empty page when its
block address is NEW_ADDR and its page is not uptodated.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-20 19:32:48 +09:00
Zhao Hongjiang
9890ff3f23 f2fs: fix memory leak when init f2fs filesystem fail
When any of the caches create fails in init_f2fs_fs(), the other caches which are
create successful should be free.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-20 18:58:44 +09:00
Gu Zheng
7b40527508 f2fs: fix a compound statement label error
An error "label at end of compound statement" will occur if CONFIG_F2FS_STAT_FS
disabled.
fs/f2fs/segment.c:556:1: error: label at end of compound statement
So clean up the 'out' label to fix it.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-19 11:51:08 +09:00
Jin Xu
92c4342fb7 f2fs: avoid writing inode redundantly when creating a file
In f2fs_write_inode, updating inode after f2fs_balance_fs is not
a optimized way in the case that f2fs_gc is performed ahead. The
inode page will be unnecessarily written out twice, one of which
is in f2fs_gc->...->sync_node_pages and the other is in
update_inode_page.

Let's update the inode page in prior to f2fs_balance_fs to avoid
this.

To reproduce it,
$ touch file (before this step, should make the device need f2fs_gc)
$ sync (or wait the bdi to write dirty inode)

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-19 09:43:25 +09:00
Dan Carpenter
e27dae4d66 f2fs: alloc_page() doesn't return an ERR_PTR
alloc_page() returns a NULL on failure, it never returns an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-19 09:42:29 +09:00
Jaegeuk Kim
479bd73ac4 f2fs: should cover i_xattr_nid with its xattr node page lock
Previously, f2fs_setxattr assigns i_xattr_nid in the inode page inconsistently.

The scenario is:

= Thread 1 =         = Thread 2 =     = fi->i_xattr_nid =  = on-disk nid =

f2fs_setxattr                                   0                 0
  new_node_page                                 X                 0
                   sync_inode_page              X                 X
                   checkpoint                   X                 X -.
    grab_cache_page                             X                 X  |
--> allocate a new xattr node block or -ENOSPC      <----------------'

At this moment, the checkpoint stores inconsistent data where the inode has
i_xattr_nid but actual xattr node block is not allocated yet.

So, we should assign the real i_xattr_nid only after its xattr node block is
allocated.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-12 16:04:53 +09:00
Jaegeuk Kim
9c02740c01 f2fs: check the free space first in new_node_page
Let's check the free space in prior to the main process of allocating a new node
page.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-12 16:00:46 +09:00
Gu Zheng
41dfde135f f2fs: clean up the needless end 'return' of void function
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-12 11:49:22 +09:00
Jaegeuk Kim
d71b5564c0 f2fs: introduce cur_cp_version function to reduce code size
This patch introduces a new inline function, cur_cp_version, to reduce redundant
codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-09 15:25:37 +09:00
Jaegeuk Kim
e518ff81c3 f2fs: fix inconsistency between xattr node blocks and its inode
Previously xattr node blocks are stored to the COLD_NODE log, which means that
our roll-forward mechanism doesn't recover the xattr node blocks at all.
Only the direct node blocks in the WARM_NODE log can be recovered.

So, let's resolve the issue simply by conducting checkpoint during fsync when a
file has a modified xattr node block.

This approach is able to degrade the performance, but normally the checkpoint
overhead is shown at the initial fsync call after the xattr entry changes.
Once the checkpoint is done, no additional overhead would be occurred.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-09 15:25:24 +09:00
Jaegeuk Kim
dbe6a5ff4f f2fs: fix the use of XATTR_NODE_OFFSET
This patch fixes the use of XATTR_NODE_OFFSET.

o The offset should not use several MSB bits which are used by marking node
blocks.

o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality
during the fsync call.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-09 14:57:56 +09:00
Jaegeuk Kim
c2d715d144 f2fs: fix a build failure due to missing the kobject header
This patch should resolve the following error reported by kbuild test robot.

All error/warnings:

   In file included from fs/f2fs/dir.c:13:0:
   >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type
        struct kobject s_kobj;

The failure was caused by missing the kobject header file in dir.c.
So, this patch move the header file to the right location, f2fs.h.

CC: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-08 15:56:49 +09:00
Jin Xu
a569469e96 f2fs: fix a deadlock in fsync
This patch fixes a deadlock bug that occurs quite often when there are
concurrent write and fsync on a same file.

Following is the simplified call trace when tasks get hung.

fsync thread:
- f2fs_sync_file
 ...
 - f2fs_write_data_pages
 ...
  - update_extent_cache
  ...
   - update_inode
    - wait_on_page_writeback

bdi writeback thread
- __writeback_single_inode
 - f2fs_write_data_pages
  - mutex_lock(sbi->writepages)

The deadlock happens when the fsync thread waits on a inode page that has
been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
no one else could be able to submit the cached bio to block layer for
writeback. This is because the fsync thread already hold a sbi->fs_lock and
the sbi->writepages lock, causing the bdi thread being blocked when attempt
to write data pages for the same inode. At the same time, f2fs_gc thread
does not notice the situation and could not help. Even the sync syscall
gets blocked.

To fix it, we could submit the cached bio first before waiting on a inode page
that is being written back.

Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-06 22:00:36 +09:00