Commit Graph

93 Commits

Author SHA1 Message Date
Namjae Jeon b1f1daf8c7 f2fs: optimize the return condition for has_not_enough_free_secs
Instead of evaluating the free_sections and then deciding to return
true/false from that path. We can directly use the evaluation condition
for returning proper value.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:02 +09:00
Namjae Jeon 5ac206cf4f f2fs: make an accessor to get sections for particular block type
Introduce accessor to get the sections based upon the block type
(node,dents...) and modify the functions : should_do_checkpoint,
has_not_enough_free_secs to use this accessor function to get
the node sections and dent sections.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:02 +09:00
Namjae Jeon 25718423ea f2fs: mark gc_thread as NULL when thread creation is failed
When gc thread creation is failed, mark gc_thread as NULL to avoid
crash while trying to stop invalid thread in stop_gc_thread->kthread_stop.
Instead make it return from:
	if (!gc_th)
       return;

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:02 +09:00
Namjae Jeon ec7b1f2dd1 f2fs: name gc task as per the block device
Currently GC task is started for each f2fs formatted/mounted device.
But, when we check the task list, using 'ps', there is no distinguishing
factor between the tasks. So, name the task as per the block device just
like the flusher threads.
Also, remove the macro GC_THREAD_NAME and instead use the name: f2fs_gc
to avoid name length truncation, as the command length is 16
-> TASK_COMM_LEN 16 and example name like:
f2fs_gc_task:8:16 -> this exceeds name length

Before Patch for 2 F2FS formatted partitions:
root  28061  0.0  0.0  0 0 ? S 10:31   0:00 [f2fs_gc_task]
root  28087  0.0  0.0  0 0 ? S 10:32   0:00 [f2fs_gc_task]

After Patch:
root  16756  0.0  0.0  0  0 ?  S  14:57   0:00 [f2fs_gc-8:18]
root  16765  0.0  0.0  0  0 ?  S  14:57   0:00 [f2fs_gc-8:19]

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:02 +09:00
Changman Lee 48600e44c1 f2fs: remove unnecessary gc option check and balance_fs
1. If f2fs is mounted with background_gc_off option, checking
    BG_GC is not redundant.
 2. f2fs_balance_fs is checked in f2fs_gc, so this is also redundant.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:02 +09:00
Changman Lee 94787d91cb f2fs: remove repeated F2FS_SET_SB_DIRT call
F2FS_SET_SB_DIRT is called in inc_page_count and
it is directly called one more time in the next line.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
majianpeng 14d7e9de05 f2fs: when check superblock failed, try to check another superblock
In f2fs, there are two superblocks. So when the first superblock was
invalidate, it should try to check another.

By Jaegeuk Kim:
 o Remove a white space for coding style
 o Clean up for code readability
 o Fix a typo

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
majianpeng 5c9b469295 f2fs: use F2FS_BLKSIZE to judge bloksize and page_cache_size
In some system PAGE_CACHE_SIZE isn't 4K. So using F2FS_BLKSIZE to judge.

By Jaegeuk Kim:
 o f2fs does not support no other 4KB page cache size.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
majianpeng f83759e283 f2fs: add device name in debugfs
In file status, it can't distinguish between different devices.
So add device name to do this function.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
Changman Lee facb020540 f2fs: stop repeated checking if cp is needed
If it is decided that f2fs should do checkpoint, skip next comparison.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
2013-02-12 07:15:01 +09:00
Jaegeuk Kim d4686d56ec f2fs: avoid balanc_fs during evict_inode
1. Background

Previously, if f2fs tries to move data blocks of an *evicting* inode during the
cleaning process, it stops the process incompletely and then restarts the whole
process, since it needs a locked inode to grab victim data pages in its address
space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
used, but, it waits if the inode is on freeing.

So, here is a deadlock scenario.
1. f2fs_evict_inode()       <- inode "A"
  2. f2fs_balance_fs()
    3. f2fs_gc()
      4. gc_data_segment()
        5. f2fs_iget()      <- inode "A" too!

If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
to complete f2fs_evict_inode().

1. f2fs_evict_inode()           <- inode "A"
  2. f2fs_balance_fs()
    3. f2fs_gc()
      4. gc_data_segment()
        5. f2fs_iget_nowait()   <- inode "A", then stop f2fs_gc() w/ -ENOENT

2. Problem and Solution

In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
 o there are not enough free sections, and
 o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.

So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
f2fs_evict_inode().
The f2fs_evict_inode() actually truncates all the data and node blocks, which
means that it doesn't produce any dirty node pages accordingly.
So, we don't need to do f2fs_balance_fs() in practical.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
Jaegeuk Kim 369a708c2a f2fs: remove the use of page_cache_release
Let's remove the use of page_cache_release() in f2fs, and instead, use
f2fs_put_page(page, 0) which is exactly same but for code readability.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
Namjae Jeon 324ddc702e f2fs: fix typo mistake for data_version description
In f2fs_inode_info structure, the description for data_version
has a typo mistake. It should be latest instead of lastes.
So, correcting that.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
Namjae Jeon a2b52a598a f2fs: reorganize code for ra_node_page
We can remove unneeded label unlock_out, avoid unnecessary jump
and reorganize the returning conditions in this function.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:01 +09:00
Namjae Jeon 3786dfdf4f f2fs: avoid redundant call to has_not_enough_free_secs in f2fs_gc
After doing a write_checkpoint from garbage collection path if there is still
need to do more garbage collection, gc_more label is used to jump and start
the process again. And in that process, first step before getting victim is to
check if there are not enough free sections, which is already done before
doing a jump to gc_more. We can avoid the redundant call to check free
sections, by checking the gc_type flag which will remain FG_GC(value 1) under
this condition.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
Changman Lee d6212a5f18 f2fs: add un/freeze_fs into super_operations
This patch supports ioctl FIFREEZE and FITHAW to snapshot filesystem.
Before calling f2fs_freeze, all writers would be suspended and sync_fs
would be completed. So no f2fs has to do something.
Just background gc operation should be skipped due to generate dirty
nodes and data until unfreeze.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
majianpeng a2617dc686 f2fs: clean up the add_orphan_inode func
For the code
> prev = list_entry(orphan->list.prev, typeof(*prev), list);
if orphan->list.prev == head, it can't get the right prev.
And we can use the parameter 'this' to add.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
Alejandro Martinez Ruiz aa43507f68 f2fs: fix disable_ext_identify option spelling
There is a typo in the ->show_options function for disable_ext_identify.
Fix it to match the spelling from the documentation.

Signed-off-by: Alejandro Martinez Ruiz <alex@nowcomputing.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
Jaegeuk Kim bd43df021a f2fs: cover global locks for reserve_new_block
The fill_zero() from fallocate() calls get_new_data_page() in which calls
reserve_new_block().
The reserve_new_block() should be covered by *DATA_NEW*, one of global locks.
And also, before getting the lock, we should check free sections by calling
f2fs_balance_fs().

If we break this rule, f2fs is able to face with out-of-control free space
management and fall into infinite loop like the following scenario as well.

[f2fs_sync_fs()]             [fallocate()]
 - write_checkpoint()        - fill_zero()
  - block_operations()        - get_new_data_page()
   : grab NODE_NEW             - get_dnode_of_data()
                                : get locked dirty node page
    - sync_node_pages()
                                : try to grab NODE_NEW for data allocation
     : trylock and skip the dirty node page
   : call sync_node_pages() repeatedly in order to flush all the dirty node
     pages!

In order to avoid this, we should grab another global lock such as DATA_NEW
before calling get_new_data_page() in fill_zero().

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
Jaegeuk Kim 577e349514 f2fs: prevent checkpoint once any IO failure is detected
This patch enhances the checkpoint routine to cope with IO errors.

Basically f2fs detects IO errors from end_io_write, and the errors are able to
be occurred during one of data, node, and meta page writes.

In the previous code, when an IO error is occurred during writes, f2fs sets a
flag, CP_ERROR_FLAG, in the raw ckeckpoint buffer which will be written to disk.
Afterwards, write_checkpoint() will check the flag and remount f2fs as a
read-only (ro) mode.

However, even once f2fs is remounted as a ro mode, dirty checkpoint pages are
freely able to be written to disk by flusher or kswapd in background.
In such a case, after cold reboot, f2fs would restore the checkpoint data having
CP_ERROR_FLAG, resulting in disabling write_checkpoint and remounting f2fs as
a ro mode again.

Therefore, let's prevent any checkpoint page (meta) writes once an IO error is
occurred, and remount f2fs as a ro mode right away at that moment.

Reported-by: Oliver Winker <oliver@oli1170.net>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-02-12 07:15:00 +09:00
Changman Lee 7d79e75f64 f2fs: save device node number into f2fs_inode
This patch stores inode->i_rdev into on-disk inode structure.

Alun reported that:
 aspire tmp # mount -t f2fs /dev/sdb mnt
 aspire tmp # mknod mnt/sda1 b 8 1
 aspire tmp # mknod mnt/null c 1 3
 aspire tmp # mknod mnt/console c 5 1
 aspire tmp # ls -l mnt
 total 2
 crw-r--r-- 1 root root 5, 1 Jan 22 18:44 console
 crw-r--r-- 1 root root 1, 3 Jan 22 18:44 null
 brw-r--r-- 1 root root 8, 1 Jan 22 18:44 sda1
 aspire tmp # umount mnt
 aspire tmp # mount -t f2fs /dev/sdb mnt
 aspire tmp # ls -l mnt
 total 2
 crw-r--r-- 1 root root 0, 0 Jan 22 18:44 console
 crw-r--r-- 1 root root 0, 0 Jan 22 18:44 null
 brw-r--r-- 1 root root 0, 0 Jan 22 18:44 sda1

In this report, f2fs lost the major/minor numbers of device files after umount.
The reason was revealed that f2fs does not store the inode->i_rdev to the
on-disk inode data structure.

So, as the other file systems do, f2fs also stores i_rdev into the i_addr fields
in on-disk inode structure without any on-disk layout changes.
Note that, this bug is limited to device files made by mknod().

Reported-and-Tested-by: Alun Jones <alun.linux@ty-penguin.org.uk>
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12 07:15:00 +09:00
Dan Carpenter d8b79b2f94 f2fs: use _safe() version of list_for_each
This is calling list_del() inside a loop which is a problem when we try
move to the next item on the list.  I've converted it to use the _safe
version.  And also, as a cleanup, I've converted it to use
list_for_each_entry instead of list_for_each.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:49:00 +09:00
Jaegeuk Kim 9af45ef5ab f2fs: add comments of start_bidx_of_node
The caller of start_bidx_of_node() should give proper node offsets which
point only direct node blocks. Otherwise, it is a caller's bug.
This patch adds comments to make it clear.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim a7fdffbd3e f2fs: avoid issuing small bios due to several dirty node pages
If some small bios of dirty node pages are supposed to be issued during the
sequential data writes, there-in well-produced consecutive data bios are able
to be split by the small node bios, resulting in performance degradation.
So, let's collect a number of dirty node pages until reaching a threshold.
And, by default, I set the threshold as 2MB, a segment size.

This improves sequential write performance on i5, 512GB SSD (830 w/ SATA2) as
follows.
Before: 231 MB/s -> After: 255 MB/s

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
2013-01-22 10:48:59 +09:00
Jaegeuk Kim c01e54b770 f2fs: support swapfile
This patch adds f2fs_bmap operation to the data address space.
This enables f2fs to support swapfile.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-22 10:48:58 +09:00