Commit Graph

202 Commits

Author SHA1 Message Date
Jaegeuk Kim 86531d6b84 f2fs: callers take care of the page from bio error
This patch changes for a caller to handle the page after its bio gets an error.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:07 -07:00
Chao Yu 8f46dcaea8 f2fs: expose f2fs_write_cache_pages
If there are gced dirty pages and normal dirty pages in the mapping
of one inode, we might writeback them alternately with discontinuous
block address, resulting in low performance.

This patch introduces f2fs_write_cache_pages with codes copied from
write_cache_pages in mm/page-writeback.c.

In this function, we refactor flow with two steps:
1) writeback all cold type pages.
2) writeback all non-cold type pages.

By using this method, f2fs will writeback dirty pages with the same
temperature in bunch mode, it makes writeouted block being with
more continuous address, so they can be merged as much as possible
in f2fs bio cache, and also it will reduce the chance of submiting
small IO from block layer.

Test environment: 8g nokia sd card (very old sd card, but it shows
better effect when testing with this patch, and with a 32g kingston
sd card, I didn't see much more improvement).

Test step:
1. touch testfile;
2. truncate -s 512K testfile;
3. write all pages with odd index;
4. trigger gc by ioctl;
5. write all pages with even index;
6. time fsync testfile.

before:
real	0m0.402s
user	0m0.000s
sys	0m0.000s

after:
real	0m0.143s
user	0m0.004s
sys	0m0.004s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:59 -07:00
Chao Yu a28ef1f5ae f2fs: maintain extent cache in separated file
This patch moves extent cache related code from data.c into extent_cache.c
since extent cache is independent feature, and its codes are not relate to
others in data.c, it's better for us to maintain them in separated place.

There is no functionality change, but several small coding style fixes
including:
* rename __drop_largest_extent to f2fs_drop_largest_extent for exporting;
* rename misspelled word 'untill' to 'until';
* remove unneeded 'return' in the end of f2fs_destroy_extent_tree().

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:58 -07:00
Fan Li 3c7df87dad f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LEN
Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will
be kept in extent cache after split, extents already shorter than
F2FS_MIN_EXTENT_LEN don't need to try split at all.

Signed-off-by: Fan Li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:58 -07:00
Chao Yu 90d4388ac2 f2fs: fix to update page flag
This patch fixes to update page flag (e.g. Uptodate/cold flag) in
->write_begin.

Otherwise, page will be non-uptodate when we try to write entire
page, and cold data flag in page will not be clean when gced page
is being rewritten.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:57 -07:00
Jaegeuk Kim 7023a1ad17 f2fs: shrink unreferenced extent_caches first
If an extent_tree entry has a zero reference count, we can drop it from the
cache in higher priority rather than currently referencing entries.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:57 -07:00
Chao Yu bb96a8d51e f2fs: enhance multithread performance
In ->writepages, we use writepages mutex lock to serialize all block
address allocation and page submitting pairs from different inodes.
This method makes our delayed dirty pages of one inode being written
continously as many as possible.

But there is one problem that we did not submit current cached bio in
protection region of writepages mutex lock, so there is a small chance
that we submit the one of other thread's as below, resulting in
splitting more bios.

thread 1			thread 2
->writepages
  lock(writepages)
  ->write_cache_pages
  unlock(writepages)
				  lock(writepages)
				  ->write_cache_pages
  ->f2fs_submit_merged_bio
				    ->writepage
				  unlock(writepages)

fs_mark-6535  [002] ....  2242.270230: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5766152, size = 524288
fs_mark-6536  [000] ....  2242.270361: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767176, size = 4096
fs_mark-6536  [000] ....  2242.270370: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, NODE, sector = 8138112, size = 4096
fs_mark-6535  [002] ....  2242.270776: f2fs_submit_write_bio: dev = (1,0), WRITE_SYNC, DATA, sector = 5767184, size = 516096

This may really increase time of block layer works, and may cause
larger IO lantency.

This patch moves the submitting operation into region of writepages
mutex lock to avoid bio splits when concurrently writebacking is
intensive.

my test environment: virtual machine,
intel cpu i5 2500, 8GB size memory, 4GB size ramdisk

time fs_mark  -t  16  -L  1  -s  524288  -S  1  -d  /mnt/f2fs/

before:
real	0m4.244s
user	0m0.088s
sys	0m12.336s

after:
real	0m3.822s
user	0m0.072s
sys	0m10.760s

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:57 -07:00
Jaegeuk Kim 84bc926c07 f2fs: check the largest extent at look-up time
Because of the extent shrinker or other -ENOMEM scenarios, it cannot guarantee
that the largest extent would be cached in the tree all the time.

Instead of relying on extent_tree, we can simply check the cached one in extent
tree accordingly.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:56 -07:00
Jaegeuk Kim 3e72f72139 f2fs: use extent_cache by default
We don't need to handle the duplicate extent information.

The integrated rule is:
 - update on-disk extent with largest one tracked by in-memory extent_cache
 - destroy extent_tree for the truncation case
 - drop per-inode extent_cache by shrinker

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:56 -07:00
Jaegeuk Kim 554df79e52 f2fs: shrink extent_cache entries
This patch registers shrinking extent_caches.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:55 -07:00
Jaegeuk Kim 244f4fc1c5 f2fs: set cached_en after checking finally
This patch relocates cached_en not only to be covered by spin_lock, but also
to set once after checking out completely.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim cbe91923a9 f2fs: update on-disk extents even under extent_cache
Previously, f2fs_update_extent_cache() updates in-memory extent_cache all the
time, and then finally preserves its up-to-date extent into on-disk one during
f2fs_evict_inode.

But, in the following scenario:

1. mount
2. open & write an extent X
3. f2fs_evict_inode; on-disk extent is X
4. open & update the extent X with Y
5. sync; trigger checkpoint
6. power-cut

after power-on, f2fs should serve extent Y, but we have an on-disk extent X.

This causes a failure on xfstests/311.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim 7a2cb67867 f2fs: fix wrong block address calculation for a split extent
This patch fixes wrong calculation on block address field when an extent is
split.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-04 14:09:54 -07:00
Jaegeuk Kim 6282adbf93 f2fs: call set_page_dirty to attach i_wb for cgroup
The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
is called, __inc_wb_stat() updates i_wb's stat.

So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
any writebacking pages.

This patch should resolve the following kernel panic reported by Andreas Reis.

https://bugzilla.kernel.org/show_bug.cgi?id=101801

--- Comment #2 from Andreas Reis <andreas.reis@gmail.com> ---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
__percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-07-25 08:54:26 -07:00
Chao Yu 1237702471 f2fs: avoid duplicated code by reusing f2fs_read_end_io
This patch tries to clean up code because part code of f2fs_read_end_io
and mpage_end_io are the same, so it's better to merge and reuse them.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-06-01 16:21:04 -07:00
Jaegeuk Kim 4375a33664 f2fs crypto: add encryption support in read/write paths
This patch adds encryption support in read and write paths.

Note that, in f2fs, we need to consider cleaning operation.
In cleaning procedure, we must avoid encrypting and decrypting written blocks.
So, this patch implements move_encrypted_block().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:52 -07:00
Jaegeuk Kim fcc85a4d86 f2fs crypto: activate encryption support for fs APIs
This patch activates the following APIs for encryption support.

The rules quoted by ext4 are:
 - An unencrypted directory may contain encrypted or unencrypted files
   or directories.
 - All files or directories in a directory must be protected using the
   same key as their containing directory.
 - Encrypted inode for regular file should not have inline_data.
 - Encrypted symlink and directory may have inline_data and inline_dentry.

This patch activates the following APIs.
1. f2fs_link              : validate context
2. f2fs_lookup            :      ''
3. f2fs_rename            :      ''
4. f2fs_create/f2fs_mkdir : inherit its dir's context
5. f2fs_direct_IO         : do buffered io for regular files
6. f2fs_open              : check encryption info
7. f2fs_file_mmap         :      ''
8. f2fs_setattr           :      ''
9. f2fs_file_write_iter   :      ''           (Called by sys_io_submit)
10. f2fs_fallocate        : do not support fcollapse
11. f2fs_evict_inode      : free_encryption_info

Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:51 -07:00
Jaegeuk Kim 7f63eb77af f2fs: report unwritten area in f2fs_fiemap
This patch slightly changes f2fs_fiemap function to report unwritten area.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:45 -07:00
Jaegeuk Kim 43f3eae1d3 f2fs: split find_data_page according to specific purposes
This patch splits find_data_page as follows.

1. f2fs_gc
 - use get_read_data_page() with read only

2. find_in_level
 - use find_data_page without locked page

3. truncate_partial_page
 - In the case cache_only mode, just drop cached page.
 - Ohterwise, use get_lock_data_page() and guarantee to truncate

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:37 -07:00
Jaegeuk Kim 01f28610a1 f2fs: fix race on allocating and deallocating a dentry block
There are two threads:
 f2fs_delete_entry()              get_new_data_page()
                                  f2fs_reserve_block()
				  dn.blkaddr = XXX
 lock_page(dentry_block)
 truncate_hole()
 dn.blkaddr = NULL
 unlock_page(dentry_block)
                                  lock_page(dentry_block)
                                  fill the block from XXX address
                                  add new dentries
                                  unlock_page(dentry_block)

Later, f2fs_write_data_page() will truncate the dentry_block, since
its block address is NULL.

The reason for this was due to the wrong lock order.
In this case, we should do f2fs_reserve_block() after locking its dentry block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:35 -07:00
Jaegeuk Kim 05ca3632e5 f2fs: add sbi and page pointer in f2fs_io_info
This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure.
With this change, we can reduce a lot of parameters for IO functions.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:32 -07:00
Jaegeuk Kim f1e8866016 f2fs: expose f2fs_mpage_readpages
This patch implements f2fs_mpage_readpages for further optimization on
encryption support.

The basic code was taken from fs/mpage.c, and changed to be simple by adjusting
that block_size is equal to page_size in f2fs.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:30 -07:00
Jaegeuk Kim 003a3e1d60 f2fs: add f2fs_map_blocks
This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks.
Now, f2fs uses f2fs_map_blocks when handling get_block.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-05-28 15:41:29 -07:00
Jaegeuk Kim 5463e7c18e Revert "f2fs: enhance multi-threads performance"
This reports performance regression by Yuanhan Liu.
The basic idea was to reduce one-point mutex, but it turns out this causes
another contention like context swithes.

https://lkml.org/lkml/2015/4/21/11

Until finishing the analysis on this issue, I'd like to revert this for a while.

This reverts commit 78373b7319.
2015-05-04 14:15:15 -07:00
Linus Torvalds 06a60deca8 Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - in-memory extent_cache
   - fs_shutdown to test power-off-recovery
   - use inline_data to store symlink path
   - show f2fs as a non-misc filesystem

  Major fixes:
   - avoid CPU stalls on sync_dirty_dir_inodes
   - fix some power-off-recovery procedure
   - fix handling of broken symlink correctly
   - fix missing dot and dotdot made by sudden power cuts
   - handle wrong data index during roll-forward recovery
   - preallocate data blocks for direct_io

  ... and a bunch of minor bug fixes and cleanups"

* tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits)
  f2fs: pass checkpoint reason on roll-forward recovery
  f2fs: avoid abnormal behavior on broken symlink
  f2fs: flush symlink path to avoid broken symlink after POR
  f2fs: change 0 to false for bool type
  f2fs: do not recover wrong data index
  f2fs: do not increase link count during recovery
  f2fs: assign parent's i_mode for empty dir
  f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries
  f2fs: fix mismatching lock and unlock pages for roll-forward recovery
  f2fs: fix sparse warnings
  f2fs: limit b_size of mapped bh in f2fs_map_bh
  f2fs: persist system.advise into on-disk inode
  f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get
  f2fs: preallocate fallocated blocks for direct IO
  f2fs: enable inline data by default
  f2fs: preserve extent info for extent cache
  f2fs: initialize extent tree with on-disk extent info of inode
  f2fs: introduce __{find,grab}_extent_tree
  f2fs: split set_data_blkaddr from f2fs_update_extent_cache
  f2fs: enable fast symlink by utilizing inline data
  ...
2015-04-18 11:17:20 -04:00