With below serials, we will lose parts of dirents:
1) mount f2fs with inline_dentry option
2) echo 1 > /sys/fs/f2fs/sdX/dir_level
3) mkdir dir
4) touch 180 files named [1-180] in dir
5) touch 181 in dir
6) echo 3 > /proc/sys/vm/drop_caches
7) ll dir
ls: cannot access 2: No such file or directory
ls: cannot access 4: No such file or directory
ls: cannot access 5: No such file or directory
ls: cannot access 6: No such file or directory
ls: cannot access 8: No such file or directory
ls: cannot access 9: No such file or directory
...
total 360
drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
-rw-r--r-- 1 root root 0 Feb 19 15:12 1
-rw-r--r-- 1 root root 0 Feb 19 15:12 10
-rw-r--r-- 1 root root 0 Feb 19 15:12 100
-????????? ? ? ? ? ? 101
-????????? ? ? ? ? ? 102
-????????? ? ? ? ? ? 103
...
The reason is: when doing the inline dir conversion, we didn't consider
that directory has hierarchical hash structure which can be configured
through sysfs interface 'dir_level'.
By default, dir_level of directory inode is 0, it means we have one bucket
in hash table located in first level, all dirents will be hashed in this
bucket, so it has no problem for us to do the duplication simply between
inline dentry page and converted normal dentry page.
However, if we configured dir_level with the value N (greater than 0), it
will expand the bucket number of first level hash table by 2^N - 1, it
hashs dirents into different buckets according their hash value, if we
still move all dirents to first bucket, it makes incorrent locating for
inline dirents, the result is, although we can iterate all dirents through
->readdir, we can't stat some of them in ->lookup which based on hash
table searching.
This patch fixes this issue by rehashing dirents into correct position
when converting inline directory.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch preallocates data blocks for buffered aio writes.
With this patch, we can avoid redundant locking and unlocking of node pages
given consecutive aio request.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are redundant pointer conversion in following call stack:
- at position a, inode was been converted to f2fs_file_info.
- at position b, f2fs_file_info was been converted to inode again.
- truncate_blocks(inode,..)
- fi = F2FS_I(inode) ---a
- ADDRS_PER_PAGE(node_page, fi)
- addrs_per_inode(fi)
- inode = &fi->vfs_inode ---b
- f2fs_has_inline_xattr(inode)
- fi = F2FS_I(inode)
- is_inode_flag_set(fi,..)
In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add annotation to let us know more clearly about space utilization
information of regular dentry and inline dentry.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.
The underlying precedure is:
1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.
2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.
3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info}
as annotation of this field descripts its meaning well to us.
By this way, we can avoid long statement in code of following patches.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds two macros for transition between byte and block offsets.
Currently, f2fs only supports 4KB blocks, so use the default size for now.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds FASTBOOT flag into checkpoint as follows.
- CP_UMOUNT_FLAG is set when system is umounted.
- CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries
was done.
So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly.
Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.
In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.
change log from v1:
o fix typo pointed out by Jaegeuk Kim.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.
This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>