Commit Graph

68 Commits

Author SHA1 Message Date
Chao Yu af033b2aa8 f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Daniel Rosenberg 4354994f09 f2fs: checkpoint disabling
Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:39 -07:00
Junling Zheng d440c52d31 f2fs: support superblock checksum
Now we support crc32 checksum for superblock.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30 18:34:18 -07:00
Chao Yu 7c1a000d46 f2fs: add SPDX license identifiers
Remove the verbose license text from f2fs files and replace them with
SPDX tags.  This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12 13:07:10 -07:00
Yunlei He e15d54d500 f2fs: Allocate and stat mem used by free nid bitmap more accurately
This patch used f2fs_bitmap_size macro to calculate mem used by
free nid bitmap, and stat used mem including aligned part.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-28 18:23:26 -07:00
Jaegeuk Kim bb1105e479 f2fs: align memory boundary for bitops
For example, in arm64, free_nid_bitmap should be aligned to word size in order
to use bit operations.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:39 +09:00
Chao Yu 63189b7859 f2fs: wrap all options with f2fs_sb_info.mount_opt
This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:30 +09:00
Chao Yu b6a06cbbb5 f2fs: support hot file extension
This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.

In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:

- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:57 +09:00
Chao Yu 846ae671ad f2fs: expose extension_list sysfs entry
This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:48 +09:00
Chao Yu 199bc3fef2 f2fs: support large nat bitmap
Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.

This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:33 +09:00
Yunlong Song bdbc90fa55 f2fs: don't put dentry page in pagecache into highmem
Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:03 +09:00
Chao Yu 1c1d35df71 f2fs: support inode creation time
This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-25 14:10:39 -08:00
Jaegeuk Kim f236792311 f2fs: allow to recover node blocks given updated checkpoint
If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
forward recovery. Let's relax the condition to recover them.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-22 14:56:59 -08:00
Jaegeuk Kim 1ad71a2712 f2fs: add an ioctl to disable GC for specific file
This patch gives a flag to disable GC on given file, which would be useful, when
user wants to keep its block map. It also conducts in-place-update for dontmove
file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-01-22 14:56:35 -08:00
Jaegeuk Kim ea6767337f f2fs: support quota sys files
This patch supports hidden quota files in the system, which will be used for
Android. It requires up-to-date f2fs-tools later than v1.9.0.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05 16:42:02 -08:00
Jaegeuk Kim 234a968961 f2fs: add quota_ino feature infra
This patch adds quota_ino feature infra to be used for quota files.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05 16:42:01 -08:00
Chao Yu 6afc662e68 f2fs: support flexible inline xattr size
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.

In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.

So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.

To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.

Inode disk layout:
  +----------------------+
  | .i_mode              |
  | ...                  |
  | .i_ext               |
  +----------------------+
  | .i_extra_isize       |
  | .i_inline_xattr_size |-----------+
  | ...                  |           |
  +----------------------+           |
  | .i_addr              |           |
  |  - block address or  |           |
  |  - inline data       |           |
  +----------------------+<---+      v
  |    inline xattr      |    +---inline xattr range
  +----------------------+<---+
  | .i_nid               |
  +----------------------+
  |   node_footer        |
  | (nid, ino, offset)   |
  +----------------------+

Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.

Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-11-05 16:41:50 -08:00
Chao Yu 704956ecf5 f2fs: support inode checksum
This patch adds to support inode checksum in f2fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix verification flow]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-08-03 19:09:26 -07:00
Chao Yu 5c57132eaf f2fs: support project quota
This patch adds to support plain project quota.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-31 16:48:32 -07:00
Chao Yu 7a2af766af f2fs: enhance on-disk inode structure scalability
This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
to indicate that on-disk structure of current inode is extended.

In order to extend, we changed the inode structure a bit:

Original one:

struct f2fs_inode {
	...
	struct f2fs_extent i_ext;
	__le32 i_addr[DEF_ADDRS_PER_INODE];
	__le32 i_nid[DEF_NIDS_PER_INODE];
}

Extended one:

struct f2fs_inode {
        ...
        struct f2fs_extent i_ext;
	union {
		struct {
			__le16 i_extra_isize;
			__le16 i_padding;
			__le32 i_extra_end[0];
		};
		__le32 i_addr[DEF_ADDRS_PER_INODE];
	};
        __le32 i_nid[DEF_NIDS_PER_INODE];
}

Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
we can calculate actual size of reserved space in i_addr, available
attribute fields included in total extra attribute fields for current
inode can be described as below:

  +--------------------+
  | .i_mode            |
  | ...                |
  | .i_ext             |
  +--------------------+
  | .i_extra_isize     |-----+
  | .i_padding         |     |
  | .i_prjid           |     |
  | .i_atime_extra     |     |
  | .i_ctime_extra     |     |
  | .i_mtime_extra     |<----+
  | .i_inode_cs        |<----- store blkaddr/inline from here
  | .i_xattr_cs        |
  | ...                |
  +--------------------+
  |                    |
  |    block address   |
  |                    |
  +--------------------+
  | .i_nid             |
  +--------------------+
  |   node_footer      |
  | (nid, ino, offset) |
  +--------------------+

Hence, with this patch, we would enhance scalability of f2fs inode for
storing more newly added attribute.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-31 16:48:30 -07:00
Chao Yu f247037120 f2fs: make max inline size changeable
This patch tries to make below macros calculating max inline size,
inline dentry field size considerring reserving size-changeable
space:
- MAX_INLINE_DATA
- NR_INLINE_DENTRY
- INLINE_DENTRY_BITMAP_SIZE
- INLINE_RESERVED_SIZE

Then, when inline_{data,dentry} options is enabled, it allows us to
reserve inline space with different size flexibly for adding newly
introduced inode attribute.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-07-31 16:48:29 -07:00
Chao Yu 1f43e2ad7b f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-03 10:04:56 -07:00
Jin Qian b9dd46188e f2fs: sanity check segment count
F2FS uses 4 bytes to represent block address. As a result, supported
size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments.

Signed-off-by: Jin Qian <jinqian@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02 21:19:48 -07:00
Jaegeuk Kim 29fa6c5678 f2fs: add parentheses for macro variables more
This patch adds parentheses for macro variables more in include/linux/f2fs_fs.h.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24 13:13:23 -07:00
Chao Yu 4ac912427c f2fs: introduce free nid bitmap
In scenario of intensively node allocation, free nids will be ran out
soon, then it needs to stop to load free nids by traversing NAT blocks,
in worse case, if NAT blocks does not be cached in memory, it generates
IOs which slows down our foreground operations.

In order to speed up node allocation, in this patch we introduce a new
free_nid_bitmap array, so there is an bitmap table for each NAT block,
Once the NAT block is loaded, related bitmap cache will be switched on,
and bitmap will be set during traversing nat entries in NAT block, later
we can query and update nid usage status in memory completely.

With such implementation, I expect performance of node allocation can be
improved in the long-term after filesystem image is mounted.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-02-27 10:07:47 -08:00