Ye Bin
1b45cc5c7b
ext4: fix potential out of bound read in ext4_fc_replay_scan()
...
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain
space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read
when mounting corrupt file system image.
ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this
three tags will read data during scan, tag length couldn't less than data length
which will read.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com >
Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
dcc5827484
ext4: factor out ext4_fc_get_tl()
...
Factor out ext4_fc_get_tl() to fill 'tl' with host byte order.
Signed-off-by: Ye Bin <yebin10@huawei.com >
Link: https://lore.kernel.org/r/20220924075233.2315259-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
fdc2a3c75d
ext4: introduce EXT4_FC_TAG_BASE_LEN helper
...
Introduce EXT4_FC_TAG_BASE_LEN helper for calculate length of
struct ext4_fc_tl.
Signed-off-by: Ye Bin <yebin10@huawei.com >
Link: https://lore.kernel.org/r/20220924075233.2315259-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
7ff5fddadd
ext4: factor out ext4_free_ext_path()
...
Factor out ext4_free_ext_path() to free extent path. As after previous patch
'ext4_ext_drop_refs()' is only used in 'extents.c', so make it static.
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220924021211.3831551-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
27cd497803
ext4: update 'state->fc_regions_size' after successful memory allocation
...
To avoid to 'state->fc_regions_size' mismatch with 'state->fc_regions'
when fail to reallocate 'fc_reqions',only update 'state->fc_regions_size'
after 'state->fc_regions' is allocated successfully.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220921064040.3693255-4-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
7069d105c1
ext4: fix potential memory leak in ext4_fc_record_regions()
...
As krealloc may return NULL, in this case 'state->fc_regions' may not be
freed by krealloc, but 'state->fc_regions' already set NULL. Then will
lead to 'state->fc_regions' memory leak.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220921064040.3693255-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
9305721a30
ext4: fix potential memory leak in ext4_fc_record_modified_inode()
...
As krealloc may return NULL, in this case 'state->fc_modified_inodes'
may not be freed by krealloc, but 'state->fc_modified_inodes' already
set NULL. Then will lead to 'state->fc_modified_inodes' memory leak.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220921064040.3693255-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:54 -04:00
Ye Bin
e64e6ca909
ext4: adjust fast commit disable judgement order in ext4_fc_track_inode
...
If fastcommit is already disabled, there isn't need to mark inode ineligible.
So move 'ext4_fc_disabled()' judgement bofore 'ext4_should_journal_data(inode)'
judgement which can avoid to do meaningless judgement.
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220916083836.388347-3-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:52 -04:00
Ye Bin
b7b80a35fb
ext4: factor out ext4_fc_disabled()
...
Factor out ext4_fc_disabled(). No functional change.
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220916083836.388347-2-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:52 -04:00
Ye Bin
ccbf8eeb39
ext4: fix miss release buffer head in ext4_fc_write_inode
...
In 'ext4_fc_write_inode' function first call 'ext4_get_inode_loc' get 'iloc',
after use it miss release 'iloc.bh'.
So just release 'iloc.bh' before 'ext4_fc_write_inode' return.
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/20220914100859.1415196-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-09-30 23:46:52 -04:00
Linus Torvalds
9daee913dc
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
...
Pull ext4 updates from Ted Ts'o:
"Add new ioctls to set and get the file system UUID in the ext4
superblock and improved the performance of the online resizing of file
systems with bigalloc enabled.
Fixed a lot of bugs, in particular for the inline data feature,
potential races when creating and deleting inodes with shared extended
attribute blocks, and the handling of directory blocks which are
corrupted"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits)
ext4: add ioctls to get/set the ext4 superblock uuid
ext4: avoid resizing to a partial cluster size
ext4: reduce computation of overhead during resize
jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted
ext4: block range must be validated before use in ext4_mb_clear_bb()
mbcache: automatically delete entries from cache on freeing
mbcache: Remove mb_cache_entry_delete()
ext2: avoid deleting xattr block that is being reused
ext2: unindent codeblock in ext2_xattr_set()
ext2: factor our freeing of xattr block reference
ext4: fix race when reusing xattr blocks
ext4: unindent codeblock in ext4_xattr_block_set()
ext4: remove EA inode entry from mbcache on inode eviction
mbcache: add functions to delete entry if unused
mbcache: don't reclaim used entries
ext4: make sure ext4_append() always allocates new block
ext4: check if directory block is within i_size
ext4: reflect mb_optimize_scan value in options file
ext4: avoid remove directory when directory is corrupted
ext4: aligned '*' in comments
...
2022-08-04 20:13:46 -07:00
Jan Kara
4978c659e7
ext4: use ext4_debug() instead of jbd_debug()
...
We use jbd_debug() in some places in ext4. It seems a bit strange to use
jbd2 debugging output function for ext4 code. Also these days
ext4_debug() uses dynamic printk so each debug message can be enabled /
disabled on its own so the time when it made some sense to have these
combined (to allow easier common selecting of messages to report) has
passed. Just convert all jbd_debug() uses in ext4 to ext4_debug().
Signed-off-by: Jan Kara <jack@suse.cz >
Reviewed-by: Lukas Czerner <lczerner@redhat.com >
Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-08-02 23:52:19 -04:00
Bart Van Assche
67c0f55630
fs/ext4: Use the new blk_opf_t type
...
Improve static type checking by using the new blk_opf_t type for
variables that represent request flags.
Cc: Theodore Ts'o <tytso@mit.edu >
Cc: Baokun Li <libaokun1@huawei.com >
Cc: Ye Bin <yebin10@huawei.com >
Signed-off-by: Bart Van Assche <bvanassche@acm.org >
Link: https://lore.kernel.org/r/20220714180729.1065367-52-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-14 12:14:32 -06:00
Bart Van Assche
1420c4a549
fs/buffer: Combine two submit_bh() and ll_rw_block() arguments
...
Both submit_bh() and ll_rw_block() accept a request operation type and
request flags as their first two arguments. Micro-optimize these two
functions by combining these first two arguments into a single argument.
This patch does not change the behavior of any of the modified code.
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Jan Kara <jack@suse.cz >
Acked-by: Song Liu <song@kernel.org > (for the md changes)
Signed-off-by: Bart Van Assche <bvanassche@acm.org >
Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk >
2022-07-14 12:14:32 -06:00
Lv Ruyi
784a09951c
ext4: remove unnecessary conditionals
...
iput() has already handled null and non-null parameter, so it is no
need to use if().
Reported-by: Zeal Robot <zealci@zte.com.cn >
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn >
Link: https://lore.kernel.org/r/20220411032337.2517465-1-lv.ruyi@zte.com.cn
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-05-13 16:27:24 -04:00
Yu Zhe
c30365b90a
ext4: remove unnecessary type castings
...
remove unnecessary void* type castings.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com >
Link: https://lore.kernel.org/r/20220401081321.73735-1-yuzhe@nfschina.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-05-11 15:19:06 -04:00
Ritesh Harjani
5641ace544
ext4: add commit tid info in ext4_fc_commit_start/stop trace events
...
This adds commit_tid info in ext4_fc_commit_start/stop which is helpful
in debugging fast_commit issues.
For e.g. issues where due to jbd2 journal full commit, FC miss to commit
updates to a file.
Also improves TP_prink format string i.e. all ext4 and jbd2 trace events
starts with "dev MAjOR,MINOR". Let's follow the same convention while we
are still at it.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com >
Link: https://lore.kernel.org/r/ebcd6b9ab5b718db30f90854497886801ce38c63.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-15 17:45:36 -04:00
Ritesh Harjani
d9bf099cb9
ext4: add commit_tid info in jbd debug log
...
This adds commit_tid argument in ext4_fc_update_stats()
so that we can add this information too in jbd_debug logs.
This is also required in a later patch to pass the commit_tid info in
ext4_fc_commit_start/stop() trace events.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com >
Link: https://lore.kernel.org/r/dabda3f2919a60e01887e798bf5915216b451733.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-15 17:45:36 -04:00
Ritesh Harjani
1d2e2440c5
ext4: add transaction tid info in fc_track events
...
This patch adds the transaction & inode tid info in trace events for
callers of ext4_fc_track_template(). This is helpful in debugging race
conditions where an inode could belong to two different transaction tids.
It also fixes the checkpatch warnings which says use tabs instead of
spaces.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Link: https://lore.kernel.org/r/c203c09dc11bb372803c430f621f25a4b8c2c8b4.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-15 17:45:36 -04:00
Ritesh Harjani
08f4c42aba
ext4: add new trace event in ext4_fc_cleanup
...
This adds a new trace event in ext4_fc_cleanup() which is helpful in debugging
some fast_commit issues.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Link: https://lore.kernel.org/r/794cdb1d5d3622f3f80d30c222ff6652ea68c375.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-15 17:45:36 -04:00
Ritesh Harjani
78be0471da
ext4: return early for non-eligible fast_commit track events
...
Currently ext4_fc_track_template() checks, whether the trace event
path belongs to replay or does sb has ineligible set, if yes it simply
returns. This patch pulls those checks before calling
ext4_fc_track_template() in the callers of ext4_fc_track_template().
[ Add checks to ext4_rename() which calls the __ext4_fc_track_*()
functions directly. -- TYT ]
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Link: https://lore.kernel.org/r/3cd025d9c490218a92e6d8fb30b6123e693373e3.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-15 17:44:46 -04:00
Ritesh Harjani
7f14244084
ext4: do not call FC trace event in ext4_fc_commit() if FS does not support FC
...
This just puts trace_ext4_fc_commit_start(sb) & ktime_get()
for measuring FC commit time, after the check of whether sb
supports JOURNAL_FAST_COMMIT or not.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/d53cf3e535924ec0a1eb41a560e96561b0727e7a.1647057583.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-12 21:26:08 -05:00
Ritesh Harjani
b3998b3bc6
ext4: improve fast_commit performance and scalability
...
Currently ext4_fc_commit_dentry_updates() is of quadratic time
complexity, which is causing performance bottlenecks with high
threads/file/dir count with fs_mark.
This patch makes commit dentry updates (and hence ext4_fc_commit()) path
to linear time complexity. Hence improves the performance of workloads
which does fsync on multiple threads/open files one-by-one.
Absolute numbers in avg file creates per sec (from fs_mark in 1K order)
=======================================================================
no. Order without-patch(K) with-patch(K) Diff(%)
1 1 16.90 17.51 +3.60
2 2,2 32.08 31.80 -0.87
3 3,3 53.97 55.01 +1.92
4 4,4 78.94 76.90 -2.58
5 5,5 95.82 95.37 -0.46
6 6,6 87.92 103.38 +17.58
7 6,10 0.73 126.13 +17178.08
8 6,14 2.33 143.19 +6045.49
workload type
==============
For e.g. 7th row order of 6,10 (2^6 == 64 && 2^10 == 1024)
echo /run/riteshh/mnt/{1..64} |sed -E 's/[[:space:]]+/ -d /g' \
| xargs -I {} bash -c "sudo fs_mark -L 100 -D 1024 -n 1024 -s0 -S5 -d {}"
Perf profile
(w/o patches)
=============================
87.15% [kernel] [k] ext4_fc_commit --> Heavy contention/bottleneck
1.98% [kernel] [k] perf_event_interrupt
0.96% [kernel] [k] power_pmu_enable
0.91% [kernel] [k] update_sd_lb_stats.constprop.0
0.67% [kernel] [k] ktime_get
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com >
Link: https://lore.kernel.org/r/930f35d4fd5f83e2673c868781d9ebf15e91bf4e.1645426817.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-03-02 23:11:15 -05:00
Ritesh Harjani
dbaafbadc5
ext4: use in_range() for range checking in ext4_fc_replay_check_excluded
...
Instead of open coding it, use in_range() function instead.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Link: https://lore.kernel.org/r/8e5526ef14150778871ac7c937c8993c6a09cd3e.1644992610.git.riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-02-25 21:34:56 -05:00
Xin Yin
8fca8a2b0a
ext4: fix incorrect type issue during replay_del_range
...
should not use fast commit log data directly, add le32_to_cpu().
Reported-by: kernel test robot <lkp@intel.com >
Fixes: 0b5b5a62b9 ("ext4: use ext4_ext_remove_space() for fast commit replay delete range")
Cc: stable@kernel.org
Signed-off-by: Xin Yin <yinxin.x@bytedance.com >
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com >
Link: https://lore.kernel.org/r/20220126063146.2302-1-yinxin.x@bytedance.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu >
2022-02-03 10:57:53 -05:00