linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Linus Torvalds	9bfccec24e	Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of bugs fixes, including Zheng and Jan's extent status shrinker fixes, which should improve CPU utilization and potential soft lockups under heavy memory pressure, and Eric Whitney's bigalloc fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) ext4: ext4_da_convert_inline_data_to_extent drop locked page after error ext4: fix suboptimal seek_{data,hole} extents traversial ext4: ext4_inline_data_fiemap should respect callers argument ext4: prevent fsreentrance deadlock for inline_data ext4: forbid journal_async_commit in data=ordered mode jbd2: remove unnecessary NULL check before iput() ext4: Remove an unnecessary check for NULL before iput() ext4: remove unneeded code in ext4_unlink ext4: don't count external journal blocks as overhead ext4: remove never taken branch from ext4_ext_shift_path_extents() ext4: create nojournal_checksum mount option ext4: update comments regarding ext4_delete_inode() ext4: cleanup GFP flags inside resize path ext4: introduce aging to extent status tree ext4: cleanup flag definitions for extent status tree ext4: limit number of scanned extents in status tree shrinker ext4: move handling of list of shrinkable inodes into extent status code ext4: change LRU to round-robin in extent status tree shrinker ext4: cache extent hole in extent status tree for ext4_da_map_blocks() ext4: fix block reservation for bigalloc filesystems ...	2014-12-12 09:28:03 -08:00
Darrick J. Wong	32f3869184	jbd2: fix regression where we fail to initialize checksum seed when loading When we're enabling journal features, we cannot use the predicate jbd2_journal_has_csum_v2or3() because we haven't yet set the sb feature flag fields! Moreover, we just finished loading the shash driver, so the test is unnecessary; calculate the seed always. Without this patch, we fail to initialize the checksum seed the first time we turn on journal_checksum, which means that all journal blocks written during that first mount are corrupt. Transactions written after the second mount will be fine, since the feature flag will be set in the journal superblock. xfstests generic/{034,321,322} are the regression tests. (This is important for 3.18.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM> Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-12-01 21:57:06 -05:00
Theodore Ts'o	d9f39d1e44	jbd2: remove unnecessary NULL check before iput() Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-11-25 20:02:37 -05:00
Theodore Ts'o	d48458d4a7	jbd2: use a better hash function for the revoke table The old hash function didn't work well for 64-bit block numbers, and used undefined (negative) shift right behavior. Use the generic 64-bit hash function instead. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>	2014-10-30 10:53:17 -04:00
Jan Kara	50849db32a	jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list __jbd2_journal_clean_checkpoint_list() returns number of buffers it freed but noone was using the value so just stop doing that. This also allows for simplifying the calling convention for journal_clean_once_cp_list(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-18 00:58:12 -04:00
Jan Kara	cc97f1a7c7	jbd2: avoid pointless scanning of checkpoint lists Yuanhan has reported that when he is running fsync(2) heavy workload creating new files over ramdisk, significant amount of time is spent in __jbd2_journal_clean_checkpoint_list() trying to clean old transactions (but they cannot be cleaned up because flusher hasn't yet checkpointed those buffers). The workload can be generated by: fs_mark -d /fs/ram0/1 -D 2 -N 2560 -n 1000000 -L 1 -S 1 -s 4096 Reduce the amount of scanning by stopping to scan the transaction list once we find a transaction that cannot be checkpointed. Note that this way of cleaning is still enough to keep freeing space in the journal after fully checkpointed transactions. Reported-and-tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-18 00:42:16 -04:00
Dmitry Monakhov	1245799f75	jbd2: jbd2_log_wait_for_space improve error detetcion If EIO happens after we have dropped j_state_lock, we won't notice that the journal has been aborted. So it is reasonable to move this check after we have grabbed the j_checkpoint_mutex and re-grabbed the j_state_lock. This patch helps to prevent false positive complain after EIO. #DMESG: __jbd2_log_wait_for_space: needed 8448 blocks and only had 8386 space available __jbd2_log_wait_for_space: no way to get more journal space in ram1-8 ------------[ cut here ]------------ WARNING: CPU: 15 PID: 6739 at fs/jbd2/checkpoint.c:168 __jbd2_log_wait_for_space+0x188/0x200() Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 15 PID: 6739 Comm: fsstress Tainted: G W 3.17.0-rc2-00429-g684de57 #139 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 00000000000000a8 ffff88077aaab878 ffffffff815c1a8c 00000000000000a8 0000000000000000 ffff88077aaab8b8 ffffffff8106ce8c ffff88077aaab898 ffff8807c57e6000 ffff8807c57e6028 0000000000002100 ffff8807c57e62f0 Call Trace: [<ffffffff815c1a8c>] dump_stack+0x51/0x6d [<ffffffff8106ce8c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff8106ceda>] warn_slowpath_null+0x1a/0x20 [<ffffffff812419f8>] __jbd2_log_wait_for_space+0x188/0x200 [<ffffffff8123be9a>] start_this_handle+0x4da/0x7b0 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810aba87>] ? lockdep_init_map+0xe7/0x180 [<ffffffff8123c5bc>] jbd2__journal_start+0xdc/0x1d0 [<ffffffff811f2414>] ? __ext4_new_inode+0x7f4/0x1330 [<ffffffff81222a38>] __ext4_journal_start_sb+0xf8/0x110 [<ffffffff811f2414>] __ext4_new_inode+0x7f4/0x1330 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff812025bb>] ext4_create+0x8b/0x150 [<ffffffff8117fe3b>] vfs_create+0x7b/0xb0 [<ffffffff8118097b>] do_last+0x7db/0xcf0 [<ffffffff8117e31d>] ? inode_permission+0x4d/0x50 [<ffffffff811845d2>] path_openat+0x242/0x590 [<ffffffff81191a76>] ? __alloc_fd+0x36/0x140 [<ffffffff81184a6a>] do_filp_open+0x4a/0xb0 [<ffffffff81191b61>] ? __alloc_fd+0x121/0x140 [<ffffffff81172f20>] do_sys_open+0x170/0x220 [<ffffffff8117300e>] SyS_open+0x1e/0x20 [<ffffffff811715d6>] SyS_creat+0x16/0x20 [<ffffffff815c7e12>] system_call_fastpath+0x16/0x1b ---[ end trace cd71c831f82059db ]--- Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-16 14:50:50 -04:00
Darrick J. Wong	064d83892e	jbd2: free bh when descriptor block checksum fails Free the buffer head if the journal descriptor block fails checksum verification. This is the jbd2 port of the e2fsprogs patch "e2fsck: free bh on csum verify error in do_one_pass". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org	2014-09-16 14:43:09 -04:00
Darrick J. Wong	feb8c6d3dd	jbd2: fix journal checksum feature flag handling Clear all three journal checksum feature flags before turning on whichever journal checksum options we want. Rearrange the error checking so that newer flags get complained about first. Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-11 11:38:21 -04:00
Gioh Kim	a49058fab2	jbd/jbd2: use non-movable memory for the jbd superblock Sicne the jbd/jbd2 superblock is not released until the file system is unmounted, allocate the buffer cache from the non-moveable area to allow page migration and CMA allocations to more easily succeed. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-04 22:36:35 -04:00
Jan Kara	0e5ecf0a76	jbd2: optimize jbd2_log_do_checkpoint() a bit When we discover written out buffer in transaction checkpoint list we don't have to recheck validity of a transaction. Either this is the last buffer in a transaction - and then we are done - or this isn't and then we can just take another buffer from the checkpoint list without dropping j_list_lock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-04 18:09:29 -04:00
Theodore Ts'o	dc6e8d669c	jbd2: don't call get_bh() before calling __jbd2_journal_remove_checkpoint() The __jbd2_journal_remove_checkpoint() doesn't require an elevated b_count; indeed, until the jh structure gets released by the call to jbd2_journal_put_journal_head(), the bh's b_count is elevated by virtue of the existence of the jh structure. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-04 18:09:22 -04:00
Theodore Ts'o	88fe1acb5b	jbd2: fold __wait_cp_io into jbd2_log_do_checkpoint() __wait_cp_io() is only called by jbd2_log_do_checkpoint(). Fold it in to make it a bit easier to understand. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2014-09-01 21:26:09 -04:00
Theodore Ts'o	be1158cc61	jbd2: fold __process_buffer() into jbd2_log_do_checkpoint() __process_buffer() is only called by jbd2_log_do_checkpoint(), and it had a very complex locking protocol where it would be called with the j_list_lock, and sometimes exit with the lock held (if the return code was 0), or release the lock. This was confusing both to humans and to smatch (which erronously complained that the lock was taken twice). Folding __process_buffer() to the caller allows us to simplify the control flow, making the resulting function easier to read and reason about, and dropping the compiled size of fs/jbd2/checkpoint.c by 150 bytes (over 4% of the text size). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-09-01 21:19:01 -04:00
Darrick J. Wong	db9ee22036	jbd2: fix descriptor block size handling errors with journal_csum It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-08-28 22:22:29 -04:00
Darrick J. Wong	022eaa7517	jbd2: fix infinite loop when recovering corrupt journal blocks When recovering the journal, don't fall into an infinite loop if we encounter a corrupt journal block. Instead, just skip the block and return an error, which fails the mount and thus forces the user to run a full filesystem fsck. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-08-28 22:22:28 -04:00
NeilBrown	743162013d	sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are not given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-16 15:10:39 +02:00
Eric Sandeen	5dd214248f	ext4: disable synchronous transaction batching if max_batch_time==0 The mount manpage says of the max_batch_time option, This optimization can be turned off entirely by setting max_batch_time to 0. But the code doesn't do that. So fix the code to do that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-07-05 19:18:22 -04:00
Peter Zijlstra	4e857c58ef	arch: Mass conversion of smp_mb__() Mostly scripted conversion of the smp_mb__ barriers. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-04-18 14:20:48 +02:00
Theodore Ts'o	66a4cb187b	jbd2: improve error messages for inconsistent journal heads Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-12 16:38:03 -04:00
Theodore Ts'o	0bfea8118d	jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:56:58 -05:00
Theodore Ts'o	6e4862a5bb	jbd2: minimize region locked by j_list_lock in journal_get_create_access() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:46:23 -05:00
Theodore Ts'o	d2eb0b9989	jbd2: check jh->b_transaction without taking j_list_lock jh->b_transaction is adequately protected for reading by the jbd_lock_bh_state(bh), so we don't need to take j_list_lock in __journal_try_to_free_buffer(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-09 00:07:19 -05:00
Theodore Ts'o	d4e839d4a9	jbd2: add transaction to checkpoint list earlier We don't otherwise need j_list_lock during the rest of commit phase #7, so add the transaction to the checkpoint list at the very end of commit phase #6. This allows us to drop j_list_lock earlier, which is a good thing since it is a super hot lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 22:34:10 -05:00
Theodore Ts'o	42cf3452d5	jbd2: calculate statistics without holding j_state_lock and j_list_lock The two hottest locks, and thus the biggest scalability bottlenecks, in the jbd2 layer, are the j_list_lock and j_state_lock. This has inspired some people to do some truly unnatural things[1]. [1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf We don't need to be holding both j_state_lock and j_list_lock while calculating the journal statistics, so move those calculations to the very end of jbd2_journal_commit_transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-08 19:51:16 -05:00

1 2 3 4 5 ...

318 Commits