Commit Graph

116 Commits

Author SHA1 Message Date
Tobin C. Harding
f69120ce6c jbd2: fix sphinx kernel-doc build warnings
Sphinx emits various (26) warnings when building make target 'htmldocs'.
Currently struct definitions contain duplicate documentation, some as
kernel-docs and some as standard c89 comments.  We can reduce
duplication while cleaning up the kernel docs.

Move all kernel-docs to right above each struct member.  Use the set of
all existing comments (kernel-doc and c89).  Add documentation for
missing struct members and function arguments.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2018-01-10 00:27:29 -05:00
Jan Kara
b8a6176c21 ext4: Support for synchronous DAX faults
We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
prepare blocks for writing and the inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case
(through VM_FAULT_NEEDDSYNC return value) and call helper
dax_finish_sync_fault() to flush metadata changes and insert page table
entry. Note that this will also dirty corresponding radix tree entry
which is what we want - fsync(2) will still provide data integrity
guarantees for applications not using userspace flushing. And
applications using userspace flushing can avoid calling fsync(2) and
thus avoid the performance overhead.

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-11-03 06:26:26 -07:00
Michal Hocko
81378da64d jbd2: mark the transaction context with the scope GFP_NOFS context
now that we have memalloc_nofs_{save,restore} api we can mark the whole
transaction context as implicitly GFP_NOFS.  All allocations will
automatically inherit GFP_NOFS this way.  This means that we do not have
to mark any of those requests with GFP_NOFS and moreover all the
ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

[akpm@linux-foundation.org: tweak comments]
Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Brian Foster <bfoster@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03 15:52:09 -07:00
Jan Kara
1eaa566d36 jbd2: track more dependencies on transaction commit
So far we were tracking only dependency on transaction commit due to
starting a new handle (which may require commit to start a new
transaction). Now add tracking also for other cases where we wait for
transaction commit. This way lockdep can catch deadlocks e. g. because we
call jbd2_journal_stop() for a synchronous handle with some locks held
which rank below transaction start.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-30 11:40:54 -04:00
Jan Kara
ab714aff4f jbd2: move lockdep tracking to journal_s
Currently lockdep map is tracked in each journal handle. To be able to
expand lockdep support to cover also other cases where we depend on
transaction commit and where handle is not available, move lockdep map
into struct journal_s. Since this makes the lockdep map shared for all
handles, we have to use rwsem_acquire_read() for acquisitions now.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-30 11:39:38 -04:00
Luis de Bethencourt
466c3fb618 jbd2: remove excess descriptions for handle_s
Commit bf6993276f ("jbd2: Use tracepoints for history file")
removed the members j_history, j_history_max and j_history_cur from struct
handle_s but the descriptions stayed lingering. Removing them.

Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2016-05-05 22:35:54 -04:00
Jan Kara
41617e1a8d jbd2: add support for avoiding data writes during transaction commits
Currently when filesystem needs to make sure data is on permanent
storage before committing a transaction it adds inode to transaction's
inode list. During transaction commit, jbd2 writes back all dirty
buffers that have allocated underlying blocks and waits for the IO to
finish. However when doing writeback for delayed allocated data, we
allocate blocks and immediately submit the data. Thus asking jbd2 to
write dirty pages just unnecessarily adds more work to jbd2 possibly
writing back other redirtied blocks.

Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
outstanding data writes before committing a transaction and thus avoid
unnecessary writes.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-24 00:56:07 -04:00
Jan Kara
1101cd4d13 jbd2: unify revoke and tag block checksum handling
Revoke and tag descriptor blocks are just different kinds of descriptor
blocks and thus have checksum in the same place. Unify computation and
checking of checksums for these.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-22 23:19:09 -05:00
Jan Kara
32ab671599 jbd2: factor out common descriptor block initialization
Descriptor block header is initialized in several places. Factor out the
common code into jbd2_journal_get_descriptor_buffer().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-22 23:17:15 -05:00
Jan Kara
9bcf976cb8 jbd2: remove unnecessary arguments of jbd2_journal_write_revoke_records
jbd2_journal_write_revoke_records() takes journal pointer and write_op,
although journal can be obtained from the passed transaction and
write_op is always WRITE_SYNC. Remove these superfluous arguments.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-22 23:07:30 -05:00
Daeho Jeong
4327ba52af ext4, jbd2: ensure entering into panic after recording an error in superblock
If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()

Tested-by: Hobin Woo <hobin.woo@samsung.com>
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2015-10-18 17:02:56 -04:00
Darrick J. Wong
56316a0d28 jbd2: clean up feature test macros with predicate functions
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros.  Furthermore, clean out the
places where we open-coded feature tests.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 16:18:45 -04:00
Darrick J. Wong
6a797d2737 ext4: call out CRC and corruption errors with specific error codes
Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-17 16:16:04 -04:00
Darrick J. Wong
8595798ca3 jbd2: gate checksum calculations on crc driver presence, not sb flags
Change the journal's checksum functions to gate on whether or not the
crc32c driver is loaded, and gate the loading on the superblock bits.
This prevents a journal crash if someone loads a journal in no-csum
mode and then randomizes the superblock, thus flipping on the feature
bits.

Tested-By: Nikolay Borisov <kernel@kyup.com>
Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-10-15 10:30:36 -04:00
Linus Torvalds
ea814ab9aa Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "Pretty much all bug fixes and clean ups for 4.3, after a lot of
  features and other churn going into 4.2"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  Revert "ext4: remove block_device_ejected"
  ext4: ratelimit the file system mounted message
  ext4: silence a format string false positive
  ext4: simplify some code in read_mmp_block()
  ext4: don't manipulate recovery flag when freezing no-journal fs
  jbd2: limit number of reserved credits
  ext4 crypto: remove duplicate header file
  ext4: update c/mtime on truncate up
  jbd2: avoid infinite loop when destroying aborted journal
  ext4, jbd2: add REQ_FUA flag when recording an error in the superblock
  ext4 crypto: fix spelling typo in comment
  ext4 crypto: exit cleanly if ext4_derive_key_aes() fails
  ext4: reject journal options for ext2 mounts
  ext4: implement cgroup writeback support
  ext4: replace ext4_io_submit->io_op with ->io_wbc
  ext4 crypto: check for too-short encrypted file names
  ext4 crypto: use a jbd2 transaction when adding a crypto policy
  jbd2: speedup jbd2_journal_dirty_metadata()
2015-09-03 12:52:19 -07:00
Jan Kara
841df7df19 jbd2: avoid infinite loop when destroying aborted journal
Commit 6f6a6fda29 "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.

Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.

Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Fixes: 6f6a6fda29
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-07-28 14:57:14 -04:00
Jan Kara
c290ea01ab fs: Remove ext3 filesystem driver
The functionality of ext3 is fully supported by ext4 driver. Major
distributions (SUSE, RedHat) already use ext4 driver to handle ext3
filesystems for quite some time. There is some ugliness in mm resulting
from jbd cleaning buffers in a dirty page without cleaning page dirty
bit and also support for buffer bouncing in the block layer when stable
pages are required is there only because of jbd. So let's remove the
ext3 driver. This saves us some 28k lines of duplicated code.

Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-07-23 20:59:40 +02:00
Joseph Qi
6f6a6fda29 jbd2: fix ocfs2 corrupt when updating journal superblock fails
If updating journal superblock fails after journal data has been
flushed, the error is omitted and this will mislead the caller as a
normal case.  In ocfs2, the checkpoint will be treated successfully
and the other node can get the lock to update. Since the sb_start is
still pointing to the old log block, it will rewrite the journal data
during journal recovery by the other node. Thus the new updates will
be overwritten and ocfs2 corrupts.  So in above case we have to return
the error, and ocfs2_commit_cache will take care of the error and
prevent the other node to do update first.  And only after recovering
journal it can do the new updates.

The issue discussion mail can be found at:
https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html
http://comments.gmane.org/gmane.comp.file-systems.ext4/48841

[ Fixed bug in patch which allowed a non-negative error return from
  jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this
  was causing xfstests ext4/306 to fail. -- Ted ]

Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
2015-06-15 14:36:01 -04:00
Davidlohr Bueso
c38fda3fe8 jbd: drop jbd_ENOSYS debug
A quick search shows that there are no users, drop the
macro for both jbd and jbd2.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2015-01-15 10:34:54 +01:00
Jan Kara
50849db32a jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list
__jbd2_journal_clean_checkpoint_list() returns number of buffers it
freed but noone was using the value so just stop doing that. This
also allows for simplifying the calling convention for
journal_clean_once_cp_list().

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-09-18 00:58:12 -04:00
Darrick J. Wong
db9ee22036 jbd2: fix descriptor block size handling errors with journal_csum
It turns out that there are some serious problems with the on-disk
format of journal checksum v2.  The foremost is that the function to
calculate descriptor tag size returns sizes that are too big.  This
causes alignment issues on some architectures and is compounded by the
fact that some parts of jbd2 use the structure size (incorrectly) to
determine the presence of a 64bit journal instead of checking the
feature flags.

Therefore, introduce journal checksum v3, which enlarges the
descriptor block tag format to allow for full 32-bit checksums of
journal blocks, fix the journal tag function to return the correct
sizes, and fix the jbd2 recovery code to use feature flags to
determine 64bitness.

Add a few function helpers so we don't have to open-code quite so
many pieces.

Switching to a 16-byte block size was found to increase journal size
overhead by a maximum of 0.1%, to convert a 32-bit journal with no
checksumming to a 32-bit journal with checksum v3 enabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reported-by: TR Reardon <thomas_reardon@hotmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-08-28 22:22:29 -04:00
Theodore Ts'o
41a5b91319 jbd2: invalidate handle if jbd2_journal_restart() fails
If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction.  In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.

Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Younger Liu <younger.liu@huawei.com>
Cc: Jan Kara <jack@suse.cz>
2013-07-01 08:12:41 -04:00
Paul Gortmaker
169f1a2a87 jbd2: use a single printk for jbd_debug()
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[  290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[  290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[  290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[  290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[  290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved.  The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 23:04:04 -04:00
Paul Gortmaker
c9b3a8ccb7 jbd/jbd2: relocate bit_spinlock header to jbd_common
The bit_spinlock functions are only used for the jbd_lock_bh_state
functions (and friends) in jbd_common.h and are not directly used
by either of jbd.h or jbd2.h content.

The jbd_common file is new as of commit 446066724c ("jdb/jbd2: factor
out common functions from the jbd[2] header files") but common
(and isolated) headers were not considered for factoring at that time.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 23:02:35 -04:00
Dmitry Monakhov
06a407f13d ext4: fix data integrity for ext4_sync_fs
Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.

Also fix data integrity for nojournal mode.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 22:25:07 -04:00