Commit Graph

290 Commits

Author SHA1 Message Date
Dan Carpenter
92e3b40537 jbd2: fix use after free in jbd2_journal_start_reserved()
If start_this_handle() fails then it leads to a use after free of
"handle".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2014-02-17 20:33:01 -05:00
Dmitry Monakhov
a67c848a8b jbd2: rename obsoleted msg JBD->JBD2
Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
2013-12-08 21:14:59 -05:00
Jan Kara
75685071cd jbd2: revise KERN_EMERG error messages
Some of KERN_EMERG printk messages do not really deserve this log
level and the one in log_wait_commit() is even rather useless (the
journal has been previously aborted and *that* is where we should have
been complaining). So make some messages just KERN_ERR and remove the
useless message.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-12-08 21:13:59 -05:00
Theodore Ts'o
f6c07cad08 jbd2: don't BUG but return ENOSPC if a handle runs out of space
If a handle runs out of space, we currently stop the kernel with a BUG
in jbd2_journal_dirty_metadata().  This makes it hard to figure out
what might be going on.  So return an error of ENOSPC, so we can let
the file system layer figure out what is going on, to make it more
likely we can get useful debugging information).  This should make it
easier to debug problems such as the one which was reported by:

    https://bugzilla.kernel.org/show_bug.cgi?id=44731

The only two callers of this function are ext4_handle_dirty_metadata()
and ocfs2_journal_dirty().  The ocfs2 function will trigger a
BUG_ON(), which means there will be no change in behavior.  The ext4
function will call ext4_error_inode() which will print the useful
debugging information and then handle the situation using ext4's error
handling mechanisms (i.e., which might mean halting the kernel or
remounting the file system read-only).

Also, since both file systems already call WARN_ON(), drop the WARN_ON
from jbd2_journal_dirty_metadata() to avoid two stack traces from
being displayed.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com
Acked-by: Joel Becker <jlbec@evilplan.org>
2013-12-08 21:12:59 -05:00
Darrick J. Wong
18a6ea1e5c jbd2: Fix endian mixing problems in the checksumming code
In the jbd2 checksumming code, explicitly declare separate variables with
endianness information so that we don't get confused and screw things up again.
Also fixes sparse warnings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-08-28 14:59:58 -04:00
Theodore Ts'o
41a5b91319 jbd2: invalidate handle if jbd2_journal_restart() fails
If jbd2_journal_restart() fails the handle will have been disconnected
from the current transaction.  In this situation, the handle must not
be used for for any jbd2 function other than jbd2_journal_stop().
Enforce this with by treating a handle which has a NULL transaction
pointer as an aborted handle, and issue a kernel warning if
jbd2_journal_extent(), jbd2_journal_get_write_access(),
jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.

This commit also fixes a bug where jbd2_journal_stop() would trip over
a kernel jbd2 assertion check when trying to free an invalid handle.

Also move the responsibility of setting current->journal_info to
start_this_handle(), simplifying the three users of this function.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Younger Liu <younger.liu@huawei.com>
Cc: Jan Kara <jack@suse.cz>
2013-07-01 08:12:41 -04:00
Theodore Ts'o
39c04153fd jbd2: fix theoretical race in jbd2__journal_restart
Once we decrement transaction->t_updates, if this is the last handle
holding the transaction from closing, and once we release the
t_handle_lock spinlock, it's possible for the transaction to commit
and be released.  In practice with normal kernels, this probably won't
happen, since the commit happens in a separate kernel thread and it's
unlikely this could all happen within the space of a few CPU cycles.

On the other hand, with a real-time kernel, this could potentially
happen, so save the tid found in transaction->t_tid before we release
t_handle_lock.  It would require an insane configuration, such as one
where the jbd2 thread was set to a very high real-time priority,
perhaps because a high priority real-time thread is trying to read or
write to a file system.  But some people who use real-time kernels
have been known to do insane things, including controlling
laser-wielding industrial robots.  :-)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-07-01 08:12:40 -04:00
Theodore Ts'o
fe52d17cdd jbd2: move superblock checksum calculation to jbd2_write_superblock()
Some of the functions which modify the jbd2 superblock were not
updating the checksum before calling jbd2_write_superblock().  Move
the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so
that the checksum is calculated consistently.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: stable@vger.kernel.org
2013-07-01 08:12:38 -04:00
Paul Gortmaker
75497d0607 jbd2: remove debug dependency on debug_fs and update Kconfig help text
Commit b6e96d0067 ("jbd2: use module parameters instead of debugfs
for jbd_debug") removed any need for a dependency on DEBUG_FS.  It
also moved the /sys variables out from underneath the typical debugfs
mount point.  Delete the dependency and update the /sys path to where
the debug settings are currently.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 23:07:51 -04:00
Paul Gortmaker
169f1a2a87 jbd2: use a single printk for jbd_debug()
Since the jbd_debug() is implemented with two separate printk()
calls, it can lead to corrupted and misleading debug output like
the following (see lines marked with "*"):

[  290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes
[  290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104
[  290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ
[* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit:
[* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103
[* 290.339382] JBD2: starting commit of transaction 42104
[  290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records
[  290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079

i.e. the debug output from log_wait_commit and journal_commit_transaction
have become interleaved.  The output should have been:

(fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103
(fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104

It is expected that this is not easy to replicate -- I was only able
to cause it on preempt-rt kernels, and even then only under heavy
I/O load.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Suggested-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 23:04:04 -04:00
Paul Gortmaker
cfc7bc896f jbd2: fix duplicate debug label for phase 2
Currently we see this output:

  $git grep phase fs/jbd2
  fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 1\n");
  fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 2\n");
  fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 2\n");
  fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 3\n");
  fs/jbd2/commit.c:       jbd_debug(3, "JBD2: commit phase 4\n");
  [...]

There is clearly a duplicate label for phase 2, and they are
both active (i.e. not in #if ... #else block).  Rename them to
be "2a" and "2b" so the debug output is unambiguous.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 22:56:35 -04:00
Paul Gortmaker
0ef54180e0 jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()
While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
 ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
 ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
 ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
 [<ffffffff8172dc34>] schedule+0x24/0x70
 [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
 [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
 [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
 [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
 [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
 [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
 [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
 [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
 [<ffffffff8114370f>] do_rmdir+0xdf/0x120
 [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
 [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
 [<ffffffff817361ae>] ? int_signal+0x12/0x17
 [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
 [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space.  And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction().  There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 22:47:35 -04:00
Paul Gortmaker
3ca841c106 jbd2: relocate assert after state lock in journal_commit_transaction()
The state lock is taken after we are doing an assert on the state
value, not before.  So we might in fact be doing an assert on a
transient value.  Ensure the state check is within the scope of
the state lock being taken.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 22:46:35 -04:00
Dmitry Monakhov
9ff8644624 jbd2: optimize jbd2_journal_force_commit
Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-12 22:24:07 -04:00
Jan Kara
8f7d89f368 jbd2: transaction reservation support
In some cases we cannot start a transaction because of locking
constraints and passing started transaction into those places is not
handy either because we could block transaction commit for too long.
Transaction reservation is designed to solve these issues.  It
reserves a handle with given number of credits in the journal and the
handle can be later attached to the running transaction without
blocking on commit or checkpointing.  Reserved handles do not block
transaction commit in any way, they only reduce maximum size of the
running transaction (because we have to always be prepared to
accomodate request for attaching reserved handle).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:35:11 -04:00
Jan Kara
f29fad7210 jbd2: remove unused waitqueues
j_wait_logspace and j_wait_checkpoint are unused.  Remove them.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:24:11 -04:00
Jan Kara
fe1e8db598 jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend()
jbd2_journal_extend() first checked whether transaction can accept
extending handle with more credits and then added credits to
t_outstanding_credits.  This can race with start_this_handle() adding
another handle to a transaction and thus overbooking a transaction.
Make jbd2_journal_extend() use atomic_add_return() to close the race.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:22:15 -04:00
Jan Kara
76c3990456 jbd2: cleanup needed free block estimates when starting a transaction
__jbd2_log_space_left() and jbd_space_needed() were kind of odd.
jbd_space_needed() accounted also credits needed for currently
committing transaction while it didn't account for credits needed for
control blocks.  __jbd2_log_space_left() then accounted for control
blocks as a fraction of free space.  Since results of these two
functions are always only compared against each other, this works
correct but is somewhat strange.  Move the estimates so that
jbd_space_needed() returns number of blocks needed for a transaction
including control blocks and __jbd2_log_space_left() returns free
space in the journal (with the committing transaction already
subtracted).  Rename functions to jbd2_log_space_left() and
jbd2_space_needed() while we are changing them.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:12:57 -04:00
Jan Kara
2f387f849b jbd2: remove outdated comment
The comment about credit estimates isn't true anymore. We do what the
comment describes now.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:10:11 -04:00
Jan Kara
b34090e5e2 jbd2: refine waiting for shadow buffers
Currently when we add a buffer to a transaction, we wait until the
buffer is removed from BJ_Shadow list (so that we prevent any changes
to the buffer that is just written to the journal).  This can take
unnecessarily long as a lot happens between the time the buffer is
submitted to the journal and the time when we remove the buffer from
BJ_Shadow list.  (e.g.  We wait for all data buffers in the
transaction, we issue a cache flush, etc.)  Also this creates a
dependency of do_get_write_access() on transaction commit (namely
waiting for data IO to complete) which we want to avoid when
implementing transaction reservation.

So we modify commit code to set new BH_Shadow flag when temporary
shadowing buffer is created and we clear that flag once IO on that
buffer is complete.  This allows do_get_write_access() to wait only
for BH_Shadow bit and thus removes the dependency on data IO
completion.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:08:56 -04:00
Jan Kara
e5a120aeb5 jbd2: remove journal_head from descriptor buffers
Similarly as for metadata buffers, also log descriptor buffers don't
really need the journal head. So strip it and remove BJ_LogCtl list.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:06:01 -04:00
Jan Kara
f5113effc2 jbd2: don't create journal_head for temporary journal buffers
When writing metadata to the journal, we create temporary buffer heads
for that task.  We also attach journal heads to these buffer heads but
the only purpose of the journal heads is to keep buffers linked in
transaction's BJ_IO list.  We remove the need for journal heads by
reusing buffer_head's b_assoc_buffers list for that purpose.  Also
since BJ_IO list is just a temporary list for transaction commit, we
use a private list in jbd2_journal_commit_transaction() for that thus
removing BJ_IO list from transaction completely.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2013-06-04 12:01:45 -04:00
Darrick J. Wong
eee06c5678 jbd2: fix block tag checksum verification brokenness
Al Viro complained of a ton of bogosity with regards to the jbd2 block
tag header checksum.  This one checksum is 16 bits, so cut off the
upper 16 bits and treat it as a 16-bit value and don't mess around
with be32* conversions.  Fortunately metadata checksumming is still
"experimental" and not in a shipping e2fsprogs, so there should be few
users affected by this.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2013-05-28 07:31:59 -04:00
Zheng Liu
5d9cf9c625 jbd2: use kmem_cache_zalloc for allocating journal head
This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/
memset when a new journal head is alloctated.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
2013-05-28 07:27:11 -04:00
Lukas Czerner
259709b07d jbd2: change jbd2_journal_invalidatepage to accept length
invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2013-05-21 23:20:03 -04:00