Commit Graph

41 Commits

Author SHA1 Message Date
Dave Kleikamp
c4e35e07af JBD2: Clear buffer_ordered flag for barried IO request on success
In JBD2 jbd2_journal_write_commit_record(), clear the buffer_ordered
flag for the bh after barried IO has succeed. This prevents later, if
the same buffer head were submitted to the underlying device, which has
been reconfigured to not support barrier request, the JBD2 commit code
could treat it as a normal IO (without barrier).

This is a port from JBD/ext3 fix from Neil Brown.

More details from Neil:

Some devices - notably dm and md - can change their behaviour in
response to BIO_RW_BARRIER requests.  They might start out accepting
such requests but on reconfiguration, they find out that they cannot
any more. JBD2 deal with this by always testing if BIO_RW_BARRIER
requests fail with EOPNOTSUPP, and retrying the write
requests without the barrier (probably after waiting for any pending
writes to complete).

However there is a bug in the handling this in JBD2 for ext4 .

When ext4/JBD2 to submit a BIO_RW_BARRIER request,
it sets the buffer_ordered flag on the buffer head.
If the request completes successfully, the flag STAYS SET.

Other code might then write the same buffer_head after the device has
been reconfigured to not accept barriers.  This write will then fail,
but the "other code" is not ready to handle EOPNOTSUPP errors and the
error will be treated as fatal.

Cc:  Neil Brown <neilb@suse.de>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10 01:09:32 -05:00
Aneesh Kumar K.V
4d60517972 JBD2: Use the incompat macro for testing the incompat feature.
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT needs to be checked with
JBD2_HAS_INCOMPAT_FEATURE

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 10:56:15 -05:00
Aneesh Kumar K.V
c4b8e635f5 jbd2: Fix reference counting on the journal commit block's buffer head
With journal checksum patch we added asynchronous commits of journal
commit headers, and accidentally dropped taking a reference on the
buffer head.

(Before the change, sync_dirty_buffer did the get_bh(). The associative
put_bh is done by journal_wait_on_commit_record().)

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 10:55:26 -05:00
Mingming Cao
b048d84626 jbd2: Add error check to journal_wait_on_commit_record to avoid oops
The buffer head pointer passed to journal_wait_on_commit_record() could
be NULL if the previous journal_submit_commit_record() failed or journal
has already aborted.

Looking at the jbd2 debug messages, before the oops happened, the jbd2
is aborted due to trying to access the next log block beyond the end
of device. This might be caused by using a corrupted image.

We need to check the error returns from journal_submit_commit_record()
and avoid calling journal_wait_on_commit_record() in the failure case.

This addresses Kernel Bugzilla #9849

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-05 08:52:45 -05:00
Andi Kleen
e86e14385d BKL-removal: remove incorrect comment refering to lock_kernel() from jbd/jbd2
None of the callers of this function does actually take the BKL as far as I
can see.  So remove the comment refering to the BKL.

Signed-off-by: Andi Kleen <ak@suse.de>
Cc: <linux-ext4@vger.kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:20 -08:00
Nick Piggin
95c354fe9f spinlock: lockbreak cleanup
The break_lock data structure and code for spinlocks is quite nasty.
Not only does it double the size of a spinlock but it changes locking to
a potentially less optimal trylock.

Put all of that under CONFIG_GENERIC_LOCKBREAK, and introduce a
__raw_spin_is_contended that uses the lock data itself to determine whether
there are waiters on the lock, to be used if CONFIG_GENERIC_LOCKBREAK is
not set.

Rename need_lockbreak to spin_needbreak, make it use spin_is_contended to
decouple it from the spinlock implementation, and make it typesafe (rwlocks
do not have any need_lockbreak sites -- why do they even get bloated up
with that break_lock then?).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 13:31:20 +01:00
Mingming Cao
4019191be7 jbd2: sparse pointer use of zero as null
Get rid of sparse related warnings from places that use integer as NULL
pointer.  (Ported from upstream ext3/jbd changes.)

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Mingming Cao
db857da336 jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup
While "every 5 seconds" doesn't sound as a problem, there can be many
of these (and these timers do add up over all the kernel).  The "5
second" wakeup isn't really timing sensitive; in addition even with
rounding it'll still happen every 5 seconds (with the exception of the
very first time, which is likely to be rounded up to somewhere closer
to 6 seconds)

(Ported from similar JBD patch made by Arjan van de Ven to
fs/jbd/transaction.c)

Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Mingming Cao
77160957e2 jbd2: Mark jbd2 slabs as SLAB_TEMPORARY
This patch marks slab allocations by jbd2 as short-lived in support of
Mel Gorman's "Group short-lived and reclaimable kernel allocations"
patch.  (Ported from similar changes made to fs/jbd/journal.c and
fs/jbd/revoke.c in Mel's patch.)

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Mingming Cao
7b7510662f jbd2: add lockdep support
Ported from similar patch for the jbd layer.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-01-28 23:58:27 -05:00
Girish Shilamkar
818d276ceb ext4: Add the journal checksum feature
The journal checksum feature adds two new flags i.e
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.

JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
checksum for the blocks described by the descriptor blocks.
Due to checksums, writing of the commit record no longer needs to be
synchronous. Now commit record can be sent to disk without waiting for
descriptor blocks to be written to disk. This behavior is controlled
using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
able to recover the journal with _ASYNC_COMMIT hence it is made
incompat.
The commit header has been extended to hold the checksum along with the
type of the checksum.

For recovery in pass scan checksums are verified to ensure the sanity
and completeness(in case of _ASYNC_COMMIT) of every transaction.

Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Girish Shilamkar <girish@clusterfs.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-01-28 23:58:27 -05:00
Johann Lombardi
8e85fb3f30 jbd2: jbd2 stats through procfs
The patch below updates the jbd stats patch to 2.6.20/jbd2.
The initial patch was posted by Alex Tomas in December 2005
(http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
It provides statistics via procfs such as transaction lifetime and size.

Sometimes, investigating performance problems, i find useful to have
stats from jbd about transaction's lifetime, size, etc. here is a
patch for review and inclusion probably.

for example, stats after creation of 3M files in htree directory:

[root@bob ~]# cat /proc/fs/jbd/sda/history
R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
R    261   8260  2720  0     0     750   9892   8170  8187
C    259                                                    750   0     4885  1
R    262   20    2200  10    0     770   9836   8170  8187
R    263   30    2200  10    0     3070  9812   8170  8187
R    264   0     5000  10    0     1340  0      0     0
C    261                                                    8240  3212  4957  0
R    265   8260  1470  0     0     4640  9854   8170  8187
R    266   0     5000  10    0     1460  0      0     0
C    262                                                    8210  2989  4868  0
R    267   8230  1490  10    0     4440  9875   8171  8188
R    268   0     5000  10    0     1260  0      0     0
C    263                                                    7710  2937  4908  0
R    269   7730  1470  10    0     3330  9841   8170  8187
R    270   0     5000  10    0     830   0      0     0
C    265                                                    8140  3234  4898  0
C    267                                                    720   0     4849  1
R    271   8630  2740  20    0     740   9819   8170  8187
C    269                                                    800   0     4214  1
R    272   40    2170  10    0     830   9716   8170  8187
R    273   40    2280  0     0     3530  9799   8170  8187
R    274   0     5000  10    0     990   0      0     0


where,

R     - line for transaction's life from T_RUNNING to T_FINISHED
C     - line for transaction's checkpointing
tid   - transaction's id
wait  - for how long we were waiting for new transaction to start
         (the longest period journal_start() took in this transaction)
run   - real transaction's lifetime (from T_RUNNING to T_LOCKED
lock  - how long we were waiting for all handles to close
         (time the transaction was in T_LOCKED)
flush - how long it took to flush all data (data=ordered)
log   - how long it took to write the transaction to the log
hndls - how many handles got to the transaction
block - how many blocks got to the transaction
inlog - how many blocks are written to the log (block + descriptors)
ctime - how long it took to checkpoint the transaction
write - how many blocks have been written during checkpointing
drop  - how many blocks have been dropped during checkpointing
close - how many running transactions have been closed to checkpoint this one

all times are in msec.


[root@bob ~]# cat /proc/fs/jbd/sda/info
280 transaction, each upto 8192 blocks
average:
  1633ms waiting for transaction
  3616ms running transaction
  5ms transaction was being locked
  1ms flushing data (in ordered mode)
  1799ms logging transaction
  11781 handles per transaction
  5629 blocks per transaction
  5641 logged blocks per transaction

Signed-off-by: Johann Lombardi <johann.lombardi@bull.net>
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
2008-01-28 23:58:27 -05:00
Jan Kara
f5a7a6b0d9 jbd2: Fix assertion failure in fs/jbd2/checkpoint.c
Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-01-28 23:58:27 -05:00
Jose R. Santos
6f38c74f5a JBD2: debug code cleanup.
Mostly stolen from akpm's JBD cleanup patch.

- use `#ifdef foo' instead of `#if defined(foo)'

- Make journal_enable_debug __read_mostly just for the heck of it

- Make jbd_debugfs_dir and jbd_debug static

- debugfs_remove(NULL) is legal: remove unneeded tests

- remove unnecessary empty loops

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-17 18:49:59 -04:00
Jan Kara
a7fa2baf8e jbd2: fix commit code to properly abort journal
We should really call journal_abort() and not __journal_abort_hard() in
case of errors.  The latter call does not record the error in the journal
superblock and thus filesystem won't be marked as with errors later (and
user could happily mount it without any warning).

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-17 18:49:58 -04:00
Mingming Cao
cd02ff0b14 jbd2: JBD_XXX to JBD2_XXX naming cleanup
change JBD_XXX macros to JBD2_XXX in JBD2/Ext4

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-10-17 18:49:58 -04:00
Mingming Cao
d802ffa885 JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4
Convert kmalloc to kzalloc() and get rid of the memset().

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2007-10-17 18:49:57 -04:00
Mingming Cao
2d917969bc JBD2: replace jbd_kmalloc with kmalloc directly.
This patch cleans up jbd_kmalloc and replace it with kmalloc directly

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2007-10-17 18:49:57 -04:00
Mingming Cao
af1e76d6b3 JBD2: jbd2 slab allocation cleanups
JBD2: Replace slab allocations with page allocations

JBD2 allocate memory for committed_data and frozen_data from slab. However
JBD2 should not pass slab pages down to the block layer. Use page allocator
pages instead. This will also prepare JBD for the large blocksize patchset.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2007-10-17 18:49:56 -04:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Mingming Cao
b38bd33a6b fix ext4/JBD2 build warnings
Looking at the current linus-git tree jbd_debug() define in
include/linux/jbd2.h

extern u8 journal_enable_debug;

#define jbd_debug(n, f, a...)                                           \
        do {                                                            \
                if ((n) <= journal_enable_debug) {                      \
                        printk (KERN_DEBUG "(%s, %d): %s: ",            \
                                __FILE__, __LINE__, __FUNCTION__);      \
                        printk (f, ## a);                               \
                }                                                       \
        } while (0)
> fs/ext4/inode.c: In function ‘ext4_write_inode’:
> fs/ext4/inode.c:2906: warning: comparison is always true due to limited
> range of data type
>
> fs/jbd2/recovery.c: In function ‘jbd2_journal_recover’:
> fs/jbd2/recovery.c:254: warning: comparison is always true due to
> limited range of data type
> fs/jbd2/recovery.c:257: warning: comparison is always true due to
> limited range of data type
>
> fs/jbd2/recovery.c: In function ‘jbd2_journal_skip_recovery’:
> fs/jbd2/recovery.c:301: warning: comparison is always true due to
> limited range of data type
>
Noticed all warnings are occurs when the debug level is 0. Then found
the "jbd2: Move jbd2-debug file to debugfs" patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0f49d5d019afa4e94253bfc92f0daca3badb990b

changed the jbd2_journal_enable_debug from int type to u8, makes the
jbd_debug comparision is always true when the debugging level is 0. Thus
the compile warning occurs.

Thought about changing the jbd2_journal_enable_debug data type back to
int, but can't, because the jbd2-debug is moved to debug fs, where
calling debugfs_create_u8() to create the debugfs entry needs the value
to be u8 type.

Even if we changed the data type back to int, the code is still buggy,
kernel should not print jbd2 debug message if the
jbd2_journal_enable_debug is set to 0. But this is not the case.

The fix is change the level of debugging to 1. The same should fixed in
ext3/JBD, but currently ext3 jbd-debug via /proc fs is broken, so we
probably should fix it all together.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:47 -07:00
Jose R. Santos
0f49d5d019 jbd2: Move jbd2-debug file to debugfs
The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it
incorrectly used create_proc_entry() instead of the sysctl routines, and
no proc entry was ever created.

Instead of fixing this we might as well move the jbd2-debug file to
debugfs which would be the preferred location for this kind of tunable.
The new location is now /sys/kernel/debug/jbd2/jbd2-debug.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-07-18 08:50:18 -04:00
Jose R. Santos
e23291b912 jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG
When the JBD code was forked to create the new JBD2 code base, the
references to CONFIG_JBD_DEBUG where never changed to
CONFIG_JBD2_DEBUG.  This patch fixes that.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-07-18 08:57:06 -04:00
vignesh babu
f482394ccb is_power_of_2(): jbd
Replace (n & (n-1)) in the context of power of 2 checks with
is_power_of_2().

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:48 -07:00
Jan Kara
f89b779508 jbd2 commit: fix transaction dropping
We have to check that also the second checkpoint list is non-empty before
dropping the transaction.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:34 -07:00