Commit Graph

15143 Commits

Author SHA1 Message Date
Aneesh Kumar K.V b6a758ec3a ext4: move ext4_mb_init_group() function earlier in the mballoc.c
This moves the function around so that it can be called from
ext4_mb_load_buddy().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09 23:47:46 -04:00
Frank Mayhar 91ac6f4331 ext4: Make non-journal fsync work properly
Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
mode:  If we're not using a journal, ext4_write_inode() now calls
ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
with a new "do_sync" parameter.  If that parameter is nonzero _and_ we're
not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
instead of ext4_handle_dirty_metadata().

This problem was found in power-fail testing, checking the amount of
loss of files and blocks after a power failure when using fsync() and
when not using fsync().  It turned out that using fsync() was actually
worse than not doing so, possibly because it increased the likelihood
that the inodes would remain unflushed and would therefore be lost at
the power failure.

Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09 22:33:47 -04:00
Theodore Ts'o fe188c0e08 ext4: Assure that metadata blocks are written during fsync in no journal mode
When there is no journal present, we must attach buffer heads
associated with extent tree and indirect blocks to the inode's
mapping->private_list via mark_buffer_dirty_inode() so that
ext4_sync_file() --- which is called to service fsync() and
fdatasync() system calls --- can write out the inode's metadata blocks
by calling sync_mapping_buffers().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-12 13:41:55 -04:00
Theodore Ts'o c7acb4c166 ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()
When ext4 is using a journal, a metadata block which is deallocated
must be passed into the journal layer so it can be dropped from the
current transaction and/or revoked.  This is done by calling the
functions ext4_journal_forget() and ext4_journal_revoke(), which call
jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.

Since the jbd2_journal_forget() and jbd2_journal_revoke() call
bforget(), if ext4 is not using a journal, ext4_journal_forget() and
ext4_journal_revoke() must call bforget() to avoid a dirty metadata
block overwriting a block after it has been reallocated and reused for
another inode's data block.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09 21:32:41 -04:00
Theodore Ts'o 80e42468d6 ext4: print more sysadmin-friendly message in check_block_validity()
Drop the WARN_ON(1), as he stack trace is not appropriate, since it is
triggered by file system corruption, and it misleads users into
thinking there is a kernel bug.  In addition, change the message
displayed by ext4_error() to make it clear that this is a file system
corruption problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-08 08:21:26 -04:00
Aneesh Kumar K.V a827eaffff ext4: Take page lock before looking at attached buffer_heads flags
In order to check whether the buffer_heads are mapped we need to hold
page lock. Otherwise a reclaim can cleanup the attached buffer_heads.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-09 22:36:03 -04:00
Akira Fujita 44fc48f704 ext4: Fix small typo for move_extent_per_page()
This function means moving extents every page, so change its name from
move_exgtent_par_page().

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 23:12:41 -04:00
Akira Fujita 8d6669133d ext4: Return exchanged blocks count to user space in failure
Return exchanged blocks count (moved_len) to user space,
if ext4_move_extents() failed on the way.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 22:46:29 -04:00
Akira Fujita daea696dba ext4: Remove unneeded BUG_ON() in ext4_move_extents()
The ext4_move_extents() functions checks with BUG_ON() whether the
exchanged blocks count accords with request blocks count.  But, if the
target range (orig_start + len) includes sparse block(s), 'moved_len'
(exchanged blocks count) does not agree with 'len' (request blocks
count), since sparse block is not counted in 'moved_len'.  This causes
us to hit the BUG_ON(), even though the function succeeded.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 22:11:55 -04:00
Akira Fujita 70d5d3dcea ext4: Fix wrong comparisons in mext_check_arguments()
The mext_check_arguments() function in move_extents.c has wrong
comparisons.  orig_start which is passed from user-space is block
unit, but i_size of inode is byte unit, therefore the checks do not
work fine.  This mis-check leads to the overflow of 'len' and then
hits BUG_ON() in ext4_move_extents().  The patch fixes this issue.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Reviewed-by: Greg Freemyer <greg.freemyer@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-16 14:28:22 -04:00
Christoph Hellwig 5f3481e9a8 ext4: fix cache flush in ext4_sync_file
We need to flush the write cache unconditionally in ->fsync, otherwise
writes into already allocated blocks can get lost.  Writes into fully
allocated files are very common when using disk images for
virtualization, and without this fix can easily lose data after
an fdatasync, which is the typical implementation for a cache flush on
the virtual drive.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 21:42:42 -04:00
Theodore Ts'o d0646f7b63 ext4: Remove journal_checksum mount option and enable it by default
There's no real cost for the journal checksum feature, and we should
make sure it is enabled all the time.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 12:50:43 -04:00
Tobias Klauser 7f1346a9de ext4: Declare seq_operations and file_operations structures as const
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05 09:28:54 -04:00
Theodore Ts'o b3a3ca8ca0 ext4: Add new tracepoint: trace_ext4_da_write_pages()
Add a new tracepoint which shows the pages that will be written using
write_cache_pages() by ext4_da_writepages().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-31 23:13:11 -04:00
Theodore Ts'o de89de6e0c ext4: Restore wbc->range_start in ext4_da_writepages()
To solve a lock inversion problem, we implement part of the
range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
for more details.)

As part of that change wbc->range_start was modified by ext4's
writepages function, which causes its callers to get confused since
they aren't expecting the filesystem to modify it.  The simplest fix
is to save and restore wbc->range_start in ext4_da_writepages.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-31 17:00:59 -04:00
Theodore Ts'o b05ab1dc37 ext4: Limit number of links that can be created by ext4_link()
In ext4_link we need to check using EXT4_LINK_MAX, and not
EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of
regular files, and not directories.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-29 21:08:08 -04:00
Aneesh Kumar K.V 2c94eb86c6 ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories
Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new
parent directory without running into the EXT4_LINK_MAX limit.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-28 21:43:15 -04:00
Theodore Ts'o 55ad63bf3a ext4: fix extent sanity checking code with AGGRESSIVE_TEST
The extents sanity-checking code depends on the ext4_ext_space_*()
functions returning the maximum alloable size for eh_max; however,
when the debugging #ifdef AGGRESSIVE_TEST is enabled to test the
extent tree handling code, this prevents a normally created ext4
filesystem from being mounted with the errors:

Aug 26 15:43:50 bsd086 kernel: [   96.070277] EXT4-fs error (device sda8): ext4_ext_check_inode: bad header/extent in inode #8: too large eh_max - magic f30a, entries 1, max 4(3), depth 0(0)
Aug 26 15:43:50 bsd086 kernel: [   96.070526] EXT4-fs (sda8): no journal found

Bug reported by Akira Fujita.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-28 10:40:33 -04:00
Eric Sandeen a36b44988c ext4: use ext4_grpblk_t more extensively
unsigned  short is potentially too small to track blocks within
a group; today it is safe due to restrictions in e2fsprogs but
we have _lo / _hi bits for group blocks with the intent to go
up to 32 bits, so clean this up now.

There are many more places where we use unsigned/int/unsigned int
to contain a group block but this should at least fix all the
short types.

I added a few comments to the struct ext4_group_info definition
as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-25 22:36:45 -04:00
Eric Sandeen 1927805e65 ext4: use variables not types in sizeofs() for allocations
Precursor to changing some types; to keep things in sync, it 
seems better to allocate/memset based on the size of the 
variables we are using rather than on some disconnected 
basic type like "unsigned short"

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
2009-08-25 22:36:25 -04:00
Aneesh Kumar K.V a8526e84ac ext4: Add missing unlock_new_inode() call in extent migration code
We need to unlock the new inode before iput.  This patch fixes the
following warning when calling chattr +e to migrate a file to use
extents.  It also fixes problems in when e4defrag attempts to
defragment an inode.

[  470.400044] ------------[ cut here ]------------
[  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
[  470.400072] Hardware name: N/A
.....
...
[  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
[  470.400359] Call Trace:
[  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
[  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
[  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
[  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
[  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
[  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
[  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
[  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
[  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
[  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
[  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
[  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
[  470.400557] ---[ end trace ab85723542352dac ]---

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-25 22:36:05 -04:00
Eric Sandeen a13fb1a453 ext4: Add feature set check helper for mount & remount paths
A user reported that although his root ext4 filesystem was mounting
fine, other filesystems would not mount, with the:

"Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"

error on his 32-bit box built without CONFIG_LBDAF.  This is because
the test at mount time for this situation was not being re-checked
on remount, and the normal boot process makes an ro->rw transition,
so this was being missed.

Refactor to make a common helper function to test the filesystem
features against the type of mount request (RO vs. RW) so that we 
stay consistent.

Addresses Red-Hat-Bugzilla: #517650

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-18 00:20:23 -04:00
Eric Sandeen 38877f4e8d simplify some logic in ext4_mb_normalize_request
While reading through some of the mballoc code it seems that a couple
spots in the size normalization function could be streamlined.

The test for non-overlapping PAs can be or'd for the start & end
conditions, and the tests for adjacent PAs can be else-if'd - 
it's essentially independently testing:

	if (A + B <= C)
		...
	if (A > C)
		...

These cannot both be true so it seems like the else-if might
be slightly more efficient and/or informative.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-17 23:55:24 -04:00
Eric Sandeen 0373130d5b ext4: open-code ext4_mb_update_group_info
ext4_mb_update_group_info is only called in one place, and it's
extremely simple.  There's no reason to have it in a separate function
in a separate file as far as I can tell, it just obfuscates what's
really going on.

Perhaps it was intended to keep the grp->bb_* manipulation local to
mballoc.c but we're already accessing other grp-> fields in balloc.c
directly so this seems ok.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-17 23:51:29 -04:00
Eric Sandeen bf43d84b18 ext4: reject too-large filesystems on 32-bit kernels
ext4 will happily mount a > 16T filesystem on a 32-bit box, but
this is not safe; writes to the block device will wrap past 16T
and the page cache can't index past 16T (232 index * 4k pages).

Adding another test to the existing "too many sectors" test
should do the trick.

Add a comment, a relevant return value, and fix the reference
to the CONFIG_LBD(AF) option as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-17 23:48:51 -04:00