Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This fixes a leak of blocks in an inode prealloc list if device failures
cause ext4_mb_mark_diskspace_used() to fail.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There is a potential race when a transaction is committing right when
the file system is being umounting. This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.
The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size. Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped. Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled,
will cause ext4 to be used for either ext2 or ext3 file system mounts
when ext2 or ext3 is not enabled in the configuration.
This allows minimalist kernel fanatics to drop to file system drivers
from their compiled kernel with out losing functionality.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space. Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes. But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().
1. Discard inode PAs of orig and donor inodes with
ext4_discard_preallocations() in ext4_move_extents().
orig : [ DATA1 ]
donor: [ DATA2 ]
2. While data blocks are exchanging between orig and donor inodes, new
inode PAs is created to orig by other process's block allocation.
(Since there are semaphore gaps in ext4_move_extents().) And new
inode PAs is used partially (2-1).
2-1 Create new inode PAs to orig inode
orig : [ DATA1 | used PA1 | free PA1 ]
donor: [ DATA2 ]
3. Donor inode which has old orig inode's blocks is deleted after
EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap
corresponds to old orig inode's blocks are freed.
3-1 After EXT4_IOC_MOVE_EXT finished
orig : [ DATA2 | free PA1 ]
donor: [ DATA1 | used PA1 ]
3-2 Delete donor inode
orig : [ DATA2 | free PA1 ]
donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]
4. The double-free of blocks is occurred, when close() is called to
orig inode. Because ext4_discard_preallocations() for orig inode
frees used PA1 and free PA1, though used PA1 is already freed in 3.
4-1 Double-free of blocks is occurred
orig : [ DATA2 | FREE SPACE(free PA1) ]
donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The block validity framework does a more comprehensive set of checks,
and it saves object code space to use the ext4_data_block_valid() than
the limited open-coded version that had been in ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add the facility for ext4_forget() to be called from
ext4_free_blocks(). This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.
Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion. As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks. With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much. So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Convert the last two callers of ext4_journal_forget() to use
ext4_forget() instead, and then fold ext4_journal_forget() into
ext4_forget(). This reduces are code complexity and shortens our call
stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The only caller of ext4_journal_revoke() is ext4_forget(), so we can
fold ext4_journal_revoke() into ext4_forget() to simplify the code and
shorten the call stack.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext4_forget() function better belongs in ext4_jbd2.c. This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.
Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues. Make
this mount-time optional, and default it to off until we know
that things are working out OK.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
We don't to issue an I/O barrier on an error or if we force commit
because we are doing data journaling.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
The block validity checks used by ext4_data_block_valid() wasn't
correctly written to check file systems with the meta_bg feature. Fix
this.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
The number of old-style block group descriptor blocks is
s_meta_first_bg when the meta_bg feature flag is set.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org