This patch let add_dir_entry handle the inline data case. So the
dir is initialized as inline dir first and then we can try to add
some files to it, when the inline space can't hold all the entries,
a dir block will be created and the dir entry will be moved to it.
Also for an inlined dir, "." and ".." are removed and we only use
4 bytes to store the parent inode number. These 2 entries will be
added when we convert an inline dir to a block-based one.
[ Folded in patch from Dan Carpenter to remove an unused variable. ]
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The old add_dirent_to_buf handles all the work related to the
work of adding dir entry to a dir block. Now we have inline data,
so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry
that do the real work and let add_dirent_to_buf call them.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The __ext4_check_dir_entry() function() is used to check whether the
de is over the block boundary. Now with inline data, it could be
within the block boundary while exceeds the inode size. So check this
function to check the overflow more precisely.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Currently, the initialization of dot and dotdot are encapsulated in
ext4_mkdir and also bond with dir_block. So create a new function
named ext4_init_new_dir and the initialization is moved to
ext4_init_dot_dotdot. Now it will called either in the normal non-inline
case(rec_len of ".." will cover the whole block) or when we converting an
inline dir to a block(rec len of ".." will be the real length). The start
of the next entry is also returned for inline dir usage.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
During a directory entry lookup of a hashed directory, if the
hash-based lookup functions fail and we fall back to a linear scan,
don't try to verify the dirent checksum on the internal nodes of the
hash tree because they don't store a checksum in a hidden dirent like
the leaf nodes do.
Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If there is no space for a checksum in a directory leaf node,
previously we would use EXT4_ERROR_INODE() which would mark the file
system as inconsistent. While it would be nice to use e2fsck -D, it
certainly isn't required, so just print a warning using
ext4_warning().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
When ext4_bread() returns NULL and err is set to zero, this means
there is no phyical block mapped to the specified logical block
number. (Previous to commit 90b0a97323, err was uninitialized in this
case, which caused other problems.)
The directory handling routines use ext4_bread() in many places, the
fact that ext4_bread() now returns NULL with err set to zero could
cause problems since a number of these functions will simply return
the value of err if the result of ext4_bread() was the NULL pointer,
causing the caller of the function to think that the function was
successful.
Since directories should never contain holes, this case can only
happen if the file system is corrupted. This commit audits all of the
callers of ext4_bread(), and makes sure they do the right thing if a
hole in a directory is found by ext4_bread().
Some ext4_bread() callers did not need any changes either because they
already had its own hole detector paths.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
htree_dirblock_to_tree() declares a non-initialized 'err' variable,
which is passed as a reference to another functions expecting them to
set this variable with their error codes.
It's passed to ext4_bread(), which then passes it to ext4_getblk(). If
ext4_map_blocks() returns 0 due to a lookup failure, leaving the
ext4_getblk() buffer_head uninitialized, it will make ext4_getblk()
return to ext4_bread() without initialize the 'err' variable, and
ext4_bread() will return to htree_dirblock_to_tree() with this variable
still uninitialized. htree_dirblock_to_tree() will pass this variable
with garbage back to ext4_htree_fill_tree(), which expects a number of
directory entries added to the rb-tree. which, in case, might return a
fake non-zero value due the garbage left in the 'err' variable, leading
the kernel to an Oops in ext4_dx_readdir(), once this is expecting a
filled rb-tree node, when in turn it will have a NULL-ed one, causing an
invalid page request when trying to get a fname struct from this NULL-ed
rb-tree node in this line:
fname = rb_entry(info->curr_node, struct fname, rb_hash);
The patch itself initializes the err variable in
htree_dirblock_to_tree() to avoid usage mistakes by the called
functions, and also fix ext4_getblk() to return a initialized 'err'
variable when ext4_map_blocks() fails a lookup.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Very large directories can cause significant performance problems, or
perhaps even invoke the OOM killer, if the process is running in a
highly constrained memory environment (whether it is VM's with a small
amount of memory or in a small memory cgroup).
So it is useful, in cloud server/data center environments, to be able
to set a filesystem-wide cap on the maximum size of a directory, to
ensure that directories never get larger than a sane size. We do this
via a new mount option, max_dir_size_kb. If there is an attempt to
grow the directory larger than max_dir_size_kb, the system call will
return ENOSPC instead.
Google-Bug-Id: 6863013
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pull ext4 updates from Ted Ts'o:
"The usual collection of bug fixes and optimizations. Perhaps of
greatest note is a speed up for parallel, non-allocating DIO writes,
since we no longer take the i_mutex lock in that case.
For bug fixes, we fix an incorrect overhead calculation which caused
slightly incorrect results for df(1) and statfs(2). We also fixed
bugs in the metadata checksum feature."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: undo ext4_calc_metadata_amount if we fail to claim space
ext4: don't let i_reserved_meta_blocks go negative
ext4: fix hole punch failure when depth is greater than 0
ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
ext4: weed out ext4_write_super
ext4: remove unnecessary superblock dirtying
ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
ext4: remove useless marking of superblock dirty
ext4: fix ext4 mismerge back in January
ext4: remove dynamic array size in ext4_chksum()
ext4: remove unused variable in ext4_update_super()
ext4: make quota as first class supported feature
ext4: don't take the i_mutex lock when doing DIO overwrites
ext4: add a new nolock flag in ext4_map_blocks
ext4: split ext4_file_write into buffered IO and direct IO
ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
ext4: fix out-of-date comments in extents.c
ext4: use s_csum_seed instead of i_csum_seed for xattr block
ext4: use proper csum calculation in ext4_rename
ext4: fix overhead calculation used by ext4_statfs()
...
The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
anymore and we can kill it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In ext4_rename, when the old name is a dir, we need to
change ".." to its new parent and journal the change, so
with metadata_csum enabled, we have to re-calc the csum.
As the first block of the dir can be either a htree root
or a normal directory block and we have different csum
calculation for these 2 types, we have to choose the right
one in ext4_rename.
btw, it is found by xfstests 013.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Pull Ext4 updates from Theodore Ts'o:
"The major new feature added in this update is Darrick J Wong's
metadata checksum feature, which adds crc32 checksums to ext4's
metadata fields.
There is also the usual set of cleanups and bug fixes."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: hole-punch use truncate_pagecache_range
jbd2: use kmem_cache_zalloc wrapper instead of flag
ext4: remove mb_groups before tearing down the buddy_cache
ext4: add ext4_mb_unload_buddy in the error path
ext4: don't trash state flags in EXT4_IOC_SETFLAGS
ext4: let getattr report the right blocks in delalloc+bigalloc
ext4: add missing save_error_info() to ext4_error()
ext4: add debugging trigger for ext4_error()
ext4: protect group inode free counting with group lock
ext4: use consistent ssize_t type in ext4_file_write()
ext4: fix format flag in ext4_ext_binsearch_idx()
ext4: cleanup in ext4_discard_allocated_blocks()
ext4: return ENOMEM when mounts fail due to lack of memory
ext4: remove redundundant "(char *) bh->b_data" casts
ext4: disallow hard-linked directory in ext4_lookup
ext4: fix potential integer overflow in alloc_flex_gd()
ext4: remove needs_recovery in ext4_mb_init()
ext4: force ro mount if ext4_setup_super() fails
ext4: fix potential NULL dereference in ext4_free_inodes_counts()
ext4/jbd2: add metadata checksumming to the list of supported features
...
A hard-linked directory to its parent can cause the VFS to deadlock,
and is a sign of a corrupted file system. So detect this case in
ext4_lookup(), before the rmdir() lockup scenario can take place.
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
This allows comparing hash and len in one operation on 64-bit
architectures. Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.
The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer. This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
None of this function callers ever pass in a NULL inode pointer, so
this check is unnecessary, and the else clause is dead code. (This
change should make the code coverage people a little happier. :-)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Calculate and verify the checksums for directory leaf blocks
(i.e. blocks that only contain actual directory entries). The
checksum lives in what looks to be an unused directory entry with a 0
name_len at the end of the block. This scheme is not used for
internal htree nodes because the mechanism in place there only costs
one dx_entry, whereas the "empty" directory entry would cost two
dx_entries.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Calculate and verify the checksum for directory index tree (htree)
node blocks. The checksum is stored in the last 4 bytes of the htree
block and requires the dx_entry array to stop 1 dx_entry short of the
end of the block.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Calculate and verify the superblock checksum. Since the UUID and
block group number are embedded in each copy of the superblock, we
need only checksum the entire block. Refactor some of the code to
eliminate open-coding of the checksum update call.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Define flags and change structure definitions to allow checksumming of
ext4 metadata.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>