Commit Graph

148 Commits

Author SHA1 Message Date
Tao Ma 7335cd3b41 ext4: create a new function search_dir
search_dirblock is used to search a dir block, but the code is almost
the same for searching an inline dir.

So create a new fuction search_dir and let search_dirblock call it.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 14:05:59 -05:00
Tao Ma 3c47d54170 ext4: let add_dir_entry handle inline data properly
This patch let add_dir_entry handle the inline data case. So the
dir is initialized as inline dir first and then we can try to add
some files to it, when the inline space can't hold all the entries,
a dir block will be created and the dir entry will be moved to it.

Also for an inlined dir, "." and ".." are removed and we only use
4 bytes to store the parent inode number. These 2 entries will be
added when we convert an inline dir to a block-based one.

[ Folded in patch from Dan Carpenter to remove an unused variable. ]

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 14:05:59 -05:00
Tao Ma 978fef914a ext4: create __ext4_insert_dentry for dir entry insertion
The old add_dirent_to_buf handles all the work related to the
work of adding dir entry to a dir block. Now we have inline data,
so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry
that do the real work and let add_dirent_to_buf call them.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 14:05:58 -05:00
Tao Ma 226ba972b0 ext4: refactor __ext4_check_dir_entry() to accept start and size
The __ext4_check_dir_entry() function() is used to check whether the
de is over the block boundary.  Now with inline data, it could be
within the block boundary while exceeds the inode size.  So check this
function to check the overflow more precisely.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 14:05:58 -05:00
Tao Ma a774f9c20e ext4: make ext4_init_dot_dotdot for inline dir usage
Currently, the initialization of dot and dotdot are encapsulated in
ext4_mkdir and also bond with dir_block. So create a new function
named ext4_init_new_dir and the initialization is moved to
ext4_init_dot_dotdot. Now it will called either in the normal non-inline
case(rec_len of ".." will cover the whole block) or when we converting an
inline dir to a block(rec len of ".." will be the real length). The start
of the next entry is also returned for inline dir usage.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10 14:05:57 -05:00
Darrick J. Wong c6af8803cd ext4: don't verify checksums of dx non-leaf nodes during fallback scan
During a directory entry lookup of a hashed directory, if the
hash-based lookup functions fail and we fall back to a linear scan,
don't try to verify the dirent checksum on the internal nodes of the
hash tree because they don't store a checksum in a hidden dirent like
the leaf nodes do.

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-12 23:51:02 -05:00
Theodore Ts'o dffe9d8da7 ext4: do not use ext4_error() when there is no space in dir leaf for csum
If there is no space for a checksum in a directory leaf node,
previously we would use EXT4_ERROR_INODE() which would mark the file
system as inconsistent.  While it would be nice to use e2fsck -D, it
certainly isn't required, so just print a warning using
ext4_warning().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
2012-11-10 22:20:05 -05:00
Carlos Maiolino 6d1ab10e69 ext4: ext4_bread usage audit
When ext4_bread() returns NULL and err is set to zero, this means
there is no phyical block mapped to the specified logical block
number.  (Previous to commit 90b0a97323, err was uninitialized in this
case, which caused other problems.)

The directory handling routines use ext4_bread() in many places, the
fact that ext4_bread() now returns NULL with err set to zero could
cause problems since a number of these functions will simply return
the value of err if the result of ext4_bread() was the NULL pointer,
causing the caller of the function to think that the function was
successful.

Since directories should never contain holes, this case can only
happen if the file system is corrupted.  This commit audits all of the
callers of ext4_bread(), and makes sure they do the right thing if a
hole in a directory is found by ext4_bread().

Some ext4_bread() callers did not need any changes either because they
already had its own hole detector paths.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-27 09:31:33 -04:00
Bernd Schubert 6a08f447fa ext4: always set i_op in ext4_mknod()
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-09-26 21:24:57 -04:00
Anatol Pomozov c9b92530a7 ext4: make orphan functions be no-op in no-journal mode
Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-18 13:38:59 -04:00
Carlos Maiolino 90b0a97323 ext4: fix possible non-initialized variable in htree_dirblock_to_tree()
htree_dirblock_to_tree() declares a non-initialized 'err' variable,
which is passed as a reference to another functions expecting them to
set this variable with their error codes.

It's passed to ext4_bread(), which then passes it to ext4_getblk(). If
ext4_map_blocks() returns 0 due to a lookup failure, leaving the
ext4_getblk() buffer_head uninitialized, it will make ext4_getblk()
return to ext4_bread() without initialize the 'err' variable, and
ext4_bread() will return to htree_dirblock_to_tree() with this variable
still uninitialized.  htree_dirblock_to_tree() will pass this variable
with garbage back to ext4_htree_fill_tree(), which expects a number of
directory entries added to the rb-tree. which, in case, might return a
fake non-zero value due the garbage left in the 'err' variable, leading
the kernel to an Oops in ext4_dx_readdir(), once this is expecting a
filled rb-tree node, when in turn it will have a NULL-ed one, causing an
invalid page request when trying to get a fname struct from this NULL-ed
rb-tree node in this line:

fname = rb_entry(info->curr_node, struct fname, rb_hash);

The patch itself initializes the err variable in
htree_dirblock_to_tree() to avoid usage mistakes by the called
functions, and also fix ext4_getblk() to return a initialized 'err'
variable when ext4_map_blocks() fails a lookup.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-17 23:39:12 -04:00
Theodore Ts'o df981d03ee ext4: add max_dir_size_kb mount option
Very large directories can cause significant performance problems, or
perhaps even invoke the OOM killer, if the process is running in a
highly constrained memory environment (whether it is VM's with a small
amount of memory or in a small memory cgroup).

So it is useful, in cloud server/data center environments, to be able
to set a filesystem-wide cap on the maximum size of a directory, to
ensure that directories never get larger than a sane size.  We do this
via a new mount option, max_dir_size_kb.  If there is an attempt to
grow the directory larger than max_dir_size_kb, the system call will
return ENOSPC instead.

Google-Bug-Id: 6863013

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-08-17 09:48:17 -04:00
Linus Torvalds 173f865474 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
 "The usual collection of bug fixes and optimizations.  Perhaps of
  greatest note is a speed up for parallel, non-allocating DIO writes,
  since we no longer take the i_mutex lock in that case.

  For bug fixes, we fix an incorrect overhead calculation which caused
  slightly incorrect results for df(1) and statfs(2).  We also fixed
  bugs in the metadata checksum feature."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: undo ext4_calc_metadata_amount if we fail to claim space
  ext4: don't let i_reserved_meta_blocks go negative
  ext4: fix hole punch failure when depth is greater than 0
  ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
  ext4: weed out ext4_write_super
  ext4: remove unnecessary superblock dirtying
  ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
  ext4: remove useless marking of superblock dirty
  ext4: fix ext4 mismerge back in January
  ext4: remove dynamic array size in ext4_chksum()
  ext4: remove unused variable in ext4_update_super()
  ext4: make quota as first class supported feature
  ext4: don't take the i_mutex lock when doing DIO overwrites
  ext4: add a new nolock flag in ext4_map_blocks
  ext4: split ext4_file_write into buffered IO and direct IO
  ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
  ext4: fix out-of-date comments in extents.c
  ext4: use s_csum_seed instead of i_csum_seed for xattr block
  ext4: use proper csum calculation in ext4_rename
  ext4: fix overhead calculation used by ext4_statfs()
  ...
2012-07-27 20:52:25 -07:00
Artem Bityutskiy b50924c2c6 ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
The '__ext4_handle_dirty_metadata()' does not need the 'now' argument
anymore and we can kill it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-07-22 20:37:31 -04:00
Al Viro 8fc37ec54c don't expose I_NEW inodes via dentry->d_inode
d_instantiate(dentry, inode);
	unlock_new_inode(inode);

is a bad idea; do it the other way round...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23 00:00:58 +04:00
Al Viro ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro 00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Tao Ma ef58f69c3c ext4: use proper csum calculation in ext4_rename
In ext4_rename, when the old name is a dir, we need to
change ".." to its new parent and journal the change, so
with metadata_csum enabled, we have to re-calc the csum.

As the first block of the dir can be either a htree root
or a normal directory block and we have different csum
calculation for these 2 types, we have to choose the right
one in ext4_rename.

btw, it is found by xfstests 013.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2012-07-09 16:29:05 -04:00
Linus Torvalds 4edebed866 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull Ext4 updates from Theodore Ts'o:
 "The major new feature added in this update is Darrick J Wong's
  metadata checksum feature, which adds crc32 checksums to ext4's
  metadata fields.

  There is also the usual set of cleanups and bug fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
  ext4: hole-punch use truncate_pagecache_range
  jbd2: use kmem_cache_zalloc wrapper instead of flag
  ext4: remove mb_groups before tearing down the buddy_cache
  ext4: add ext4_mb_unload_buddy in the error path
  ext4: don't trash state flags in EXT4_IOC_SETFLAGS
  ext4: let getattr report the right blocks in delalloc+bigalloc
  ext4: add missing save_error_info() to ext4_error()
  ext4: add debugging trigger for ext4_error()
  ext4: protect group inode free counting with group lock
  ext4: use consistent ssize_t type in ext4_file_write()
  ext4: fix format flag in ext4_ext_binsearch_idx()
  ext4: cleanup in ext4_discard_allocated_blocks()
  ext4: return ENOMEM when mounts fail due to lack of memory
  ext4: remove redundundant "(char *) bh->b_data" casts
  ext4: disallow hard-linked directory in ext4_lookup
  ext4: fix potential integer overflow in alloc_flex_gd()
  ext4: remove needs_recovery in ext4_mb_init()
  ext4: force ro mount if ext4_setup_super() fails
  ext4: fix potential NULL dereference in ext4_free_inodes_counts()
  ext4/jbd2: add metadata checksumming to the list of supported features
  ...
2012-06-01 10:12:15 -07:00
Andreas Dilger 7e936b7372 ext4: disallow hard-linked directory in ext4_lookup
A hard-linked directory to its parent can cause the VFS to deadlock,
and is a sign of a corrupted file system.  So detect this case in
ext4_lookup(), before the rmdir() lockup scenario can take place.

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-05-28 17:02:25 -04:00
Linus Torvalds 26fe575028 vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:35 -07:00
Theodore Ts'o b09de7fa52 ext4: remove unnecessary check in add_dirent_to_buf()
None of this function callers ever pass in a NULL inode pointer, so
this check is unnecessary, and the else clause is dead code.  (This
change should make the code coverage people a little happier.  :-)

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-30 07:40:00 -04:00
Darrick J. Wong b0336e8d21 ext4: calculate and verify checksums of directory leaf blocks
Calculate and verify the checksums for directory leaf blocks
(i.e. blocks that only contain actual directory entries).  The
checksum lives in what looks to be an unused directory entry with a 0
name_len at the end of the block.  This scheme is not used for
internal htree nodes because the mechanism in place there only costs
one dx_entry, whereas the "empty" directory entry would cost two
dx_entries.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29 18:41:10 -04:00
Darrick J. Wong dbe8944404 ext4: Calculate and verify checksums for htree nodes
Calculate and verify the checksum for directory index tree (htree)
node blocks.  The checksum is stored in the last 4 bytes of the htree
block and requires the dx_entry array to stop 1 dx_entry short of the
end of the block.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29 18:39:10 -04:00
Darrick J. Wong a9c4731780 ext4: calculate and verify superblock checksum
Calculate and verify the superblock checksum.  Since the UUID and
block group number are embedded in each copy of the superblock, we
need only checksum the entire block.  Refactor some of the code to
eliminate open-coding of the checksum update call.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29 18:29:10 -04:00