Pull ext2, ext3 and quota fixes from Jan Kara:
"Interesting bits are:
- removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
quota code does not need i_mutex anymore in any unusual way.
- backport (from ext4) of a fix of a checkpointing bug (missing cache
flush) that could lead to fs corruption on power failure
The rest are just random small fixes & cleanups."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: trivial fix to comment for ext2_free_blocks
ext2: remove the redundant comment for ext2_export_ops
ext3: return 32/64-bit dir name hash according to usage type
quota: Get rid of nested I_MUTEX_QUOTA locking subclass
quota: Use precomputed value of sb_dqopt in dquot_quota_sync
ext2: Remove i_mutex use from ext2_quota_write()
reiserfs: Remove i_mutex use from reiserfs_quota_write()
ext4: Remove i_mutex use from ext4_quota_write()
ext3: Remove i_mutex use from ext3_quota_write()
quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
jbd: Write journal superblock with WRITE_FUA after checkpointing
jbd: protect all log tail updates with j_checkpoint_mutex
jbd: Split updating of journal superblock and marking journal empty
ext2: do not register write_super within VFS
ext2: Remove s_dirt handling
ext2: write superblock only once on unmount
ext3: update documentation with barrier=1 default
ext3: remove max_debt in find_group_orlov()
jbd: Refine commit writeout logic
The comment is outdated and isn't particularly informative anyway - NULL
meaning the default behavior is very common in kernel. And we really set about
half of entries. So remove the whole comment for ext2_export_ops.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We don't need i_mutex in ext2_quota_write() because writes to quota file
are serialized by dqio_mutex anyway. Changes to quota files outside of quota
code are forbidded and enforced by NOATIME and IMMUTABLE bits.
Signed-off-by: Jan Kara <jack@suse.cz>
Jan Kara removed 'sb->s_dirt' VFS flag references, so we do not need to
register the ext2 'ext2_write_super()' method in the VFS superblock operations,
because 'sb->s_dirt' won't be ever set to 1 and VFS won't ever call
'->write_super()' anyway. Thus, remove the method.
Tested using xfstests.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Places which modify superblock feature / state fields mark the superblock
buffer dirty so it is written out by flusher thread. Thus there's no need to
set s_dirt there.
The only other fields changing in the superblock are the numbers of free
blocks, free inodes and s_wtime. There's no real need to write (or even
compute) these periodically. Free blocks / inodes counters are recomputed on
every mount from group counters anyway and value of s_wtime is only
informational and imprecise anyway. So it should be enough to write these
opportunistically on mount, remount, umount, and sync_fs times.
Signed-off-by: Jan Kara <jack@suse.cz>
Currently on unmount if we are mounted R/W, we first write the superblock to
the media if it is dirty, and then write it again, which is not optimal. This
patch makes ext2 write the superblock on unmount less times.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
New field of struct super_block - ->s_max_links. Maximal allowed
value of ->i_nlink or 0; in the latter case all checks still need
to be done in ->link/->mkdir/->rename instances. Note that this
limit applies both to directoris and to non-directories.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed
Delete any instances of include module.h that were not strictly
required. In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When ext2 mounts a filesystem, it attempts to set the block device
blocksize with a call to sb_set_blocksize, which can fail for
several reasons. The current failure message in ext2 prints:
EXT2-fs (loop1): error: blocksize is too small
which is not correct in all cases. This can be demonstrated
by creating a filesystem with
# mkfs.ext2 -b 8192
on a 4k page system, and attempting to mount it.
Change the error message to a more generic:
EXT2-fs (loop1): bad blocksize 8192
to match the error message in ext3.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Reviewed-by: Coly Li <bosong.ly@taobao.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
ext3: Remove redundant unlikely()
ext2: Remove redundant unlikely()
ext3: speed up file creates by optimizing rec_len functions
ext2: speed up file creates by optimizing rec_len functions
ext3: Add more journal error check
ext3: Add journal error check in resize.c
quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
ext3: Add FITRIM handling
ext3: Add batched discard support for ext3
ext3: Add journal error check into ext3_rename()
ext3: Use search_dirblock() in ext3_dx_find_entry()
ext3: Avoid uninitialized memory references with a corrupted htree directory
ext3: Return error code from generic_check_addressable
ext3: Add journal error check into ext3_delete_entry()
ext3: Add error check in ext3_mkdir()
fs/ext3/super.c: Use printf extension %pV
fs/ext2/super.c: Use printf extension %pV
ext3: don't update sb journal_devnum when RO dev
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
It's pointless - we *do* have busy inodes (root directory,
for one), so that call will fail and attempt to change
XIP flag will be ignored.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs()
ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(),
ext2_fill_super() and ext2_remount() are protected against each other by
the struct super_block s_umount rw semaphore. The call in ext2_write_inode()
could only protect the modification of the ext2_sb_info through
ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount().
ext2_fill_super() and ext2_put_super() can be left out because you need a
valid filesystem reference in all three cases, which you do not have when
you are one of these functions.
If the BKL is only protecting the modification of the ext2_sb_info it can
safely be removed since this is protected by the struct ext2_sb_info s_lock.
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.
I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.
do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.
Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.
[arnd: do not add the BKL to those file systems that already
don't use it elsewhere]
Signed-off-by: Jan Blunck <jblunck@infradead.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Follow the dquot_* style used elsewhere in dquot.c.
[Jan Kara: Fixed up missing conversion of ext2]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>