This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/ext4/super.c: In function 'ext4_fill_super':
fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use
in this function)
fs/ext4/super.c:2226: error: (Each undeclared identifier is reported
only once
fs/ext4/super.c:2226: error: for each function it appears in.)
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the journal doesn't abort when it gets an IO error in file data
blocks, the file data corruption will spread silently. Because
most of applications and commands do buffered writes without fsync(),
they don't notice the IO error. It's scary for mission critical
systems. On the other hand, if the journal aborts whenever it gets
an IO error in file data blocks, the system will easily become
inoperable. So this patch introduces a filesystem option to
determine whether it aborts the journal or just call printk() when
it gets an IO error in file data.
If you mount an ext4 fs with data_err=abort option, it aborts on file
data write error. If you mount it with data_err=ignore, it doesn't
abort, just call printk(). data_err=ignore is the default.
Here is the corresponding patch of the ext3 version:
http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If the journal has aborted due to a checkpointing failure, we
have to keep the contents of the journal space. Otherwise, the
filesystem will lose uncheckpointed metadata completely and
become inconsistent. To avoid this, we need to keep needs_recovery
flag if checkpoint has failed.
With this patch, ext4_put_super() detects a checkpointing failure
from the return value of journal_destroy(), then it invokes
ext4_abort() to make the filesystem read only and keep
needs_recovery flag. Errors from jbd2_journal_flush() are also
handled by this patch in some places.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The ext4 filesystem is getting stable enough that it's time to drop
the "dev" prefix. Also remove the requirement for the TEST_FILESYS
flag.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
While reading code I noticed that ext4_put_super() dirties the
superblock bh twice. It is always done in ext4_commit_super()
too. Remove the redundant dirty operation.
Should be a nop semantically.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
ext4_xattr_set_handle() eventually ends up calling
ext4_mark_inode_dirty() which tries to expand the inode by shifting
the EAs. This leads to the xattr_sem being downed again and leading
to a deadlock.
This patch makes sure that if ext4_xattr_set_handle() is in the
call-chain, ext4_mark_inode_dirty() will not expand the inode.
Signed-off-by: Kalpak Shah <kalpak.shah@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This debugging markers are designed to debug problems such as the
random filesystem latency problems reported by Arjan.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This fixes a bug which caused on-line resizing of filesystems with a
1k blocksize to fail. The root cause of this bug was the fact that if
an uninitalized bitmap block gets read in by userspace (which
e2fsprogs does try to avoid, but can happen when the blocksize is less
than the pagesize and an adjacent blocks is read into memory)
ext4_read_block_bitmap() was erroneously depending on the buffer
uptodate flag to decide whether it needed to initialize the bitmap
block in memory --- i.e., to set the standard set of blocks in use by
a block group (superblock, bitmaps, inode table, etc.). Essentially,
ext4_read_block_bitmap() assumed it was the only routine that might
try to read a block containing a block bitmap, which is simply not
true.
To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
must always initialize uninitialized bitmap blocks. Once a block or
inode is allocated out of that bitmap, it will be marked as
initialized in the block group descriptor, so in general this won't
result any extra unnecessary work.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
With modern hard drives, reading 64k takes roughly the same time as
reading a 4k block. So request readahead for adjacent inode table
blocks to reduce the time it takes when iterating over directories
(especially when doing this in htree sort order) in a cold cache case.
With this patch, the time it takes to run "git status" on a kernel
tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches"
is reduced by 21%.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Previously mballoc created a separate set of functions for each proc
file. This combines the tunables into a single set of functions which
gets used for all of the per-superblock proc files, saving
approximately 2k of compiled object code.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
...and into the core setup/teardown code in fs/ext4/super.c so that
other parts of ext4 can define tuning parameters.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This is a port of a patch from Linus which fixes a 200+ byte stack
usage problem in ext4_get_parent().
It's more efficient to pass down only the actual parts of the dentry
that matter: the parent inode and the name, instead of allocating a
struct dentry on the stack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
lg_prealloc_list seems to cry out for a per-cpu data structure; on a large
smp system I think this should be better. I've lightly tested this change
on a 4-cpu system.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Pick an ioctl number for EXT4_IOC_MIGRATE that won't conflict with
other ext4 ioctl's. Since there haven't been any major userspace
users of this ioctl, we can afford to change this now, to avoid
potential problems later.
Also, reorder the ioctl numbers in ext4.h to avoid this sort of
mistake in the future.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch hooks the ext3 to ext4 migrate interface to
EXT4_IOC_SETFLAGS ioctl. The userspace interface is via chattr +e. We
only allow setting extent flags. Clearing extent flag (migrating from
ext4 to ext3) is not supported.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The migrate ioctl writes to the filsystem, so we need to elevate the
write count.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If there group descriptors are corrupted we need unlock the block
group lock before returning from the function; else we will oops when
freeing a spinlock which is still being held.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Calculate the journal device name once and stash it away in the
journal_s structure. This avoids needing to call bdevname()
everywhere and reduces stack usage by not needing to allocate an
on-stack buffer. In addition, we eliminate the '/' that can appear in
device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that
can cause problems when creating proc directory names, and include the
inode number to support ocfs2 which creates multiple journals with
different inode numbers.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4 creates per-suberblock directory in /proc/ext4/ . Name used as
basis is taken from bdevname, which, surprise, can contain slash.
However, proc while allowing to use proc_create("a/b", parent) form of
PDE creation, assumes that parent/a was already created.
bdevname in question is 'cciss/c0d0p9', directory is not created and all
this stuff goes directly into /proc (which is real bug).
Warning comes when _second_ partition is mounted.
http://bugzilla.kernel.org/show_bug.cgi?id=11321
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This fixes a bug which prevented the newly created inodes after a
resize from being used on filesystems with flex_bg.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Note: some people thinks this represents a security bug, since it
might make the system go away while it is printing a large number of
console messages, especially if a serial console is involved. Hence,
it has been assigned CVE-2008-3528, but it requires that the attacker
either has physical access to your machine to insert a USB disk with a
corrupted filesystem image (at which point why not just hit the power
button), or is otherwise able to convince the system administrator to
mount an arbitrary filesystem image (at which point why not just
include a setuid shell or world-writable hard disk device file or some
such). Me, I think they're just being silly. --tytso
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Cc: Eugene Teo <eugeneteo@kernel.sg>