Commit Graph

190758 Commits

Author SHA1 Message Date
Dmitry Monakhov 84061e07c5 ext4: init statistics after journal recovery
Currently block/inode/dir counters initialized before journal was
recovered. In fact after journal recovery this info will probably
change. And freeblocks it critical for correct delalloc mode
accounting.

https://bugzilla.kernel.org/show_bug.cgi?id=15768

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 08:00:00 -04:00
Dmitry Monakhov d17413c08c ext4: clean up inode bitmaps manipulation in ext4_free_inode
- Reorganize locking scheme to batch two atomic operation in to one.
  This also allow us to state what healthy group must obey following rule
  ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
- Fix possible undefined pointer dereference.
- Even if group descriptor stats aren't accessible we have to update
  inode bitmaps.
- Move non-group members update out of group_lock.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 07:00:00 -04:00
Dmitry Monakhov 21ca087a38 ext4: Do not zero out uninitialized extents beyond i_size
The extents code will sometimes zero out blocks and mark them as
initialized instead of splitting an extent into several smaller ones.
This optimization however, causes problems if the extent is beyond
i_size because fsck will complain if there are uninitialized blocks
after i_size as this can not be distinguished from an inode that has
an incorrect i_size field.

https://bugzilla.kernel.org/show_bug.cgi?id=15742

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 06:00:00 -04:00
Theodore Ts'o c35a56a090 jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop()
One of the most contended locks in the jbd2 layer is j_state_lock when
running dbench.  This is especially true if using the real-time kernel
with its "sleeping spinlocks" patch that replaces spinlocks with
priority inheriting mutexes --- but it also shows up on large SMP
benchmarks.

Thanks to John Stultz for pointing this out.

Reviewed by Mingming Cao and Jan Kara.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 05:00:00 -04:00
Eric Sandeen c445e3e0a5 ext4: don't scan/accumulate more pages than mballoc will allocate
There was a bug reported on RHEL5 that a 10G dd on a 12G box
had a very, very slow sync after that.

At issue was the loop in write_cache_pages scanning all the way
to the end of the 10G file, even though the subsequent call
to mpage_da_submit_io would only actually write a smallish amt; then
we went back to the write_cache_pages loop ... wasting tons of time
in calling __mpage_da_writepage for thousands of pages we would
just revisit (many times) later.

Upstream it's not such a big issue for sys_sync because we get
to the loop with a much smaller nr_to_write, which limits the loop.

However, talking with Aneesh he realized that fsync upstream still
gets here with a very large nr_to_write and we face the same problem.

This patch makes mpage_add_bh_to_extent stop the loop after we've
accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
causes the write_cache_pages loop to break.

Repeating the test with a dirty_ratio of 80 (to leave something for
fsync to do), I don't see huge IO performance gains, but the reduction
in cpu usage is striking: 80% usage with stock, and 2% with the
below patch.  Instrumenting the loop in write_cache_pages clearly
shows that we are wasting time here.

Eventually we need to change mpage_da_map_pages() also submit its I/O
to the block layer, subsuming mpage_da_submit_io(), and then change it
call ext4_get_blocks() multiple times.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 04:00:00 -04:00
Eric Sandeen a30eec2a86 ext4: stop issuing discards if not supported by device
Turn off issuance of discard requests if the device does
not support it - similar to the action we take for barriers.
This will save a little computation time if a non-discardable
device is mounted with -o discard, and also makes it obvious
that it's not doing what was asked at mount time ...

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 03:00:00 -04:00
Eric Sandeen 6b0310fbf0 ext4: don't return to userspace after freezing the fs with a mutex held
ext4_freeze() used jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then returns to userspace.  The
kernel does not like this:

================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0

Use vfs_check_frozen() added to ext4_journal_start_sb() and
ext4_force_commit() instead.

Addresses-Red-Hat-Bugzilla: #568503

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 02:00:00 -04:00
Dmitry Monakhov 256a453546 ext4: symlink must be handled via filesystem specific operation
generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext4_setattr.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 02:00:00 -04:00
Eric Sandeen 42007efd56 ext4: check s_log_groups_per_flex in online resize code
If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
and every other access to this first tests s_log_groups_per_flex;
same thing needs to happen in resize or we'll wander off into
a null pointer when doing an online resize of the file system.

Thanks to Christoph Biedl, who came up with the trivial testcase:

# truncate --size 128M fsfile
# mkfs.ext3 -F fsfile
# tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile
# e2fsck -yDf -C0 fsfile
# truncate --size 132M fsfile
# losetup /dev/loop0 fsfile
# mount /dev/loop0 mnt
# resize2fs -p /dev/loop0

	https://bugzilla.kernel.org/show_bug.cgi?id=13549

Reported-by: Alessandro Polverini <alex@nibbles.it>
Test-case-by: Christoph Biedl  <bugzilla.kernel.bpeb@manchmal.in-ulm.de>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 01:00:00 -04:00
Dmitry Monakhov 35121c9860 ext4: fix quota accounting in case of fallocate
allocated_meta_data is already included in 'used' variable.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-16 00:00:00 -04:00
Christian Borntraeger b684b2ee94 ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode
I have an x86_64 kernel with i386 userspace. e4defrag fails on the
EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat
case. It seems that struct move_extent is compat save, only types
with fixed widths are used:
{
        __u32 reserved;         /* should be zero */
        __u32 donor_fd;         /* donor file descriptor */
        __u64 orig_start;       /* logical start offset in block for orig */
        __u64 donor_start;      /* logical start offset in block for donor */
        __u64 len;              /* block length to be moved */
        __u64 moved_len;        /* moved block length */
};

Lets just wire up EXT4_IOC_MOVE_EXT for the compat case.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
CC: Akira Fujita <a-fujita@rs.jp.nec.com>
2010-05-15 00:00:00 -04:00
Jing Zhang e39e07fdfd ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy()
This function cleans up after ext4_mb_load_buddy(), so the renaming
makes the code clearer.

Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-14 00:00:00 -04:00
Jing Zhang 62e823a2cb ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-13 00:00:00 -04:00
Jing Zhang b720303df7 ext4: fix memory leaks in error path handling of ext4_ext_zeroout()
When EIO occurs after bio is submitted, there is no memory free
operation for bio, which results in memory leakage. And there is also
no check against bio_alloc() for bio.

Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Jing Zhang <zj.barak@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-12 00:00:00 -04:00
Steven Liu c26d0bad3d ext4: Fix coding style in fs/ext4/move_extent.c
Making sure ee_block is initialized to zero to prevent gcc from
kvetching.  It's harmless (although it's not obvious that it's
harmless) from code inspection:

fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used
uninitialized in this function

Thanks to Stefan Richter for first bringing this to the attention of
linux-ext4@vger.kernel.org.

Signed-off-by: LiuQi <lingjiujianke@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-05-11 00:00:00 -04:00
Dmitry Monakhov 0671e70465 ext4: check missed return value in ext4_sync_file()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-05-10 00:00:00 -04:00
Linus Torvalds b57f95a382 Linux 2.6.34-rc7 2010-05-09 18:36:28 -07:00
Linus Torvalds 93cb463141 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error
  [SCSI] Enable retries for SYNCRONIZE_CACHE commands to fix I/O error
  [SCSI] scsi_debug: virtual_gb ignores sector_size
  [SCSI] libiscsi: regression: fix header digest errors
  [SCSI] fix locking around blk_abort_request()
  [SCSI] advansys: fix narrow board error path
2010-05-09 18:35:53 -07:00
Arjan van de Ven 1c6fe0364f cpuidle: Fix incorrect optimization
commit 672917dcc7 ("cpuidle: menu governor: reduce latency on exit")
added an optimization, where the analysis on the past idle period moved
from the end of idle, to the beginning of the new idle.

Unfortunately, this optimization had a bug where it zeroed one key
variable for new use, that is needed for the analysis.  The fix is
simple, zero the variable after doing the work from the previous idle.

During the audit of the code that found this issue, another issue was
also found; the ->measured_us data structure member is never set, a
local variable is always used instead.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-09 18:35:36 -07:00
Linus Torvalds f1c448e0a9 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: restore ability of spare drives to spin down.
  md/raid6: Fix raid-6 read-error correction in degraded state
2010-05-07 14:11:40 -07:00
Linus Torvalds 2c32b1dab5 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6
* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: fix compilation after 16bit state locking changes
  pcmcia: order userspace suspend and resume requests
  pcmcia: avoid pccard_validate_cis failure in resume callpath
2010-05-07 14:11:09 -07:00
Linus Torvalds 48fe37cb53 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  blk-cgroup: Fix an RCU warning in blkiocg_create()
  blk-cgroup: Fix RCU correctness warning in cfq_init_queue()
  drbd: don't expose failed local READ to upper layers
2010-05-07 14:07:20 -07:00
Linus Torvalds e33b3e7567 Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/ttm: Remove the ttm_bo_block_reservation() function.
  drm/ttm: Remove some leftover debug messages.
  drm/radeon: async event synchronization for drmWaitVblank
2010-05-07 14:02:01 -07:00
Stijn Tintel e2dbe06c27 virtio: initialize earlier
Move initialization of the virtio framework before the initialization of
mtd, so that block2mtd can be used on virtio-based block devices.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15644

Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-07 14:01:17 -07:00
Linus Torvalds 9167746716 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix RCU issues in the NFSv4 delegation code
  NFSv4: Fix the locking in nfs_inode_reclaim_delegation()
2010-05-07 13:59:48 -07:00