Commit Graph

59 Commits

Author SHA1 Message Date
Aneesh Kumar K.V ae2d9fb18e ext4: fix missing ext4_unlock_group in error path
If we try to free a block which is already freed, the code was
returning without first unlocking the group.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-11-04 09:10:50 -05:00
Theodore Ts'o 3e624fc72f ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback
The multiblock allocator needs to be able to release blocks (and issue
a blkdev discard request) when the transaction which freed those
blocks is committed.  Previously this was done via a polling mechanism
when blocks are allocated or freed.  A much better way of doing things
is to create a jbd2 callback function and attaching the list of blocks
to be freed directly to the transaction structure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-16 20:00:24 -04:00
Manish Katiyar 0b09923eab ext4: Remove compile warnings when building w/o CONFIG_PROC_FS
Signed-off-by: Manish Katiyar <mkatiyar@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-17 14:58:45 -04:00
Theodore Ts'o 8a0aba733d ext4: let the block device know when unused blocks can be discarded
Let the block device know when unused blocks can be discarded, using
the new sb_issue_discard() interface.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-16 10:06:27 -04:00
Aneesh Kumar K.V c894058d66 ext4: Use an rbtree for tracking blocks freed during transaction.
With this patch we track the block freed during a transaction using
red-black tree.  We also make sure contiguous blocks freed are collected
in one node in the tree.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-16 10:14:27 -04:00
Aneesh Kumar K.V 688f05a019 ext4: Free ext4_prealloc_space using kmem_cache_free
We should use kmem_cache_free to free memory allocated
via kmem_cache_alloc

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-10-13 12:14:14 -04:00
Frederic Bohe c806e68f56 ext4: fix initialization of UNINIT bitmap blocks
This fixes a bug which caused on-line resizing of filesystems with a
1k blocksize to fail.  The root cause of this bug was the fact that if
an uninitalized bitmap block gets read in by userspace (which
e2fsprogs does try to avoid, but can happen when the blocksize is less
than the pagesize and an adjacent blocks is read into memory)
ext4_read_block_bitmap() was erroneously depending on the buffer
uptodate flag to decide whether it needed to initialize the bitmap
block in memory --- i.e., to set the standard set of blocks in use by
a block group (superblock, bitmaps, inode table, etc.).  Essentially,
ext4_read_block_bitmap() assumed it was the only routine that might
try to read a block containing a block bitmap, which is simply not
true.  

To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap()
must always initialize uninitialized bitmap blocks.  Once a block or
inode is allocated out of that bitmap, it will be marked as
initialized in the block group descriptor, so in general this won't
result any extra unnecessary work.

Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10 08:09:18 -04:00
Theodore Ts'o c2ea3fde61 ext4: Remove old legacy block allocator
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10 09:40:52 -04:00
Theodore Ts'o 5e8814f2f7 ext4: Combine proc file handling into a single set of functions
Previously mballoc created a separate set of functions for each proc
file.  This combines the tunables into a single set of functions which
gets used for all of the per-superblock proc files, saving
approximately 2k of compiled object code.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-23 18:07:35 -04:00
Theodore Ts'o 9f6200bbfc ext4: move /proc setup and teardown out of mballoc.c
...and into the core setup/teardown code in fs/ext4/super.c so that
other parts of ext4 can define tuning parameters.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-23 09:18:24 -04:00
Eric Sandeen 730c213c79 ext4: use percpu data structures for lg_prealloc_list
lg_prealloc_list seems to cry out for a per-cpu data structure; on a large
smp system I think this should be better.  I've lightly tested this change
on a 4-cpu system.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-13 15:23:29 -04:00
Alexey Dobriyan 899fc1a4cf ext4: fix #11321: create /proc/ext4/*/stats more carefully
ext4 creates per-suberblock directory in /proc/ext4/ . Name used as
basis is taken from bdevname, which, surprise, can contain slash.

However, proc while allowing to use proc_create("a/b", parent) form of
PDE creation, assumes that parent/a was already created.

bdevname in question is 'cciss/c0d0p9', directory is not created and all
this stuff goes directly into /proc (which is real bug).

Warning comes when _second_ partition is mounted.

http://bugzilla.kernel.org/show_bug.cgi?id=11321

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-14 10:21:33 -04:00
Aneesh Kumar K.V 6bc6e63fcd ext4: Add percpu dirty block accounting.
This patch adds dirty block accounting using percpu_counters.  Delayed
allocation block reservation is now done by updating dirty block
counter.  In a later patch we switch to non delalloc mode if the
filesystem free blocks is greater than 150% of total filesystem dirty
blocks

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-10 09:39:00 -04:00
Aneesh Kumar K.V 030ba6bc67 ext4: Retry block reservation
During block reservation if we don't have enough blocks left, retry
block reservation with smaller block counts.  This makes sure we try
fallocate and DIO with smaller request size and don't fail early.  The
delayed allocation reservation cannot try with smaller block count. So
retry block reservation to handle temporary disk full conditions.  Also
print free blocks details if we fail block allocation during writepages.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-08 23:14:50 -04:00
Aneesh Kumar K.V a30d542a00 ext4: Make sure all the block allocation paths reserve blocks
With delayed allocation we need to make sure block are reserved before
we attempt to allocate them. Otherwise we get block allocation failure
(ENOSPC) during writepages which cannot be handled. This would mean
silent data loss (We do a printk stating data will be lost). This patch
updates the DIO and fallocate code path to do block reservation before
block allocation. This is needed to make sure parallel DIO and fallocate
request doesn't take block out of delayed reserve space.

When free blocks count go below a threshold we switch to a slow patch
which looks at other CPU's accumulated percpu counter values.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-09 10:56:23 -04:00
Theodore Ts'o 4776004f54 ext4: Add printk priority levels to clean up checkpatch warnings
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-09-08 23:00:52 -04:00
Aneesh Kumar K.V 5e745b041f ext4: Fix small file fragmentation
For small file block allocations, mballoc uses per cpu prealloc
space.  Use goal block when searching for the right prealloc
space.  Also make sure ext4_da_writepages tries to write
all the pages for small files in single attempt

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-08-18 18:00:57 -04:00
Aneesh Kumar K.V 6be2ded1d7 ext4: Don't allow lg prealloc list to be grow large.
Currently, the locality group prealloc list is freed only when there
is a block allocation failure. This can result in large number of
entries in the preallocation list making ext4_mb_use_preallocated()
expensive.

To fix this, we convert the locality group prealloc list to a hash
list. The hash index is the order of number of blocks in the prealloc
space with a max order of 9. When adding prealloc space to the list we
make sure total entries for each order does not exceed 8. If it is
more than 8 we discard few entries and make sure the we have only <= 5
entries.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-23 14:14:05 -04:00
Aneesh Kumar K.V 1320cbcf77 ext4: Convert the usage of NR_CPUS to nr_cpu_ids.
NR_CPUS can be really large. We should be using nr_cpu_ids instead.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-23 14:09:26 -04:00
Aneesh Kumar K.V ce89f46cb8 ext4: Improve error handling in mballoc
Don't call BUG_ON on file system failures. Instead use ext4_error and
also handle the continue case properly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-23 14:09:29 -04:00
Eric Sandeen b5f10eed81 ext4: lock block groups when initializing
I noticed when filling a 1T filesystem with 4 threads using the
fs_mark benchmark:

fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0

that I occasionally got checksum mismatch errors:

EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935

etc.  I'd reliably get 4-5 of them during the run.

It appears that the problem is likely a race to init the bg's
when the uninit_bg feature is enabled.

With the patch below, which adds sb_bgl_locking around initialization,
I was able to complete several runs with no errors or warnings.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-08-02 21:21:08 -04:00
Mingming Cao d2a1763791 ext4: delayed allocation ENOSPC handling
This patch does block reservation for delayed
allocation, to avoid ENOSPC later at page flush time.

Blocks(data and metadata) are reserved at da_write_begin()
time, the freeblocks counter is updated by then, and the number of
reserved blocks is store in per inode counter.
        
At the writepage time, the unused reserved meta blocks are returned
back. At unlink/truncate time, reserved blocks are properly released.

Updated fix from  Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
to fix the oldallocator block reservation accounting with delalloc, added
lock to guard the counters and also fix the reservation for meta blocks.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-14 17:52:37 -04:00
Frederic Bohe 5f21b0e642 ext4: fix online resize with mballoc
Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.

Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-07-11 19:27:31 -04:00
Mingming Cao 0703143107 ext4: mballoc avoid use root reserved blocks for non root allocation
mballoc allocation missed check for blocks reserved for root users. Add
ext4_has_free_blocks() check before allocation. Also modified
ext4_has_free_blocks() to support multiple block allocation request.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00
Aneesh Kumar K.V 654b4908bc ext4: cleanup block allocator
Move the code related to block allocation to a single function and add helper
funtions to differient allocation for data and meta data blocks

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11 19:27:31 -04:00