Commit Graph

349948 Commits

Author SHA1 Message Date
Miao Xie 39847c4d3d Btrfs: fix wrong reservation of csums
We reserve the space for csums only when we write data into a file, in
the other cases, such as tree log, log replay, we don't do reservation,
so we can use the reservation of the transaction handle just for the former.
And for the latter, we should use the tree's own reservation. But the
function - btrfs_csum_file_blocks() didn't differentiate between these
two types of the cases, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:30 -04:00
Wang Shilong a7975026ff Btrfs: fix double free in the btrfs_qgroup_account_ref()
The function btrfs_find_all_roots is responsible to allocate
memory for 'roots' and free it if errors happen,so the caller should not
free it again since the work has been done.

Besides,'tmp' is allocated after the function btrfs_find_all_roots,
so we can return directly if btrfs_find_all_roots() fails.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:29 -04:00
Josef Bacik fdf30d1c1b Btrfs: limit the global reserve to 512mb
A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space.  This is because the
global block reserve was taking up the entire free metadata space.  This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done.  This allowed the user to
complete his rsync without issues.  Thanks

Cc: stable@vger.kernel.org
Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:29 -04:00
Josef Bacik db1d607d3c Btrfs: hold the ordered operations mutex when waiting on ordered extents
We need to hold the ordered_operations mutex while waiting on ordered extents
since we splice and run the ordered extents list.  We need to make sure anybody
else who wants to wait on ordered extents does actually wait for them to be
completed.  This will keep us from bailing out of flushing in case somebody is
already waiting on ordered extents to complete.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:28 -04:00
Josef Bacik 6e137ed3f3 Btrfs: fix space accounting for unlink and rename
We are way over-reserving for unlink and rename.  Rename is just some random
huge number and unlink accounts for tree log operations that don't actually
happen during unlink, not to mention the tree log doesn't take from the trans
block rsv anyway so it's completely useless.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:27 -04:00
Josef Bacik f4881bc7a8 Btrfs: fix space leak when we fail to reserve metadata space
Dave reported a warning when running xfstest 275.  We have been leaking delalloc
metadata space when our reservations fail.  This is because we were improperly
calculating how much space to free for our checksum reservations.  The problem
is we would sometimes free up space that had already been freed in another
thread and we would end up with negative usage for the delalloc space.  This
patch fixes the problem by calculating how much space the other threads would
have already freed, and then calculate how much space we need to free had we not
done the reservation at all, and then freeing any excess space.  This makes
xfstests 275 no longer have leaked space.  Thanks

Cc: stable@vger.kernel.org
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:26 -04:00
Jan Schmidt adaa4b8e4d Btrfs: fix EIO from btrfs send in is_extent_unchanged for punched holes
When you take a snapshot, punch a hole where there has been data, then take
another snapshot and try to send an incremental stream, btrfs send would
give you EIO. That is because is_extent_unchanged had no support for holes
being punched. With this patch, instead of returning EIO we just return
0 (== the extent is not unchanged) and we're good.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Cc: Alexander Block <ablock84@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-28 09:51:26 -04:00
Chris Mason 4adaa61102 Btrfs: fix race between mmap writes and compression
Btrfs uses page_mkwrite to ensure stable pages during
crc calculations and mmap workloads.  We call clear_page_dirty_for_io
before we do any crcs, and this forces any application with the file
mapped to wait for the crc to finish before it is allowed to change
the file.

With compression on, the clear_page_dirty_for_io step is happening after
we've compressed the pages.  This means the applications might be
changing the pages while we are compressing them, and some of those
modifications might not hit the disk.

This commit adds the clear_page_dirty_for_io before compression starts
and makes sure to redirty the page if we have to fallback to
uncompressed IO as well.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Alexandre Oliva <oliva@gnu.org>
cc: stable@vger.kernel.org
2013-03-26 13:19:14 -04:00
Tsutomu Itoh 1dd05682b3 Btrfs: fix memory leak in btrfs_create_tree()
We should free leaf and root before returning from the error
handling code.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-21 19:31:52 -04:00
Jan Schmidt d9abbf1c31 Btrfs: fix locking on ROOT_REPLACE operations in tree mod log
To resolve backrefs, ROOT_REPLACE operations in the tree mod log are
required to be tied to at least one KEY_REMOVE_WHILE_FREEING operation.
Therefore, those operations must be enclosed by tree_mod_log_write_lock()
and tree_mod_log_write_unlock() calls.

Those calls are private to the tree_mod_log_* functions, which means that
removal of the elements of an old root node must be logged from
tree_mod_log_insert_root. This partly reverts and corrects commit ba1bfbd5
(Btrfs: fix a tree mod logging issue for root replacement operations).

This fixes the brand-new version of xfstest 276 as of commit cfe73f71.

Cc: stable@vger.kernel.org
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-21 19:31:52 -04:00
Wang Shilong 6113077cd3 Btrfs: fix missing qgroup reservation before fallocating
Steps to reproduce:
	mkfs.btrfs <disk>
	mount <disk> <mnt>
	btrfs quota enable <mnt>
	btrfs sub create <mnt>/subv
	btrfs qgroup limit 10M <mnt>/subv
	fallocate --length 20M <mnt>/subv/data

For the above example, fallocating will return successfully which
is not expected, we try to fix it by doing qgroup reservation before
fallocating.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-21 19:24:32 -04:00
Josef Bacik 835d974fab Btrfs: handle a bogus chunk tree nicely
If you restore a btrfs-image file system and try to mount that file system we'll
panic.  That's because btrfs-image restores and just makes one big chunk to
envelope the whole disk, since they are really only meant to be messed with by
our btrfs-progs.  So fix up btrfs_rmap_block and the callers of it for mount so
that we no longer panic but instead just return an error and fail to mount.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-21 19:24:31 -04:00
Liu Bo d763448286 Btrfs: update to use fs_state bit
Now that we use bit operation to check fs_state, update
btrfs_free_fs_root()'s checker, otherwise we get back to
memory leak case.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-21 19:24:31 -04:00
Liu Bo 3b2775942d Btrfs: fix warning of free_extent_map
Users report that an extent map's list is still linked when it's actually
going to be freed from cache.

The story is that

a) when we're going to drop an extent map and may split this large one into
smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
that it's on the list to be logged, then the smaller ems split from it will also
be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.

b) we'll keep ems from unlinking the list and freeing when they are flagged with
EXTENT_FLAG_LOGGING, because the log code holds one reference.

The end result is the warning, but the truth is that we set the flag
EXTENT_FLAG_LOGGING only during fsync.

So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.

Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-15 21:51:49 -04:00
Liu Bo 7c2ec3f073 Btrfs: fix warning when creating snapshots
Creating snapshot passes extent_root to commit its transaction,
but it can lead to the warning of checking root for quota in
the __btrfs_end_transaction() when someone else is committing
the current transaction.  Since we've recorded the needed root
in trans_handle, just use it to get rid of the warning.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:30 -04:00
Wang Shilong 720f1e2060 Btrfs: return as soon as possible when edquot happens
If one of qgroup fails to reserve firstly, we should return immediately,
it is unnecessary to continue check.

Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Josef Bacik 492104c866 Btrfs: return EIO if we have extent tree corruption
The callers of lookup_inline_extent_info all handle getting an error back
properly, so return an error if we have corruption instead of being a jerk and
panicing.  Still WARN_ON() since this is kind of crucial and I've been seeing it
a bit too much recently for my taste, I think we're doing something wrong
somewhere.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Eric Sandeen bc178622d4 btrfs: use rcu_barrier() to wait for bdev puts at unmount
Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
	__btrfs_close_devices
		call_rcu(&device->rcu, free_device);
			free_device
				INIT_WORK(&device->rcu_work, __free_device);
				schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Cc: stable@vger.kernel.org
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:29 -04:00
Liu Bo d340d2475c Btrfs: remove btrfs_try_spin_lock
Remove a useless function declaration

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:57:10 -04:00
Liu Bo a09a0a705d Btrfs: get better concurrency for snapshot-aware defrag work
Using spinning case instead of blocking will result in better concurrency
overall.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-14 14:50:19 -04:00
Chris Mason de3cb945db Btrfs: improve the delayed inode throttling
The delayed inode code batches up changes to the btree in hopes of doing
them in bulk.  As the changes build up, processes kick off worker
threads and wait for them to make progress.

The current code kicks off an async work queue item for each delayed
node, which creates a lot of churn.  It also uses a fixed 1 HZ waiting
period for the throttle, which allows us to build a lot of pending
work and can slow down the commit.

This changes us to watch a sequence counter as it is bumped during the
operations.  We kick off fewer work items and have each work item do
more work.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-07 07:52:40 -05:00
Ilya Dryomov 3a01aa7a25 Btrfs: fix a mismerge in btrfs_balance()
Raid56 merge (merge commit e942f88) had mistakenly removed a call to
__cancel_balance(), which resulted in balance not cleaning up after itself
after a successful finish.  (Cleanup includes switching the state, removing
the balance item and releasing mut_ex_op testnset lock.)  Bring it back.

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-03-06 22:03:16 -05:00
Chris Mason 2cc65e3e57 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next into for-linus-3.9 2013-03-06 19:46:29 -05:00
Chris Mason 154ea28930 Btrfs: enforce min_bytes parameter during extent allocation
Commit 24542bf7ea changed preallocation of
extents to cap the max size we try to allocate.  It's a valid change,
but the extent reservation code is also used by balance, and that
can't tolerate a smaller extent being allocated.

__btrfs_prealloc_file_range already has a min_size parameter, which is
used by relocation to request a specific extent size.  This commit
adds an extra check to enforce that minimum extent size.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Stefan Behrens <sbehrens@giantdisaster.de>
2013-03-05 11:30:16 -05:00
Stefan Behrens 9b53157aac Btrfs: allow running defrag in parallel to administrative tasks
Commit 5ac00add added a testnset mutex and code that disallows
running administrative tasks in parallel. It is prevented that
the device add/delete/balance/replace/resize operations are
started in parallel. By mistake, the defragmentation operation
was included in the check for mutually exclusiveness as well.
This is fixed with this commit.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-03-04 16:33:24 -05:00