This patch contains following things.
1) Limit the max size of btrfs_ordered_sum structure to PAGE_SIZE. This
struct is kmalloced so we want to keep it reasonable.
2) Replace copy_extent_csums by btrfs_lookup_csums_range. This was
duplicated code in tree-log.c
3) Remove replay_one_csum. csum items are replayed at the same time as
replaying file extents. This guarantees we only replay useful csums.
4) nbytes accounting fix.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This is a patch to fix discard semantic to make Btrfs work with FTL and SSD.
We can improve FTL's performance by telling it which sectors are freed by file
system. But if we don't tell FTL the information of free sectors in proper
time, the transaction mechanism of Btrfs will be destroyed and Btrfs could not
roll back the previous transaction under the power loss condition.
There are some problems in the old implementation:
1, In __free_extent(), the pinned down extents should not be discarded.
2, In free_extents(), the free extents are all pinned, so they need to
be discarded in transaction committing time instead of free_extents().
3, The reserved extent used by log tree should be discard too.
This patch change discard behavior as follows:
1, For the extents which need to be free at once,
we discard them in update_block_group().
2, Delay discarding the pinned extent in btrfs_finish_extent_commit()
when committing transaction.
3, Remove discarding from free_extents() and __free_extent()
4, Add discard interface into btrfs_free_reserved_extent()
5, Discard sectors before updating the free space cache, otherwise,
FTL will destroy file system data.
There is a race in relocate_inode_pages, it happens when
find_delalloc_range finds the delalloc extent before the
boundary bit is set. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This adds the missing block accounting code to finish_current_insert and makes
block accounting for root item properly protected by the delalloc spin lock.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Btrfs maintains a cache of blocks available for allocation in ram. The
code that frees extents was marking the extents free and then deleting
the checksum items.
This meant it was possible the extent would be reallocated before the
checksum item was actually deleted, leading to races and other
problems as the checksums were updated for the newly allocated extent.
The fix is to delete the checksum before marking the extent free.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The delalloc lock doesn't need to have irqs disabled, nobody that
changes the number of delalloc bytes in the FS is running with irqs off.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Checksums on data can be disabled by mount option, so it's
possible some data extents don't have checksums or have
invalid checksums. This causes trouble for data relocation.
This patch contains following things to make data relocation
work.
1) make nodatasum/nodatacow mount option only affects new
files. Checksums and COW on data are only controlled by the
inode flags.
2) check the existence of checksum in the nodatacow checker.
If checksums exist, force COW the data extent. This ensure that
checksum for a given block is either valid or does not exist.
3) update data relocation code to properly handle the case
of checksum missing.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This patch makes seed device possible to be shared by
multiple mounted file systems. The sharing is achieved
by cloning seed device's btrfs_fs_devices structure.
Thanks you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
The block group structs are referenced in many different
places, and it's not safe to free while balancing. So, those block
group structs were simply leaked instead.
This patch replaces the block group pointer in the inode with the starting byte
offset of the block group and adds reference counting to the block group
struct.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
This finishes off the new checksumming code by removing csum items
for extents that are no longer in use.
The trick is doing it without racing because a single csum item may
hold csums for more than one extent. Extra checks are added to
btrfs_csum_file_blocks to make sure that we are using the correct
csum item after dropping locks.
A new btrfs_split_item is added to split a single csum item so it
can be split without dropping the leaf lock. This is used to
remove csum bytes from the middle of an item.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch implements superblock duplication. Superblocks
are stored at offset 16K, 64M and 256G on every devices.
Spaces used by superblocks are preserved by the allocator,
which uses a reverse mapping function to find the logical
addresses that correspond to superblocks. Thank you,
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Shut up various sparse warnings about symbols that should be either
static or have their declarations in scope.
Signed-off-by: Christoph Hellwig <hch@lst.de>
This the lockdep complaint by having a different mutex to gaurd caching the
block group, so you don't end up with this backwards dependancy. Thank you,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
The btrfs git kernel trees is used to build a standalone tree for
compiling against older kernels. This commit makes the standalone tree
work with 2.6.27
Signed-off-by: Chris Mason <chris.mason@oracle.com>
* open/close_bdev_excl -> open/close_bdev_exclusive
* blkdev_issue_discard takes a GFP mask now
* Fix blkdev_issue_discard usage now that it is enabled
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch fixes what I hope is the last early ENOSPC bug left. I did not know
that pinned extents would merge into one big extent when inserted on to the
pinned extent tree, so I was adding free space to a block group that could
possibly span multiple block groups.
This is a big issue because first that space doesn't exist in that block group,
and second we won't actually use that space because there are a bunch of other
checks to make sure we're allocating within the constraints of the block group.
This patch fixes the problem by adding the btrfs_add_free_space to
btrfs_update_pinned_extents which makes sure we are adding the appropriate
amount of free space to the appropriate block group. Thanks much to Lee Trager
for running my myriad of debug patches to help me track this problem down.
Thank you,
Signed-off-by: Josef Bacik <jbacik@redhat.com>
In insert_extents(), when ret==1 and last is not zero, it should
check if the current inserted item is the last item in this batching
inserts. If so, it should just break from loop. If not, 'cur =
insert_list->next' will make no sense because the list is empty now,
and 'op' will point to an unexpectable place.
There are also some trivial fixs in this patch including one comment
typo error and deleting two redundant lines.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
In my batch delete/update/insert patch I introduced a free space leak. The
extent that we do the original search on in free_extents is never pinned, so we
always update the block saying that it has free space, but the free space never
actually gets added to the free space tree, since op->del will always be 0 and
it's never actually added to the pinned extents tree.
This patch fixes this problem by making sure we call pin_down_bytes on the
pending extent op and set op->del to the return value of pin_down_bytes so
update_block_group is called with the right value. This seems to fix the case
where we were getting ENOSPC when there was plenty of space available.
Signed-off-by: Josef Bacik <jbacik@redhat.com>