Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs updates from Chris Mason:
 "We have a good sized cleanup of our internal read ahead code, and the
  first series of commits from Chandan to enable PAGE_SIZE > sectorsize

  Otherwise, it's a normal series of cleanups and fixes, with many
  thanks to Dave Sterba for doing most of the patch wrangling this time"

* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (82 commits)
  btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums
  btrfs: Fix misspellings in comments.
  btrfs: Print Warning only if ENOSPC_DEBUG is enabled
  btrfs: scrub: silence an uninitialized variable warning
  btrfs: move btrfs_compression_type to compression.h
  btrfs: rename btrfs_print_info to btrfs_print_mod_info
  Btrfs: Show a warning message if one of objectid reaches its highest value
  Documentation: btrfs: remove usage specific information
  btrfs: use kbasename in btrfsic_mount
  Btrfs: do not collect ordered extents when logging that inode exists
  Btrfs: fix race when checking if we can skip fsync'ing an inode
  Btrfs: fix listxattrs not listing all xattrs packed in the same item
  Btrfs: fix deadlock between direct IO reads and buffered writes
  Btrfs: fix extent_same allowing destination offset beyond i_size
  Btrfs: fix file loss on log replay after renaming a file and fsync
  Btrfs: fix unreplayable log after snapshot delete + parent dir fsync
  Btrfs: fix lockdep deadlock warning due to dev_replace
  btrfs: drop unused argument in btrfs_ioctl_get_supported_features
  btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls
  btrfs: change max_inline default to 2048
  ...
This commit is contained in:
Linus Torvalds
2016-03-21 18:12:42 -07:00
36 changed files with 1102 additions and 931 deletions
+11 -250
View File
@@ -1,20 +1,10 @@
BTRFS
=====
Btrfs is a copy on write filesystem for Linux aimed at
implementing advanced features while focusing on fault tolerance,
repair and easy administration. Initially developed by Oracle, Btrfs
is licensed under the GPL and open for contribution from anyone.
Linux has a wealth of filesystems to choose from, but we are facing a
number of challenges with scaling to the large storage subsystems that
are becoming common in today's data centers. Filesystems need to scale
in their ability to address and manage large storage, and also in
their ability to detect, repair and tolerate errors in the data stored
on disk. Btrfs is under heavy development, and is not suitable for
any uses other than benchmarking and review. The Btrfs disk format is
not yet finalized.
Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
features while focusing on fault tolerance, repair and easy administration.
Jointly developed by several companies, licensed under the GPL and open for
contribution from anyone.
The main Btrfs features include:
@@ -28,243 +18,14 @@ The main Btrfs features include:
* Checksums on data and metadata (multiple algorithms available)
* Compression
* Integrated multiple device support, with several raid algorithms
* Online filesystem check (not yet implemented)
* Very fast offline filesystem check
* Efficient incremental backup and FS mirroring (not yet implemented)
* Offline filesystem check
* Efficient incremental backup and FS mirroring
* Online filesystem defragmentation
For more information please refer to the wiki
Mount Options
=============
https://btrfs.wiki.kernel.org
When mounting a btrfs filesystem, the following option are accepted.
Options with (*) are default options and will not show in the mount options.
alloc_start=<bytes>
Debugging option to force all block allocations above a certain
byte threshold on each block device. The value is specified in
bytes, optionally with a K, M, or G suffix, case insensitive.
Default is 1MB.
noautodefrag(*)
autodefrag
Disable/enable auto defragmentation.
Auto defragmentation detects small random writes into files and queue
them up for the defrag process. Works best for small files;
Not well suited for large database workloads.
check_int
check_int_data
check_int_print_mask=<value>
These debugging options control the behavior of the integrity checking
module (the BTRFS_FS_CHECK_INTEGRITY config option required).
check_int enables the integrity checker module, which examines all
block write requests to ensure on-disk consistency, at a large
memory and CPU cost.
check_int_data includes extent data in the integrity checks, and
implies the check_int option.
check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values
as defined in fs/btrfs/check-integrity.c, to control the integrity
checker module behavior.
See comments at the top of fs/btrfs/check-integrity.c for more info.
commit=<seconds>
Set the interval of periodic commit, 30 seconds by default. Higher
values defer data being synced to permanent storage with obvious
consequences when the system crashes. The upper bound is not forced,
but a warning is printed if it's more than 300 seconds (5 minutes).
compress
compress=<type>
compress-force
compress-force=<type>
Control BTRFS file data compression. Type may be specified as "zlib"
"lzo" or "no" (for no compression, used for remounting). If no type
is specified, zlib is used. If compress-force is specified,
all files will be compressed, whether or not they compress well.
If compression is enabled, nodatacow and nodatasum are disabled.
degraded
Allow mounts to continue with missing devices. A read-write mount may
fail with too many devices missing, for example if a stripe member
is completely missing.
device=<devicepath>
Specify a device during mount so that ioctls on the control device
can be avoided. Especially useful when trying to mount a multi-device
setup as root. May be specified multiple times for multiple devices.
nodiscard(*)
discard
Disable/enable discard mount option.
Discard issues frequent commands to let the block device reclaim space
freed by the filesystem.
This is useful for SSD devices, thinly provisioned
LUNs and virtual machine images, but may have a significant
performance impact. (The fstrim command is also available to
initiate batch trims from userspace).
noenospc_debug(*)
enospc_debug
Disable/enable debugging option to be more verbose in some ENOSPC conditions.
fatal_errors=<action>
Action to take when encountering a fatal error:
"bug" - BUG() on a fatal error. This is the default.
"panic" - panic() on a fatal error.
noflushoncommit(*)
flushoncommit
The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit. This makes
the committed state a fully consistent view of the file system from the
application's perspective (i.e., it includes all completed file system
operations). This was previously the behavior only when a snapshot is
created.
inode_cache
Enable free inode number caching. Defaults to off due to an overflow
problem when the free space crcs don't fit inside a single page.
max_inline=<bytes>
Specify the maximum amount of space, in bytes, that can be inlined in
a metadata B-tree leaf. The value is specified in bytes, optionally
with a K, M, or G suffix, case insensitive. In practice, this value
is limited by the root sector size, with some space unavailable due
to leaf headers. For a 4k sector size, max inline data is ~3900 bytes.
metadata_ratio=<value>
Specify that 1 metadata chunk should be allocated after every <value>
data chunks. Off by default.
acl(*)
noacl
Enable/disable support for Posix Access Control Lists (ACLs). See the
acl(5) manual page for more information about ACLs.
barrier(*)
nobarrier
Enable/disable the use of block layer write barriers. Write barriers
ensure that certain IOs make it through the device cache and are on
persistent storage. If disabled on a device with a volatile
(non-battery-backed) write-back cache, nobarrier option will lead to
filesystem corruption on a system crash or power loss.
datacow(*)
nodatacow
Enable/disable data copy-on-write for newly created files.
Nodatacow implies nodatasum, and disables all compression.
datasum(*)
nodatasum
Enable/disable data checksumming for newly created files.
Datasum implies datacow.
treelog(*)
notreelog
Enable/disable the tree logging used for fsync and O_SYNC writes.
recovery
Enable autorecovery attempts if a bad tree root is found at mount time.
Currently this scans a list of several previous tree roots and tries to
use the first readable.
rescan_uuid_tree
Force check and rebuild procedure of the UUID tree. This should not
normally be needed.
skip_balance
Skip automatic resume of interrupted balance operation after mount.
May be resumed with "btrfs balance resume."
space_cache (*)
Enable the on-disk freespace cache.
nospace_cache
Disable freespace cache loading without clearing the cache.
clear_cache
Force clearing and rebuilding of the disk space cache if something
has gone wrong.
ssd
nossd
ssd_spread
Options to control ssd allocation schemes. By default, BTRFS will
enable or disable ssd allocation heuristics depending on whether a
rotational or non-rotational disk is in use. The ssd and nossd options
can override this autodetection.
The ssd_spread mount option attempts to allocate into big chunks
of unused space, and may perform better on low-end ssds. ssd_spread
implies ssd, enabling all other ssd heuristics as well.
subvol=<path>
Mount subvolume at <path> rather than the root subvolume. <path> is
relative to the top level subvolume.
subvolid=<ID>
Mount subvolume specified by an ID number rather than the root subvolume.
This allows mounting of subvolumes which are not in the root of the mounted
filesystem.
You can use "btrfs subvolume list" to see subvolume ID numbers.
subvolrootid=<objectid> (deprecated)
Mount subvolume specified by <objectid> rather than the root subvolume.
This allows mounting of subvolumes which are not in the root of the mounted
filesystem.
You can use "btrfs subvolume show " to see the object ID for a subvolume.
thread_pool=<number>
The number of worker threads to allocate. The default number is equal
to the number of CPUs + 2, or 8, whichever is smaller.
user_subvol_rm_allowed
Allow subvolumes to be deleted by a non-root user. Use with caution.
MAILING LIST
============
There is a Btrfs mailing list hosted on vger.kernel.org. You can
find details on how to subscribe here:
http://vger.kernel.org/vger-lists.html#linux-btrfs
Mailing list archives are available from gmane:
http://dir.gmane.org/gmane.comp.file-systems.btrfs
IRC
===
Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
IRC network.
UTILITIES
=========
Userspace tools for creating and manipulating Btrfs file systems are
available from the git repository at the following location:
http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
These include the following tools:
* mkfs.btrfs: create a filesystem
* btrfs: a single tool to manage the filesystems, refer to the manpage for more details
* 'btrfsck' or 'btrfs check': do a consistency check of the filesystem
Other tools for specific tasks:
* btrfs-convert: in-place conversion from ext2/3/4 filesystems
* btrfs-image: dump filesystem metadata for debugging
that maintains information about administration tasks, frequently asked
questions, use cases, mount options, comprehensible changelogs, features,
manual pages, source code repositories, contacts etc.
+4 -8
View File
@@ -148,8 +148,7 @@ int __init btrfs_prelim_ref_init(void)
void btrfs_prelim_ref_exit(void)
{
if (btrfs_prelim_ref_cache)
kmem_cache_destroy(btrfs_prelim_ref_cache);
kmem_cache_destroy(btrfs_prelim_ref_cache);
}
/*
@@ -566,17 +565,14 @@ static void __merge_refs(struct list_head *head, int mode)
struct __prelim_ref *pos2 = pos1, *tmp;
list_for_each_entry_safe_continue(pos2, tmp, head, list) {
struct __prelim_ref *xchg, *ref1 = pos1, *ref2 = pos2;
struct __prelim_ref *ref1 = pos1, *ref2 = pos2;
struct extent_inode_elem *eie;
if (!ref_for_same_block(ref1, ref2))
continue;
if (mode == 1) {
if (!ref1->parent && ref2->parent) {
xchg = ref1;
ref1 = ref2;
ref2 = xchg;
}
if (!ref1->parent && ref2->parent)
swap(ref1, ref2);
} else {
if (ref1->parent != ref2->parent)
continue;
+5 -7
View File
@@ -95,6 +95,7 @@
#include <linux/genhd.h>
#include <linux/blkdev.h>
#include <linux/vmalloc.h>
#include <linux/string.h>
#include "ctree.h"
#include "disk-io.h"
#include "hash.h"
@@ -105,6 +106,7 @@
#include "locking.h"
#include "check-integrity.h"
#include "rcu-string.h"
#include "compression.h"
#define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x10000
#define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x10000
@@ -176,7 +178,7 @@ struct btrfsic_block {
* Elements of this type are allocated dynamically and required because
* each block object can refer to and can be ref from multiple blocks.
* The key to lookup them in the hashtable is the dev_bytenr of
* the block ref to plus the one from the block refered from.
* the block ref to plus the one from the block referred from.
* The fact that they are searchable via a hashtable and that a
* ref_cnt is maintained is not required for the btrfs integrity
* check algorithm itself, it is only used to make the output more
@@ -3076,7 +3078,7 @@ int btrfsic_mount(struct btrfs_root *root,
list_for_each_entry(device, dev_head, dev_list) {
struct btrfsic_dev_state *ds;
char *p;
const char *p;
if (!device->bdev || !device->name)
continue;
@@ -3092,11 +3094,7 @@ int btrfsic_mount(struct btrfs_root *root,
ds->state = state;
bdevname(ds->bdev, ds->name);
ds->name[BDEVNAME_SIZE - 1] = '\0';
for (p = ds->name; *p != '\0'; p++);
while (p > ds->name && *p != '/')
p--;
if (*p == '/')
p++;
p = kbasename(ds->name);
strlcpy(ds->name, p, sizeof(ds->name));
btrfsic_dev_state_hashtable_add(ds,
&btrfsic_dev_state_hashtable);
+9
View File
@@ -48,6 +48,15 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
void btrfs_clear_biovec_end(struct bio_vec *bvec, int vcnt,
unsigned long pg_index,
unsigned long pg_offset);
enum btrfs_compression_type {
BTRFS_COMPRESS_NONE = 0,
BTRFS_COMPRESS_ZLIB = 1,
BTRFS_COMPRESS_LZO = 2,
BTRFS_COMPRESS_TYPES = 2,
BTRFS_COMPRESS_LAST = 3,
};
struct btrfs_compress_op {
struct list_head *(*alloc_workspace)(void);
+18 -18
View File
@@ -311,7 +311,7 @@ struct tree_mod_root {
struct tree_mod_elem {
struct rb_node node;
u64 index; /* shifted logical */
u64 logical;
u64 seq;
enum mod_log_op op;
@@ -435,11 +435,11 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
/*
* key order of the log:
* index -> sequence
* node/leaf start address -> sequence
*
* the index is the shifted logical of the *new* root node for root replace
* operations, or the shifted logical of the affected block for all other
* operations.
* The 'start address' is the logical address of the *new* root node
* for root replace operations, or the logical address of the affected
* block for all other operations.
*
* Note: must be called with write lock (tree_mod_log_write_lock).
*/
@@ -460,9 +460,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
while (*new) {
cur = container_of(*new, struct tree_mod_elem, node);
parent = *new;
if (cur->index < tm->index)
if (cur->logical < tm->logical)
new = &((*new)->rb_left);
else if (cur->index > tm->index)
else if (cur->logical > tm->logical)
new = &((*new)->rb_right);
else if (cur->seq < tm->seq)
new = &((*new)->rb_left);
@@ -523,7 +523,7 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
if (!tm)
return NULL;
tm->index = eb->start >> PAGE_CACHE_SHIFT;
tm->logical = eb->start;
if (op != MOD_LOG_KEY_ADD) {
btrfs_node_key(eb, &tm->key, slot);
tm->blockptr = btrfs_node_blockptr(eb, slot);
@@ -588,7 +588,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
goto free_tms;
}
tm->index = eb->start >> PAGE_CACHE_SHIFT;
tm->logical = eb->start;
tm->slot = src_slot;
tm->move.dst_slot = dst_slot;
tm->move.nr_items = nr_items;
@@ -699,7 +699,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
goto free_tms;
}
tm->index = new_root->start >> PAGE_CACHE_SHIFT;
tm->logical = new_root->start;
tm->old_root.logical = old_root->start;
tm->old_root.level = btrfs_header_level(old_root);
tm->generation = btrfs_header_generation(old_root);
@@ -739,16 +739,15 @@ __tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 start, u64 min_seq,
struct rb_node *node;
struct tree_mod_elem *cur = NULL;
struct tree_mod_elem *found = NULL;
u64 index = start >> PAGE_CACHE_SHIFT;
tree_mod_log_read_lock(fs_info);
tm_root = &fs_info->tree_mod_log;
node = tm_root->rb_node;
while (node) {
cur = container_of(node, struct tree_mod_elem, node);
if (cur->index < index) {
if (cur->logical < start) {
node = node->rb_left;
} else if (cur->index > index) {
} else if (cur->logical > start) {
node = node->rb_right;
} else if (cur->seq < min_seq) {
node = node->rb_left;
@@ -1230,9 +1229,10 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
return NULL;
/*
* the very last operation that's logged for a root is the replacement
* operation (if it is replaced at all). this has the index of the *new*
* root, making it the very first operation that's logged for this root.
* the very last operation that's logged for a root is the
* replacement operation (if it is replaced at all). this has
* the logical address of the *new* root, making it the very
* first operation that's logged for this root.
*/
while (1) {
tm = tree_mod_log_search_oldest(fs_info, root_logical,
@@ -1336,7 +1336,7 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct extent_buffer *eb,
if (!next)
break;
tm = container_of(next, struct tree_mod_elem, node);
if (tm->index != first_tm->index)
if (tm->logical != first_tm->logical)
break;
}
tree_mod_log_read_unlock(fs_info);
@@ -5361,7 +5361,7 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
goto out;
}
tmp_buf = kmalloc(left_root->nodesize, GFP_NOFS);
tmp_buf = kmalloc(left_root->nodesize, GFP_KERNEL);
if (!tmp_buf) {
ret = -ENOMEM;
goto out;
+61 -26
View File
@@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
/* tracks free space in block groups. */
#define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
/* device stats in the device tree */
#define BTRFS_DEV_STATS_OBJECTID 0ULL
/* for storing balance parameters in the root tree */
#define BTRFS_BALANCE_OBJECTID -4ULL
@@ -715,14 +718,6 @@ struct btrfs_timespec {
__le32 nsec;
} __attribute__ ((__packed__));
enum btrfs_compression_type {
BTRFS_COMPRESS_NONE = 0,
BTRFS_COMPRESS_ZLIB = 1,
BTRFS_COMPRESS_LZO = 2,
BTRFS_COMPRESS_TYPES = 2,
BTRFS_COMPRESS_LAST = 3,
};
struct btrfs_inode_item {
/* nfs style generation number */
__le64 generation;
@@ -793,7 +788,7 @@ struct btrfs_root_item {
/*
* This generation number is used to test if the new fields are valid
* and up to date while reading the root item. Everytime the root item
* and up to date while reading the root item. Every time the root item
* is written out, the "generation" field is copied into this field. If
* anyone ever mounted the fs with an older kernel, we will have
* mismatching generation values here and thus must invalidate the
@@ -1002,8 +997,10 @@ struct btrfs_dev_replace {
pid_t lock_owner;
atomic_t nesting_level;
struct mutex lock_finishing_cancel_unmount;
struct mutex lock_management_lock;
struct mutex lock;
rwlock_t lock;
atomic_t read_locks;
atomic_t blocking_readers;
wait_queue_head_t read_lock_wq;
struct btrfs_scrub_progress scrub_progress;
};
@@ -1222,10 +1219,10 @@ struct btrfs_space_info {
* we've called update_block_group and dropped the bytes_used counter
* and increased the bytes_pinned counter. However this means that
* bytes_pinned does not reflect the bytes that will be pinned once the
* delayed refs are flushed, so this counter is inc'ed everytime we call
* btrfs_free_extent so it is a realtime count of what will be freed
* once the transaction is committed. It will be zero'ed everytime the
* transaction commits.
* delayed refs are flushed, so this counter is inc'ed every time we
* call btrfs_free_extent so it is a realtime count of what will be
* freed once the transaction is committed. It will be zero'ed every
* time the transaction commits.
*/
struct percpu_counter total_bytes_pinned;
@@ -1822,6 +1819,9 @@ struct btrfs_fs_info {
spinlock_t reada_lock;
struct radix_tree_root reada_tree;
/* readahead works cnt */
atomic_t reada_works_cnt;
/* Extent buffer radix tree */
spinlock_t buffer_lock;
struct radix_tree_root buffer_radix;
@@ -2185,13 +2185,43 @@ struct btrfs_ioctl_defrag_range_args {
*/
#define BTRFS_QGROUP_RELATION_KEY 246
/*
* Obsolete name, see BTRFS_TEMPORARY_ITEM_KEY.
*/
#define BTRFS_BALANCE_ITEM_KEY 248
/*
* Persistantly stores the io stats in the device tree.
* One key for all stats, (0, BTRFS_DEV_STATS_KEY, devid).
* The key type for tree items that are stored persistently, but do not need to
* exist for extended period of time. The items can exist in any tree.
*
* [subtype, BTRFS_TEMPORARY_ITEM_KEY, data]
*
* Existing items:
*
* - balance status item
* (BTRFS_BALANCE_OBJECTID, BTRFS_TEMPORARY_ITEM_KEY, 0)
*/
#define BTRFS_DEV_STATS_KEY 249
#define BTRFS_TEMPORARY_ITEM_KEY 248
/*
* Obsolete name, see BTRFS_PERSISTENT_ITEM_KEY
*/
#define BTRFS_DEV_STATS_KEY 249
/*
* The key type for tree items that are stored persistently and usually exist
* for a long period, eg. filesystem lifetime. The item kinds can be status
* information, stats or preference values. The item can exist in any tree.
*
* [subtype, BTRFS_PERSISTENT_ITEM_KEY, data]
*
* Existing items:
*
* - device statistics, store IO stats in the device tree, one key for all
* stats
* (BTRFS_DEV_STATS_OBJECTID, BTRFS_DEV_STATS_KEY, 0)
*/
#define BTRFS_PERSISTENT_ITEM_KEY 249
/*
* Persistantly stores the device replace state in the device tree.
@@ -2241,7 +2271,7 @@ struct btrfs_ioctl_defrag_range_args {
#define BTRFS_MOUNT_ENOSPC_DEBUG (1 << 15)
#define BTRFS_MOUNT_AUTO_DEFRAG (1 << 16)
#define BTRFS_MOUNT_INODE_MAP_CACHE (1 << 17)
#define BTRFS_MOUNT_RECOVERY (1 << 18)
#define BTRFS_MOUNT_USEBACKUPROOT (1 << 18)
#define BTRFS_MOUNT_SKIP_BALANCE (1 << 19)
#define BTRFS_MOUNT_CHECK_INTEGRITY (1 << 20)
#define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1 << 21)
@@ -2250,9 +2280,10 @@ struct btrfs_ioctl_defrag_range_args {
#define BTRFS_MOUNT_FRAGMENT_DATA (1 << 24)
#define BTRFS_MOUNT_FRAGMENT_METADATA (1 << 25)
#define BTRFS_MOUNT_FREE_SPACE_TREE (1 << 26)
#define BTRFS_MOUNT_NOLOGREPLAY (1 << 27)
#define BTRFS_DEFAULT_COMMIT_INTERVAL (30)
#define BTRFS_DEFAULT_MAX_INLINE (8192)
#define BTRFS_DEFAULT_MAX_INLINE (2048)
#define btrfs_clear_opt(o, opt) ((o) &= ~BTRFS_MOUNT_##opt)
#define btrfs_set_opt(o, opt) ((o) |= BTRFS_MOUNT_##opt)
@@ -2353,6 +2384,9 @@ struct btrfs_map_token {
unsigned long offset;
};
#define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
((bytes) >> (fs_info)->sb->s_blocksize_bits)
static inline void btrfs_init_map_token (struct btrfs_map_token *token)
{
token->kaddr = NULL;
@@ -3448,8 +3482,7 @@ u64 btrfs_csum_bytes_to_leaves(struct btrfs_root *root, u64 csum_bytes);
static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root,
unsigned num_items)
{
return (root->nodesize + root->nodesize * (BTRFS_MAX_LEVEL - 1)) *
2 * num_items;
return root->nodesize * BTRFS_MAX_LEVEL * 2 * num_items;
}
/*
@@ -4027,7 +4060,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
@@ -4089,6 +4122,7 @@ void btrfs_test_inode_set_ops(struct inode *inode);
/* ioctl.c */
long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
int btrfs_ioctl_get_supported_features(void __user *arg);
void btrfs_update_iflags(struct inode *inode);
void btrfs_inherit_iflags(struct inode *inode, struct inode *dir);
int btrfs_is_empty_uuid(u8 *uuid);
@@ -4151,7 +4185,8 @@ void btrfs_sysfs_remove_mounted(struct btrfs_fs_info *fs_info);
ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
/* super.c */
int btrfs_parse_options(struct btrfs_root *root, char *options);
int btrfs_parse_options(struct btrfs_root *root, char *options,
unsigned long new_flags);
int btrfs_sync_fs(struct super_block *sb, int wait);
#ifdef CONFIG_PRINTK
@@ -4525,8 +4560,8 @@ struct reada_control *btrfs_reada_add(struct btrfs_root *root,
struct btrfs_key *start, struct btrfs_key *end);
int btrfs_reada_wait(void *handle);
void btrfs_reada_detach(void *handle);
int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb,
u64 start, int err);
int btree_readahead_hook(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb, u64 start, int err);
static inline int is_fstree(u64 rootid)
{
+7 -3
View File
@@ -43,8 +43,7 @@ int __init btrfs_delayed_inode_init(void)
void btrfs_delayed_inode_exit(void)
{
if (delayed_node_cache)
kmem_cache_destroy(delayed_node_cache);
kmem_cache_destroy(delayed_node_cache);
}
static inline void btrfs_init_delayed_node(
@@ -651,9 +650,14 @@ static int btrfs_delayed_inode_reserve_metadata(
goto out;
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
if (!WARN_ON(ret))
if (!ret)
goto out;
if (btrfs_test_opt(root, ENOSPC_DEBUG)) {
btrfs_debug(root->fs_info,
"block rsv migrate returned %d", ret);
WARN_ON(1);
}
/*
* Ok this is a problem, let's just steal from the global rsv
* since this really shouldn't happen that often.
+4 -8
View File
@@ -929,14 +929,10 @@ btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr)
void btrfs_delayed_ref_exit(void)
{
if (btrfs_delayed_ref_head_cachep)
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
if (btrfs_delayed_tree_ref_cachep)
kmem_cache_destroy(btrfs_delayed_tree_ref_cachep);
if (btrfs_delayed_data_ref_cachep)
kmem_cache_destroy(btrfs_delayed_data_ref_cachep);
if (btrfs_delayed_extent_op_cachep)
kmem_cache_destroy(btrfs_delayed_extent_op_cachep);
kmem_cache_destroy(btrfs_delayed_ref_head_cachep);
kmem_cache_destroy(btrfs_delayed_tree_ref_cachep);
kmem_cache_destroy(btrfs_delayed_data_ref_cachep);
kmem_cache_destroy(btrfs_delayed_extent_op_cachep);
}
int btrfs_delayed_ref_init(void)
+73 -63
View File
@@ -202,13 +202,13 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
struct btrfs_dev_replace_item *ptr;
struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 0);
if (!dev_replace->is_valid ||
!dev_replace->item_needs_writeback) {
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 0);
return 0;
}
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 0);
key.objectid = 0;
key.type = BTRFS_DEV_REPLACE_KEY;
@@ -264,7 +264,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
ptr = btrfs_item_ptr(eb, path->slots[0],
struct btrfs_dev_replace_item);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
if (dev_replace->srcdev)
btrfs_set_dev_replace_src_devid(eb, ptr,
dev_replace->srcdev->devid);
@@ -287,7 +287,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
btrfs_set_dev_replace_cursor_right(eb, ptr,
dev_replace->cursor_right);
dev_replace->item_needs_writeback = 0;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
btrfs_mark_buffer_dirty(eb);
@@ -356,7 +356,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
return PTR_ERR(trans);
}
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -395,7 +395,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
dev_replace->is_valid = 1;
dev_replace->item_needs_writeback = 1;
args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
if (ret)
@@ -407,7 +407,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
goto leave;
}
@@ -433,7 +433,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
leave:
dev_replace->srcdev = NULL;
dev_replace->tgtdev = NULL;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
btrfs_destroy_dev_replace_tgtdev(fs_info, tgt_device);
return ret;
}
@@ -471,18 +471,18 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
/* don't allow cancel or unmount to disturb the finishing procedure */
mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 0);
/* was the operation canceled, or is it finished? */
if (dev_replace->replace_state !=
BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED) {
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 0);
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
return 0;
}
tgt_device = dev_replace->tgtdev;
src_device = dev_replace->srcdev;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 0);
/*
* flush all outstanding I/O and inode extent mappings before the
@@ -507,7 +507,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
/* keep away write_all_supers() during the finishing procedure */
mutex_lock(&root->fs_info->fs_devices->device_list_mutex);
mutex_lock(&root->fs_info->chunk_mutex);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
dev_replace->replace_state =
scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED
: BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED;
@@ -528,7 +528,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
rcu_str_deref(src_device->name),
src_device->devid,
rcu_str_deref(tgt_device->name), scrub_ret);
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
mutex_unlock(&root->fs_info->chunk_mutex);
mutex_unlock(&root->fs_info->fs_devices->device_list_mutex);
mutex_unlock(&uuid_mutex);
@@ -565,7 +565,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list);
fs_info->fs_devices->rw_devices++;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
btrfs_rm_dev_replace_blocked(fs_info);
@@ -649,7 +649,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
struct btrfs_device *srcdev;
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 0);
/* even if !dev_replace_is_valid, the values are good enough for
* the replace_status ioctl */
args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
@@ -675,7 +675,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
div_u64(btrfs_device_get_total_bytes(srcdev), 1000));
break;
}
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 0);
}
int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info,
@@ -698,13 +698,13 @@ static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
return -EROFS;
mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NOT_STARTED;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
goto leave;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
@@ -717,7 +717,7 @@ static u64 __btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED;
dev_replace->time_stopped = get_seconds();
dev_replace->item_needs_writeback = 1;
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
btrfs_scrub_cancel(fs_info);
trans = btrfs_start_transaction(root, 0);
@@ -740,7 +740,7 @@ void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info)
struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -756,7 +756,7 @@ void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info)
break;
}
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
}
@@ -766,12 +766,12 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
struct task_struct *task;
struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
btrfs_dev_replace_lock(dev_replace);
btrfs_dev_replace_lock(dev_replace, 1);
switch (dev_replace->replace_state) {
case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
return 0;
case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
break;
@@ -784,10 +784,10 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
btrfs_info(fs_info, "cannot continue dev_replace, tgtdev is missing");
btrfs_info(fs_info,
"you may cancel the operation after 'mount -o degraded'");
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
return 0;
}
btrfs_dev_replace_unlock(dev_replace);
btrfs_dev_replace_unlock(dev_replace, 1);
WARN_ON(atomic_xchg(
&fs_info->mutually_exclusive_operation_running, 1));
@@ -802,7 +802,7 @@ static int btrfs_dev_replace_kthread(void *data)
struct btrfs_ioctl_dev_replace_args *status_args;
u64 progress;
status_args = kzalloc(sizeof(*status_args), GFP_NOFS);
status_args = kzalloc(sizeof(*status_args), GFP_KERNEL);
if (status_args) {
btrfs_dev_replace_status(fs_info, status_args);
progress = status_args->status.progress_1000;
@@ -858,57 +858,67 @@ int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace)
* not called and the the filesystem is remounted
* in degraded state. This does not stop the
* dev_replace procedure. It needs to be canceled
* manually if the cancelation is wanted.
* manually if the cancellation is wanted.
*/
break;
}
return 1;
}
void btrfs_dev_replace_lock(struct btrfs_dev_replace *dev_replace)
void btrfs_dev_replace_lock(struct btrfs_dev_replace *dev_replace, int rw)
{
/* the beginning is just an optimization for the typical case */
if (atomic_read(&dev_replace->nesting_level) == 0) {
acquire_lock:
/* this is not a nested case where the same thread
* is trying to acqurire the same lock twice */
mutex_lock(&dev_replace->lock);
mutex_lock(&dev_replace->lock_management_lock);
dev_replace->lock_owner = current->pid;
atomic_inc(&dev_replace->nesting_level);
mutex_unlock(&dev_replace->lock_management_lock);
return;
if (rw == 1) {
/* write */
again:
wait_event(dev_replace->read_lock_wq,
atomic_read(&dev_replace->blocking_readers) == 0);
write_lock(&dev_replace->lock);
if (atomic_read(&dev_replace->blocking_readers)) {
write_unlock(&dev_replace->lock);
goto again;
}
} else {
read_lock(&dev_replace->lock);
atomic_inc(&dev_replace->read_locks);
}
mutex_lock(&dev_replace->lock_management_lock);
if (atomic_read(&dev_replace->nesting_level) > 0 &&
dev_replace->lock_owner == current->pid) {
WARN_ON(!mutex_is_locked(&dev_replace->lock));
atomic_inc(&dev_replace->nesting_level);
mutex_unlock(&dev_replace->lock_management_lock);
return;
}
mutex_unlock(&dev_replace->lock_management_lock);
goto acquire_lock;
}
void btrfs_dev_replace_unlock(struct btrfs_dev_replace *dev_replace)
void btrfs_dev_replace_unlock(struct btrfs_dev_replace *dev_replace, int rw)
{
WARN_ON(!mutex_is_locked(&dev_replace->lock));
mutex_lock(&dev_replace->lock_management_lock);
WARN_ON(atomic_read(&dev_replace->nesting_level) < 1);
WARN_ON(dev_replace->lock_owner != current->pid);
atomic_dec(&dev_replace->nesting_level);
if (atomic_read(&dev_replace->nesting_level) == 0) {
dev_replace->lock_owner = 0;
mutex_unlock(&dev_replace->lock_management_lock);
mutex_unlock(&dev_replace->lock);
if (rw == 1) {
/* write */
ASSERT(atomic_read(&dev_replace->blocking_readers) == 0);
write_unlock(&dev_replace->lock);
} else {
mutex_unlock(&dev_replace->lock_management_lock);
ASSERT(atomic_read(&dev_replace->read_locks) > 0);
atomic_dec(&dev_replace->read_locks);
read_unlock(&dev_replace->lock);
}
}
/* inc blocking cnt and release read lock */
void btrfs_dev_replace_set_lock_blocking(
struct btrfs_dev_replace *dev_replace)
{
/* only set blocking for read lock */
ASSERT(atomic_read(&dev_replace->read_locks) > 0);
atomic_inc(&dev_replace->blocking_readers);
read_unlock(&dev_replace->lock);
}
/* acquire read lock and dec blocking cnt */
void btrfs_dev_replace_clear_lock_blocking(
struct btrfs_dev_replace *dev_replace)
{
/* only set blocking for read lock */
ASSERT(atomic_read(&dev_replace->read_locks) > 0);
ASSERT(atomic_read(&dev_replace->blocking_readers) > 0);
read_lock(&dev_replace->lock);
if (atomic_dec_and_test(&dev_replace->blocking_readers) &&
waitqueue_active(&dev_replace->read_lock_wq))
wake_up(&dev_replace->read_lock_wq);
}
void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
{
percpu_counter_inc(&fs_info->bio_counter);
+5 -2
View File
@@ -34,8 +34,11 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info,
void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info);
int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info);
int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace);
void btrfs_dev_replace_lock(struct btrfs_dev_replace *dev_replace);
void btrfs_dev_replace_unlock(struct btrfs_dev_replace *dev_replace);
void btrfs_dev_replace_lock(struct btrfs_dev_replace *dev_replace, int rw);
void btrfs_dev_replace_unlock(struct btrfs_dev_replace *dev_replace, int rw);
void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace *dev_replace);
void btrfs_dev_replace_clear_lock_blocking(
struct btrfs_dev_replace *dev_replace);
static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
{
+41 -30
View File
@@ -50,6 +50,7 @@
#include "raid56.h"
#include "sysfs.h"
#include "qgroup.h"
#include "compression.h"
#ifdef CONFIG_X86
#include <asm/cpufeature.h>
@@ -110,8 +111,7 @@ int __init btrfs_end_io_wq_init(void)
void btrfs_end_io_wq_exit(void)
{
if (btrfs_end_io_wq_cache)
kmem_cache_destroy(btrfs_end_io_wq_cache);
kmem_cache_destroy(btrfs_end_io_wq_cache);
}
/*
@@ -612,6 +612,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
int found_level;
struct extent_buffer *eb;
struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
struct btrfs_fs_info *fs_info = root->fs_info;
int ret = 0;
int reads_done;
@@ -637,21 +638,21 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
found_start = btrfs_header_bytenr(eb);
if (found_start != eb->start) {
btrfs_err_rl(eb->fs_info, "bad tree block start %llu %llu",
found_start, eb->start);
btrfs_err_rl(fs_info, "bad tree block start %llu %llu",
found_start, eb->start);
ret = -EIO;
goto err;
}
if (check_tree_block_fsid(root->fs_info, eb)) {
btrfs_err_rl(eb->fs_info, "bad fsid on block %llu",
eb->start);
if (check_tree_block_fsid(fs_info, eb)) {
btrfs_err_rl(fs_info, "bad fsid on block %llu",
eb->start);
ret = -EIO;
goto err;
}
found_level = btrfs_header_level(eb);
if (found_level >= BTRFS_MAX_LEVEL) {
btrfs_err(root->fs_info, "bad tree block level %d",
(int)btrfs_header_level(eb));
btrfs_err(fs_info, "bad tree block level %d",
(int)btrfs_header_level(eb));
ret = -EIO;
goto err;
}
@@ -659,7 +660,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
eb, found_level);
ret = csum_tree_block(root->fs_info, eb, 1);
ret = csum_tree_block(fs_info, eb, 1);
if (ret) {
ret = -EIO;
goto err;
@@ -680,7 +681,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
err:
if (reads_done &&
test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
btree_readahead_hook(root, eb, eb->start, ret);
btree_readahead_hook(fs_info, eb, eb->start, ret);
if (ret) {
/*
@@ -699,14 +700,13 @@ out:
static int btree_io_failed_hook(struct page *page, int failed_mirror)
{
struct extent_buffer *eb;
struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
eb = (struct extent_buffer *)page->private;
set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
eb->read_mirror = failed_mirror;
atomic_dec(&eb->io_pages);
if (test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
btree_readahead_hook(root, eb, eb->start, -EIO);
btree_readahead_hook(eb->fs_info, eb, eb->start, -EIO);
return -EIO; /* we fixed nothing */
}
@@ -816,7 +816,7 @@ static void run_one_async_done(struct btrfs_work *work)
waitqueue_active(&fs_info->async_submit_wait))
wake_up(&fs_info->async_submit_wait);
/* If an error occured we just want to clean up the bio and move on */
/* If an error occurred we just want to clean up the bio and move on */
if (async->error) {
async->bio->bi_error = async->error;
bio_endio(async->bio);
@@ -1296,9 +1296,10 @@ static void __setup_root(u32 nodesize, u32 sectorsize, u32 stripesize,
spin_lock_init(&root->root_item_lock);
}
static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info)
static struct btrfs_root *btrfs_alloc_root(struct btrfs_fs_info *fs_info,
gfp_t flags)
{
struct btrfs_root *root = kzalloc(sizeof(*root), GFP_NOFS);
struct btrfs_root *root = kzalloc(sizeof(*root), flags);
if (root)
root->fs_info = fs_info;
return root;
@@ -1310,7 +1311,7 @@ struct btrfs_root *btrfs_alloc_dummy_root(void)
{
struct btrfs_root *root;
root = btrfs_alloc_root(NULL);
root = btrfs_alloc_root(NULL, GFP_KERNEL);
if (!root)
return ERR_PTR(-ENOMEM);
__setup_root(4096, 4096, 4096, root, NULL, 1);
@@ -1332,7 +1333,7 @@ struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
int ret = 0;
uuid_le uuid;
root = btrfs_alloc_root(fs_info);
root = btrfs_alloc_root(fs_info, GFP_KERNEL);
if (!root)
return ERR_PTR(-ENOMEM);
@@ -1408,7 +1409,7 @@ static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans,
struct btrfs_root *tree_root = fs_info->tree_root;
struct extent_buffer *leaf;
root = btrfs_alloc_root(fs_info);
root = btrfs_alloc_root(fs_info, GFP_NOFS);
if (!root)
return ERR_PTR(-ENOMEM);
@@ -1506,7 +1507,7 @@ static struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
if (!path)
return ERR_PTR(-ENOMEM);
root = btrfs_alloc_root(fs_info);
root = btrfs_alloc_root(fs_info, GFP_NOFS);
if (!root) {
ret = -ENOMEM;
goto alloc_fail;
@@ -2272,9 +2273,11 @@ static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
fs_info->dev_replace.lock_owner = 0;
atomic_set(&fs_info->dev_replace.nesting_level, 0);
mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
mutex_init(&fs_info->dev_replace.lock_management_lock);
mutex_init(&fs_info->dev_replace.lock);
rwlock_init(&fs_info->dev_replace.lock);
atomic_set(&fs_info->dev_replace.read_locks, 0);
atomic_set(&fs_info->dev_replace.blocking_readers, 0);
init_waitqueue_head(&fs_info->replace_wait);
init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
}
static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info)
@@ -2385,7 +2388,7 @@ static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
return -EIO;
}
log_tree_root = btrfs_alloc_root(fs_info);
log_tree_root = btrfs_alloc_root(fs_info, GFP_KERNEL);
if (!log_tree_root)
return -ENOMEM;
@@ -2510,8 +2513,8 @@ int open_ctree(struct super_block *sb,
int backup_index = 0;
int max_active;
tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info);
tree_root = fs_info->tree_root = btrfs_alloc_root(fs_info, GFP_KERNEL);
chunk_root = fs_info->chunk_root = btrfs_alloc_root(fs_info, GFP_KERNEL);
if (!tree_root || !chunk_root) {
err = -ENOMEM;
goto fail;
@@ -2603,6 +2606,7 @@ int open_ctree(struct super_block *sb,
atomic_set(&fs_info->nr_async_bios, 0);
atomic_set(&fs_info->defrag_running, 0);
atomic_set(&fs_info->qgroup_op_seq, 0);
atomic_set(&fs_info->reada_works_cnt, 0);
atomic64_set(&fs_info->tree_mod_seq, 0);
fs_info->sb = sb;
fs_info->max_inline = BTRFS_DEFAULT_MAX_INLINE;
@@ -2622,7 +2626,7 @@ int open_ctree(struct super_block *sb,
INIT_LIST_HEAD(&fs_info->ordered_roots);
spin_lock_init(&fs_info->ordered_root_lock);
fs_info->delayed_root = kmalloc(sizeof(struct btrfs_delayed_root),
GFP_NOFS);
GFP_KERNEL);
if (!fs_info->delayed_root) {
err = -ENOMEM;
goto fail_iput;
@@ -2750,7 +2754,7 @@ int open_ctree(struct super_block *sb,
*/
fs_info->compress_type = BTRFS_COMPRESS_ZLIB;
ret = btrfs_parse_options(tree_root, options);
ret = btrfs_parse_options(tree_root, options, sb->s_flags);
if (ret) {
err = ret;
goto fail_alloc;
@@ -3029,8 +3033,9 @@ retry_root_backup:
if (ret)
goto fail_trans_kthread;
/* do not make disk changes in broken FS */
if (btrfs_super_log_root(disk_super) != 0) {
/* do not make disk changes in broken FS or nologreplay is given */
if (btrfs_super_log_root(disk_super) != 0 &&
!btrfs_test_opt(tree_root, NOLOGREPLAY)) {
ret = btrfs_replay_log(fs_info, fs_devices);
if (ret) {
err = ret;
@@ -3146,6 +3151,12 @@ retry_root_backup:
fs_info->open = 1;
/*
* backuproot only affect mount behavior, and if open_ctree succeeded,
* no need to keep the flag
*/
btrfs_clear_opt(fs_info->mount_opt, USEBACKUPROOT);
return 0;
fail_qgroup:
@@ -3200,7 +3211,7 @@ fail:
return err;
recovery_tree_root:
if (!btrfs_test_opt(tree_root, RECOVERY))
if (!btrfs_test_opt(tree_root, USEBACKUPROOT))
goto fail_tree_roots;
free_root_pointers(fs_info, 0);
+23 -17
View File
@@ -4838,7 +4838,7 @@ static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
u64 thresh = div_factor_fine(space_info->total_bytes, 98);
/* If we're just plain full then async reclaim just slows us down. */
if (space_info->bytes_used >= thresh)
if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh)
return 0;
return (used >= thresh && !btrfs_fs_closing(fs_info) &&
@@ -5373,27 +5373,33 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
block_rsv->size = min_t(u64, num_bytes, SZ_512M);
num_bytes = sinfo->bytes_used + sinfo->bytes_pinned +
sinfo->bytes_reserved + sinfo->bytes_readonly +
sinfo->bytes_may_use;
if (sinfo->total_bytes > num_bytes) {
num_bytes = sinfo->total_bytes - num_bytes;
block_rsv->reserved += num_bytes;
sinfo->bytes_may_use += num_bytes;
trace_btrfs_space_reservation(fs_info, "space_info",
sinfo->flags, num_bytes, 1);
}
if (block_rsv->reserved >= block_rsv->size) {
if (block_rsv->reserved < block_rsv->size) {
num_bytes = sinfo->bytes_used + sinfo->bytes_pinned +
sinfo->bytes_reserved + sinfo->bytes_readonly +
sinfo->bytes_may_use;
if (sinfo->total_bytes > num_bytes) {
num_bytes = sinfo->total_bytes - num_bytes;
num_bytes = min(num_bytes,
block_rsv->size - block_rsv->reserved);
block_rsv->reserved += num_bytes;
sinfo->bytes_may_use += num_bytes;
trace_btrfs_space_reservation(fs_info, "space_info",
sinfo->flags, num_bytes,
1);
}
} else if (block_rsv->reserved > block_rsv->size) {
num_bytes = block_rsv->reserved - block_rsv->size;
sinfo->bytes_may_use -= num_bytes;
trace_btrfs_space_reservation(fs_info, "space_info",
sinfo->flags, num_bytes, 0);
block_rsv->reserved = block_rsv->size;
block_rsv->full = 1;
}
if (block_rsv->reserved == block_rsv->size)
block_rsv->full = 1;
else
block_rsv->full = 0;
spin_unlock(&block_rsv->lock);
spin_unlock(&sinfo->lock);
}
@@ -5752,7 +5758,7 @@ out_fail:
/*
* This is tricky, but first we need to figure out how much we
* free'd from any free-ers that occured during this
* free'd from any free-ers that occurred during this
* reservation, so we reset ->csum_bytes to the csum_bytes
* before we dropped our lock, and then call the free for the
* number of bytes that were freed while we were trying our
@@ -7018,7 +7024,7 @@ btrfs_lock_cluster(struct btrfs_block_group_cache *block_group,
struct btrfs_free_cluster *cluster,
int delalloc)
{
struct btrfs_block_group_cache *used_bg;
struct btrfs_block_group_cache *used_bg = NULL;
bool locked = false;
again:
spin_lock(&cluster->refill_lock);
+18 -22
View File
@@ -206,10 +206,8 @@ void extent_io_exit(void)
* destroy caches.
*/
rcu_barrier();
if (extent_state_cache)
kmem_cache_destroy(extent_state_cache);
if (extent_buffer_cache)
kmem_cache_destroy(extent_buffer_cache);
kmem_cache_destroy(extent_state_cache);
kmem_cache_destroy(extent_buffer_cache);
if (btrfs_bioset)
bioset_free(btrfs_bioset);
}
@@ -232,7 +230,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask)
if (!state)
return state;
state->state = 0;
state->private = 0;
state->failrec = NULL;
RB_CLEAR_NODE(&state->rb_node);
btrfs_leak_debug_add(&state->leak_list, &states);
atomic_set(&state->refs, 1);
@@ -1844,7 +1842,8 @@ out:
* set the private field for a given byte offset in the tree. If there isn't
* an extent_state there already, this does nothing.
*/
static int set_state_private(struct extent_io_tree *tree, u64 start, u64 private)
static noinline int set_state_failrec(struct extent_io_tree *tree, u64 start,
struct io_failure_record *failrec)
{
struct rb_node *node;
struct extent_state *state;
@@ -1865,13 +1864,14 @@ static int set_state_private(struct extent_io_tree *tree, u64 start, u64 private
ret = -ENOENT;
goto out;
}
state->private = private;
state->failrec = failrec;
out:
spin_unlock(&tree->lock);
return ret;
}
int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private)
static noinline int get_state_failrec(struct extent_io_tree *tree, u64 start,
struct io_failure_record **failrec)
{
struct rb_node *node;
struct extent_state *state;
@@ -1892,7 +1892,7 @@ int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private)
ret = -ENOENT;
goto out;
}
*private = state->private;
*failrec = state->failrec;
out:
spin_unlock(&tree->lock);
return ret;
@@ -1972,7 +1972,7 @@ int free_io_failure(struct inode *inode, struct io_failure_record *rec)
int err = 0;
struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
set_state_private(failure_tree, rec->start, 0);
set_state_failrec(failure_tree, rec->start, NULL);
ret = clear_extent_bits(failure_tree, rec->start,
rec->start + rec->len - 1,
EXTENT_LOCKED | EXTENT_DIRTY, GFP_NOFS);
@@ -2089,7 +2089,6 @@ int clean_io_failure(struct inode *inode, u64 start, struct page *page,
unsigned int pg_offset)
{
u64 private;
u64 private_failure;
struct io_failure_record *failrec;
struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
struct extent_state *state;
@@ -2102,12 +2101,11 @@ int clean_io_failure(struct inode *inode, u64 start, struct page *page,
if (!ret)
return 0;
ret = get_state_private(&BTRFS_I(inode)->io_failure_tree, start,
&private_failure);
ret = get_state_failrec(&BTRFS_I(inode)->io_failure_tree, start,
&failrec);
if (ret)
return 0;
failrec = (struct io_failure_record *)(unsigned long) private_failure;
BUG_ON(!failrec->this_mirror);
if (failrec->in_validation) {
@@ -2167,7 +2165,7 @@ void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end)
next = next_state(state);
failrec = (struct io_failure_record *)(unsigned long)state->private;
failrec = state->failrec;
free_extent_state(state);
kfree(failrec);
@@ -2177,10 +2175,9 @@ void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end)
}
int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret)
struct io_failure_record **failrec_ret)
{
struct io_failure_record *failrec;
u64 private;
struct extent_map *em;
struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree;
@@ -2188,7 +2185,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
int ret;
u64 logical;
ret = get_state_private(failure_tree, start, &private);
ret = get_state_failrec(failure_tree, start, &failrec);
if (ret) {
failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
if (!failrec)
@@ -2237,8 +2234,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
ret = set_extent_bits(failure_tree, start, end,
EXTENT_LOCKED | EXTENT_DIRTY, GFP_NOFS);
if (ret >= 0)
ret = set_state_private(failure_tree, start,
(u64)(unsigned long)failrec);
ret = set_state_failrec(failure_tree, start, failrec);
/* set the bits in the inode's tree */
if (ret >= 0)
ret = set_extent_bits(tree, start, end, EXTENT_DAMAGED,
@@ -2248,7 +2244,6 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
return ret;
}
} else {
failrec = (struct io_failure_record *)(unsigned long)private;
pr_debug("Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu, validation=%d\n",
failrec->logical, failrec->start, failrec->len,
failrec->in_validation);
@@ -3177,7 +3172,8 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
while (1) {
lock_extent(tree, start, end);
ordered = btrfs_lookup_ordered_extent(inode, start);
ordered = btrfs_lookup_ordered_range(inode, start,
PAGE_CACHE_SIZE);
if (!ordered)
break;
unlock_extent(tree, start, end);
+2 -3
View File
@@ -61,6 +61,7 @@
struct extent_state;
struct btrfs_root;
struct btrfs_io_bio;
struct io_failure_record;
typedef int (extent_submit_bio_hook_t)(struct inode *inode, int rw,
struct bio *bio, int mirror_num,
@@ -111,8 +112,7 @@ struct extent_state {
atomic_t refs;
unsigned state;
/* for use by the FS */
u64 private;
struct io_failure_record *failrec;
#ifdef CONFIG_BTRFS_DEBUG
struct list_head leak_list;
@@ -342,7 +342,6 @@ int extent_readpages(struct extent_io_tree *tree,
get_extent_t get_extent);
int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len, get_extent_t *get_extent);
int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private);
void set_page_extent_mapped(struct page *page);
struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
+4 -4
View File
@@ -4,6 +4,7 @@
#include <linux/hardirq.h>
#include "ctree.h"
#include "extent_map.h"
#include "compression.h"
static struct kmem_cache *extent_map_cache;
@@ -20,8 +21,7 @@ int __init extent_map_init(void)
void extent_map_exit(void)
{
if (extent_map_cache)
kmem_cache_destroy(extent_map_cache);
kmem_cache_destroy(extent_map_cache);
}
/**
@@ -62,7 +62,7 @@ struct extent_map *alloc_extent_map(void)
/**
* free_extent_map - drop reference count of an extent_map
* @em: extent map beeing releasead
* @em: extent map being releasead
*
* Drops the reference out on @em by one and free the structure
* if the reference count hits zero.
@@ -422,7 +422,7 @@ struct extent_map *search_extent_mapping(struct extent_map_tree *tree,
/**
* remove_extent_mapping - removes an extent_map from the extent tree
* @tree: extent tree to remove from
* @em: extent map beeing removed
* @em: extent map being removed
*
* Removes @em from @tree. No reference counts are dropped, and no checks
* are done to see if the range is in use
+70 -33
View File
@@ -25,6 +25,7 @@
#include "transaction.h"
#include "volumes.h"
#include "print-tree.h"
#include "compression.h"
#define __MAX_CSUM_ITEMS(r, size) ((unsigned long)(((BTRFS_LEAF_DATA_SIZE(r) - \
sizeof(struct btrfs_item) * 2) / \
@@ -172,6 +173,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +222,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
if (!dio)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
@@ -243,7 +247,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)->root->root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
offset + bvec->bv_len - 1,
offset + root->sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {
btrfs_info(BTRFS_I(inode)->root->fs_info,
@@ -281,13 +285,29 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
found:
csum += count * csum_size;
nblocks -= count;
bio_index += count;
while (count--) {
disk_bytenr += bvec->bv_len;
offset += bvec->bv_len;
bvec++;
disk_bytenr += root->sectorsize;
offset += root->sectorsize;
page_bytes_left -= root->sectorsize;
if (!page_bytes_left) {
bio_index++;
/*
* make sure we're still inside the
* bio before we update page_bytes_left
*/
if (bio_index >= bio->bi_vcnt) {
WARN_ON_ONCE(count);
goto done;
}
bvec++;
page_bytes_left = bvec->bv_len;
}
}
}
done:
btrfs_free_path(path);
return 0;
}
@@ -432,6 +452,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
struct bio_vec *bvec = bio->bi_io_vec;
int bio_index = 0;
int index;
int nr_sectors;
int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -459,41 +481,56 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
if (!contig)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
if (offset >= ordered->file_offset + ordered->len ||
offset < ordered->file_offset) {
unsigned long bytes_left;
sums->len = this_sum_bytes;
this_sum_bytes = 0;
btrfs_add_ordered_sum(inode, ordered, sums);
btrfs_put_ordered_extent(ordered);
data = kmap_atomic(bvec->bv_page);
bytes_left = bio->bi_iter.bi_size - total_bytes;
nr_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info,
bvec->bv_len + root->sectorsize
- 1);
sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left),
GFP_NOFS);
BUG_ON(!sums); /* -ENOMEM */
sums->len = bytes_left;
ordered = btrfs_lookup_ordered_extent(inode, offset);
BUG_ON(!ordered); /* Logic error */
sums->bytenr = ((u64)bio->bi_iter.bi_sector << 9) +
total_bytes;
index = 0;
for (i = 0; i < nr_sectors; i++) {
if (offset >= ordered->file_offset + ordered->len ||
offset < ordered->file_offset) {
unsigned long bytes_left;
kunmap_atomic(data);
sums->len = this_sum_bytes;
this_sum_bytes = 0;
btrfs_add_ordered_sum(inode, ordered, sums);
btrfs_put_ordered_extent(ordered);
bytes_left = bio->bi_iter.bi_size - total_bytes;
sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left),
GFP_NOFS);
BUG_ON(!sums); /* -ENOMEM */
sums->len = bytes_left;
ordered = btrfs_lookup_ordered_extent(inode,
offset);
ASSERT(ordered); /* Logic error */
sums->bytenr = ((u64)bio->bi_iter.bi_sector << 9)
+ total_bytes;
index = 0;
data = kmap_atomic(bvec->bv_page);
}
sums->sums[index] = ~(u32)0;
sums->sums[index]
= btrfs_csum_data(data + bvec->bv_offset
+ (i * root->sectorsize),
sums->sums[index],
root->sectorsize);
btrfs_csum_final(sums->sums[index],
(char *)(sums->sums + index));
index++;
offset += root->sectorsize;
this_sum_bytes += root->sectorsize;
total_bytes += root->sectorsize;
}
data = kmap_atomic(bvec->bv_page);
sums->sums[index] = ~(u32)0;
sums->sums[index] = btrfs_csum_data(data + bvec->bv_offset,
sums->sums[index],
bvec->bv_len);
kunmap_atomic(data);
btrfs_csum_final(sums->sums[index],
(char *)(sums->sums + index));
bio_index++;
index++;
total_bytes += bvec->bv_len;
this_sum_bytes += bvec->bv_len;
offset += bvec->bv_len;
bvec++;
}
this_sum_bytes = 0;
+90 -68
View File
@@ -41,6 +41,7 @@
#include "locking.h"
#include "volumes.h"
#include "qgroup.h"
#include "compression.h"
static struct kmem_cache *btrfs_inode_defrag_cachep;
/*
@@ -498,7 +499,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode *inode,
loff_t isize = i_size_read(inode);
start_pos = pos & ~((u64)root->sectorsize - 1);
num_bytes = ALIGN(write_bytes + pos - start_pos, root->sectorsize);
num_bytes = round_up(write_bytes + pos - start_pos, root->sectorsize);
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
@@ -1379,16 +1380,19 @@ fail:
static noinline int
lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
{
struct btrfs_root *root = BTRFS_I(inode)->root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;
start_pos = pos & ~((u64)PAGE_CACHE_SIZE - 1);
last_pos = start_pos + ((u64)num_pages << PAGE_CACHE_SHIFT) - 1;
start_pos = round_down(pos, root->sectorsize);
last_pos = start_pos
+ round_up(pos + write_bytes - start_pos, root->sectorsize) - 1;
if (start_pos < inode->i_size) {
struct btrfs_ordered_extent *ordered;
@@ -1503,6 +1507,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
while (iov_iter_count(i) > 0) {
size_t offset = pos & (PAGE_CACHE_SIZE - 1);
size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
nrptrs * (size_t)PAGE_CACHE_SIZE -
offset);
@@ -1511,6 +1516,8 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
size_t reserve_bytes;
size_t dirty_pages;
size_t copied;
size_t dirty_sectors;
size_t num_sectors;
WARN_ON(num_pages > nrptrs);
@@ -1523,29 +1530,29 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file,
break;
}
reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
sector_offset = pos & (root->sectorsize - 1);
reserve_bytes = round_up(write_bytes + sector_offset,
root->sectorsize);
if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) {
ret = check_can_nocow(inode, pos, &write_bytes);
if (ret < 0)
break;
if (ret > 0) {
/*
* For nodata cow case, no need to reserve
* data space.
*/
only_release_metadata = true;
/*
* our prealloc extent may be smaller than
* write_bytes, so scale down.
*/
num_pages = DIV_ROUND_UP(write_bytes + offset,
PAGE_CACHE_SIZE);
reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
goto reserve_metadata;
}
if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
BTRFS_INODE_PREALLOC)) &&
check_can_nocow(inode, pos, &write_bytes) > 0) {
/*
* For nodata cow case, no need to reserve
* data space.
*/
only_release_metadata = true;
/*
* our prealloc extent may be smaller than
* write_bytes, so scale down.
*/
num_pages = DIV_ROUND_UP(write_bytes + offset,
PAGE_CACHE_SIZE);
reserve_bytes = round_up(write_bytes + sector_offset,
root->sectorsize);
goto reserve_metadata;
}
ret = btrfs_check_data_free_space(inode, pos, write_bytes);
if (ret < 0)
break;
@@ -1576,8 +1583,8 @@ again:
break;
ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
pos, &lockstart, &lockend,
&cached_state);
pos, write_bytes, &lockstart,
&lockend, &cached_state);
if (ret < 0) {
if (ret == -EAGAIN)
goto again;
@@ -1612,9 +1619,16 @@ again:
* we still have an outstanding extent for the chunk we actually
* managed to copy.
*/
if (num_pages > dirty_pages) {
release_bytes = (num_pages - dirty_pages) <<
PAGE_CACHE_SHIFT;
num_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info,
reserve_bytes);
dirty_sectors = round_up(copied + sector_offset,
root->sectorsize);
dirty_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info,
dirty_sectors);
if (num_sectors > dirty_sectors) {
release_bytes = (write_bytes - copied)
& ~((u64)root->sectorsize - 1);
if (copied > 0) {
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->outstanding_extents++;
@@ -1633,7 +1647,8 @@ again:
}
}
release_bytes = dirty_pages << PAGE_CACHE_SHIFT;
release_bytes = round_up(copied + sector_offset,
root->sectorsize);
if (copied > 0)
ret = btrfs_dirty_pages(root, inode, pages,
@@ -1654,8 +1669,7 @@ again:
if (only_release_metadata && copied > 0) {
lockstart = round_down(pos, root->sectorsize);
lockend = lockstart +
(dirty_pages << PAGE_CACHE_SHIFT) - 1;
lockend = round_up(pos + copied, root->sectorsize) - 1;
set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
lockend, EXTENT_NORESERVE, NULL,
@@ -1761,6 +1775,8 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
ssize_t err;
loff_t pos;
size_t count;
loff_t oldsize;
int clean_page = 0;
inode_lock(inode);
err = generic_write_checks(iocb, from);
@@ -1799,14 +1815,17 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
pos = iocb->ki_pos;
count = iov_iter_count(from);
start_pos = round_down(pos, root->sectorsize);
if (start_pos > i_size_read(inode)) {
oldsize = i_size_read(inode);
if (start_pos > oldsize) {
/* Expand hole size to cover write data, preventing empty gap */
end_pos = round_up(pos + count, root->sectorsize);
err = btrfs_cont_expand(inode, i_size_read(inode), end_pos);
err = btrfs_cont_expand(inode, oldsize, end_pos);
if (err) {
inode_unlock(inode);
goto out;
}
if (start_pos > round_up(oldsize, root->sectorsize))
clean_page = 1;
}
if (sync)
@@ -1818,6 +1837,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
num_written = __btrfs_buffered_write(file, from, pos);
if (num_written > 0)
iocb->ki_pos = pos + num_written;
if (clean_page)
pagecache_isize_extended(inode, oldsize,
i_size_read(inode));
}
inode_unlock(inode);
@@ -1825,7 +1847,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
/*
* We also have to set last_sub_trans to the current log transid,
* otherwise subsequent syncs to a file that's been synced in this
* transaction will appear to have already occured.
* transaction will appear to have already occurred.
*/
spin_lock(&BTRFS_I(inode)->lock);
BTRFS_I(inode)->last_sub_trans = root->log_transid;
@@ -1996,10 +2018,11 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
*/
smp_mb();
if (btrfs_inode_in_log(inode, root->fs_info->generation) ||
(BTRFS_I(inode)->last_trans <=
root->fs_info->last_trans_committed &&
(full_sync ||
!btrfs_have_ordered_extents_in_range(inode, start, len)))) {
(full_sync && BTRFS_I(inode)->last_trans <=
root->fs_info->last_trans_committed) ||
(!btrfs_have_ordered_extents_in_range(inode, start, len) &&
BTRFS_I(inode)->last_trans
<= root->fs_info->last_trans_committed)) {
/*
* We'v had everything committed since the last time we were
* modified so clear this flag in case it was set for whatever
@@ -2293,10 +2316,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
int ret = 0;
int err = 0;
unsigned int rsv_count;
bool same_page;
bool same_block;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
bool truncated_page = false;
bool truncated_block = false;
bool updated_inode = false;
ret = btrfs_wait_ordered_range(inode, offset, len);
@@ -2304,7 +2327,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
return ret;
inode_lock(inode);
ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE);
ino_size = round_up(inode->i_size, root->sectorsize);
ret = find_first_non_hole(inode, &offset, &len);
if (ret < 0)
goto out_only_mutex;
@@ -2317,31 +2340,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
lockstart = round_up(offset, BTRFS_I(inode)->root->sectorsize);
lockend = round_down(offset + len,
BTRFS_I(inode)->root->sectorsize) - 1;
same_page = ((offset >> PAGE_CACHE_SHIFT) ==
((offset + len - 1) >> PAGE_CACHE_SHIFT));
same_block = (BTRFS_BYTES_TO_BLKS(root->fs_info, offset))
== (BTRFS_BYTES_TO_BLKS(root->fs_info, offset + len - 1));
/*
* We needn't truncate any page which is beyond the end of the file
* We needn't truncate any block which is beyond the end of the file
* because we are sure there is no data there.
*/
/*
* Only do this if we are in the same page and we aren't doing the
* entire page.
* Only do this if we are in the same block and we aren't doing the
* entire block.
*/
if (same_page && len < PAGE_CACHE_SIZE) {
if (same_block && len < root->sectorsize) {
if (offset < ino_size) {
truncated_page = true;
ret = btrfs_truncate_page(inode, offset, len, 0);
truncated_block = true;
ret = btrfs_truncate_block(inode, offset, len, 0);
} else {
ret = 0;
}
goto out_only_mutex;
}
/* zero back part of the first page */
/* zero back part of the first block */
if (offset < ino_size) {
truncated_page = true;
ret = btrfs_truncate_page(inode, offset, 0, 0);
truncated_block = true;
ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
inode_unlock(inode);
return ret;
@@ -2376,9 +2398,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
if (!ret) {
/* zero the front end of the last page */
if (tail_start + tail_len < ino_size) {
truncated_page = true;
ret = btrfs_truncate_page(inode,
tail_start + tail_len, 0, 1);
truncated_block = true;
ret = btrfs_truncate_block(inode,
tail_start + tail_len,
0, 1);
if (ret)
goto out_only_mutex;
}
@@ -2544,7 +2567,7 @@ out_trans:
goto out_free;
inode_inc_iversion(inode);
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb);
trans->block_rsv = &root->fs_info->trans_block_rsv;
ret = btrfs_update_inode(trans, root, inode);
@@ -2558,7 +2581,7 @@ out:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend,
&cached_state, GFP_NOFS);
out_only_mutex:
if (!updated_inode && truncated_page && !ret && !err) {
if (!updated_inode && truncated_block && !ret && !err) {
/*
* If we only end up zeroing part of a page, we still need to
* update the inode item, so that all the time fields are
@@ -2611,7 +2634,7 @@ static int add_falloc_range(struct list_head *head, u64 start, u64 len)
return 0;
}
insert:
range = kmalloc(sizeof(*range), GFP_NOFS);
range = kmalloc(sizeof(*range), GFP_KERNEL);
if (!range)
return -ENOMEM;
range->start = start;
@@ -2678,10 +2701,10 @@ static long btrfs_fallocate(struct file *file, int mode,
} else if (offset + len > inode->i_size) {
/*
* If we are fallocating from the end of the file onward we
* need to zero out the end of the page if i_size lands in the
* middle of a page.
* need to zero out the end of the block if i_size lands in the
* middle of a block.
*/
ret = btrfs_truncate_page(inode, inode->i_size, 0, 0);
ret = btrfs_truncate_block(inode, inode->i_size, 0, 0);
if (ret)
goto out;
}
@@ -2712,7 +2735,7 @@ static long btrfs_fallocate(struct file *file, int mode,
btrfs_put_ordered_extent(ordered);
unlock_extent_cached(&BTRFS_I(inode)->io_tree,
alloc_start, locked_end,
&cached_state, GFP_NOFS);
&cached_state, GFP_KERNEL);
/*
* we can't wait on the range with the transaction
* running or with the extent lock held
@@ -2794,7 +2817,7 @@ static long btrfs_fallocate(struct file *file, int mode,
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
} else {
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_fs_time(inode->i_sb);
i_size_write(inode, actual_end);
btrfs_ordered_update_i_size(inode, actual_end, NULL);
ret = btrfs_update_inode(trans, root, inode);
@@ -2806,7 +2829,7 @@ static long btrfs_fallocate(struct file *file, int mode,
}
out_unlock:
unlock_extent_cached(&BTRFS_I(inode)->io_tree, alloc_start, locked_end,
&cached_state, GFP_NOFS);
&cached_state, GFP_KERNEL);
out:
/*
* As we waited the extent range, the data_rsv_map must be empty
@@ -2939,8 +2962,7 @@ const struct file_operations btrfs_file_operations = {
void btrfs_auto_defrag_exit(void)
{
if (btrfs_inode_defrag_cachep)
kmem_cache_destroy(btrfs_inode_defrag_cachep);
kmem_cache_destroy(btrfs_inode_defrag_cachep);
}
int btrfs_auto_defrag_init(void)
+3
View File
@@ -556,6 +556,9 @@ int btrfs_find_free_objectid(struct btrfs_root *root, u64 *objectid)
mutex_lock(&root->objectid_mutex);
if (unlikely(root->highest_objectid >= BTRFS_LAST_FREE_OBJECTID)) {
btrfs_warn(root->fs_info,
"the objectid of root %llu reaches its highest value",
root->root_key.objectid);
ret = -ENOSPC;
goto out;
}
+226 -100
View File
File diff suppressed because it is too large Load Diff
+18 -17
View File
@@ -59,6 +59,8 @@
#include "props.h"
#include "sysfs.h"
#include "qgroup.h"
#include "tree-log.h"
#include "compression.h"
#ifdef CONFIG_64BIT
/* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
@@ -347,7 +349,7 @@ static int btrfs_ioctl_setflags(struct file *file, void __user *arg)
btrfs_update_iflags(inode);
inode_inc_iversion(inode);
inode->i_ctime = CURRENT_TIME;
inode->i_ctime = current_fs_time(inode->i_sb);
ret = btrfs_update_inode(trans, root, inode);
btrfs_end_transaction(trans, root);
@@ -443,7 +445,7 @@ static noinline int create_subvol(struct inode *dir,
struct btrfs_root *root = BTRFS_I(dir)->root;
struct btrfs_root *new_root;
struct btrfs_block_rsv block_rsv;
struct timespec cur_time = CURRENT_TIME;
struct timespec cur_time = current_fs_time(dir->i_sb);
struct inode *inode;
int ret;
int err;
@@ -844,10 +846,6 @@ static noinline int btrfs_mksubvol(struct path *parent,
if (IS_ERR(dentry))
goto out_unlock;
error = -EEXIST;
if (d_really_is_positive(dentry))
goto out_dput;
error = btrfs_may_create(dir, dentry);
if (error)
goto out_dput;
@@ -2097,8 +2095,6 @@ static noinline int search_ioctl(struct inode *inode,
key.offset = (u64)-1;
root = btrfs_read_fs_root_no_name(info, &key);
if (IS_ERR(root)) {
btrfs_err(info, "could not find root %llu",
sk->tree_id);
btrfs_free_path(path);
return -ENOENT;
}
@@ -2476,6 +2472,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
trans->block_rsv = &block_rsv;
trans->bytes_reserved = block_rsv.size;
btrfs_record_snapshot_destroy(trans, dir);
ret = btrfs_unlink_subvol(trans, root, dir,
dest->root_key.objectid,
dentry->d_name.name,
@@ -2960,8 +2958,8 @@ static int btrfs_cmp_data_prepare(struct inode *src, u64 loff,
* of the array is bounded by len, which is in turn bounded by
* BTRFS_MAX_DEDUPE_LEN.
*/
src_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
dst_pgarr = kzalloc(num_pages * sizeof(struct page *), GFP_NOFS);
src_pgarr = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
dst_pgarr = kcalloc(num_pages, sizeof(struct page *), GFP_KERNEL);
if (!src_pgarr || !dst_pgarr) {
kfree(src_pgarr);
kfree(dst_pgarr);
@@ -3066,6 +3064,9 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
inode_lock(src);
ret = extent_same_check_offsets(src, loff, &len, olen);
if (ret)
goto out_unlock;
ret = extent_same_check_offsets(src, dst_loff, &len, olen);
if (ret)
goto out_unlock;
@@ -3217,7 +3218,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
inode_inc_iversion(inode);
if (!no_time_update)
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_mtime = inode->i_ctime = current_fs_time(inode->i_sb);
/*
* We round up to the block size at eof when determining which
* extents to clone above, but shouldn't round up the file size.
@@ -3889,8 +3890,9 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src,
* Truncate page cache pages so that future reads will see the cloned
* data immediately and not the previous data.
*/
truncate_inode_pages_range(&inode->i_data, destoff,
PAGE_CACHE_ALIGN(destoff + len) - 1);
truncate_inode_pages_range(&inode->i_data,
round_down(destoff, PAGE_CACHE_SIZE),
round_up(destoff + len, PAGE_CACHE_SIZE) - 1);
out_unlock:
if (!same_inode)
btrfs_double_inode_unlock(src, inode);
@@ -5031,7 +5033,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file,
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_root_item *root_item = &root->root_item;
struct btrfs_trans_handle *trans;
struct timespec ct = CURRENT_TIME;
struct timespec ct = current_fs_time(inode->i_sb);
int ret = 0;
int received_uuid_changed;
@@ -5262,8 +5264,7 @@ out_unlock:
.compat_ro_flags = BTRFS_FEATURE_COMPAT_RO_##suffix, \
.incompat_flags = BTRFS_FEATURE_INCOMPAT_##suffix }
static int btrfs_ioctl_get_supported_features(struct file *file,
void __user *arg)
int btrfs_ioctl_get_supported_features(void __user *arg)
{
static const struct btrfs_ioctl_feature_flags features[3] = {
INIT_FEATURE_FLAGS(SUPP),
@@ -5542,7 +5543,7 @@ long btrfs_ioctl(struct file *file, unsigned int
case BTRFS_IOC_SET_FSLABEL:
return btrfs_ioctl_set_fslabel(file, argp);
case BTRFS_IOC_GET_SUPPORTED_FEATURES:
return btrfs_ioctl_get_supported_features(file, argp);
return btrfs_ioctl_get_supported_features(argp);
case BTRFS_IOC_GET_FEATURES:
return btrfs_ioctl_get_features(file, argp);
case BTRFS_IOC_SET_FEATURES:

Some files were not shown because too many files have changed in this diff Show More