Merge tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs and iomap updates from Dave Chinner:
 "The main things in this update are the iomap-based DAX infrastructure,
  an XFS delalloc rework, and a chunk of fixes to how log recovery
  schedules writeback to prevent spurious corruption detections when
  recovery of certain items was not required.

  The other main chunk of code is some preparation for the upcoming
  reflink functionality. Most of it is generic and cleanups that stand
  alone, but they were ready and reviewed so are in this pull request.

  Speaking of reflink, I'm currently planning to send you another pull
  request next week containing all the new reflink functionality. I'm
  working through a similar process to the last cycle, where I sent the
  reverse mapping code in a separate request because of how large it
  was. The reflink code merge is even bigger than reverse mapping, so
  I'll be doing the same thing again....

  Summary for this update:

   - change of XFS mailing list to linux-xfs@vger.kernel.org

   - iomap-based DAX infrastructure w/ XFS and ext2 support

   - small iomap fixes and additions

   - more efficient XFS delayed allocation infrastructure based on iomap

   - a rework of log recovery writeback scheduling to ensure we don't
     fail recovery when trying to replay items that are already on disk

   - some preparation patches for upcoming reflink support

   - configurable error handling fixes and documentation

   - aio access time update race fixes for XFS and
     generic_file_read_iter"

* tag 'xfs-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (40 commits)
  fs: update atime before I/O in generic_file_read_iter
  xfs: update atime before I/O in xfs_file_dio_aio_read
  ext2: fix possible integer truncation in ext2_iomap_begin
  xfs: log recovery tracepoints to track current lsn and buffer submission
  xfs: update metadata LSN in buffers during log recovery
  xfs: don't warn on buffers not being recovered due to LSN
  xfs: pass current lsn to log recovery buffer validation
  xfs: rework log recovery to submit buffers on LSN boundaries
  xfs: quiesce the filesystem after recovery on readonly mount
  xfs: remote attribute blocks aren't really userdata
  ext2: use iomap to implement DAX
  ext2: stop passing buffer_head to ext2_get_blocks
  xfs: use iomap to implement DAX
  xfs: refactor xfs_setfilesize
  xfs: take the ilock shared if possible in xfs_file_iomap_begin
  xfs: fix locking for DAX writes
  dax: provide an iomap based fault handler
  dax: provide an iomap based dax read/write path
  dax: don't pass buffer_head to copy_user_dax
  dax: don't pass buffer_head to dax_insert_mapping
  ...
This commit is contained in:
Linus Torvalds
2016-10-06 08:18:10 -07:00
49 changed files with 1949 additions and 721 deletions
+123
View File
@@ -348,3 +348,126 @@ Removed Sysctls
---- -------
fs.xfs.xfsbufd_centisec v4.0
fs.xfs.age_buffer_centisecs v4.0
Error handling
==============
XFS can act differently according to the type of error found during its
operation. The implementation introduces the following concepts to the error
handler:
-failure speed:
Defines how fast XFS should propagate an error upwards when a specific
error is found during the filesystem operation. It can propagate
immediately, after a defined number of retries, after a set time period,
or simply retry forever.
-error classes:
Specifies the subsystem the error configuration will apply to, such as
metadata IO or memory allocation. Different subsystems will have
different error handlers for which behaviour can be configured.
-error handlers:
Defines the behavior for a specific error.
The filesystem behavior during an error can be set via sysfs files. Each
error handler works independently - the first condition met by an error handler
for a specific class will cause the error to be propagated rather than reset and
retried.
The action taken by the filesystem when the error is propagated is context
dependent - it may cause a shut down in the case of an unrecoverable error,
it may be reported back to userspace, or it may even be ignored because
there's nothing useful we can with the error or anyone we can report it to (e.g.
during unmount).
The configuration files are organized into the following hierarchy for each
mounted filesystem:
/sys/fs/xfs/<dev>/error/<class>/<error>/
Where:
<dev>
The short device name of the mounted filesystem. This is the same device
name that shows up in XFS kernel error messages as "XFS(<dev>): ..."
<class>
The subsystem the error configuration belongs to. As of 4.9, the defined
classes are:
- "metadata": applies metadata buffer write IO
<error>
The individual error handler configurations.
Each filesystem has "global" error configuration options defined in their top
level directory:
/sys/fs/xfs/<dev>/error/
fail_at_unmount (Min: 0 Default: 1 Max: 1)
Defines the filesystem error behavior at unmount time.
If set to a value of 1, XFS will override all other error configurations
during unmount and replace them with "immediate fail" characteristics.
i.e. no retries, no retry timeout. This will always allow unmount to
succeed when there are persistent errors present.
If set to 0, the configured retry behaviour will continue until all
retries and/or timeouts have been exhausted. This will delay unmount
completion when there are persistent errors, and it may prevent the
filesystem from ever unmounting fully in the case of "retry forever"
handler configurations.
Note: there is no guarantee that fail_at_unmount can be set whilst an
unmount is in progress. It is possible that the sysfs entries are
removed by the unmounting filesystem before a "retry forever" error
handler configuration causes unmount to hang, and hence the filesystem
must be configured appropriately before unmount begins to prevent
unmount hangs.
Each filesystem has specific error class handlers that define the error
propagation behaviour for specific errors. There is also a "default" error
handler defined, which defines the behaviour for all errors that don't have
specific handlers defined. Where multiple retry constraints are configuredi for
a single error, the first retry configuration that expires will cause the error
to be propagated. The handler configurations are found in the directory:
/sys/fs/xfs/<dev>/error/<class>/<error>/
max_retries (Min: -1 Default: Varies Max: INTMAX)
Defines the allowed number of retries of a specific error before
the filesystem will propagate the error. The retry count for a given
error context (e.g. a specific metadata buffer) is reset every time
there is a successful completion of the operation.
Setting the value to "-1" will cause XFS to retry forever for this
specific error.
Setting the value to "0" will cause XFS to fail immediately when the
specific error is reported.
Setting the value to "N" (where 0 < N < Max) will make XFS retry the
operation "N" times before propagating the error.
retry_timeout_seconds (Min: -1 Default: Varies Max: 1 day)
Define the amount of time (in seconds) that the filesystem is
allowed to retry its operations when the specific error is
found.
Setting the value to "-1" will allow XFS to retry forever for this
specific error.
Setting the value to "0" will cause XFS to fail immediately when the
specific error is reported.
Setting the value to "N" (where 0 < N < Max) will allow XFS to retry the
operation for up to "N" seconds before propagating the error.
Note: The default behaviour for a specific error handler is dependent on both
the class and error context. For example, the default values for
"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
to "fail immediately" behaviour. This is done because ENODEV is a fatal,
unrecoverable error no matter how many times the metadata IO is retried.
+3 -4
View File
@@ -13099,11 +13099,10 @@ F: arch/x86/xen/*swiotlb*
F: drivers/xen/*swiotlb*
XFS FILESYSTEM
P: Silicon Graphics Inc
M: Dave Chinner <david@fromorbit.com>
M: xfs@oss.sgi.com
L: xfs@oss.sgi.com
W: http://oss.sgi.com/projects/xfs
M: linux-xfs@vger.kernel.org
L: linux-xfs@vger.kernel.org
W: http://xfs.org/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git
S: Supported
F: Documentation/filesystems/xfs.txt
+240 -12
View File
@@ -31,6 +31,8 @@
#include <linux/vmstat.h>
#include <linux/pfn_t.h>
#include <linux/sizes.h>
#include <linux/iomap.h>
#include "internal.h"
/*
* We use lowest available bit in exceptional entry for locking, other two
@@ -580,14 +582,13 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
return VM_FAULT_LOCKED;
}
static int copy_user_bh(struct page *to, struct inode *inode,
struct buffer_head *bh, unsigned long vaddr)
static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
struct page *to, unsigned long vaddr)
{
struct blk_dax_ctl dax = {
.sector = to_sector(bh, inode),
.size = bh->b_size,
.sector = sector,
.size = size,
};
struct block_device *bdev = bh->b_bdev;
void *vto;
if (dax_map_atomic(bdev, &dax) < 0)
@@ -790,14 +791,13 @@ int dax_writeback_mapping_range(struct address_space *mapping,
EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
static int dax_insert_mapping(struct address_space *mapping,
struct buffer_head *bh, void **entryp,
struct vm_area_struct *vma, struct vm_fault *vmf)
struct block_device *bdev, sector_t sector, size_t size,
void **entryp, struct vm_area_struct *vma, struct vm_fault *vmf)
{
unsigned long vaddr = (unsigned long)vmf->virtual_address;
struct block_device *bdev = bh->b_bdev;
struct blk_dax_ctl dax = {
.sector = to_sector(bh, mapping->host),
.size = bh->b_size,
.sector = sector,
.size = size,
};
void *ret;
void *entry = *entryp;
@@ -868,7 +868,8 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
if (vmf->cow_page) {
struct page *new_page = vmf->cow_page;
if (buffer_written(&bh))
error = copy_user_bh(new_page, inode, &bh, vaddr);
error = copy_user_dax(bh.b_bdev, to_sector(&bh, inode),
bh.b_size, new_page, vaddr);
else
clear_user_highpage(new_page, vaddr);
if (error)
@@ -898,7 +899,8 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
/* Filesystem should not return unwritten buffers to us! */
WARN_ON_ONCE(buffer_unwritten(&bh) || buffer_new(&bh));
error = dax_insert_mapping(mapping, &bh, &entry, vma, vmf);
error = dax_insert_mapping(mapping, bh.b_bdev, to_sector(&bh, inode),
bh.b_size, &entry, vma, vmf);
unlock_entry:
put_locked_mapping_entry(mapping, vmf->pgoff, entry);
out:
@@ -1241,3 +1243,229 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
return dax_zero_page_range(inode, from, length, get_block);
}
EXPORT_SYMBOL_GPL(dax_truncate_page);
#ifdef CONFIG_FS_IOMAP
static loff_t
iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
struct iov_iter *iter = data;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
if (iov_iter_rw(iter) == READ) {
end = min(end, i_size_read(inode));
if (pos >= end)
return 0;
if (iomap->type == IOMAP_HOLE || iomap->type == IOMAP_UNWRITTEN)
return iov_iter_zero(min(length, end - pos), iter);
}
if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
return -EIO;
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
struct blk_dax_ctl dax = { 0 };
ssize_t map_len;
dax.sector = iomap->blkno +
(((pos & PAGE_MASK) - iomap->offset) >> 9);
dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
map_len = dax_map_atomic(iomap->bdev, &dax);
if (map_len < 0) {
ret = map_len;
break;
}
dax.addr += offset;
map_len -= offset;
if (map_len > end - pos)
map_len = end - pos;
if (iov_iter_rw(iter) == WRITE)
map_len = copy_from_iter_pmem(dax.addr, map_len, iter);
else
map_len = copy_to_iter(dax.addr, map_len, iter);
dax_unmap_atomic(iomap->bdev, &dax);
if (map_len <= 0) {
ret = map_len ? map_len : -EFAULT;
break;
}
pos += map_len;
length -= map_len;
done += map_len;
}
return done ? done : ret;
}
/**
* iomap_dax_rw - Perform I/O to a DAX file
* @iocb: The control block for this I/O
* @iter: The addresses to do I/O from or to
* @ops: iomap ops passed from the file system
*
* This function performs read and write operations to directly mapped
* persistent memory. The callers needs to take care of read/write exclusion
* and evicting any page cache pages in the region under I/O.
*/
ssize_t
iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_ops *ops)
{
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
loff_t pos = iocb->ki_pos, ret = 0, done = 0;
unsigned flags = 0;
if (iov_iter_rw(iter) == WRITE)
flags |= IOMAP_WRITE;
/*
* Yes, even DAX files can have page cache attached to them: A zeroed
* page is inserted into the pagecache when we have to serve a write
* fault on a hole. It should never be dirtied and can simply be
* dropped from the pagecache once we get real data for the page.
*
* XXX: This is racy against mmap, and there's nothing we can do about
* it. We'll eventually need to shift this down even further so that
* we can check if we allocated blocks over a hole first.
*/
if (mapping->nrpages) {
ret = invalidate_inode_pages2_range(mapping,
pos >> PAGE_SHIFT,
(pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
WARN_ON_ONCE(ret);
}
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
iter, iomap_dax_actor);
if (ret <= 0)
break;
pos += ret;
done += ret;
}
iocb->ki_pos += done;
return done ? done : ret;
}
EXPORT_SYMBOL_GPL(iomap_dax_rw);
/**
* iomap_dax_fault - handle a page fault on a DAX file
* @vma: The virtual memory area where the fault occurred
* @vmf: The description of the fault
* @ops: iomap ops passed from the file system
*
* When a page fault occurs, filesystems may call this helper in their fault
* or mkwrite handler for DAX files. Assumes the caller has done all the
* necessary locking for the page fault to proceed successfully.
*/
int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
struct iomap_ops *ops)
{
struct address_space *mapping = vma->vm_file->f_mapping;
struct inode *inode = mapping->host;
unsigned long vaddr = (unsigned long)vmf->virtual_address;
loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT;
sector_t sector;
struct iomap iomap = { 0 };
unsigned flags = 0;
int error, major = 0;
void *entry;
/*
* Check whether offset isn't beyond end of file now. Caller is supposed
* to hold locks serializing us with truncate / punch hole so this is
* a reliable test.
*/
if (pos >= i_size_read(inode))
return VM_FAULT_SIGBUS;
entry = grab_mapping_entry(mapping, vmf->pgoff);
if (IS_ERR(entry)) {
error = PTR_ERR(entry);
goto out;
}
if ((vmf->flags & FAULT_FLAG_WRITE) && !vmf->cow_page)
flags |= IOMAP_WRITE;
/*
* Note that we don't bother to use iomap_apply here: DAX required
* the file system block size to be equal the page size, which means
* that we never have to deal with more than a single extent here.
*/
error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap);
if (error)
goto unlock_entry;
if (WARN_ON_ONCE(iomap.offset + iomap.length < pos + PAGE_SIZE)) {
error = -EIO; /* fs corruption? */
goto unlock_entry;
}
sector = iomap.blkno + (((pos & PAGE_MASK) - iomap.offset) >> 9);
if (vmf->cow_page) {
switch (iomap.type) {
case IOMAP_HOLE:
case IOMAP_UNWRITTEN:
clear_user_highpage(vmf->cow_page, vaddr);
break;
case IOMAP_MAPPED:
error = copy_user_dax(iomap.bdev, sector, PAGE_SIZE,
vmf->cow_page, vaddr);
break;
default:
WARN_ON_ONCE(1);
error = -EIO;
break;
}
if (error)
goto unlock_entry;
if (!radix_tree_exceptional_entry(entry)) {
vmf->page = entry;
return VM_FAULT_LOCKED;
}
vmf->entry = entry;
return VM_FAULT_DAX_LOCKED;
}
switch (iomap.type) {
case IOMAP_MAPPED:
if (iomap.flags & IOMAP_F_NEW) {
count_vm_event(PGMAJFAULT);
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
major = VM_FAULT_MAJOR;
}
error = dax_insert_mapping(mapping, iomap.bdev, sector,
PAGE_SIZE, &entry, vma, vmf);
break;
case IOMAP_UNWRITTEN:
case IOMAP_HOLE:
if (!(vmf->flags & FAULT_FLAG_WRITE))
return dax_load_hole(mapping, entry, vmf);
/*FALLTHRU*/
default:
WARN_ON_ONCE(1);
error = -EIO;
break;
}
unlock_entry:
put_locked_mapping_entry(mapping, vmf->pgoff, entry);
out:
if (error == -ENOMEM)
return VM_FAULT_OOM | major;
/* -EBUSY is fine, somebody else faulted on the same PTE */
if (error < 0 && error != -EBUSY)
return VM_FAULT_SIGBUS | major;
return VM_FAULT_NOPAGE | major;
}
EXPORT_SYMBOL_GPL(iomap_dax_fault);
#endif /* CONFIG_FS_IOMAP */
+1
View File
@@ -1,5 +1,6 @@
config EXT2_FS
tristate "Second extended fs support"
select FS_IOMAP if FS_DAX
help
Ext2 is a standard Linux file system for hard disks.
+1
View File
@@ -814,6 +814,7 @@ extern const struct file_operations ext2_file_operations;
/* inode.c */
extern const struct address_space_operations ext2_aops;
extern const struct address_space_operations ext2_nobh_aops;
extern struct iomap_ops ext2_iomap_ops;
/* namei.c */
extern const struct inode_operations ext2_dir_inode_operations;
+69 -7
View File
@@ -22,11 +22,59 @@
#include <linux/pagemap.h>
#include <linux/dax.h>
#include <linux/quotaops.h>
#include <linux/iomap.h>
#include <linux/uio.h>
#include "ext2.h"
#include "xattr.h"
#include "acl.h"
#ifdef CONFIG_FS_DAX
static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct inode *inode = iocb->ki_filp->f_mapping->host;
ssize_t ret;
if (!iov_iter_count(to))
return 0; /* skip atime */
inode_lock_shared(inode);
ret = iomap_dax_rw(iocb, to, &ext2_iomap_ops);
inode_unlock_shared(inode);
file_accessed(iocb->ki_filp);
return ret;
}
static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
ssize_t ret;
inode_lock(inode);
ret = generic_write_checks(iocb, from);
if (ret <= 0)
goto out_unlock;
ret = file_remove_privs(file);
if (ret)
goto out_unlock;
ret = file_update_time(file);
if (ret)
goto out_unlock;
ret = iomap_dax_rw(iocb, from, &ext2_iomap_ops);
if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
i_size_write(inode, iocb->ki_pos);
mark_inode_dirty(inode);
}
out_unlock:
inode_unlock(inode);
if (ret > 0)
ret = generic_write_sync(iocb, ret);
return ret;
}
/*
* The lock ordering for ext2 DAX fault paths is:
*
@@ -51,7 +99,7 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
}
down_read(&ei->dax_sem);
ret = dax_fault(vma, vmf, ext2_get_block);
ret = iomap_dax_fault(vma, vmf, &ext2_iomap_ops);
up_read(&ei->dax_sem);
if (vmf->flags & FAULT_FLAG_WRITE)
@@ -156,14 +204,28 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
return ret;
}
/*
* We have mostly NULL's here: the current defaults are ok for
* the ext2 filesystem.
*/
static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
#ifdef CONFIG_FS_DAX
if (IS_DAX(iocb->ki_filp->f_mapping->host))
return ext2_dax_read_iter(iocb, to);
#endif
return generic_file_read_iter(iocb, to);
}
static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
#ifdef CONFIG_FS_DAX
if (IS_DAX(iocb->ki_filp->f_mapping->host))
return ext2_dax_write_iter(iocb, from);
#endif
return generic_file_write_iter(iocb, from);
}
const struct file_operations ext2_file_operations = {
.llseek = generic_file_llseek,
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.read_iter = ext2_file_read_iter,
.write_iter = ext2_file_write_iter,
.unlocked_ioctl = ext2_ioctl,
#ifdef CONFIG_COMPAT
.compat_ioctl = ext2_compat_ioctl,
+82 -20
View File
@@ -32,6 +32,7 @@
#include <linux/buffer_head.h>
#include <linux/mpage.h>
#include <linux/fiemap.h>
#include <linux/iomap.h>
#include <linux/namei.h>
#include <linux/uio.h>
#include "ext2.h"
@@ -618,7 +619,7 @@ static void ext2_splice_branch(struct inode *inode,
*/
static int ext2_get_blocks(struct inode *inode,
sector_t iblock, unsigned long maxblocks,
struct buffer_head *bh_result,
u32 *bno, bool *new, bool *boundary,
int create)
{
int err = -EIO;
@@ -644,7 +645,6 @@ static int ext2_get_blocks(struct inode *inode,
/* Simplest case - block found, no allocation needed */
if (!partial) {
first_block = le32_to_cpu(chain[depth - 1].key);
clear_buffer_new(bh_result); /* What's this do? */
count++;
/*map more blocks*/
while (count < maxblocks && count <= blocks_to_boundary) {
@@ -699,7 +699,6 @@ static int ext2_get_blocks(struct inode *inode,
mutex_unlock(&ei->truncate_mutex);
if (err)
goto cleanup;
clear_buffer_new(bh_result);
goto got_it;
}
}
@@ -755,15 +754,16 @@ static int ext2_get_blocks(struct inode *inode,
mutex_unlock(&ei->truncate_mutex);
goto cleanup;
}
} else
set_buffer_new(bh_result);
} else {
*new = true;
}
ext2_splice_branch(inode, iblock, partial, indirect_blks, count);
mutex_unlock(&ei->truncate_mutex);
got_it:
map_bh(bh_result, inode->i_sb, le32_to_cpu(chain[depth-1].key));
*bno = le32_to_cpu(chain[depth-1].key);
if (count > blocks_to_boundary)
set_buffer_boundary(bh_result);
*boundary = true;
err = count;
/* Clean up and exit */
partial = chain + depth - 1; /* the whole chain */
@@ -775,19 +775,82 @@ cleanup:
return err;
}
int ext2_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create)
int ext2_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
int ret = ext2_get_blocks(inode, iblock, max_blocks,
bh_result, create);
if (ret > 0) {
bh_result->b_size = (ret << inode->i_blkbits);
ret = 0;
}
return ret;
bool new = false, boundary = false;
u32 bno;
int ret;
ret = ext2_get_blocks(inode, iblock, max_blocks, &bno, &new, &boundary,
create);
if (ret <= 0)
return ret;
map_bh(bh_result, inode->i_sb, bno);
bh_result->b_size = (ret << inode->i_blkbits);
if (new)
set_buffer_new(bh_result);
if (boundary)
set_buffer_boundary(bh_result);
return 0;
}
#ifdef CONFIG_FS_DAX
static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
unsigned flags, struct iomap *iomap)
{
unsigned int blkbits = inode->i_blkbits;
unsigned long first_block = offset >> blkbits;
unsigned long max_blocks = (length + (1 << blkbits) - 1) >> blkbits;
bool new = false, boundary = false;
u32 bno;
int ret;
ret = ext2_get_blocks(inode, first_block, max_blocks,
&bno, &new, &boundary, flags & IOMAP_WRITE);
if (ret < 0)
return ret;
iomap->flags = 0;
iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
if (ret == 0) {
iomap->type = IOMAP_HOLE;
iomap->blkno = IOMAP_NULL_BLOCK;
iomap->length = 1 << blkbits;
} else {
iomap->type = IOMAP_MAPPED;
iomap->blkno = (sector_t)bno << (blkbits - 9);
iomap->length = (u64)ret << blkbits;
iomap->flags |= IOMAP_F_MERGED;
}
if (new)
iomap->flags |= IOMAP_F_NEW;
return 0;
}
static int
ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
ssize_t written, unsigned flags, struct iomap *iomap)
{
if (iomap->type == IOMAP_MAPPED &&
written < length &&
(flags & IOMAP_WRITE))
ext2_write_failed(inode->i_mapping, offset + length);
return 0;
}
struct iomap_ops ext2_iomap_ops = {
.iomap_begin = ext2_iomap_begin,
.iomap_end = ext2_iomap_end,
};
#endif /* CONFIG_FS_DAX */
int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
@@ -873,11 +936,10 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;
if (IS_DAX(inode))
ret = dax_do_io(iocb, inode, iter, ext2_get_block, NULL,
DIO_LOCKING);
else
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
if (WARN_ON_ONCE(IS_DAX(inode)))
return -EIO;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
if (ret < 0 && iov_iter_rw(iter) == WRITE)
ext2_write_failed(mapping, offset + count);
return ret;
+11
View File
@@ -12,6 +12,7 @@
struct super_block;
struct file_system_type;
struct iomap;
struct iomap_ops;
struct linux_binprm;
struct path;
struct mount;
@@ -164,3 +165,13 @@ extern struct dentry_operations ns_dentry_operations;
extern int do_vfs_ioctl(struct file *file, unsigned int fd, unsigned int cmd,
unsigned long arg);
extern long vfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
/*
* iomap support:
*/
typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
void *data, struct iomap *iomap);
loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, struct iomap_ops *ops, void *data,
iomap_actor_t actor);
+85 -4
View File
@@ -27,9 +27,6 @@
#include <linux/dax.h>
#include "internal.h"
typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
void *data, struct iomap *iomap);
/*
* Execute a iomap write on a segment of the mapping that spans a
* contiguous range of pages that have identical block mapping state.
@@ -41,7 +38,7 @@ typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len,
* resources they require in the iomap_begin call, and release them in the
* iomap_end call.
*/
static loff_t
loff_t
iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
struct iomap_ops *ops, void *data, iomap_actor_t actor)
{
@@ -252,6 +249,88 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
}
EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
static struct page *
__iomap_read_page(struct inode *inode, loff_t offset)
{
struct address_space *mapping = inode->i_mapping;
struct page *page;
page = read_mapping_page(mapping, offset >> PAGE_SHIFT, NULL);
if (IS_ERR(page))
return page;
if (!PageUptodate(page)) {
put_page(page);
return ERR_PTR(-EIO);
}
return page;
}
static loff_t
iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
long status = 0;
ssize_t written = 0;
do {
struct page *page, *rpage;
unsigned long offset; /* Offset into pagecache page */
unsigned long bytes; /* Bytes to write to page */
offset = (pos & (PAGE_SIZE - 1));
bytes = min_t(unsigned long, PAGE_SIZE - offset, length);
rpage = __iomap_read_page(inode, pos);
if (IS_ERR(rpage))
return PTR_ERR(rpage);
status = iomap_write_begin(inode, pos, bytes,
AOP_FLAG_NOFS | AOP_FLAG_UNINTERRUPTIBLE,
&page, iomap);
put_page(rpage);
if (unlikely(status))
return status;
WARN_ON_ONCE(!PageUptodate(page));
status = iomap_write_end(inode, pos, bytes, bytes, page);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
return status;
}
cond_resched();
pos += status;
written += status;
length -= status;
balance_dirty_pages_ratelimited(inode->i_mapping);
} while (length);
return written;
}
int
iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
struct iomap_ops *ops)
{
loff_t ret;
while (len) {
ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
iomap_dirty_actor);
if (ret <= 0)
return ret;
pos += ret;
len -= ret;
}
return 0;
}
EXPORT_SYMBOL_GPL(iomap_file_dirty);
static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
unsigned bytes, struct iomap *iomap)
{
@@ -430,6 +509,8 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
if (iomap->flags & IOMAP_F_MERGED)
flags |= FIEMAP_EXTENT_MERGED;
if (iomap->flags & IOMAP_F_SHARED)
flags |= FIEMAP_EXTENT_SHARED;
return fiemap_fill_next_extent(fi, iomap->offset,
iomap->blkno != IOMAP_NULL_BLOCK ? iomap->blkno << 9: 0,
+1
View File
@@ -52,6 +52,7 @@ xfs-y += $(addprefix libxfs/, \
xfs_inode_fork.o \
xfs_inode_buf.o \
xfs_log_rlimit.o \
xfs_ag_resv.o \
xfs_rmap.o \
xfs_rmap_btree.o \
xfs_sb.o \
+325
View File
@@ -0,0 +1,325 @@
/*
* Copyright (C) 2016 Oracle. All Rights Reserved.
*
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#include "xfs.h"
#include "xfs_fs.h"
#include "xfs_shared.h"
#include "xfs_format.h"
#include "xfs_log_format.h"
#include "xfs_trans_resv.h"
#include "xfs_sb.h"
#include "xfs_mount.h"
#include "xfs_defer.h"
#include "xfs_alloc.h"
#include "xfs_error.h"
#include "xfs_trace.h"
#include "xfs_cksum.h"
#include "xfs_trans.h"
#include "xfs_bit.h"
#include "xfs_bmap.h"
#include "xfs_bmap_btree.h"
#include "xfs_ag_resv.h"
#include "xfs_trans_space.h"
#include "xfs_rmap_btree.h"
#include "xfs_btree.h"
/*
* Per-AG Block Reservations
*
* For some kinds of allocation group metadata structures, it is advantageous
* to reserve a small number of blocks in each AG so that future expansions of
* that data structure do not encounter ENOSPC because errors during a btree
* split cause the filesystem to go offline.
*
* Prior to the introduction of reflink, this wasn't an issue because the free
* space btrees maintain a reserve of space (the AGFL) to handle any expansion
* that may be necessary; and allocations of other metadata (inodes, BMBT,
* dir/attr) aren't restricted to a single AG. However, with reflink it is
* possible to allocate all the space in an AG, have subsequent reflink/CoW
* activity expand the refcount btree, and discover that there's no space left
* to handle that expansion. Since we can calculate the maximum size of the
* refcount btree, we can reserve space for it and avoid ENOSPC.
*
* Handling per-AG reservations consists of three changes to the allocator's
* behavior: First, because these reservations are always needed, we decrease
* the ag_max_usable counter to reflect the size of the AG after the reserved
* blocks are taken. Second, the reservations must be reflected in the
* fdblocks count to maintain proper accounting. Third, each AG must maintain
* its own reserved block counter so that we can calculate the amount of space
* that must remain free to maintain the reservations. Fourth, the "remaining
* reserved blocks" count must be used when calculating the length of the
* longest free extent in an AG and to clamp maxlen in the per-AG allocation
* functions. In other words, we maintain a virtual allocation via in-core
* accounting tricks so that we don't have to clean up after a crash. :)
*
* Reserved blocks can be managed by passing one of the enum xfs_ag_resv_type
* values via struct xfs_alloc_arg or directly to the xfs_free_extent
* function. It might seem a little funny to maintain a reservoir of blocks
* to feed another reservoir, but the AGFL only holds enough blocks to get
* through the next transaction. The per-AG reservation is to ensure (we
* hope) that each AG never runs out of blocks. Each data structure wanting
* to use the reservation system should update ask/used in xfs_ag_resv_init.
*/
/*
* Are we critically low on blocks? For now we'll define that as the number
* of blocks we can get our hands on being less than 10% of what we reserved
* or less than some arbitrary number (maximum btree height).
*/
bool
xfs_ag_resv_critical(
struct xfs_perag *pag,
enum xfs_ag_resv_type type)
{
xfs_extlen_t avail;
xfs_extlen_t orig;
switch (type) {
case XFS_AG_RESV_METADATA:
avail = pag->pagf_freeblks - pag->pag_agfl_resv.ar_reserved;
orig = pag->pag_meta_resv.ar_asked;
break;
case XFS_AG_RESV_AGFL:
avail = pag->pagf_freeblks + pag->pagf_flcount -
pag->pag_meta_resv.ar_reserved;
orig = pag->pag_agfl_resv.ar_asked;
break;
default:
ASSERT(0);
return false;
}
trace_xfs_ag_resv_critical(pag, type, avail);
/* Critically low if less than 10% or max btree height remains. */
return avail < orig / 10 || avail < XFS_BTREE_MAXLEVELS;
}
/*
* How many blocks are reserved but not used, and therefore must not be
* allocated away?
*/
xfs_extlen_t
xfs_ag_resv_needed(
struct xfs_perag *pag,
enum xfs_ag_resv_type type)
{
xfs_extlen_t len;
len = pag->pag_meta_resv.ar_reserved + pag->pag_agfl_resv.ar_reserved;
switch (type) {
case XFS_AG_RESV_METADATA:
case XFS_AG_RESV_AGFL:
len -= xfs_perag_resv(pag, type)->ar_reserved;
break;
case XFS_AG_RESV_NONE:
/* empty */
break;
default:
ASSERT(0);
}
trace_xfs_ag_resv_needed(pag, type, len);
return len;
}
/* Clean out a reservation */
static int
__xfs_ag_resv_free(
struct xfs_perag *pag,
enum xfs_ag_resv_type type)
{
struct xfs_ag_resv *resv;
xfs_extlen_t oldresv;
int error;
trace_xfs_ag_resv_free(pag, type, 0);
resv = xfs_perag_resv(pag, type);
pag->pag_mount->m_ag_max_usable += resv->ar_asked;
/*
* AGFL blocks are always considered "free", so whatever
* was reserved at mount time must be given back at umount.
*/
if (type == XFS_AG_RESV_AGFL)
oldresv = resv->ar_orig_reserved;
else
oldresv = resv->ar_reserved;
error = xfs_mod_fdblocks(pag->pag_mount, oldresv, true);
resv->ar_reserved = 0;
resv->ar_asked = 0;
if (error)
trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno,
error, _RET_IP_);
return error;
}
/* Free a per-AG reservation. */
int
xfs_ag_resv_free(
struct xfs_perag *pag)
{
int error;
int err2;
error = __xfs_ag_resv_free(pag, XFS_AG_RESV_AGFL);
err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
if (err2 && !error)
error = err2;
return error;
}
static int
__xfs_ag_resv_init(
struct xfs_perag *pag,
enum xfs_ag_resv_type type,
xfs_extlen_t ask,
xfs_extlen_t used)
{
struct xfs_mount *mp = pag->pag_mount;
struct xfs_ag_resv *resv;
int error;
resv = xfs_perag_resv(pag, type);
if (used > ask)
ask = used;
resv->ar_asked = ask;
resv->ar_reserved = resv->ar_orig_reserved = ask - used;
mp->m_ag_max_usable -= ask;
trace_xfs_ag_resv_init(pag, type, ask);
error = xfs_mod_fdblocks(mp, -(int64_t)resv->ar_reserved, true);
if (error)
trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
error, _RET_IP_);
return error;
}
/* Create a per-AG block reservation. */
int
xfs_ag_resv_init(
struct xfs_perag *pag)
{
xfs_extlen_t ask;
xfs_extlen_t used;
int error = 0;
/* Create the metadata reservation. */
if (pag->pag_meta_resv.ar_asked == 0) {
ask = used = 0;
error = __xfs_ag_resv_init(pag, XFS_AG_RESV_METADATA,
ask, used);
if (error)
goto out;
}
/* Create the AGFL metadata reservation */
if (pag->pag_agfl_resv.ar_asked == 0) {
ask = used = 0;
error = __xfs_ag_resv_init(pag, XFS_AG_RESV_AGFL, ask, used);
if (error)
goto out;
}
out:
return error;
}
/* Allocate a block from the reservation. */
void
xfs_ag_resv_alloc_extent(
struct xfs_perag *pag,
enum xfs_ag_resv_type type,
struct xfs_alloc_arg *args)
{
struct xfs_ag_resv *resv;
xfs_extlen_t len;
uint field;
trace_xfs_ag_resv_alloc_extent(pag, type, args->len);
switch (type) {
case XFS_AG_RESV_METADATA:
case XFS_AG_RESV_AGFL:
resv = xfs_perag_resv(pag, type);
break;
default:
ASSERT(0);
/* fall through */
case XFS_AG_RESV_NONE:
field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS :
XFS_TRANS_SB_FDBLOCKS;
xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len);
return;
}
len = min_t(xfs_extlen_t, args->len, resv->ar_reserved);
resv->ar_reserved -= len;
if (type == XFS_AG_RESV_AGFL)
return;
/* Allocations of reserved blocks only need on-disk sb updates... */
xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS, -(int64_t)len);
/* ...but non-reserved blocks need in-core and on-disk updates. */
if (args->len > len)
xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS,
-((int64_t)args->len - len));
}
/* Free a block to the reservation. */
void
xfs_ag_resv_free_extent(
struct xfs_perag *pag,
enum xfs_ag_resv_type type,
struct xfs_trans *tp,
xfs_extlen_t len)
{
xfs_extlen_t leftover;
struct xfs_ag_resv *resv;
trace_xfs_ag_resv_free_extent(pag, type, len);
switch (type) {
case XFS_AG_RESV_METADATA:
case XFS_AG_RESV_AGFL:
resv = xfs_perag_resv(pag, type);
break;
default:
ASSERT(0);
/* fall through */
case XFS_AG_RESV_NONE:
xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (int64_t)len);
return;
}
leftover = min_t(xfs_extlen_t, len, resv->ar_asked - resv->ar_reserved);
resv->ar_reserved += leftover;
if (type == XFS_AG_RESV_AGFL)
return;
/* Freeing into the reserved pool only requires on-disk update... */
xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, len);
/* ...but freeing beyond that requires in-core and on-disk update. */
if (len > leftover)
xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len - leftover);
}
+35
View File
@@ -0,0 +1,35 @@
/*
* Copyright (C) 2016 Oracle. All Rights Reserved.
*
* Author: Darrick J. Wong <darrick.wong@oracle.com>
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it would be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
*/
#ifndef __XFS_AG_RESV_H__
#define __XFS_AG_RESV_H__
int xfs_ag_resv_free(struct xfs_perag *pag);
int xfs_ag_resv_init(struct xfs_perag *pag);
bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
xfs_extlen_t xfs_ag_resv_needed(struct xfs_perag *pag,
enum xfs_ag_resv_type type);
void xfs_ag_resv_alloc_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
struct xfs_alloc_arg *args);
void xfs_ag_resv_free_extent(struct xfs_perag *pag, enum xfs_ag_resv_type type,
struct xfs_trans *tp, xfs_extlen_t len);
#endif /* __XFS_AG_RESV_H__ */
+90 -45
View File
@@ -37,6 +37,7 @@
#include "xfs_trans.h"
#include "xfs_buf_item.h"
#include "xfs_log.h"
#include "xfs_ag_resv.h"
struct workqueue_struct *xfs_alloc_wq;
@@ -74,14 +75,8 @@ xfs_prealloc_blocks(
* extents need to be actually allocated. To get around this, we explicitly set
* aside a few blocks which will not be reserved in delayed allocation.
*
* When rmap is disabled, we need to reserve 4 fsbs _per AG_ for the freelist
* and 4 more to handle a potential split of the file's bmap btree.
*
* When rmap is enabled, we must also be able to handle two rmap btree inserts
* to record both the file data extent and a new bmbt block. The bmbt block
* might not be in the same AG as the file data extent. In the worst case
* the bmap btree splits multiple levels and all the new blocks come from
* different AGs, so set aside enough to handle rmap btree splits in all AGs.
* We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to handle a
* potential split of the file's bmap btree.
*/
unsigned int
xfs_alloc_set_aside(
@@ -90,8 +85,6 @@ xfs_alloc_set_aside(
unsigned int blocks;
blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
blocks += mp->m_sb.sb_agcount * mp->m_rmap_maxlevels;
return blocks;
}
@@ -265,7 +258,7 @@ xfs_alloc_compute_diff(
xfs_agblock_t wantbno, /* target starting block */
xfs_extlen_t wantlen, /* target length */
xfs_extlen_t alignment, /* target alignment */
char userdata, /* are we allocating data? */
int datatype, /* are we allocating data? */
xfs_agblock_t freebno, /* freespace's starting block */
xfs_extlen_t freelen, /* freespace's length */
xfs_agblock_t *newbnop) /* result: best start block from free */
@@ -276,6 +269,7 @@ xfs_alloc_compute_diff(
xfs_extlen_t newlen1=0; /* length with newbno1 */
xfs_extlen_t newlen2=0; /* length with newbno2 */
xfs_agblock_t wantend; /* end of target extent */
bool userdata = xfs_alloc_is_userdata(datatype);
ASSERT(freelen >= wantlen);
freeend = freebno + freelen;
@@ -680,12 +674,29 @@ xfs_alloc_ag_vextent(
xfs_alloc_arg_t *args) /* argument structure for allocation */
{
int error=0;
xfs_extlen_t reservation;
xfs_extlen_t oldmax;
ASSERT(args->minlen > 0);
ASSERT(args->maxlen > 0);
ASSERT(args->minlen <= args->maxlen);
ASSERT(args->mod < args->prod);
ASSERT(args->alignment > 0);
/*
* Clamp maxlen to the amount of free space minus any reservations
* that have been made.
*/
oldmax = args->maxlen;
reservation = xfs_ag_resv_needed(args->pag, args->resv);
if (args->maxlen > args->pag->pagf_freeblks - reservation)
args->maxlen = args->pag->pagf_freeblks - reservation;
if (args->maxlen == 0) {
args->agbno = NULLAGBLOCK;
args->maxlen = oldmax;
return 0;
}
/*
* Branch to correct routine based on the type.
*/
@@ -705,12 +716,14 @@ xfs_alloc_ag_vextent(
/* NOTREACHED */
}
args->maxlen = oldmax;
if (error || args->agbno == NULLAGBLOCK)
return error;
ASSERT(args->len >= args->minlen);
ASSERT(args->len <= args->maxlen);
ASSERT(!args->wasfromfl || !args->isfl);
ASSERT(!args->wasfromfl || args->resv != XFS_AG_RESV_AGFL);
ASSERT(args->agbno % args->alignment == 0);
/* if not file data, insert new block into the reverse map btree */
@@ -732,12 +745,7 @@ xfs_alloc_ag_vextent(
args->agbno, args->len));
}
if (!args->isfl) {
xfs_trans_mod_sb(args->tp, args->wasdel ?
XFS_TRANS_SB_RES_FDBLOCKS :
XFS_TRANS_SB_FDBLOCKS,
-((long)(args->len)));
}
xfs_ag_resv_alloc_extent(args->pag, args->resv, args);
XFS_STATS_INC(args->mp, xs_allocx);
XFS_STATS_ADD(args->mp, xs_allocb, args->len);
@@ -917,7 +925,7 @@ xfs_alloc_find_best_extent(
sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
args->alignment,
args->userdata, *sbnoa,
args->datatype, *sbnoa,
*slena, &new);
/*
@@ -1101,7 +1109,7 @@ restart:
if (args->len < blen)
continue;
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
args->alignment, args->userdata, ltbnoa,
args->alignment, args->datatype, ltbnoa,
ltlena, &ltnew);
if (ltnew != NULLAGBLOCK &&
(args->len > blen || ltdiff < bdiff)) {
@@ -1254,7 +1262,7 @@ restart:
args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
xfs_alloc_fix_len(args);
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
args->alignment, args->userdata, ltbnoa,
args->alignment, args->datatype, ltbnoa,
ltlena, &ltnew);
error = xfs_alloc_find_best_extent(args,
@@ -1271,7 +1279,7 @@ restart:
args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
xfs_alloc_fix_len(args);
gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
args->alignment, args->userdata, gtbnoa,
args->alignment, args->datatype, gtbnoa,
gtlena, &gtnew);
error = xfs_alloc_find_best_extent(args,
@@ -1331,7 +1339,7 @@ restart:
}
rlen = args->len;
(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
args->userdata, ltbnoa, ltlena, &ltnew);
args->datatype, ltbnoa, ltlena, &ltnew);
ASSERT(ltnew >= ltbno);
ASSERT(ltnew + rlen <= ltbnoa + ltlena);
ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
@@ -1583,6 +1591,7 @@ xfs_alloc_ag_vextent_small(
int *stat) /* status: 0-freelist, 1-normal/none */
{
struct xfs_owner_info oinfo;
struct xfs_perag *pag;
int error;
xfs_agblock_t fbno;
xfs_extlen_t flen;
@@ -1600,7 +1609,8 @@ xfs_alloc_ag_vextent_small(
* to respect minleft even when pulling from the
* freelist.
*/
else if (args->minlen == 1 && args->alignment == 1 && !args->isfl &&
else if (args->minlen == 1 && args->alignment == 1 &&
args->resv != XFS_AG_RESV_AGFL &&
(be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount)
> args->minleft)) {
error = xfs_alloc_get_freelist(args->tp, args->agbp, &fbno, 0);
@@ -1608,9 +1618,9 @@ xfs_alloc_ag_vextent_small(
goto error0;
if (fbno != NULLAGBLOCK) {
xfs_extent_busy_reuse(args->mp, args->agno, fbno, 1,
args->userdata);
xfs_alloc_allow_busy_reuse(args->datatype));
if (args->userdata) {
if (xfs_alloc_is_userdata(args->datatype)) {
xfs_buf_t *bp;
bp = xfs_btree_get_bufs(args->mp, args->tp,
@@ -1629,13 +1639,18 @@ xfs_alloc_ag_vextent_small(
/*
* If we're feeding an AGFL block to something that
* doesn't live in the free space, we need to clear
* out the OWN_AG rmap.
* out the OWN_AG rmap and add the block back to
* the AGFL per-AG reservation.
*/
xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_AG);
error = xfs_rmap_free(args->tp, args->agbp, args->agno,
fbno, 1, &oinfo);
if (error)
goto error0;
pag = xfs_perag_get(args->mp, args->agno);
xfs_ag_resv_free_extent(pag, XFS_AG_RESV_AGFL,
args->tp, 1);
xfs_perag_put(pag);
*stat = 0;
return 0;
@@ -1683,7 +1698,7 @@ xfs_free_ag_extent(
xfs_agblock_t bno,
xfs_extlen_t len,
struct xfs_owner_info *oinfo,
int isfl)
enum xfs_ag_resv_type type)
{
xfs_btree_cur_t *bno_cur; /* cursor for by-block btree */
xfs_btree_cur_t *cnt_cur; /* cursor for by-size btree */
@@ -1911,21 +1926,22 @@ xfs_free_ag_extent(
*/
pag = xfs_perag_get(mp, agno);
error = xfs_alloc_update_counters(tp, pag, agbp, len);
xfs_ag_resv_free_extent(pag, type, tp, len);
xfs_perag_put(pag);
if (error)
goto error0;
if (!isfl)
xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (long)len);
XFS_STATS_INC(mp, xs_freex);
XFS_STATS_ADD(mp, xs_freeb, len);
trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
trace_xfs_free_extent(mp, agno, bno, len, type == XFS_AG_RESV_AGFL,
haveleft, haveright);
return 0;
error0:
trace_xfs_free_extent(mp, agno, bno, len, isfl, -1, -1);
trace_xfs_free_extent(mp, agno, bno, len, type == XFS_AG_RESV_AGFL,
-1, -1);
if (bno_cur)
xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
if (cnt_cur)
@@ -1950,21 +1966,43 @@ xfs_alloc_compute_maxlevels(
}
/*
* Find the length of the longest extent in an AG.
* Find the length of the longest extent in an AG. The 'need' parameter
* specifies how much space we're going to need for the AGFL and the
* 'reserved' parameter tells us how many blocks in this AG are reserved for
* other callers.
*/
xfs_extlen_t
xfs_alloc_longest_free_extent(
struct xfs_mount *mp,
struct xfs_perag *pag,
xfs_extlen_t need)
xfs_extlen_t need,
xfs_extlen_t reserved)
{
xfs_extlen_t delta = 0;
/*
* If the AGFL needs a recharge, we'll have to subtract that from the
* longest extent.
*/
if (need > pag->pagf_flcount)
delta = need - pag->pagf_flcount;
/*
* If we cannot maintain others' reservations with space from the
* not-longest freesp extents, we'll have to subtract /that/ from
* the longest extent too.
*/
if (pag->pagf_freeblks - pag->pagf_longest < reserved)
delta += reserved - (pag->pagf_freeblks - pag->pagf_longest);
/*
* If the longest extent is long enough to satisfy all the
* reservations and AGFL rules in place, we can return this extent.
*/
if (pag->pagf_longest > delta)
return pag->pagf_longest - delta;
/* Otherwise, let the caller try for 1 block if there's space. */
return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
}
@@ -2004,20 +2042,24 @@ xfs_alloc_space_available(
{
struct xfs_perag *pag = args->pag;
xfs_extlen_t longest;
xfs_extlen_t reservation; /* blocks that are still reserved */
int available;
if (flags & XFS_ALLOC_FLAG_FREEING)
return true;
reservation = xfs_ag_resv_needed(pag, args->resv);
/* do we have enough contiguous free space for the allocation? */
longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free);
longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free,
reservation);
if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
return false;
/* do have enough free space remaining for the allocation? */
/* do we have enough free space remaining for the allocation? */
available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
min_free - args->total);
if (available < (int)args->minleft)
reservation - min_free - args->total);
if (available < (int)args->minleft || available <= 0)
return false;
return true;
@@ -2058,7 +2100,7 @@ xfs_alloc_fix_freelist(
* somewhere else if we are not being asked to try harder at this
* point
*/
if (pag->pagf_metadata && args->userdata &&
if (pag->pagf_metadata && xfs_alloc_is_userdata(args->datatype) &&
(flags & XFS_ALLOC_FLAG_TRYLOCK)) {
ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
goto out_agbp_relse;
@@ -2124,7 +2166,7 @@ xfs_alloc_fix_freelist(
if (error)
goto out_agbp_relse;
error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
&targs.oinfo, 1);
&targs.oinfo, XFS_AG_RESV_AGFL);
if (error)
goto out_agbp_relse;
bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2135,7 +2177,7 @@ xfs_alloc_fix_freelist(
targs.mp = mp;
targs.agbp = agbp;
targs.agno = args->agno;
targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
targs.alignment = targs.minlen = targs.prod = 1;
targs.type = XFS_ALLOCTYPE_THIS_AG;
targs.pag = pag;
error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
@@ -2146,6 +2188,7 @@ xfs_alloc_fix_freelist(
while (pag->pagf_flcount < need) {
targs.agbno = 0;
targs.maxlen = need - pag->pagf_flcount;
targs.resv = XFS_AG_RESV_AGFL;
/* Allocate as many blocks as possible at once. */
error = xfs_alloc_ag_vextent(&targs);
@@ -2633,7 +2676,7 @@ xfs_alloc_vextent(
* Try near allocation first, then anywhere-in-ag after
* the first a.g. fails.
*/
if ((args->userdata & XFS_ALLOC_INITIAL_USER_DATA) &&
if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
(mp->m_flags & XFS_MOUNT_32BITINODES)) {
args->fsbno = XFS_AGB_TO_FSB(mp,
((mp->m_agfrotor / rotorstep) %
@@ -2766,7 +2809,7 @@ xfs_alloc_vextent(
#endif
/* Zero the extent if we were asked to do so */
if (args->userdata & XFS_ALLOC_USERDATA_ZERO) {
if (args->datatype & XFS_ALLOC_USERDATA_ZERO) {
error = xfs_zero_extent(args->ip, args->fsbno, args->len);
if (error)
goto error0;
@@ -2825,7 +2868,8 @@ xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
xfs_extlen_t len, /* length of extent */
struct xfs_owner_info *oinfo) /* extent owner */
struct xfs_owner_info *oinfo, /* extent owner */
enum xfs_ag_resv_type type) /* block reservation type */
{
struct xfs_mount *mp = tp->t_mountp;
struct xfs_buf *agbp;
@@ -2834,6 +2878,7 @@ xfs_free_extent(
int error;
ASSERT(len != 0);
ASSERT(type != XFS_AG_RESV_AGFL);
if (XFS_TEST_ERROR(false, mp,
XFS_ERRTAG_FREE_EXTENT,
@@ -2851,7 +2896,7 @@ xfs_free_extent(
agbno + len <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_length),
err);
error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, 0);
error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, type);
if (error)
goto err;
+20 -5
View File
@@ -85,20 +85,33 @@ typedef struct xfs_alloc_arg {
xfs_extlen_t len; /* output: actual size of extent */
xfs_alloctype_t type; /* allocation type XFS_ALLOCTYPE_... */
xfs_alloctype_t otype; /* original allocation type */
int datatype; /* mask defining data type treatment */
char wasdel; /* set if allocation was prev delayed */
char wasfromfl; /* set if allocation is from freelist */
char isfl; /* set if is freelist blocks - !acctg */
char userdata; /* mask defining userdata treatment */
xfs_fsblock_t firstblock; /* io first block allocated */
struct xfs_owner_info oinfo; /* owner of blocks being allocated */
enum xfs_ag_resv_type resv; /* block reservation to use */
} xfs_alloc_arg_t;
/*
* Defines for userdata
* Defines for datatype
*/
#define XFS_ALLOC_USERDATA (1 << 0)/* allocation is for user data*/
#define XFS_ALLOC_INITIAL_USER_DATA (1 << 1)/* special case start of file */
#define XFS_ALLOC_USERDATA_ZERO (1 << 2)/* zero extent on allocation */
#define XFS_ALLOC_NOBUSY (1 << 3)/* Busy extents not allowed */
static inline bool
xfs_alloc_is_userdata(int datatype)
{
return (datatype & ~XFS_ALLOC_NOBUSY) != 0;
}
static inline bool
xfs_alloc_allow_busy_reuse(int datatype)
{
return (datatype & XFS_ALLOC_NOBUSY) == 0;
}
/* freespace limit calculations */
#define XFS_ALLOC_AGFL_RESERVE 4
@@ -106,7 +119,8 @@ unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
struct xfs_perag *pag, xfs_extlen_t need);
struct xfs_perag *pag, xfs_extlen_t need,
xfs_extlen_t reserved);
unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
struct xfs_perag *pag);
@@ -184,7 +198,8 @@ xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
xfs_extlen_t len, /* length of extent */
struct xfs_owner_info *oinfo);/* extent owner */
struct xfs_owner_info *oinfo, /* extent owner */
enum xfs_ag_resv_type type); /* block reservation type */
int /* error */
xfs_alloc_lookup_ge(
+32 -104
View File
@@ -47,6 +47,7 @@
#include "xfs_attr_leaf.h"
#include "xfs_filestream.h"
#include "xfs_rmap.h"
#include "xfs_ag_resv.h"
kmem_zone_t *xfs_bmap_free_item_zone;
@@ -1388,7 +1389,7 @@ xfs_bmap_search_multi_extents(
* Else, *lastxp will be set to the index of the found
* entry; *gotp will contain the entry.
*/
STATIC xfs_bmbt_rec_host_t * /* pointer to found extent entry */
xfs_bmbt_rec_host_t * /* pointer to found extent entry */
xfs_bmap_search_extents(
xfs_inode_t *ip, /* incore inode pointer */
xfs_fileoff_t bno, /* block number searched for */
@@ -3347,7 +3348,8 @@ xfs_bmap_adjacent(
mp = ap->ip->i_mount;
nullfb = *ap->firstblock == NULLFSBLOCK;
rt = XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata;
rt = XFS_IS_REALTIME_INODE(ap->ip) &&
xfs_alloc_is_userdata(ap->datatype);
fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp, *ap->firstblock);
/*
* If allocating at eof, and there's a previous real block,
@@ -3501,7 +3503,8 @@ xfs_bmap_longest_free_extent(
}
longest = xfs_alloc_longest_free_extent(mp, pag,
xfs_alloc_min_freelist(mp, pag));
xfs_alloc_min_freelist(mp, pag),
xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
if (*blen < longest)
*blen = longest;
@@ -3622,7 +3625,7 @@ xfs_bmap_btalloc(
{
xfs_mount_t *mp; /* mount point structure */
xfs_alloctype_t atype = 0; /* type for allocation routines */
xfs_extlen_t align; /* minimum allocation alignment */
xfs_extlen_t align = 0; /* minimum allocation alignment */
xfs_agnumber_t fb_agno; /* ag number of ap->firstblock */
xfs_agnumber_t ag;
xfs_alloc_arg_t args;
@@ -3645,7 +3648,8 @@ xfs_bmap_btalloc(
else if (mp->m_dalign)
stripe_align = mp->m_dalign;
align = ap->userdata ? xfs_get_extsz_hint(ap->ip) : 0;
if (xfs_alloc_is_userdata(ap->datatype))
align = xfs_get_extsz_hint(ap->ip);
if (unlikely(align)) {
error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
align, 0, ap->eof, 0, ap->conv,
@@ -3658,7 +3662,8 @@ xfs_bmap_btalloc(
nullfb = *ap->firstblock == NULLFSBLOCK;
fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp, *ap->firstblock);
if (nullfb) {
if (ap->userdata && xfs_inode_is_filestream(ap->ip)) {
if (xfs_alloc_is_userdata(ap->datatype) &&
xfs_inode_is_filestream(ap->ip)) {
ag = xfs_filestream_lookup_ag(ap->ip);
ag = (ag != NULLAGNUMBER) ? ag : 0;
ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
@@ -3698,7 +3703,8 @@ xfs_bmap_btalloc(
* enough for the request. If one isn't found, then adjust
* the minimum allocation size to the largest space found.
*/
if (ap->userdata && xfs_inode_is_filestream(ap->ip))
if (xfs_alloc_is_userdata(ap->datatype) &&
xfs_inode_is_filestream(ap->ip))
error = xfs_bmap_btalloc_filestreams(ap, &args, &blen);
else
error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
@@ -3781,9 +3787,9 @@ xfs_bmap_btalloc(
}
args.minleft = ap->minleft;
args.wasdel = ap->wasdel;
args.isfl = 0;
args.userdata = ap->userdata;
if (ap->userdata & XFS_ALLOC_USERDATA_ZERO)
args.resv = XFS_AG_RESV_NONE;
args.datatype = ap->datatype;
if (ap->datatype & XFS_ALLOC_USERDATA_ZERO)
args.ip = ap->ip;
error = xfs_alloc_vextent(&args);
@@ -3877,7 +3883,8 @@ STATIC int
xfs_bmap_alloc(
struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{
if (XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata)
if (XFS_IS_REALTIME_INODE(ap->ip) &&
xfs_alloc_is_userdata(ap->datatype))
return xfs_bmap_rtalloc(ap);
return xfs_bmap_btalloc(ap);
}
@@ -4074,7 +4081,7 @@ xfs_bmapi_read(
return 0;
}
STATIC int
int
xfs_bmapi_reserve_delalloc(
struct xfs_inode *ip,
xfs_fileoff_t aoff,
@@ -4170,91 +4177,6 @@ out_unreserve_quota:
return error;
}
/*
* Map file blocks to filesystem blocks, adding delayed allocations as needed.
*/
int
xfs_bmapi_delay(
struct xfs_inode *ip, /* incore inode */
xfs_fileoff_t bno, /* starting file offs. mapped */
xfs_filblks_t len, /* length to map in file */
struct xfs_bmbt_irec *mval, /* output: map values */
int *nmap, /* i/o: mval size/count */
int flags) /* XFS_BMAPI_... */
{
struct xfs_mount *mp = ip->i_mount;
struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
struct xfs_bmbt_irec got; /* current file extent record */
struct xfs_bmbt_irec prev; /* previous file extent record */
xfs_fileoff_t obno; /* old block number (offset) */
xfs_fileoff_t end; /* end of mapped file region */
xfs_extnum_t lastx; /* last useful extent number */
int eof; /* we've hit the end of extents */
int n = 0; /* current extent index */
int error = 0;
ASSERT(*nmap >= 1);
ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
ASSERT(!(flags & ~XFS_BMAPI_ENTIRE));
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
XFS_ERROR_REPORT("xfs_bmapi_delay", XFS_ERRLEVEL_LOW, mp);
return -EFSCORRUPTED;
}
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
XFS_STATS_INC(mp, xs_blk_mapw);
if (!(ifp->if_flags & XFS_IFEXTENTS)) {
error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
if (error)
return error;
}
xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got, &prev);
end = bno + len;
obno = bno;
while (bno < end && n < *nmap) {
if (eof || got.br_startoff > bno) {
error = xfs_bmapi_reserve_delalloc(ip, bno, len, &got,
&prev, &lastx, eof);
if (error) {
if (n == 0) {
*nmap = 0;
return error;
}
break;
}
}
/* set up the extent map to return. */
xfs_bmapi_trim_map(mval, &got, &bno, len, obno, end, n, flags);
xfs_bmapi_update_map(&mval, &bno, &len, obno, end, &n, flags);
/* If we're done, stop now. */
if (bno >= end || n >= *nmap)
break;
/* Else go on to the next record. */
prev = got;
if (++lastx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
xfs_bmbt_get_all(xfs_iext_get_ext(ifp, lastx), &got);
else
eof = 1;
}
*nmap = n;
return 0;
}
static int
xfs_bmapi_allocate(
struct xfs_bmalloca *bma)
@@ -4287,15 +4209,21 @@ xfs_bmapi_allocate(
}
/*
* Indicate if this is the first user data in the file, or just any
* user data. And if it is userdata, indicate whether it needs to
* be initialised to zero during allocation.
* Set the data type being allocated. For the data fork, the first data
* in the file is treated differently to all other allocations. For the
* attribute fork, we only need to ensure the allocated range is not on
* the busy list.
*/
if (!(bma->flags & XFS_BMAPI_METADATA)) {
bma->userdata = (bma->offset == 0) ?
XFS_ALLOC_INITIAL_USER_DATA : XFS_ALLOC_USERDATA;
bma->datatype = XFS_ALLOC_NOBUSY;
if (whichfork == XFS_DATA_FORK) {
if (bma->offset == 0)
bma->datatype |= XFS_ALLOC_INITIAL_USER_DATA;
else
bma->datatype |= XFS_ALLOC_USERDATA;
}
if (bma->flags & XFS_BMAPI_ZERO)
bma->userdata |= XFS_ALLOC_USERDATA_ZERO;
bma->datatype |= XFS_ALLOC_USERDATA_ZERO;
}
bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
@@ -4565,7 +4493,7 @@ xfs_bmapi_write(
bma.tp = tp;
bma.ip = ip;
bma.total = total;
bma.userdata = 0;
bma.datatype = 0;
bma.dfops = dfops;
bma.firstblock = firstblock;
+8 -4
View File
@@ -54,7 +54,7 @@ struct xfs_bmalloca {
bool wasdel; /* replacing a delayed allocation */
bool aeof; /* allocated space at eof */
bool conv; /* overwriting unwritten extents */
char userdata;/* userdata mask */
int datatype;/* data type being allocated */
int flags;
};
@@ -181,9 +181,6 @@ int xfs_bmap_read_extents(struct xfs_trans *tp, struct xfs_inode *ip,
int xfs_bmapi_read(struct xfs_inode *ip, xfs_fileoff_t bno,
xfs_filblks_t len, struct xfs_bmbt_irec *mval,
int *nmap, int flags);
int xfs_bmapi_delay(struct xfs_inode *ip, xfs_fileoff_t bno,
xfs_filblks_t len, struct xfs_bmbt_irec *mval,
int *nmap, int flags);
int xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
xfs_fileoff_t bno, xfs_filblks_t len, int flags,
xfs_fsblock_t *firstblock, xfs_extlen_t total,
@@ -202,5 +199,12 @@ int xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
struct xfs_defer_ops *dfops, enum shift_direction direction,
int num_exts);
int xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);
struct xfs_bmbt_rec_host *
xfs_bmap_search_extents(struct xfs_inode *ip, xfs_fileoff_t bno,
int fork, int *eofp, xfs_extnum_t *lastxp,
struct xfs_bmbt_irec *gotp, struct xfs_bmbt_irec *prevp);
int xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, xfs_fileoff_t aoff,
xfs_filblks_t len, struct xfs_bmbt_irec *got,
struct xfs_bmbt_irec *prev, xfs_extnum_t *lastx, int eof);
#endif /* __XFS_BMAP_H__ */
+53 -6
View File
@@ -2070,7 +2070,7 @@ __xfs_btree_updkeys(
struct xfs_buf *bp0,
bool force_all)
{
union xfs_btree_bigkey key; /* keys from current level */
union xfs_btree_key key; /* keys from current level */
union xfs_btree_key *lkey; /* keys from the next level up */
union xfs_btree_key *hkey;
union xfs_btree_key *nlkey; /* keys from the next level up */
@@ -2086,7 +2086,7 @@ __xfs_btree_updkeys(
trace_xfs_btree_updkeys(cur, level, bp0);
lkey = (union xfs_btree_key *)&key;
lkey = &key;
hkey = xfs_btree_high_key_from_key(cur, lkey);
xfs_btree_get_keys(cur, block, lkey);
for (level++; level < cur->bc_nlevels; level++) {
@@ -3226,7 +3226,7 @@ xfs_btree_insrec(
struct xfs_buf *bp; /* buffer for block */
union xfs_btree_ptr nptr; /* new block ptr */
struct xfs_btree_cur *ncur; /* new btree cursor */
union xfs_btree_bigkey nkey; /* new block key */
union xfs_btree_key nkey; /* new block key */
union xfs_btree_key *lkey;
int optr; /* old key/record index */
int ptr; /* key/record index */
@@ -3241,7 +3241,7 @@ xfs_btree_insrec(
XFS_BTREE_TRACE_ARGIPR(cur, level, *ptrp, &rec);
ncur = NULL;
lkey = (union xfs_btree_key *)&nkey;
lkey = &nkey;
/*
* If we have an external root pointer, and we've made it to the
@@ -3444,14 +3444,14 @@ xfs_btree_insert(
union xfs_btree_ptr nptr; /* new block number (split result) */
struct xfs_btree_cur *ncur; /* new cursor (split result) */
struct xfs_btree_cur *pcur; /* previous level's cursor */
union xfs_btree_bigkey bkey; /* key of block to insert */
union xfs_btree_key bkey; /* key of block to insert */
union xfs_btree_key *key;
union xfs_btree_rec rec; /* record to insert */
level = 0;
ncur = NULL;
pcur = cur;
key = (union xfs_btree_key *)&bkey;
key = &bkey;
xfs_btree_set_ptr_null(cur, &nptr);
@@ -4797,3 +4797,50 @@ xfs_btree_query_range(
return xfs_btree_overlapped_query_range(cur, &low_key, &high_key,
fn, priv);
}
/*
* Calculate the number of blocks needed to store a given number of records
* in a short-format (per-AG metadata) btree.
*/
xfs_extlen_t
xfs_btree_calc_size(
struct xfs_mount *mp,
uint *limits,
unsigned long long len)
{
int level;
int maxrecs;
xfs_extlen_t rval;
maxrecs = limits[0];
for (level = 0, rval = 0; len > 1; level++) {
len += maxrecs - 1;
do_div(len, maxrecs);
maxrecs = limits[1];
rval += len;
}
return rval;
}
int
xfs_btree_count_blocks_helper(
struct xfs_btree_cur *cur,
int level,
void *data)
{
xfs_extlen_t *blocks = data;
(*blocks)++;
return 0;
}
/* Count the blocks in a btree and return the result in *blocks. */
int
xfs_btree_count_blocks(
struct xfs_btree_cur *cur,
xfs_extlen_t *blocks)
{
*blocks = 0;
return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper,
blocks);
}
+10 -18
View File
@@ -37,30 +37,18 @@ union xfs_btree_ptr {
__be64 l; /* long form ptr */
};
/*
* The in-core btree key. Overlapping btrees actually store two keys
* per pointer, so we reserve enough memory to hold both. The __*bigkey
* items should never be accessed directly.
*/
union xfs_btree_key {
struct xfs_bmbt_key bmbt;
xfs_bmdr_key_t bmbr; /* bmbt root block */
xfs_alloc_key_t alloc;
struct xfs_inobt_key inobt;
struct xfs_rmap_key rmap;
};
/*
* In-core key that holds both low and high keys for overlapped btrees.
* The two keys are packed next to each other on disk, so do the same
* in memory. Preserve the existing xfs_btree_key as a single key to
* avoid the mental model breakage that would happen if we passed a
* bigkey into a function that operates on a single key.
*/
union xfs_btree_bigkey {
struct xfs_bmbt_key bmbt;
xfs_bmdr_key_t bmbr; /* bmbt root block */
xfs_alloc_key_t alloc;
struct xfs_inobt_key inobt;
struct {
struct xfs_rmap_key rmap;
struct xfs_rmap_key rmap_hi;
};
struct xfs_rmap_key __rmap_bigkey[2];
};
union xfs_btree_rec {
@@ -513,6 +501,8 @@ bool xfs_btree_sblock_v5hdr_verify(struct xfs_buf *bp);
bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
uint xfs_btree_compute_maxlevels(struct xfs_mount *mp, uint *limits,
unsigned long len);
xfs_extlen_t xfs_btree_calc_size(struct xfs_mount *mp, uint *limits,
unsigned long long len);
/* return codes */
#define XFS_BTREE_QUERY_RANGE_CONTINUE 0 /* keep iterating */
@@ -529,4 +519,6 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level,
int xfs_btree_visit_blocks(struct xfs_btree_cur *cur,
xfs_btree_visit_blocks_fn fn, void *data);
int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks);
#endif /* __XFS_BTREE_H__ */
+72 -7
View File
@@ -81,6 +81,10 @@
* - For each work item attached to the log intent item,
* * Perform the described action.
* * Attach the work item to the log done item.
* * If the result of doing the work was -EAGAIN, ->finish work
* wants a new transaction. See the "Requesting a Fresh
* Transaction while Finishing Deferred Work" section below for
* details.
*
* The key here is that we must log an intent item for all pending
* work items every time we roll the transaction, and that we must log
@@ -88,6 +92,34 @@
* we can perform complex remapping operations, chaining intent items
* as needed.
*
* Requesting a Fresh Transaction while Finishing Deferred Work
*
* If ->finish_item decides that it needs a fresh transaction to
* finish the work, it must ask its caller (xfs_defer_finish) for a
* continuation. The most likely cause of this circumstance are the
* refcount adjust functions deciding that they've logged enough items
* to be at risk of exceeding the transaction reservation.
*
* To get a fresh transaction, we want to log the existing log done
* item to prevent the log intent item from replaying, immediately log
* a new log intent item with the unfinished work items, roll the
* transaction, and re-call ->finish_item wherever it left off. The
* log done item and the new log intent item must be in the same
* transaction or atomicity cannot be guaranteed; defer_finish ensures
* that this happens.
*
* This requires some coordination between ->finish_item and
* defer_finish. Upon deciding to request a new transaction,
* ->finish_item should update the current work item to reflect the
* unfinished work. Next, it should reset the log done item's list
* count to the number of items finished, and return -EAGAIN.
* defer_finish sees the -EAGAIN, logs the new log intent item
* with the remaining work items, and leaves the xfs_defer_pending
* item at the head of the dop_work queue. Then it rolls the
* transaction and picks up processing where it left off. It is
* required that ->finish_item must be careful to leave enough
* transaction reservation to fit the new log intent item.
*
* This is an example of remapping the extent (E, E+B) into file X at
* offset A and dealing with the extent (C, C+B) already being mapped
* there:
@@ -104,21 +136,26 @@
* | Intent to add rmap (X, E, A, B) |
* +-------------------------------------------------+
* | Reduce refcount for extent (C, B) | t2
* | Done reducing refcount for extent (C, B) |
* | Done reducing refcount for extent (C, 9) |
* | Intent to reduce refcount for extent (C+9, B-9) |
* | (ran out of space after 9 refcount updates) |
* +-------------------------------------------------+
* | Reduce refcount for extent (C+9, B+9) | t3
* | Done reducing refcount for extent (C+9, B-9) |
* | Increase refcount for extent (E, B) |
* | Done increasing refcount for extent (E, B) |
* | Intent to free extent (C, B) |
* | Intent to free extent (F, 1) (refcountbt block) |
* | Intent to remove rmap (F, 1, REFC) |
* +-------------------------------------------------+
* | Remove rmap (X, C, A, B) | t3
* | Remove rmap (X, C, A, B) | t4
* | Done removing rmap (X, C, A, B) |
* | Add rmap (X, E, A, B) |
* | Done adding rmap (X, E, A, B) |
* | Remove rmap (F, 1, REFC) |
* | Done removing rmap (F, 1, REFC) |
* +-------------------------------------------------+
* | Free extent (C, B) | t4
* | Free extent (C, B) | t5
* | Done freeing extent (C, B) |
* | Free extent (D, 1) |
* | Done freeing extent (D, 1) |
@@ -141,6 +178,9 @@
* - Intent to free extent (C, B)
* - Intent to free extent (F, 1) (refcountbt block)
* - Intent to remove rmap (F, 1, REFC)
*
* Note that the continuation requested between t2 and t3 is likely to
* reoccur.
*/
static const struct xfs_defer_op_type *defer_op_types[XFS_DEFER_OPS_TYPE_MAX];
@@ -323,7 +363,16 @@ xfs_defer_finish(
dfp->dfp_count--;
error = dfp->dfp_type->finish_item(*tp, dop, li,
dfp->dfp_done, &state);
if (error) {
if (error == -EAGAIN) {
/*
* Caller wants a fresh transaction;
* put the work item back on the list
* and jump out.
*/
list_add(li, &dfp->dfp_work);
dfp->dfp_count++;
break;
} else if (error) {
/*
* Clean up after ourselves and jump out.
* xfs_defer_cancel will take care of freeing
@@ -335,9 +384,25 @@ xfs_defer_finish(
goto out;
}
}
/* Done with the dfp, free it. */
list_del(&dfp->dfp_list);
kmem_free(dfp);
if (error == -EAGAIN) {
/*
* Caller wants a fresh transaction, so log a
* new log intent item to replace the old one
* and roll the transaction. See "Requesting
* a Fresh Transaction while Finishing
* Deferred Work" above.
*/
dfp->dfp_intent = dfp->dfp_type->create_intent(*tp,
dfp->dfp_count);
dfp->dfp_done = NULL;
list_for_each(li, &dfp->dfp_work)
dfp->dfp_type->log_item(*tp, dfp->dfp_intent,
li);
} else {
/* Done with the dfp, free it. */
list_del(&dfp->dfp_list);
kmem_free(dfp);
}
if (cleanup_fn)
cleanup_fn(*tp, state, error);
+1 -1
View File
@@ -132,7 +132,7 @@ xfs_inobt_free_block(
xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_INOBT);
return xfs_free_extent(cur->bc_tp,
XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp)), 1,
&oinfo);
&oinfo, XFS_AG_RESV_NONE);
}
STATIC int

Some files were not shown because too many files have changed in this diff Show More