You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer: - based on Jens' 'for-4.7/core' to have DM thinp's discard support use bio_inc_remaining() and the block core's new async __blkdev_issue_discard() interface - make DM multipath's fast code-paths lockless, using lockless_deference, to significantly improve large NUMA performance when using blk-mq. The m->lock spinlock contention was a serious bottleneck. - a few other small code cleanups and Documentation fixes * tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: unroll issue_discard() to create longer discard bio chains dm thin: use __blkdev_issue_discard for async discard support dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining() dm raid: make sure no feature flags are set in metadata dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call dm stats: fix spelling mistake in Documentation dm cache: update cache-policies.txt now that mq is an alias for smq dm mpath: eliminate use of spinlock in IO fast-paths dm mpath: move trigger_event member to the end of 'struct multipath' dm mpath: use atomic_t for counting members of 'struct multipath' dm mpath: switch to using bitops for state flags dm thin: Remove return statement from void function dm: remove unused mapped_device argument from free_tio()
This commit is contained in:
@@ -11,7 +11,7 @@ Every bio that is mapped by the target is referred to the policy.
|
||||
The policy can return a simple HIT or MISS or issue a migration.
|
||||
|
||||
Currently there's no way for the policy to issue background work,
|
||||
e.g. to start writing back dirty blocks that are going to be evicte
|
||||
e.g. to start writing back dirty blocks that are going to be evicted
|
||||
soon.
|
||||
|
||||
Because we map bios, rather than requests it's easy for the policy
|
||||
@@ -48,7 +48,7 @@ with the multiqueue (mq) policy.
|
||||
|
||||
The smq policy (vs mq) offers the promise of less memory utilization,
|
||||
improved performance and increased adaptability in the face of changing
|
||||
workloads. SMQ also does not have any cumbersome tuning knobs.
|
||||
workloads. smq also does not have any cumbersome tuning knobs.
|
||||
|
||||
Users may switch from "mq" to "smq" simply by appropriately reloading a
|
||||
DM table that is using the cache target. Doing so will cause all of the
|
||||
@@ -57,47 +57,45 @@ degrade slightly until smq recalculates the origin device's hotspots
|
||||
that should be cached.
|
||||
|
||||
Memory usage:
|
||||
The mq policy uses a lot of memory; 88 bytes per cache block on a 64
|
||||
The mq policy used a lot of memory; 88 bytes per cache block on a 64
|
||||
bit machine.
|
||||
|
||||
SMQ uses 28bit indexes to implement it's data structures rather than
|
||||
smq uses 28bit indexes to implement it's data structures rather than
|
||||
pointers. It avoids storing an explicit hit count for each block. It
|
||||
has a 'hotspot' queue rather than a pre cache which uses a quarter of
|
||||
has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
|
||||
the entries (each hotspot block covers a larger area than a single
|
||||
cache block).
|
||||
|
||||
All these mean smq uses ~25bytes per cache block. Still a lot of
|
||||
All this means smq uses ~25bytes per cache block. Still a lot of
|
||||
memory, but a substantial improvement nontheless.
|
||||
|
||||
Level balancing:
|
||||
MQ places entries in different levels of the multiqueue structures
|
||||
based on their hit count (~ln(hit count)). This means the bottom
|
||||
levels generally have the most entries, and the top ones have very
|
||||
few. Having unbalanced levels like this reduces the efficacy of the
|
||||
mq placed entries in different levels of the multiqueue structures
|
||||
based on their hit count (~ln(hit count)). This meant the bottom
|
||||
levels generally had the most entries, and the top ones had very
|
||||
few. Having unbalanced levels like this reduced the efficacy of the
|
||||
multiqueue.
|
||||
|
||||
SMQ does not maintain a hit count, instead it swaps hit entries with
|
||||
the least recently used entry from the level above. The over all
|
||||
smq does not maintain a hit count, instead it swaps hit entries with
|
||||
the least recently used entry from the level above. The overall
|
||||
ordering being a side effect of this stochastic process. With this
|
||||
scheme we can decide how many entries occupy each multiqueue level,
|
||||
resulting in better promotion/demotion decisions.
|
||||
|
||||
Adaptability:
|
||||
The MQ policy maintains a hit count for each cache block. For a
|
||||
The mq policy maintained a hit count for each cache block. For a
|
||||
different block to get promoted to the cache it's hit count has to
|
||||
exceed the lowest currently in the cache. This means it can take a
|
||||
exceed the lowest currently in the cache. This meant it could take a
|
||||
long time for the cache to adapt between varying IO patterns.
|
||||
Periodically degrading the hit counts could help with this, but I
|
||||
haven't found a nice general solution.
|
||||
|
||||
SMQ doesn't maintain hit counts, so a lot of this problem just goes
|
||||
smq doesn't maintain hit counts, so a lot of this problem just goes
|
||||
away. In addition it tracks performance of the hotspot queue, which
|
||||
is used to decide which blocks to promote. If the hotspot queue is
|
||||
performing badly then it starts moving entries more quickly between
|
||||
levels. This lets it adapt to new IO patterns very quickly.
|
||||
|
||||
Performance:
|
||||
Testing SMQ shows substantially better performance than MQ.
|
||||
Testing smq shows substantially better performance than mq.
|
||||
|
||||
cleaner
|
||||
-------
|
||||
|
||||
@@ -205,7 +205,7 @@ statistics on them:
|
||||
|
||||
dmsetup message vol 0 @stats_create - /100
|
||||
|
||||
Set the auxillary data string to "foo bar baz" (the escape for each
|
||||
Set the auxiliary data string to "foo bar baz" (the escape for each
|
||||
space must also be escaped, otherwise the shell will consume them):
|
||||
|
||||
dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz
|
||||
|
||||
@@ -1723,7 +1723,7 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
|
||||
if (!dmi) {
|
||||
unsigned noio_flag;
|
||||
noio_flag = memalloc_noio_save();
|
||||
dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_REPEAT | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
|
||||
dmi = __vmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH | __GFP_HIGHMEM, PAGE_KERNEL);
|
||||
memalloc_noio_restore(noio_flag);
|
||||
if (dmi)
|
||||
*param_flags |= DM_PARAMS_VMALLOC;
|
||||
|
||||
+195
-156
File diff suppressed because it is too large
Load Diff
@@ -1037,6 +1037,11 @@ static int super_validate(struct raid_set *rs, struct md_rdev *rdev)
|
||||
if (!mddev->events && super_init_validation(mddev, rdev))
|
||||
return -EINVAL;
|
||||
|
||||
if (le32_to_cpu(sb->features)) {
|
||||
rs->ti->error = "Unable to assemble array: No feature flags supported yet";
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* Enable bitmap creation for RAID levels != 0 */
|
||||
mddev->bitmap_info.offset = (rs->raid_type->level) ? to_sector(4096) : 0;
|
||||
rdev->mddev->bitmap_info.default_offset = mddev->bitmap_info.offset;
|
||||
@@ -1718,7 +1723,7 @@ static void raid_resume(struct dm_target *ti)
|
||||
|
||||
static struct target_type raid_target = {
|
||||
.name = "raid",
|
||||
.version = {1, 7, 0},
|
||||
.version = {1, 8, 0},
|
||||
.module = THIS_MODULE,
|
||||
.ctr = raid_ctr,
|
||||
.dtr = raid_dtr,
|
||||
|
||||
+75
-90
@@ -322,56 +322,6 @@ struct thin_c {
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
|
||||
/**
|
||||
* __blkdev_issue_discard_async - queue a discard with async completion
|
||||
* @bdev: blockdev to issue discard for
|
||||
* @sector: start sector
|
||||
* @nr_sects: number of sectors to discard
|
||||
* @gfp_mask: memory allocation flags (for bio_alloc)
|
||||
* @flags: BLKDEV_IFL_* flags to control behaviour
|
||||
* @parent_bio: parent discard bio that all sub discards get chained to
|
||||
*
|
||||
* Description:
|
||||
* Asynchronously issue a discard request for the sectors in question.
|
||||
*/
|
||||
static int __blkdev_issue_discard_async(struct block_device *bdev, sector_t sector,
|
||||
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
|
||||
struct bio *parent_bio)
|
||||
{
|
||||
struct request_queue *q = bdev_get_queue(bdev);
|
||||
int type = REQ_WRITE | REQ_DISCARD;
|
||||
struct bio *bio;
|
||||
|
||||
if (!q || !nr_sects)
|
||||
return -ENXIO;
|
||||
|
||||
if (!blk_queue_discard(q))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
if (flags & BLKDEV_DISCARD_SECURE) {
|
||||
if (!blk_queue_secdiscard(q))
|
||||
return -EOPNOTSUPP;
|
||||
type |= REQ_SECURE;
|
||||
}
|
||||
|
||||
/*
|
||||
* Required bio_put occurs in bio_endio thanks to bio_chain below
|
||||
*/
|
||||
bio = bio_alloc(gfp_mask, 1);
|
||||
if (!bio)
|
||||
return -ENOMEM;
|
||||
|
||||
bio_chain(bio, parent_bio);
|
||||
|
||||
bio->bi_iter.bi_sector = sector;
|
||||
bio->bi_bdev = bdev;
|
||||
bio->bi_iter.bi_size = nr_sects << 9;
|
||||
|
||||
submit_bio(type, bio);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static bool block_size_is_power_of_two(struct pool *pool)
|
||||
{
|
||||
return pool->sectors_per_block_shift >= 0;
|
||||
@@ -384,14 +334,55 @@ static sector_t block_to_sectors(struct pool *pool, dm_block_t b)
|
||||
(b * pool->sectors_per_block);
|
||||
}
|
||||
|
||||
static int issue_discard(struct thin_c *tc, dm_block_t data_b, dm_block_t data_e,
|
||||
struct bio *parent_bio)
|
||||
/*----------------------------------------------------------------*/
|
||||
|
||||
struct discard_op {
|
||||
struct thin_c *tc;
|
||||
struct blk_plug plug;
|
||||
struct bio *parent_bio;
|
||||
struct bio *bio;
|
||||
};
|
||||
|
||||
static void begin_discard(struct discard_op *op, struct thin_c *tc, struct bio *parent)
|
||||
{
|
||||
BUG_ON(!parent);
|
||||
|
||||
op->tc = tc;
|
||||
blk_start_plug(&op->plug);
|
||||
op->parent_bio = parent;
|
||||
op->bio = NULL;
|
||||
}
|
||||
|
||||
static int issue_discard(struct discard_op *op, dm_block_t data_b, dm_block_t data_e)
|
||||
{
|
||||
struct thin_c *tc = op->tc;
|
||||
sector_t s = block_to_sectors(tc->pool, data_b);
|
||||
sector_t len = block_to_sectors(tc->pool, data_e - data_b);
|
||||
|
||||
return __blkdev_issue_discard_async(tc->pool_dev->bdev, s, len,
|
||||
GFP_NOWAIT, 0, parent_bio);
|
||||
return __blkdev_issue_discard(tc->pool_dev->bdev, s, len,
|
||||
GFP_NOWAIT, REQ_WRITE | REQ_DISCARD, &op->bio);
|
||||
}
|
||||
|
||||
static void end_discard(struct discard_op *op, int r)
|
||||
{
|
||||
if (op->bio) {
|
||||
/*
|
||||
* Even if one of the calls to issue_discard failed, we
|
||||
* need to wait for the chain to complete.
|
||||
*/
|
||||
bio_chain(op->bio, op->parent_bio);
|
||||
submit_bio(REQ_WRITE | REQ_DISCARD, op->bio);
|
||||
}
|
||||
|
||||
blk_finish_plug(&op->plug);
|
||||
|
||||
/*
|
||||
* Even if r is set, there could be sub discards in flight that we
|
||||
* need to wait for.
|
||||
*/
|
||||
if (r && !op->parent_bio->bi_error)
|
||||
op->parent_bio->bi_error = r;
|
||||
bio_endio(op->parent_bio);
|
||||
}
|
||||
|
||||
/*----------------------------------------------------------------*/
|
||||
@@ -632,7 +623,7 @@ static void error_retry_list(struct pool *pool)
|
||||
{
|
||||
int error = get_pool_io_error_code(pool);
|
||||
|
||||
return error_retry_list_with_code(pool, error);
|
||||
error_retry_list_with_code(pool, error);
|
||||
}
|
||||
|
||||
/*
|
||||
@@ -1006,24 +997,28 @@ static void process_prepared_discard_no_passdown(struct dm_thin_new_mapping *m)
|
||||
mempool_free(m, tc->pool->mapping_pool);
|
||||
}
|
||||
|
||||
static int passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
|
||||
/*----------------------------------------------------------------*/
|
||||
|
||||
static void passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
|
||||
{
|
||||
/*
|
||||
* We've already unmapped this range of blocks, but before we
|
||||
* passdown we have to check that these blocks are now unused.
|
||||
*/
|
||||
int r;
|
||||
int r = 0;
|
||||
bool used = true;
|
||||
struct thin_c *tc = m->tc;
|
||||
struct pool *pool = tc->pool;
|
||||
dm_block_t b = m->data_block, e, end = m->data_block + m->virt_end - m->virt_begin;
|
||||
struct discard_op op;
|
||||
|
||||
begin_discard(&op, tc, m->bio);
|
||||
while (b != end) {
|
||||
/* find start of unmapped run */
|
||||
for (; b < end; b++) {
|
||||
r = dm_pool_block_is_used(pool->pmd, b, &used);
|
||||
if (r)
|
||||
return r;
|
||||
goto out;
|
||||
|
||||
if (!used)
|
||||
break;
|
||||
@@ -1036,20 +1031,20 @@ static int passdown_double_checking_shared_status(struct dm_thin_new_mapping *m)
|
||||
for (e = b + 1; e != end; e++) {
|
||||
r = dm_pool_block_is_used(pool->pmd, e, &used);
|
||||
if (r)
|
||||
return r;
|
||||
goto out;
|
||||
|
||||
if (used)
|
||||
break;
|
||||
}
|
||||
|
||||
r = issue_discard(tc, b, e, m->bio);
|
||||
r = issue_discard(&op, b, e);
|
||||
if (r)
|
||||
return r;
|
||||
goto out;
|
||||
|
||||
b = e;
|
||||
}
|
||||
|
||||
return 0;
|
||||
out:
|
||||
end_discard(&op, r);
|
||||
}
|
||||
|
||||
static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
|
||||
@@ -1059,20 +1054,21 @@ static void process_prepared_discard_passdown(struct dm_thin_new_mapping *m)
|
||||
struct pool *pool = tc->pool;
|
||||
|
||||
r = dm_thin_remove_range(tc->td, m->virt_begin, m->virt_end);
|
||||
if (r)
|
||||
if (r) {
|
||||
metadata_operation_failed(pool, "dm_thin_remove_range", r);
|
||||
bio_io_error(m->bio);
|
||||
|
||||
else if (m->maybe_shared)
|
||||
r = passdown_double_checking_shared_status(m);
|
||||
else
|
||||
r = issue_discard(tc, m->data_block, m->data_block + (m->virt_end - m->virt_begin), m->bio);
|
||||
} else if (m->maybe_shared) {
|
||||
passdown_double_checking_shared_status(m);
|
||||
|
||||
} else {
|
||||
struct discard_op op;
|
||||
begin_discard(&op, tc, m->bio);
|
||||
r = issue_discard(&op, m->data_block,
|
||||
m->data_block + (m->virt_end - m->virt_begin));
|
||||
end_discard(&op, r);
|
||||
}
|
||||
|
||||
/*
|
||||
* Even if r is set, there could be sub discards in flight that we
|
||||
* need to wait for.
|
||||
*/
|
||||
m->bio->bi_error = r;
|
||||
bio_endio(m->bio);
|
||||
cell_defer_no_holder(tc, m->cell);
|
||||
mempool_free(m, pool->mapping_pool);
|
||||
}
|
||||
@@ -1494,17 +1490,6 @@ static void process_discard_cell_no_passdown(struct thin_c *tc,
|
||||
pool->process_prepared_discard(m);
|
||||
}
|
||||
|
||||
/*
|
||||
* __bio_inc_remaining() is used to defer parent bios's end_io until
|
||||
* we _know_ all chained sub range discard bios have completed.
|
||||
*/
|
||||
static inline void __bio_inc_remaining(struct bio *bio)
|
||||
{
|
||||
bio->bi_flags |= (1 << BIO_CHAIN);
|
||||
smp_mb__before_atomic();
|
||||
atomic_inc(&bio->__bi_remaining);
|
||||
}
|
||||
|
||||
static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t end,
|
||||
struct bio *bio)
|
||||
{
|
||||
@@ -1554,13 +1539,13 @@ static void break_up_discard_bio(struct thin_c *tc, dm_block_t begin, dm_block_t
|
||||
|
||||
/*
|
||||
* The parent bio must not complete before sub discard bios are
|
||||
* chained to it (see __blkdev_issue_discard_async's bio_chain)!
|
||||
* chained to it (see end_discard's bio_chain)!
|
||||
*
|
||||
* This per-mapping bi_remaining increment is paired with
|
||||
* the implicit decrement that occurs via bio_endio() in
|
||||
* process_prepared_discard_{passdown,no_passdown}.
|
||||
* end_discard().
|
||||
*/
|
||||
__bio_inc_remaining(bio);
|
||||
bio_inc_remaining(bio);
|
||||
if (!dm_deferred_set_add_work(pool->all_io_ds, &m->list))
|
||||
pool->process_prepared_discard(m);
|
||||
|
||||
@@ -3899,7 +3884,7 @@ static struct target_type pool_target = {
|
||||
.name = "thin-pool",
|
||||
.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
|
||||
DM_TARGET_IMMUTABLE,
|
||||
.version = {1, 18, 0},
|
||||
.version = {1, 19, 0},
|
||||
.module = THIS_MODULE,
|
||||
.ctr = pool_ctr,
|
||||
.dtr = pool_dtr,
|
||||
@@ -4273,7 +4258,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
|
||||
|
||||
static struct target_type thin_target = {
|
||||
.name = "thin",
|
||||
.version = {1, 18, 0},
|
||||
.version = {1, 19, 0},
|
||||
.module = THIS_MODULE,
|
||||
.ctr = thin_ctr,
|
||||
.dtr = thin_dtr,
|
||||
|
||||
+4
-6
@@ -674,7 +674,7 @@ static void free_io(struct mapped_device *md, struct dm_io *io)
|
||||
mempool_free(io, md->io_pool);
|
||||
}
|
||||
|
||||
static void free_tio(struct mapped_device *md, struct dm_target_io *tio)
|
||||
static void free_tio(struct dm_target_io *tio)
|
||||
{
|
||||
bio_put(&tio->clone);
|
||||
}
|
||||
@@ -1055,7 +1055,7 @@ static void clone_endio(struct bio *bio)
|
||||
!bdev_get_queue(bio->bi_bdev)->limits.max_write_same_sectors))
|
||||
disable_write_same(md);
|
||||
|
||||
free_tio(md, tio);
|
||||
free_tio(tio);
|
||||
dec_pending(io, error);
|
||||
}
|
||||
|
||||
@@ -1517,7 +1517,6 @@ static void __map_bio(struct dm_target_io *tio)
|
||||
{
|
||||
int r;
|
||||
sector_t sector;
|
||||
struct mapped_device *md;
|
||||
struct bio *clone = &tio->clone;
|
||||
struct dm_target *ti = tio->ti;
|
||||
|
||||
@@ -1540,9 +1539,8 @@ static void __map_bio(struct dm_target_io *tio)
|
||||
generic_make_request(clone);
|
||||
} else if (r < 0 || r == DM_MAPIO_REQUEUE) {
|
||||
/* error the io and bail out, or requeue it if needed */
|
||||
md = tio->io->md;
|
||||
dec_pending(tio->io, r);
|
||||
free_tio(md, tio);
|
||||
free_tio(tio);
|
||||
} else if (r != DM_MAPIO_SUBMITTED) {
|
||||
DMWARN("unimplemented target map return value: %d", r);
|
||||
BUG();
|
||||
@@ -1663,7 +1661,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
|
||||
tio->len_ptr = len;
|
||||
r = clone_bio(tio, bio, sector, *len);
|
||||
if (r < 0) {
|
||||
free_tio(ci->md, tio);
|
||||
free_tio(tio);
|
||||
break;
|
||||
}
|
||||
__map_bio(tio);
|
||||
|
||||
Reference in New Issue
Block a user