Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block

Pull core block IO changes from Jens Axboe:
 "The major piece in here is the immutable bio_ve series from Kent, the
  rest is fairly minor.  It was supposed to go in last round, but
  various issues pushed it to this release instead.  The pull request
  contains:

   - Various smaller blk-mq fixes from different folks.  Nothing major
     here, just minor fixes and cleanups.

   - Fix for a memory leak in the error path in the block ioctl code
     from Christian Engelmayer.

   - Header export fix from CaiZhiyong.

   - Finally the immutable biovec changes from Kent Overstreet.  This
     enables some nice future work on making arbitrarily sized bios
     possible, and splitting more efficient.  Related fixes to immutable
     bio_vecs:

        - dm-cache immutable fixup from Mike Snitzer.
        - btrfs immutable fixup from Muthu Kumar.

  - bio-integrity fix from Nic Bellinger, which is also going to stable"

* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
  xtensa: fixup simdisk driver to work with immutable bio_vecs
  block/blk-mq-cpu.c: use hotcpu_notifier()
  blk-mq: for_each_* macro correctness
  block: Fix memory leak in rw_copy_check_uvector() handling
  bio-integrity: Fix bio_integrity_verify segment start bug
  block: remove unrelated header files and export symbol
  blk-mq: uses page->list incorrectly
  blk-mq: use __smp_call_function_single directly
  btrfs: fix missing increment of bi_remaining
  Revert "block: Warn and free bio if bi_end_io is not set"
  block: Warn and free bio if bi_end_io is not set
  blk-mq: fix initializing request's start time
  block: blk-mq: don't export blk_mq_free_queue()
  block: blk-mq: make blk_sync_queue support mq
  block: blk-mq: support draining mq queue
  dm cache: increment bi_remaining when bi_end_io is restored
  block: fixup for generic bio chaining
  block: Really silence spurious compiler warnings
  block: Silence spurious compiler warnings
  block: Kill bio_pair_split()
  ...
This commit is contained in:
Linus Torvalds
2014-01-30 11:19:05 -08:00
139 changed files with 2137 additions and 2676 deletions
+3 -4
View File
@@ -447,14 +447,13 @@ struct bio_vec {
* main unit of I/O for the block layer and lower layers (ie drivers)
*/
struct bio {
sector_t bi_sector;
struct bio *bi_next; /* request queue link */
struct block_device *bi_bdev; /* target device */
unsigned long bi_flags; /* status, command, etc */
unsigned long bi_rw; /* low bits: r/w, high: priority */
unsigned int bi_vcnt; /* how may bio_vec's */
unsigned int bi_idx; /* current index into bio_vec array */
struct bvec_iter bi_iter; /* current index into bio_vec array */
unsigned int bi_size; /* total size in bytes */
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
@@ -480,7 +479,7 @@ With this multipage bio design:
- Code that traverses the req list can find all the segments of a bio
by using rq_for_each_segment. This handles the fact that a request
has multiple bios, each of which can have multiple segments.
- Drivers which can't process a large bio in one shot can use the bi_idx
- Drivers which can't process a large bio in one shot can use the bi_iter
field to keep track of the next bio_vec entry to process.
(e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
[TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
@@ -589,7 +588,7 @@ driver should not modify these values. The block layer sets up the
nr_sectors and current_nr_sectors fields (based on the corresponding
hard_xxx values and the number of bytes transferred) and updates it on
every transfer that invokes end_that_request_first. It does the same for the
buffer, bio, bio->bi_idx fields too.
buffer, bio, bio->bi_iter fields too.
The buffer field is just a virtual address mapping of the current segment
of the i/o buffer in cases where the buffer resides in low-memory. For high
+111
View File
@@ -0,0 +1,111 @@
Immutable biovecs and biovec iterators:
=======================================
Kent Overstreet <kmo@daterainc.com>
As of 3.13, biovecs should never be modified after a bio has been submitted.
Instead, we have a new struct bvec_iter which represents a range of a biovec -
the iterator will be modified as the bio is completed, not the biovec.
More specifically, old code that needed to partially complete a bio would
update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
ended up partway through a biovec, it would increment bv_offset and decrement
bv_len by the number of bytes completed in that biovec.
In the new scheme of things, everything that must be mutated in order to
partially complete a bio is segregated into struct bvec_iter: bi_sector,
bi_size and bi_idx have been moved there; and instead of modifying bv_offset
and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
bytes completed in the current bvec.
There are a bunch of new helper macros for hiding the gory details - in
particular, presenting the illusion of partially completed biovecs so that
normal code doesn't have to deal with bi_bvec_done.
* Driver code should no longer refer to biovecs directly; we now have
bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
constructed from the raw biovecs but taking into account bi_bvec_done and
bi_size.
bio_for_each_segment() has been updated to take a bvec_iter argument
instead of an integer (that corresponded to bi_idx); for a lot of code the
conversion just required changing the types of the arguments to
bio_for_each_segment().
* Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
advances the bio integrity's iter if present.
There is a lower level advance function - bvec_iter_advance() - which takes
a pointer to a biovec, not a bio; this is used by the bio integrity code.
What's all this get us?
=======================
Having a real iterator, and making biovecs immutable, has a number of
advantages:
* Before, iterating over bios was very awkward when you weren't processing
exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
which copies the contents of one bio into another. Because the biovecs
wouldn't necessarily be the same size, the old code was tricky convoluted -
it had to walk two different bios at the same time, keeping both bi_idx and
and offset into the current biovec for each.
The new code is much more straightforward - have a look. This sort of
pattern comes up in a lot of places; a lot of drivers were essentially open
coding bvec iterators before, and having common implementation considerably
simplifies a lot of code.
* Before, any code that might need to use the biovec after the bio had been
completed (perhaps to copy the data somewhere else, or perhaps to resubmit
it somewhere else if there was an error) had to save the entire bvec array
- again, this was being done in a fair number of places.
* Biovecs can be shared between multiple bios - a bvec iter can represent an
arbitrary range of an existing biovec, both starting and ending midway
through biovecs. This is what enables efficient splitting of arbitrary
bios. Note that this means we _only_ use bi_size to determine when we've
reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
bi_size into account when constructing biovecs.
* Splitting bios is now much simpler. The old bio_split() didn't even work on
bios with more than a single bvec! Now, we can efficiently split arbitrary
size bios - because the new bio can share the old bio's biovec.
Care must be taken to ensure the biovec isn't freed while the split bio is
still using it, in case the original bio completes first, though. Using
bio_chain() when splitting bios helps with this.
* Submitting partially completed bios is now perfectly fine - this comes up
occasionally in stacking block drivers and various code (e.g. md and
bcache) had some ugly workarounds for this.
It used to be the case that submitting a partially completed bio would work
fine to _most_ devices, but since accessing the raw bvec array was the
norm, not all drivers would respect bi_idx and those would break. Now,
since all drivers _must_ go through the bvec iterator - and have been
audited to make sure they are - submitting partially completed bios is
perfectly fine.
Other implications:
===================
* Almost all usage of bi_idx is now incorrect and has been removed; instead,
where previously you would have used bi_idx you'd now use a bvec_iter,
probably passing it to one of the helper macros.
I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
now use bio_iter_iovec(), which takes a bvec_iter and returns a
literal struct bio_vec - constructed on the fly from the raw biovec but
taking into account bi_bvec_done (and bi_size).
* bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
doesn't actually own the bio. The reason is twofold: firstly, it's not
actually needed for iterating over the bio anymore - we only use bi_size.
Secondly, when cloning a bio and reusing (a portion of) the original bio's
biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
over all the biovecs in the new bio - which is silly as it's not needed.
So, don't use bi_vcnt anymore.
+7 -6
View File
@@ -62,17 +62,18 @@ struct nfhd_device {
static void nfhd_make_request(struct request_queue *queue, struct bio *bio)
{
struct nfhd_device *dev = queue->queuedata;
struct bio_vec *bvec;
int i, dir, len, shift;
sector_t sec = bio->bi_sector;
struct bio_vec bvec;
struct bvec_iter iter;
int dir, len, shift;
sector_t sec = bio->bi_iter.bi_sector;
dir = bio_data_dir(bio);
shift = dev->bshift;
bio_for_each_segment(bvec, bio, i) {
len = bvec->bv_len;
bio_for_each_segment(bvec, bio, iter) {
len = bvec.bv_len;
len >>= 9;
nfhd_read_write(dev->id, 0, dir, sec >> shift, len >> shift,
bvec_to_phys(bvec));
bvec_to_phys(&bvec));
sec += len;
}
bio_endio(bio, 0);
+11 -10
View File
@@ -109,27 +109,28 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
struct axon_ram_bank *bank = bio->bi_bdev->bd_disk->private_data;
unsigned long phys_mem, phys_end;
void *user_mem;
struct bio_vec *vec;
struct bio_vec vec;
unsigned int transfered;
unsigned short idx;
struct bvec_iter iter;
phys_mem = bank->io_addr + (bio->bi_sector << AXON_RAM_SECTOR_SHIFT);
phys_mem = bank->io_addr + (bio->bi_iter.bi_sector <<
AXON_RAM_SECTOR_SHIFT);
phys_end = bank->io_addr + bank->size;
transfered = 0;
bio_for_each_segment(vec, bio, idx) {
if (unlikely(phys_mem + vec->bv_len > phys_end)) {
bio_for_each_segment(vec, bio, iter) {
if (unlikely(phys_mem + vec.bv_len > phys_end)) {
bio_io_error(bio);
return;
}
user_mem = page_address(vec->bv_page) + vec->bv_offset;
user_mem = page_address(vec.bv_page) + vec.bv_offset;
if (bio_data_dir(bio) == READ)
memcpy(user_mem, (void *) phys_mem, vec->bv_len);
memcpy(user_mem, (void *) phys_mem, vec.bv_len);
else
memcpy((void *) phys_mem, user_mem, vec->bv_len);
memcpy((void *) phys_mem, user_mem, vec.bv_len);
phys_mem += vec->bv_len;
transfered += vec->bv_len;
phys_mem += vec.bv_len;
transfered += vec.bv_len;
}
bio_endio(bio, 0);
}
+7 -7
View File
@@ -103,18 +103,18 @@ static void simdisk_transfer(struct simdisk *dev, unsigned long sector,
static int simdisk_xfer_bio(struct simdisk *dev, struct bio *bio)
{
int i;
struct bio_vec *bvec;
sector_t sector = bio->bi_sector;
struct bio_vec bvec;
struct bvec_iter iter;
sector_t sector = bio->bi_iter.bi_sector;
bio_for_each_segment(bvec, bio, i) {
char *buffer = __bio_kmap_atomic(bio, i);
unsigned len = bvec->bv_len >> SECTOR_SHIFT;
bio_for_each_segment(bvec, bio, iter) {
char *buffer = __bio_kmap_atomic(bio, iter);
unsigned len = bvec.bv_len >> SECTOR_SHIFT;
simdisk_transfer(dev, sector, len, buffer,
bio_data_dir(bio) == WRITE);
sector += len;
__bio_kunmap_atomic(bio);
__bio_kunmap_atomic(buffer);
}
return 0;
}
+38 -23
View File
@@ -38,6 +38,7 @@
#include "blk.h"
#include "blk-cgroup.h"
#include "blk-mq.h"
EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
@@ -130,7 +131,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
bio_advance(bio, nbytes);
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
bio_endio(bio, error);
}
@@ -245,7 +246,16 @@ EXPORT_SYMBOL(blk_stop_queue);
void blk_sync_queue(struct request_queue *q)
{
del_timer_sync(&q->timeout);
cancel_delayed_work_sync(&q->delay_work);
if (q->mq_ops) {
struct blk_mq_hw_ctx *hctx;
int i;
queue_for_each_hw_ctx(q, hctx, i)
cancel_delayed_work_sync(&hctx->delayed_work);
} else {
cancel_delayed_work_sync(&q->delay_work);
}
}
EXPORT_SYMBOL(blk_sync_queue);
@@ -497,8 +507,13 @@ void blk_cleanup_queue(struct request_queue *q)
* Drain all requests queued before DYING marking. Set DEAD flag to
* prevent that q->request_fn() gets invoked after draining finished.
*/
spin_lock_irq(lock);
__blk_drain_queue(q, true);
if (q->mq_ops) {
blk_mq_drain_queue(q);
spin_lock_irq(lock);
} else {
spin_lock_irq(lock);
__blk_drain_queue(q, true);
}
queue_flag_set(QUEUE_FLAG_DEAD, q);
spin_unlock_irq(lock);
@@ -1326,7 +1341,7 @@ void blk_add_request_payload(struct request *rq, struct page *page,
bio->bi_io_vec->bv_offset = 0;
bio->bi_io_vec->bv_len = len;
bio->bi_size = len;
bio->bi_iter.bi_size = len;
bio->bi_vcnt = 1;
bio->bi_phys_segments = 1;
@@ -1351,7 +1366,7 @@ bool bio_attempt_back_merge(struct request_queue *q, struct request *req,
req->biotail->bi_next = bio;
req->biotail = bio;
req->__data_len += bio->bi_size;
req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
blk_account_io_start(req, false);
@@ -1380,8 +1395,8 @@ bool bio_attempt_front_merge(struct request_queue *q, struct request *req,
* not touch req->buffer either...
*/
req->buffer = bio_data(bio);
req->__sector = bio->bi_sector;
req->__data_len += bio->bi_size;
req->__sector = bio->bi_iter.bi_sector;
req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));
blk_account_io_start(req, false);
@@ -1459,7 +1474,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
req->cmd_flags |= REQ_FAILFAST_MASK;
req->errors = 0;
req->__sector = bio->bi_sector;
req->__sector = bio->bi_iter.bi_sector;
req->ioprio = bio_prio(bio);
blk_rq_bio_prep(req->q, req, bio);
}
@@ -1583,12 +1598,12 @@ static inline void blk_partition_remap(struct bio *bio)
if (bio_sectors(bio) && bdev != bdev->bd_contains) {
struct hd_struct *p = bdev->bd_part;
bio->bi_sector += p->start_sect;
bio->bi_iter.bi_sector += p->start_sect;
bio->bi_bdev = bdev->bd_contains;
trace_block_bio_remap(bdev_get_queue(bio->bi_bdev), bio,
bdev->bd_dev,
bio->bi_sector - p->start_sect);
bio->bi_iter.bi_sector - p->start_sect);
}
}
@@ -1654,7 +1669,7 @@ static inline int bio_check_eod(struct bio *bio, unsigned int nr_sectors)
/* Test device or partition size, when known. */
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (maxsector) {
sector_t sector = bio->bi_sector;
sector_t sector = bio->bi_iter.bi_sector;
if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
/*
@@ -1690,7 +1705,7 @@ generic_make_request_checks(struct bio *bio)
"generic_make_request: Trying to access "
"nonexistent block-device %s (%Lu)\n",
bdevname(bio->bi_bdev, b),
(long long) bio->bi_sector);
(long long) bio->bi_iter.bi_sector);
goto end_io;
}
@@ -1704,9 +1719,9 @@ generic_make_request_checks(struct bio *bio)
}
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_size) ||
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
bio->bi_size))
bio->bi_iter.bi_size))
goto end_io;
/*
@@ -1865,7 +1880,7 @@ void submit_bio(int rw, struct bio *bio)
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
task_io_account_read(bio->bi_size);
task_io_account_read(bio->bi_iter.bi_size);
count_vm_events(PGPGIN, count);
}
@@ -1874,7 +1889,7 @@ void submit_bio(int rw, struct bio *bio)
printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u sectors)\n",
current->comm, task_pid_nr(current),
(rw & WRITE) ? "WRITE" : "READ",
(unsigned long long)bio->bi_sector,
(unsigned long long)bio->bi_iter.bi_sector,
bdevname(bio->bi_bdev, b),
count);
}
@@ -2007,7 +2022,7 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
for (bio = rq->bio; bio; bio = bio->bi_next) {
if ((bio->bi_rw & ff) != ff)
break;
bytes += bio->bi_size;
bytes += bio->bi_iter.bi_size;
}
/* this could lead to infinite loop */
@@ -2378,9 +2393,9 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
total_bytes = 0;
while (req->bio) {
struct bio *bio = req->bio;
unsigned bio_bytes = min(bio->bi_size, nr_bytes);
unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
if (bio_bytes == bio->bi_size)
if (bio_bytes == bio->bi_iter.bi_size)
req->bio = bio->bi_next;
req_bio_endio(req, bio, bio_bytes, error);
@@ -2728,7 +2743,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
rq->nr_phys_segments = bio_phys_segments(q, bio);
rq->buffer = bio_data(bio);
}
rq->__data_len = bio->bi_size;
rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;
if (bio->bi_bdev)
@@ -2746,10 +2761,10 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
void rq_flush_dcache_pages(struct request *rq)
{
struct req_iterator iter;
struct bio_vec *bvec;
struct bio_vec bvec;
rq_for_each_segment(bvec, rq, iter)
flush_dcache_page(bvec->bv_page);
flush_dcache_page(bvec.bv_page);
}
EXPORT_SYMBOL_GPL(rq_flush_dcache_pages);
#endif
+4
View File
@@ -60,6 +60,10 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
rq->rq_disk = bd_disk;
rq->end_io = done;
/*
* don't check dying flag for MQ because the request won't
* be resued after dying flag is set
*/
if (q->mq_ops) {
blk_mq_insert_request(q, rq, true);
return;
+1 -1
View File
@@ -548,7 +548,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
* copied from blk_rq_pos(rq).
*/
if (error_sector)
*error_sector = bio->bi_sector;
*error_sector = bio->bi_iter.bi_sector;
bio_put(bio);
return ret;
+22 -18
View File
@@ -43,30 +43,32 @@ static const char *bi_unsupported_name = "unsupported";
*/
int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
{
struct bio_vec *iv, *ivprv = NULL;
struct bio_vec iv, ivprv = { NULL };
unsigned int segments = 0;
unsigned int seg_size = 0;
unsigned int i = 0;
struct bvec_iter iter;
int prev = 0;
bio_for_each_integrity_vec(iv, bio, i) {
bio_for_each_integrity_vec(iv, bio, iter) {
if (ivprv) {
if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
if (prev) {
if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;
if (seg_size + iv->bv_len > queue_max_segment_size(q))
if (seg_size + iv.bv_len > queue_max_segment_size(q))
goto new_segment;
seg_size += iv->bv_len;
seg_size += iv.bv_len;
} else {
new_segment:
segments++;
seg_size = iv->bv_len;
seg_size = iv.bv_len;
}
prev = 1;
ivprv = iv;
}
@@ -87,24 +89,25 @@ EXPORT_SYMBOL(blk_rq_count_integrity_sg);
int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *bio,
struct scatterlist *sglist)
{
struct bio_vec *iv, *ivprv = NULL;
struct bio_vec iv, ivprv = { NULL };
struct scatterlist *sg = NULL;
unsigned int segments = 0;
unsigned int i = 0;
struct bvec_iter iter;
int prev = 0;
bio_for_each_integrity_vec(iv, bio, i) {
bio_for_each_integrity_vec(iv, bio, iter) {
if (ivprv) {
if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
if (prev) {
if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;
if (sg->length + iv->bv_len > queue_max_segment_size(q))
if (sg->length + iv.bv_len > queue_max_segment_size(q))
goto new_segment;
sg->length += iv->bv_len;
sg->length += iv.bv_len;
} else {
new_segment:
if (!sg)
@@ -114,10 +117,11 @@ new_segment:
sg = sg_next(sg);
}
sg_set_page(sg, iv->bv_page, iv->bv_len, iv->bv_offset);
sg_set_page(sg, iv.bv_page, iv.bv_len, iv.bv_offset);
segments++;
}
prev = 1;
ivprv = iv;
}
+6 -6
View File
@@ -108,12 +108,12 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
req_sects = end_sect - sector;
}
bio->bi_sector = sector;
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;
bio->bi_size = req_sects << 9;
bio->bi_iter.bi_size = req_sects << 9;
nr_sects -= req_sects;
sector = end_sect;
@@ -174,7 +174,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
break;
}
bio->bi_sector = sector;
bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;
@@ -184,11 +184,11 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
if (nr_sects > max_write_same_sectors) {
bio->bi_size = max_write_same_sectors << 9;
bio->bi_iter.bi_size = max_write_same_sectors << 9;
nr_sects -= max_write_same_sectors;
sector += max_write_same_sectors;
} else {
bio->bi_size = nr_sects << 9;
bio->bi_iter.bi_size = nr_sects << 9;
nr_sects = 0;
}
@@ -240,7 +240,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
break;
}
bio->bi_sector = sector;
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
bio->bi_end_io = bio_batch_end_io;
bio->bi_private = &bb;
+3 -3
View File
@@ -20,7 +20,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq,
rq->biotail->bi_next = bio;
rq->biotail = bio;
rq->__data_len += bio->bi_size;
rq->__data_len += bio->bi_iter.bi_size;
}
return 0;
}
@@ -76,7 +76,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,
ret = blk_rq_append_bio(q, rq, bio);
if (!ret)
return bio->bi_size;
return bio->bi_iter.bi_size;
/* if it was boucned we must call the end io function */
bio_endio(bio, 0);
@@ -220,7 +220,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
if (IS_ERR(bio))
return PTR_ERR(bio);
if (bio->bi_size != len) {
if (bio->bi_iter.bi_size != len) {
/*
* Grab an extra reference to this bio, as bio_unmap_user()
* expects to be able to drop it twice as it happens on the
+36 -30
View File
@@ -12,10 +12,11 @@
static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
struct bio *bio)
{
struct bio_vec *bv, *bvprv = NULL;
int cluster, i, high, highprv = 1;
struct bio_vec bv, bvprv = { NULL };
int cluster, high, highprv = 1;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;
struct bvec_iter iter;
if (!bio)
return 0;
@@ -25,25 +26,23 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
seg_size = 0;
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, i) {
bio_for_each_segment(bv, bio, iter) {
/*
* the trick here is making sure that a high page is
* never considered part of another segment, since that
* might change with the bounce page.
*/
high = page_to_pfn(bv->bv_page) > queue_bounce_pfn(q);
if (high || highprv)
goto new_segment;
if (cluster) {
if (seg_size + bv->bv_len
high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
if (!high && !highprv && cluster) {
if (seg_size + bv.bv_len
> queue_max_segment_size(q))
goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
goto new_segment;
seg_size += bv->bv_len;
seg_size += bv.bv_len;
bvprv = bv;
continue;
}
@@ -54,7 +53,7 @@ new_segment:
nr_phys_segs++;
bvprv = bv;
seg_size = bv->bv_len;
seg_size = bv.bv_len;
highprv = high;
}
bbio = bio;
@@ -87,6 +86,9 @@ EXPORT_SYMBOL(blk_recount_segments);
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
struct bio *nxt)
{
struct bio_vec end_bv = { NULL }, nxt_bv;
struct bvec_iter iter;
if (!blk_queue_cluster(q))
return 0;
@@ -97,34 +99,40 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
if (!bio_has_data(bio))
return 1;
if (!BIOVEC_PHYS_MERGEABLE(__BVEC_END(bio), __BVEC_START(nxt)))
bio_for_each_segment(end_bv, bio, iter)
if (end_bv.bv_len == iter.bi_size)
break;
nxt_bv = bio_iovec(nxt);
if (!BIOVEC_PHYS_MERGEABLE(&end_bv, &nxt_bv))
return 0;
/*
* bio and nxt are contiguous in memory; check if the queue allows
* these two to be merged into one
*/
if (BIO_SEG_BOUNDARY(q, bio, nxt))
if (BIOVEC_SEG_BOUNDARY(q, &end_bv, &nxt_bv))
return 1;
return 0;
}
static void
static inline void
__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
struct scatterlist *sglist, struct bio_vec **bvprv,
struct scatterlist *sglist, struct bio_vec *bvprv,
struct scatterlist **sg, int *nsegs, int *cluster)
{
int nbytes = bvec->bv_len;
if (*bvprv && *cluster) {
if (*sg && *cluster) {
if ((*sg)->length + nbytes > queue_max_segment_size(q))
goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(*bvprv, bvec))
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, *bvprv, bvec))
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;
(*sg)->length += nbytes;
@@ -150,7 +158,7 @@ new_segment:
sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
(*nsegs)++;
}
*bvprv = bvec;
*bvprv = *bvec;
}
/*
@@ -160,7 +168,7 @@ new_segment:
int blk_rq_map_sg(struct request_queue *q, struct request *rq,
struct scatterlist *sglist)
{
struct bio_vec *bvec, *bvprv;
struct bio_vec bvec, bvprv = { NULL };
struct req_iterator iter;
struct scatterlist *sg;
int nsegs, cluster;
@@ -171,10 +179,9 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
/*
* for each bio in rq
*/
bvprv = NULL;
sg = NULL;
rq_for_each_segment(bvec, rq, iter) {
__blk_segment_map_sg(q, bvec, sglist, &bvprv, &sg,
__blk_segment_map_sg(q, &bvec, sglist, &bvprv, &sg,
&nsegs, &cluster);
} /* segments in rq */
@@ -223,18 +230,17 @@ EXPORT_SYMBOL(blk_rq_map_sg);
int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
struct scatterlist *sglist)
{
struct bio_vec *bvec, *bvprv;
struct bio_vec bvec, bvprv = { NULL };
struct scatterlist *sg;
int nsegs, cluster;
unsigned long i;
struct bvec_iter iter;
nsegs = 0;
cluster = blk_queue_cluster(q);
bvprv = NULL;
sg = NULL;
bio_for_each_segment(bvec, bio, i) {
__blk_segment_map_sg(q, bvec, sglist, &bvprv, &sg,
bio_for_each_segment(bvec, bio, iter) {
__blk_segment_map_sg(q, &bvec, sglist, &bvprv, &sg,
&nsegs, &cluster);
} /* segments in bio */
@@ -543,9 +549,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
int blk_try_merge(struct request *rq, struct bio *bio)
{
if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_sector)
if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
return ELEVATOR_BACK_MERGE;
else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_sector)
else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector)
return ELEVATOR_FRONT_MERGE;
return ELEVATOR_NO_MERGE;
}
+1 -36
View File
@@ -28,36 +28,6 @@ static int blk_mq_main_cpu_notify(struct notifier_block *self,
return NOTIFY_OK;
}
static void blk_mq_cpu_notify(void *data, unsigned long action,
unsigned int cpu)
{
if (action == CPU_DEAD || action == CPU_DEAD_FROZEN) {
/*
* If the CPU goes away, ensure that we run any pending
* completions.
*/
struct llist_node *node;
struct request *rq;
local_irq_disable();
node = llist_del_all(&per_cpu(ipi_lists, cpu));
while (node) {
struct llist_node *next = node->next;
rq = llist_entry(node, struct request, ll_list);
__blk_mq_end_io(rq, rq->errors);
node = next;
}
local_irq_enable();
}
}
static struct notifier_block __cpuinitdata blk_mq_main_cpu_notifier = {
.notifier_call = blk_mq_main_cpu_notify,
};
void blk_mq_register_cpu_notifier(struct blk_mq_cpu_notifier *notifier)
{
BUG_ON(!notifier->notify);
@@ -82,12 +52,7 @@ void blk_mq_init_cpu_notifier(struct blk_mq_cpu_notifier *notifier,
notifier->data = data;
}
static struct blk_mq_cpu_notifier __cpuinitdata cpu_notifier = {
.notify = blk_mq_cpu_notify,
};
void __init blk_mq_cpu_init(void)
{
register_hotcpu_notifier(&blk_mq_main_cpu_notifier);
blk_mq_register_cpu_notifier(&cpu_notifier);
hotcpu_notifier(blk_mq_main_cpu_notify, 0);
}
+44 -79
View File
@@ -27,8 +27,6 @@ static LIST_HEAD(all_q_list);
static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx);
DEFINE_PER_CPU(struct llist_head, ipi_lists);
static struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
unsigned int cpu)
{
@@ -106,10 +104,13 @@ static int blk_mq_queue_enter(struct request_queue *q)
spin_lock_irq(q->queue_lock);
ret = wait_event_interruptible_lock_irq(q->mq_freeze_wq,
!blk_queue_bypass(q), *q->queue_lock);
!blk_queue_bypass(q) || blk_queue_dying(q),
*q->queue_lock);
/* inc usage with lock hold to avoid freeze_queue runs here */
if (!ret)
if (!ret && !blk_queue_dying(q))
__percpu_counter_add(&q->mq_usage_counter, 1, 1000000);
else if (blk_queue_dying(q))
ret = -ENODEV;
spin_unlock_irq(q->queue_lock);
return ret;
@@ -120,6 +121,22 @@ static void blk_mq_queue_exit(struct request_queue *q)
__percpu_counter_add(&q->mq_usage_counter, -1, 1000000);
}
static void __blk_mq_drain_queue(struct request_queue *q)
{
while (true) {
s64 count;
spin_lock_irq(q->queue_lock);
count = percpu_counter_sum(&q->mq_usage_counter);
spin_unlock_irq(q->queue_lock);
if (count == 0)
break;
blk_mq_run_queues(q, false);
msleep(10);
}
}
/*
* Guarantee no request is in use, so we can change any data structure of
* the queue afterward.
@@ -133,21 +150,13 @@ static void blk_mq_freeze_queue(struct request_queue *q)
queue_flag_set(QUEUE_FLAG_BYPASS, q);
spin_unlock_irq(q->queue_lock);
if (!drain)
return;
if (drain)
__blk_mq_drain_queue(q);
}
while (true) {
s64 count;
spin_lock_irq(q->queue_lock);
count = percpu_counter_sum(&q->mq_usage_counter);
spin_unlock_irq(q->queue_lock);
if (count == 0)
break;
blk_mq_run_queues(q, false);
msleep(10);
}
void blk_mq_drain_queue(struct request_queue *q)
{
__blk_mq_drain_queue(q);
}
static void blk_mq_unfreeze_queue(struct request_queue *q)
@@ -179,6 +188,8 @@ static void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx,
rq->mq_ctx = ctx;
rq->cmd_flags = rw_flags;
rq->start_time = jiffies;
set_start_time_ns(rq);
ctx->rq_dispatched[rw_is_sync(rw_flags)]++;
}
@@ -305,7 +316,7 @@ void blk_mq_complete_request(struct request *rq, int error)
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
bytes += bio->bi_size;
bytes += bio->bi_iter.bi_size;
blk_mq_bio_endio(rq, bio, error);
bio = next;
}
@@ -326,56 +337,13 @@ void __blk_mq_end_io(struct request *rq, int error)
blk_mq_complete_request(rq, error);
}
#if defined(CONFIG_SMP)
/*
* Called with interrupts disabled.
*/
static void ipi_end_io(void *data)
static void blk_mq_end_io_remote(void *data)
{
struct llist_head *list = &per_cpu(ipi_lists, smp_processor_id());
struct llist_node *entry, *next;
struct request *rq;
struct request *rq = data;
entry = llist_del_all(list);
while (entry) {
next = entry->next;
rq = llist_entry(entry, struct request, ll_list);
__blk_mq_end_io(rq, rq->errors);
entry = next;
}
__blk_mq_end_io(rq, rq->errors);
}
static int ipi_remote_cpu(struct blk_mq_ctx *ctx, const int cpu,
struct request *rq, const int error)
{
struct call_single_data *data = &rq->csd;
rq->errors = error;
rq->ll_list.next = NULL;
/*
* If the list is non-empty, an existing IPI must already
* be "in flight". If that is the case, we need not schedule
* a new one.
*/
if (llist_add(&rq->ll_list, &per_cpu(ipi_lists, ctx->cpu))) {
data->func = ipi_end_io;
data->flags = 0;
__smp_call_function_single(ctx->cpu, data, 0);
}
return true;
}
#else /* CONFIG_SMP */
static int ipi_remote_cpu(struct blk_mq_ctx *ctx, const int cpu,
struct request *rq, const int error)
{
return false;
}
#endif
/*
* End IO on this request on a multiqueue enabled driver. We'll either do
* it directly inline, or punt to a local IPI handler on the matching
@@ -390,11 +358,15 @@ void blk_mq_end_io(struct request *rq, int error)
return __blk_mq_end_io(rq, error);
cpu = get_cpu();
if (cpu == ctx->cpu || !cpu_online(ctx->cpu) ||
!ipi_remote_cpu(ctx, cpu, rq, error))
if (cpu != ctx->cpu && cpu_online(ctx->cpu)) {
rq->errors = error;
rq->csd.func = blk_mq_end_io_remote;
rq->csd.info = rq;
rq->csd.flags = 0;
__smp_call_function_single(ctx->cpu, &rq->csd, 0);
} else {
__blk_mq_end_io(rq, error);
}
put_cpu();
}
EXPORT_SYMBOL(blk_mq_end_io);
@@ -1091,8 +1063,8 @@ static void blk_mq_free_rq_map(struct blk_mq_hw_ctx *hctx)
struct page *page;
while (!list_empty(&hctx->page_list)) {
page = list_first_entry(&hctx->page_list, struct page, list);
list_del_init(&page->list);
page = list_first_entry(&hctx->page_list, struct page, lru);
list_del_init(&page->lru);
__free_pages(page, page->private);
}
@@ -1156,7 +1128,7 @@ static int blk_mq_init_rq_map(struct blk_mq_hw_ctx *hctx,
break;
page->private = this_order;
list_add_tail(&page->list, &hctx->page_list);
list_add_tail(&page->lru, &hctx->page_list);
p = page_address(page);
entries_per_page = order_to_size(this_order) / rq_size;
@@ -1429,7 +1401,6 @@ void blk_mq_free_queue(struct request_queue *q)
int i;
queue_for_each_hw_ctx(q, hctx, i) {
cancel_delayed_work_sync(&hctx->delayed_work);
kfree(hctx->ctx_map);
kfree(hctx->ctxs);
blk_mq_free_rq_map(hctx);
@@ -1451,7 +1422,6 @@ void blk_mq_free_queue(struct request_queue *q)
list_del_init(&q->all_q_node);
mutex_unlock(&all_q_mutex);
}
EXPORT_SYMBOL(blk_mq_free_queue);
/* Basically redo blk_mq_init_queue with queue frozen */
static void blk_mq_queue_reinit(struct request_queue *q)
@@ -1495,11 +1465,6 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
static int __init blk_mq_init(void)
{
unsigned int i;
for_each_possible_cpu(i)
init_llist_head(&per_cpu(ipi_lists, i));
blk_mq_cpu_init();
/* Must be called after percpu_counter_hotcpu_callback() */
+2 -1
View File
@@ -27,6 +27,8 @@ void blk_mq_complete_request(struct request *rq, int error);
void blk_mq_run_request(struct request *rq, bool run_queue, bool async);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_init_flush(struct request_queue *q);
void blk_mq_drain_queue(struct request_queue *q);
void blk_mq_free_queue(struct request_queue *q);
/*
* CPU hotplug helpers
@@ -38,7 +40,6 @@ void blk_mq_init_cpu_notifier(struct blk_mq_cpu_notifier *notifier,
void blk_mq_register_cpu_notifier(struct blk_mq_cpu_notifier *notifier);
void blk_mq_unregister_cpu_notifier(struct blk_mq_cpu_notifier *notifier);
void blk_mq_cpu_init(void);
DECLARE_PER_CPU(struct llist_head, ipi_lists);
/*
* CPU -> queue mappings
+1
View File
@@ -11,6 +11,7 @@
#include "blk.h"
#include "blk-cgroup.h"
#include "blk-mq.h"
struct queue_sysfs_entry {
struct attribute attr;
+7 -7
View File
@@ -877,14 +877,14 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
do_div(tmp, HZ);
bytes_allowed = tmp;
if (tg->bytes_disp[rw] + bio->bi_size <= bytes_allowed) {
if (tg->bytes_disp[rw] + bio->bi_iter.bi_size <= bytes_allowed) {
if (wait)
*wait = 0;
return 1;
}
/* Calc approx time to dispatch */
extra_bytes = tg->bytes_disp[rw] + bio->bi_size - bytes_allowed;
extra_bytes = tg->bytes_disp[rw] + bio->bi_iter.bi_size - bytes_allowed;
jiffy_wait = div64_u64(extra_bytes * HZ, tg->bps[rw]);
if (!jiffy_wait)
@@ -987,7 +987,7 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
bool rw = bio_data_dir(bio);
/* Charge the bio to the group */
tg->bytes_disp[rw] += bio->bi_size;
tg->bytes_disp[rw] += bio->bi_iter.bi_size;
tg->io_disp[rw]++;
/*
@@ -1003,8 +1003,8 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
*/
if (!(bio->bi_rw & REQ_THROTTLED)) {
bio->bi_rw |= REQ_THROTTLED;
throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size,
bio->bi_rw);
throtl_update_dispatch_stats(tg_to_blkg(tg),
bio->bi_iter.bi_size, bio->bi_rw);
}
}
@@ -1503,7 +1503,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
if (tg) {
if (!tg->has_rules[rw]) {
throtl_update_dispatch_stats(tg_to_blkg(tg),
bio->bi_size, bio->bi_rw);
bio->bi_iter.bi_size, bio->bi_rw);
goto out_unlock_rcu;
}
}
@@ -1559,7 +1559,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
/* out-of-limit, queue to @tg */
throtl_log(sq, "[%c] bio. bdisp=%llu sz=%u bps=%llu iodisp=%u iops=%u queued=%d/%d",
rw == READ ? 'R' : 'W',
tg->bytes_disp[rw], bio->bi_size, tg->bps[rw],
tg->bytes_disp[rw], bio->bi_iter.bi_size, tg->bps[rw],
tg->io_disp[rw], tg->iops[rw],
sq->nr_queued[READ], sq->nr_queued[WRITE]);
+11 -7
View File
@@ -4,8 +4,7 @@
* Written by Cai Zhiyong <caizhiyong@huawei.com>
*
*/
#include <linux/buffer_head.h>
#include <linux/module.h>
#include <linux/export.h>
#include <linux/cmdline-parser.h>
static int parse_subpart(struct cmdline_subpart **subpart, char *partdef)
@@ -159,6 +158,7 @@ void cmdline_parts_free(struct cmdline_parts **parts)
*parts = next_parts;
}
}
EXPORT_SYMBOL(cmdline_parts_free);
int cmdline_parts_parse(struct cmdline_parts **parts, const char *cmdline)
{
@@ -206,6 +206,7 @@ fail:
cmdline_parts_free(parts);
goto done;
}
EXPORT_SYMBOL(cmdline_parts_parse);
struct cmdline_parts *cmdline_parts_find(struct cmdline_parts *parts,
const char *bdev)
@@ -214,17 +215,17 @@ struct cmdline_parts *cmdline_parts_find(struct cmdline_parts *parts,
parts = parts->next_parts;
return parts;
}
EXPORT_SYMBOL(cmdline_parts_find);
/*
* add_part()
* 0 success.
* 1 can not add so many partitions.
*/
void cmdline_parts_set(struct cmdline_parts *parts, sector_t disk_size,
int slot,
int (*add_part)(int, struct cmdline_subpart *, void *),
void *param)
int cmdline_parts_set(struct cmdline_parts *parts, sector_t disk_size,
int slot,
int (*add_part)(int, struct cmdline_subpart *, void *),
void *param)
{
sector_t from = 0;
struct cmdline_subpart *subpart;
@@ -247,4 +248,7 @@ void cmdline_parts_set(struct cmdline_parts *parts, sector_t disk_size,
if (add_part(slot, subpart, param))
break;
}
return slot;
}
EXPORT_SYMBOL(cmdline_parts_set);
+1 -1
View File
@@ -440,7 +440,7 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
/*
* See if our hash lookup can find a potential backmerge.
*/
__rq = elv_rqhash_find(q, bio->bi_sector);
__rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
if (__rq && elv_rq_merge_ok(__rq, bio)) {
*req = __rq;
return ELEVATOR_BACK_MERGE;
+4 -2
View File
@@ -323,12 +323,14 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
if (hdr->iovec_count) {
size_t iov_data_len;
struct iovec *iov;
struct iovec *iov = NULL;
ret = rw_copy_check_uvector(-1, hdr->dxferp, hdr->iovec_count,
0, NULL, &iov);
if (ret < 0)
if (ret < 0) {
kfree(iov);
goto out;
}
iov_data_len = ret;
ret = 0;

Some files were not shown because too many files have changed in this diff Show More