Commit Graph

648271 Commits

Author SHA1 Message Date
Jens Axboe 70f36b6001 blk-mq: allow resize of scheduler requests
Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-20 09:05:53 -07:00
Jens Axboe 7e79dadce2 blk-mq: stop hardware queue in blk_mq_delay_queue()
The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.

This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.

Reported-by: Hannes Reinecke <hare@suse.com>
Fixes: 70f4db639c ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-19 08:01:55 -07:00
Jens Axboe 8cecb07d70 blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL
We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-19 07:43:05 -07:00
Jens Axboe 610d886c0c elevator: fix unnecessary put of elevator in failure case
We already checked that e is NULL, so no point in calling
elevator_put() to free it.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-19 07:43:05 -07:00
Jens Axboe 38dbb7dd4d blk-cgroup: don't quiesce the queue on policy activate/deactivate
There's no potential harm in quiescing the queue, but it also doesn't
buy us anything. And we can't run the queue async for policy
deactivate, since we could be in the path of tearing the queue down.
If we schedule an async run of the queue at that time, we're racing
with queue teardown AFTER having we've already torn most of it down.

Reported-by: Omar Sandoval <osandov@fb.com>
Fixes: 4d199c6f1c ("blk-cgroup: ensure that we clear the stop bit on quiesced queues")
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 15:37:27 -07:00
Omar Sandoval 6c0ca7ae29 sbitmap: fix wakeup hang after sbq resize
When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).

This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.

Reported-by: Martin Raiber <martin@urbackup.org>
Fixes: e3a2b3f931 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f35 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 13:41:55 -07:00
Omar Sandoval f66227de59 sbitmap: use smp_mb__after_atomic() in sbq_wake_up()
We always do an atomic clear_bit() right before we call sbq_wake_up(),
so we can use smp_mb__after_atomic(). While we're here, comment the
memory barriers in here a little more.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 13:41:49 -07:00
Jens Axboe 4d199c6f1c blk-cgroup: ensure that we clear the stop bit on quiesced queues
If we call blk_mq_quiesce_queue() on a queue, we must remember to
pair that with something that clears the stopped by on the
queues later on.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18 07:43:26 -07:00
Jens Axboe d348499138 blk-mq-sched: allow setting of default IO scheduler
Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:31 -07:00
Jens Axboe 945ffb60c1 mq-deadline: add blk-mq adaptation of the deadline IO scheduler
This is basically identical to deadline-iosched, except it registers
as a MQ capable scheduler. This is still a single queue design.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:25 -07:00
Jens Axboe bd166ef183 blk-mq-sched: add framework for MQ capable IO schedulers
This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:20 -07:00
Jens Axboe 2af8cbe305 blk-mq: split tag ->rqs[] into two
This is in preparation for having two sets of tags available. For
that we need a static index, and a dynamically assignable one.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:15 -07:00
Jens Axboe fd2d332677 blk-mq: add support for carrying internal tag information in blk_qc_t
No functional change in this patch, just in preparation for having
two types of tags available to the block layer for a single request.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:09 -07:00
Jens Axboe cc71a6f438 blk-mq: abstract out helpers for allocating/freeing tag maps
Prep patch for adding an extra tag map for scheduler requests.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:04:04 -07:00
Jens Axboe 4941115bef blk-mq-tag: cleanup the normal/reserved tag allocation
This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().

Signed-off-by: Jens Axboe <axboe@fb.com>
[hch: even more cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:03:59 -07:00
Jens Axboe 2c3ad66790 blk-mq: export some helpers we need to the scheduling framework
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:03:53 -07:00
Jens Axboe 16a3c2a70c blk-mq: un-export blk_mq_free_hctx_request()
It's only used in blk-mq, kill it from the main exported header
and kill the symbol export as well.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:03:48 -07:00
Jens Axboe c23ecb4260 block: move rq_ioc() to blk.h
We want to use it outside of blk-core.c.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:03:42 -07:00
Jens Axboe c51ca6cf54 block: move existing elevator ops to union
Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17 10:03:33 -07:00
Alden Tondettar c5082b70ad partitions/efi: Fix integer overflow in GPT size calculation
If a GUID Partition Table claims to have more than 2**25 entries, the
calculation of the partition table size in alloc_read_gpt_entries() will
overflow a 32-bit integer and not enough space will be allocated for the
table.

Nothing seems to get written out of bounds, but later efi_partition() will
read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing
information to /proc/partitions and uevents.

The problem exists on both 64-bit and 32-bit platforms.

Fix the overflow and also print a meaningful debug message if the table
size is too large.

Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-17 09:02:31 -07:00
Josef Bacik 1e668f4e79 MAINTAINERS: Update maintainer entry for NBD
The old maintainers email is bouncing and I've rewritten most of this
driver in the recent months.  Also add linux-block to the mailinglist
and remove the old tree, I will send patches through the linux-block
tree.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-11 20:49:17 -07:00
Jens Axboe f8a5b12247 blk-mq: make mq_ops a const pointer
We never change it, make that clear.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
2017-01-11 20:47:44 -07:00
Ming Lei 729204ef49 block: relax check on sg gap
If the last bvec of the 1st bio and the 1st bvec of the next
bio are physically contigious, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.

Both Vitaly and Dexuan reported lots of unmergeable small bios
are observed when running mkfs on Hyper-V virtual storage, and
performance becomes quite low. This patch fixes that performance
issue.

The same issue should exist on NVMe, since it sets virt boundary too.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-11 20:47:08 -07:00
Vlastimil Babka 1661f2e21c floppy: replace wrong kmalloc(GFP_USER) with GFP_KERNEL
The raw_cmd_copyin() function does a kmalloc() with GFP_USER, although the
allocated structure is obviously not mapped to userspace, just copied from/to.
In this case GFP_KERNEL is more appropriate, so let's use it, although in the
current implementation this does not manifest as any error.

Reported-by: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-01-11 20:47:08 -07:00
Linus Torvalds a121103c92 Linux 4.10-rc3 2017-01-08 14:18:17 -08:00