Commit Graph

217 Commits

Author SHA1 Message Date
Christoph Hellwig
131d08e122 block: split the blk-mq case from elevator_init
There is almost no shared logic, which leads to a very confusing code
flow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01 07:38:21 -06:00
Christoph Hellwig
acddf3b308 block: move sysfs_lock into elevator_init
Both callers take just around so function call, so move it in.
Also remove the now pointless blk_mq_sched_init wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01 07:38:19 -06:00
Christoph Hellwig
ddb7253254 block: remove the always unused name argument to elevator_init
Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01 07:38:17 -06:00
Christoph Hellwig
a8a275c9c2 block: unexport elevator_init/exit
These are only used by the block core.  Also move the declarations to
block/blk.h.

Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01 07:38:16 -06:00
Christoph Hellwig
cbf62af353 block: move initialization of elevator-related fields to blk_alloc_queue_node
No point in doing this in elevator_init.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Damien Le Moal <Damien.LeMoal@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-01 07:38:14 -06:00
Bart Van Assche
14a23498ba block: Document scheduler modification locking requirements
This patch does not change any functionality.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-18 12:54:42 -07:00
Bart Van Assche
83d016ac86 block: Unexport elv_register_queue() and elv_unregister_queue()
These two functions are only called from inside the block layer so
unexport them.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-18 12:54:41 -07:00
Ming Lei
24f5a90f0d blk-mq: quiesce queue during switching io sched and updating nr_requests
Dispatch may still be in-progress after queue is frozen, so we have to
quiesce queue before switching IO scheduler and updating nr_requests.

Also when switching io schedulers, blk_mq_run_hw_queue() may still be
called somewhere(such as from nvme_reset_work()), and io scheduler's
per-hctx data may not be setup yet, so cause oops even inside
blk_mq_hctx_has_pending(), such as it can be run just between:

        ret = e->ops.mq.init_sched(q, e);
AND
        ret = e->ops.mq.init_hctx(hctx, i)

inside blk_mq_init_sched().

This reverts commit 7a148c2fcff8330(block: don't call blk_mq_quiesce_queue()
after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending
won't be called if queue is quiesced.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Fixes: 7a148c2fcff83309(block: don't call blk_mq_quiesce_queue() after queue is frozen)
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:25:36 -07:00
Jens Axboe
8ac0d9a81e elevator: allow name aliases
Since we now lookup elevator types with the appropriate multiqueue
capability, allow schedulers to register with an alias alongside
the real name. This is in preparation for allowing 'mq-deadline'
to register an alias of 'deadline' as well.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-25 12:36:50 -06:00
Jens Axboe
2527d99789 elevator: lookup mq vs non-mq elevators
If an IO scheduler is selected via elevator= and it doesn't match
the driver in question wrt blk-mq support, then we fail to boot.

The elevator= parameter is deprecated and only supported for
non-mq devices. Augment the elevator lookup API so that we
pass in if we're looking for an mq capable scheduler or not,
so that we only ever return a valid type for the queue in
question.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196695
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-25 12:36:45 -06:00
Christoph Hellwig
5fdee2127f block: remove QUEUE_FLAG_STACKABLE
We already have a queue_is_rq_based helper to check if a request_queue
is request based, so we can remove the flag for it.

Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-10-05 15:22:59 -06:00
David Jeffery
e9a823fb34 block: fix warning when I/O elevator is changed as request_queue is being removed
There is a race between changing I/O elevator and request_queue removal
which can trigger the warning in kobject_add_internal.  A program can
use sysfs to request a change of elevator at the same time another task
is unregistering the request_queue the elevator would be attached to.
The elevator's kobject will then attempt to be connected to the
request_queue in the object tree when the request_queue has just been
removed from sysfs.  This triggers the warning in kobject_add_internal
as the request_queue no longer has a sysfs directory:

kobject_add_internal failed for iosched (error: -2 parent: queue)
------------[ cut here ]------------
WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
changing the elevator and use the request_queue's sysfs_lock to
serialize between clearing the flag and the elevator testing the flag.

Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-28 10:52:44 -06:00
Bart Van Assche
e29387ebd8 block: Add fallthrough markers to switch statements
This patch suppresses gcc 7 warnings about falling through in switch
statements when building with W=1. From the gcc documentation: The
-Wimplicit-fallthrough=3 warning is enabled by -Wextra. See also
https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Warning-Options.html.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-21 11:46:07 -06:00
Jens Axboe
340ff32167 elevator: remove redundant warnings on IO scheduler switch
We warn twice for switching to a scheduler, if that switch fails.
As we also report the failure in the return value to the
sysfs write, remove the dmesg induced failures.

Keep the failure print for warning to switch to the kconfig
selected IO scheduler, as we can't report errors for that in
any other way.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-10 07:40:04 -06:00
Ming Lei
7a148c2fcf block: don't call blk_mq_quiesce_queue() after queue is frozen
After queue is frozen, no request in this queue can be in use at all, so
there can't be any .queue_rq() running on this queue.  It isn't
necessary to call blk_mq_quiesce_queue() any more, so remove it in both
elevator_switch_mq() and blk_mq_update_nr_requests().

Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Fixed up the description a bit.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-02 11:33:08 -06:00
Bart Van Assche
c033269490 block: Remove elevator_change()
Since commit 8425339492 ("remove the mg_disk driver") removed the
only caller of elevator_change(), also remove the elevator_change()
function itself.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-02 07:52:08 -06:00
Linus Torvalds
694752922b Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..
2017-05-01 10:39:57 -07:00
Ming Lei
3a5088c8c1 block: respect BLK_MQ_F_NO_SCHED
If one driver claims that it doesn't support io scheduler via
BLK_MQ_F_NO_SCHED, we should not allow to change and show the
availabe io schedulers.

This patch adds check to enhance this behaviour.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19 14:15:43 -06:00
Jan Kara
8330cdb0fe block: Make writeback throttling defaults consistent for SQ devices
When CFQ is used as an elevator, it disables writeback throttling
because they don't play well together. Later when a different elevator
is chosen for the device, writeback throttling doesn't get enabled
again as it should. Make sure CFQ enables writeback throttling (if it
should be enabled by default) when we switch from it to another IO
scheduler.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-19 08:49:03 -06:00
Omar Sandoval
54d5329d42 blk-mq-sched: fix crash in switch error path
In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 08:56:48 -06:00
Omar Sandoval
6917ff0b5b blk-mq-sched: refactor scheduler initialization
Preparation cleanup for the next couple of fixes, push
blk_mq_sched_setup() and e->ops.mq.init_sched() into a helper.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 08:56:44 -06:00
Jens Axboe
7b36a7189f block: don't call ioc_exit_icq() with the queue lock held for blk-mq
For legacy scheduling, we always call ioc_exit_icq() with both the
ioc and queue lock held. This poses a problem for blk-mq with
scheduling, since the queue lock isn't what we use in the scheduler.
And since we don't need the queue lock held for ioc exit there,
don't grab it and leave any extra locking up to the blk-mq scheduler.

Reported-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 13:59:08 -07:00
Jens Axboe
b86dd815ff block: get rid of blk-mq default scheduler choice Kconfig entries
The wording in the entries were poor and not understandable
by even deities. Kill the selection for default block scheduler,
and impose a policy with sane defaults.

Architected-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-22 13:19:45 -07:00
Jens Axboe
818551e2b2 Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:08:19 -07:00
Jens Axboe
d1a987f35e elevator: fix loading wrong elevator type for blk-mq devices
The old elevator= boot parameter blindly attempts to load the
same scheduler for mq and !mq devices, leading to a crash if
we specify the wrong one.

Ensure that we only apply this boot parameter to old !mq devices.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-14 08:16:41 -07:00