linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Keith Busch	bea99a5007	blk-mq-sched: Enable merging discard bio into request Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-02-01 14:45:11 -07:00
Mike Snitzer	9e97d2951a	blk-mq-sched: remove unused 'can_block' arg from blk_mq_sched_insert_request After commit: `923218f616` ("blk-mq: don't allocate driver tag upfront for flush rq") we no longer use the 'can_block' argument in blk_mq_sched_insert_request(). Kill it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Added actual commit message as to why it's being removed. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-17 09:49:21 -07:00
Liu Bo	913a9500b9	blk-mq: remove confusing comment of blk_mq_sched_dispatch_requests Commit `de14829740` ("blk-mq: introduce .get_budget and .put_budget in blk_mq_ops") changes the function to return bool type, and then commit `1f460b63d4` ("blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE") changes it back to void, but the comment remains. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-05 08:36:33 -07:00
Jens Axboe	79f720a751	blk-mq: only run the hardware queue if IO is pending Currently we are inconsistent in when we decide to run the queue. Using blk_mq_run_hw_queues() we check if the hctx has pending IO before running it, but we don't do that from the individual queue run function, blk_mq_run_hw_queue(). This results in a lot of extra and pointless queue runs, potentially, on flush requests and (much worse) on tag starvation situations. This is observable just looking at top output, with lots of kworkers active. For the !async runs, it just adds to the CPU overhead of blk-mq. Move the has-pending check into the run function instead of having callers do it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-10 19:55:57 -07:00
Jens Axboe	05b7941394	Revert "blk-mq: don't handle TAG_SHARED in restart" This reverts commit `358a3a6bcc`. We have cases that aren't covered 100% in the drivers, so for now we have to retain the shared tag restart loops. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-10 19:53:25 -07:00
Ming Lei	923218f616	blk-mq: don't allocate driver tag upfront for flush rq The idea behind it is simple: 1) for none scheduler, driver tag has to be borrowed for flush rq, otherwise we may run out of tag, and that causes an IO hang. And get/put driver tag is actually noop for none, so reordering tags isn't necessary at all. 2) for a real I/O scheduler, we need not allocate a driver tag upfront for flush rq. It works just fine to follow the same approach as normal requests: allocate driver tag for each rq just before calling ->queue_rq(). One driver visible change is that the driver tag isn't shared in the flush request sequence. That won't be a problem, since we always do that in legacy path. Then flush rq need not be treated specially wrt. get/put driver tag. This cleans up the code - for instance, reorder_tags_to_front() can be removed, and we needn't worry about request ordering in dispatch list for avoiding I/O deadlock. Also we have to put the driver tag before requeueing. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-04 12:40:13 -06:00
Ming Lei	a6a252e649	blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ In case of IO scheduler we always pre-allocate one driver tag before calling blk_insert_flush(), and flush request will be marked as RQF_FLUSH_SEQ once it is in flush machinery. So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle the request, otherwise the flush request is dispatched to ->dispatch list directly. This is a preparation patch for not preallocating a driver tag for flush requests, and for not treating flush requests as a special case. This is similar to what the legacy path does. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-04 12:38:50 -06:00
Ming Lei	88022d7201	blk-mq: don't handle failure in .get_budget It is enough to just check if we can get the budget via .get_budget(). And we don't need to deal with device state change in .get_budget(). For SCSI, one issue to be fixed is that we have to call scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails to handle the request. And it isn't enough to simply call blk_mq_end_request() to do that if this request is marked as RQF_DONTPREP. Fixes: 0df21c86bdbf(scsi: implement .get_budget and .put_budget for blk-mq) Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-04 12:31:08 -06:00
Ming Lei	1f460b63d4	blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE SCSI restarts its queue in scsi_end_request() automatically, so we don't need to handle this case in blk-mq. Especailly any request won't be dequeued in this case, we needn't to worry about IO hang caused by restart vs. dispatch. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:34 -06:00
Ming Lei	358a3a6bcc	blk-mq: don't handle TAG_SHARED in restart Now restart is used in the following cases, and TAG_SHARED is for SCSI only. 1) .get_budget() returns BLK_STS_RESOURCE - if resource in target/host level isn't satisfied, this SCSI device will be added in shost->starved_list, and the whole queue will be rerun (via SCSI's built-in RESTART) in scsi_end_request() after any request initiated from this host/targe is completed. Forget to mention, host level resource can't be an issue for blk-mq at all. - the same is true if resource in the queue level isn't satisfied. - if there isn't outstanding request on this queue, then SCSI's RESTART can't work(blk-mq's can't work too), and the queue will be run after SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's RESTART when this request is finished 2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE - if there isn't onprogressing request on this queue, the queue will be run after SCSI_QUEUE_DELAY - otherwise, SCSI's RESTART covers the rerun. 3) blk_mq_get_driver_tag() failed - BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver allocation. In one word, SCSI's built-in RESTART is enough to cover the queue rerun, and we don't need to pay special attention to TAG_SHARED wrt. restart. In my test on scsi_debug(8 luns), this patch improves IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Aslo Roman Pen reported the current RESTART is very expensive especialy when there are lots of LUNs attached in one host, such as in his test, RESTART causes half of IOPS be cut. Fixes: https://marc.info/?l=linux-kernel&m=150832216727524&w=2 Fixes: `6d8c6c0f97` ("blk-mq: Restart a single queue if tag sets are shared") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:33 -06:00
Ming Lei	b347689ffb	blk-mq-sched: improve dispatching from sw queue SCSI devices use host-wide tagset, and the shared driver tag space is often quite big. However, there is also a queue depth for each lun( .cmd_per_lun), which is often small, for example, on both lpfc and qla2xxx, .cmd_per_lun is just 3. So lots of requests may stay in sw queue, and we always flush all belonging to same hw queue and dispatch them all to driver. Unfortunately it is easy to cause queue busy because of the small .cmd_per_lun. Once these requests are flushed out, they have to stay in hctx->dispatch, and no bio merge can happen on these requests, and sequential IO performance is harmed. This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request from a sw queue, so that we can dispatch them in scheduler's way. We can then avoid dequeueing too many requests from sw queue, since we don't flush ->dispatch completely. This patch improves dispatching from sw queue by using the .get_budget and .put_budget callbacks. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:02 -06:00
Ming Lei	de14829740	blk-mq: introduce .get_budget and .put_budget in blk_mq_ops For SCSI devices, there is often a per-request-queue depth, which needs to be respected before queuing one request. Currently blk-mq always dequeues the request first, then calls .queue_rq() to dispatch the request to lld. One obvious issue with this approach is that I/O merging may not be successful, because when the per-request-queue depth can't be respected, .queue_rq() has to return BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch list. This means it never gets a chance to be merged with other IO. This patch introduces .get_budget and .put_budget callback in blk_mq_ops, then we can try to get reserved budget first before dequeuing request. If the budget for queueing I/O can't be satisfied, we don't need to dequeue request at all. Hence the request can be left in the IO scheduler queue, for more merging opportunities. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:02 -06:00
Ming Lei	caf8eb0d60	blk-mq-sched: move actual dispatching into one helper So that it becomes easy to support to dispatch from sw queue in the following patch. No functional change. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Suggested-by: Christoph Hellwig <hch@lst.de> # for simplifying dispatch logic Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:02 -06:00
Ming Lei	5e3d02bbaf	blk-mq-sched: dispatch from scheduler IFF progress is made in ->dispatch When the hw queue is busy, we shouldn't take requests from the scheduler queue any more, otherwise it is difficult to do IO merge. This patch fixes the awful IO performance on some SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if hw queue is busy. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-01 08:20:02 -06:00
Ming Lei	32825c45ff	blk-mq-sched: fix performance regression of mq-deadline When mq-deadline is taken, IOPS of sequential read and seqential write is observed more than 20% drop on sata(scsi-mq) devices, compared with using 'none' scheduler. The reason is that the default nr_requests for scheduler is too big for small queuedepth devices, and latency is increased much. Since the principle of taking 256 requests for mq scheduler is based on 128 queue depth, this patch changes into double size of min(hw queue_depth, 128). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-07-03 16:54:09 -06:00
Jens Axboe	f95a0d6a95	Merge commit '8e8320c9315c' into for-4.13/block Pull in the fix for shared tags, as it conflicts with the pending changes in for-4.13/block. We already pulled in v4.12-rc5 to solve other conflicts or get fixes that went into 4.12, so not a lot of changes in this merge. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-22 21:55:24 -06:00
Jens Axboe	8e8320c931	blk-mq: fix performance regression with shared tags If we have shared tags enabled, then every IO completion will trigger a full loop of every queue belonging to a tag set, and every hardware queue for each of those queues, even if nothing needs to be done. This causes a massive performance regression if you have a lot of shared devices. Instead of doing this huge full scan on every IO, add an atomic counter to the main queue that tracks how many hardware queues have been marked as needing a restart. With that, we can avoid looking for restartable queues, if we don't have to. Max reports that this restores performance. Before this patch, 4K IOPS was limited to 22-23K IOPS. With the patch, we are running at 950-970K IOPS. Fixes: `6d8c6c0f97` ("blk-mq: Restart a single queue if tag sets are shared") Reported-by: Max Gurtovoy <maxg@mellanox.com> Tested-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-21 10:17:49 -06:00
Bart Van Assche	7b6078146c	blk-mq: Document locking assumptions Document the locking assumptions in functions that modify blk_mq_ctx.rq_list to make it easier for humans to verify this code. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-20 19:27:14 -06:00
Ming Lei	f4560ffe8c	blk-mq: use QUEUE_FLAG_QUIESCED to quiesce queue It is required that no dispatch can happen any more once blk_mq_quiesce_queue() returns, and we don't have such requirement on APIs of stopping queue. But blk_mq_quiesce_queue() still may not block/drain dispatch in the the case of BLK_MQ_S_START_ON_RUN, so use the new introduced flag of QUEUE_FLAG_QUIESCED and evaluate it inside RCU read-side critical sections for fixing this issue. Also blk_mq_quiesce_queue() is implemented via stopping queue, which limits its uses, and easy to cause race, because any queue restart in other paths may break blk_mq_quiesce_queue(). With the introduced flag of QUEUE_FLAG_QUIESCED, we don't need to depend on stopping queue for quiescing any more. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 14:24:27 -06:00
Christoph Hellwig	44e8c2bff8	blk-mq: refactor blk_mq_sched_assign_ioc blk_mq_sched_assign_ioc now only handles the assigned of the ioc if the schedule needs it (bfq only at the moment). The caller to the per-request initializer is moved out so that it can be merged with a similar call for the kyber I/O scheduler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 10:08:55 -06:00
Christoph Hellwig	ea511e3c28	blk-mq: remove blk_mq_sched_{get,put}_rq_priv Having these as separate helpers in a header really does not help readability, or my chances to refactor this code sanely. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 10:08:55 -06:00
Christoph Hellwig	d2c0d38324	blk-mq: move blk_mq_sched_{get,put}_request to blk-mq.c Having them out of line in blk-mq-sched.c just makes the code flow unnecessarily complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 10:08:55 -06:00
Ming Lei	9bddeb2a5b	blk-mq: make per-sw-queue bio merge as default .bio_merge Because what the per-sw-queue bio merge does is basically same with scheduler's .bio_merge(), this patch makes per-sw-queue bio merge as the default .bio_merge if no scheduler is used or io scheduler doesn't provide .bio_merge(). Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-26 14:12:04 -06:00
Omar Sandoval	d332ce0918	blk-mq-debugfs: allow schedulers to register debugfs attributes This provides the infrastructure for schedulers to expose their internal state through debugfs. We add a list of queue attributes and a list of hctx attributes to struct elevator_type and wire them up when switching schedulers. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Add missing seq_file.h header in blk-mq-debugfs.h Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-04 08:24:40 -06:00
Jens Axboe	9f2779bff2	blk-mq-sched: remove hack that bypasses scheduler for reserved requests We have update the troublesome driver (mtip32xx) to deal with this appropriately. So kill the hack that bypassed scheduler allocation and insertion for reserved requests. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-02 07:52:08 -06:00

1 2 3

53 Commits