Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Core block changes that have been queued up for this release:

   - Remove dead blk-throttle and blk-wbt code (Guoqing)

   - Include pid in blktrace note traces (Jan)

   - Don't spew I/O errors on wouldblock termination (me)

   - Zone append addition (Johannes, Keith, Damien)

   - IO accounting improvements (Konstantin, Christoph)

   - blk-mq hardware map update improvements (Ming)

   - Scheduler dispatch improvement (Salman)

   - Inline block encryption support (Satya)

   - Request map fixes and improvements (Weiping)

   - blk-iocost tweaks (Tejun)

   - Fix for timeout failing with error injection (Keith)

   - Queue re-run fixes (Douglas)

   - CPU hotplug improvements (Christoph)

   - Queue entry/exit improvements (Christoph)

   - Move DMA drain handling to the few drivers that use it (Christoph)

   - Partition handling cleanups (Christoph)"

* tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits)
  block: mark bio_wouldblock_error() bio with BIO_QUIET
  blk-wbt: rename __wbt_update_limits to wbt_update_limits
  blk-wbt: remove wbt_update_limits
  blk-throttle: remove tg_drain_bios
  blk-throttle: remove blk_throtl_drain
  null_blk: force complete for timeout request
  blk-mq: drain I/O when all CPUs in a hctx are offline
  blk-mq: add blk_mq_all_tag_iter
  blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx
  blk-mq: use BLK_MQ_NO_TAG in more places
  blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG
  blk-mq: move more request initialization to blk_mq_rq_ctx_init
  blk-mq: simplify the blk_mq_get_request calling convention
  blk-mq: remove the bio argument to ->prepare_request
  nvme: force complete cancelled requests
  blk-mq: blk-mq: provide forced completion method
  block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds
  block: blk-crypto-fallback: remove redundant initialization of variable err
  block: reduce part_stat_lock() scope
  block: use __this_cpu_add() instead of access by smp_processor_id()
  ...
This commit is contained in:
Linus Torvalds
2020-06-02 15:29:19 -07:00
122 changed files with 4514 additions and 1493 deletions

View File

@@ -14,6 +14,7 @@ Block
cmdline-partition
data-integrity
deadline-iosched
inline-encryption
ioprio
kyber-iosched
null_blk

View File

@@ -0,0 +1,263 @@
.. SPDX-License-Identifier: GPL-2.0
=================
Inline Encryption
=================
Background
==========
Inline encryption hardware sits logically between memory and the disk, and can
en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
fixed number of "keyslots" - slots into which encryption contexts (i.e. the
encryption key, encryption algorithm, data unit size) can be programmed by the
kernel at any time. Each request sent to the disk can be tagged with the index
of a keyslot (and also a data unit number to act as an encryption tweak), and
the inline encryption hardware will en/decrypt the data in the request with the
encryption context programmed into that keyslot. This is very different from
full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
Security standards, since with inline encryption, any block on disk could be
encrypted with any encryption context the kernel chooses.
Objective
=========
We want to support inline encryption (IE) in the kernel.
To allow for testing, we also want a crypto API fallback when actual
IE hardware is absent. We also want IE to work with layered devices
like dm and loopback (i.e. we want to be able to use the IE hardware
of the underlying devices if present, or else fall back to crypto API
en/decryption).
Constraints and notes
=====================
- IE hardware has a limited number of "keyslots" that can be programmed
with an encryption context (key, algorithm, data unit size, etc.) at any time.
One can specify a keyslot in a data request made to the device, and the
device will en/decrypt the data using the encryption context programmed into
that specified keyslot. When possible, we want to make multiple requests with
the same encryption context share the same keyslot.
- We need a way for upper layers like filesystems to specify an encryption
context to use for en/decrypting a struct bio, and a device driver (like UFS)
needs to be able to use that encryption context when it processes the bio.
- We need a way for device drivers to expose their inline encryption
capabilities in a unified way to the upper layers.
Design
======
We add a :c:type:`struct bio_crypt_ctx` to :c:type:`struct bio` that can
represent an encryption context, because we need to be able to pass this
encryption context from the upper layers (like the fs layer) to the
device driver to act upon.
While IE hardware works on the notion of keyslots, the FS layer has no
knowledge of keyslots - it simply wants to specify an encryption context to
use while en/decrypting a bio.
We introduce a keyslot manager (KSM) that handles the translation from
encryption contexts specified by the FS to keyslots on the IE hardware.
This KSM also serves as the way IE hardware can expose its capabilities to
upper layers. The generic mode of operation is: each device driver that wants
to support IE will construct a KSM and set it up in its struct request_queue.
Upper layers that want to use IE on this device can then use this KSM in
the device's struct request_queue to translate an encryption context into
a keyslot. The presence of the KSM in the request queue shall be used to mean
that the device supports IE.
The KSM uses refcounts to track which keyslots are idle (either they have no
encryption context programmed, or there are no in-flight struct bios
referencing that keyslot). When a new encryption context needs a keyslot, it
tries to find a keyslot that has already been programmed with the same
encryption context, and if there is no such keyslot, it evicts the least
recently used idle keyslot and programs the new encryption context into that
one. If no idle keyslots are available, then the caller will sleep until there
is at least one.
blk-mq changes, other block layer changes and blk-crypto-fallback
=================================================================
We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
:c:type:`struct request`. These will be referred to as the ``crypto fields``
for the request. This ``keyslot`` is the keyslot into which the
``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
that this request is being sent to.
We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
blissfully unaware of whether or not real inline encryption hardware is present
underneath. When a bio is submitted with a target ``request_queue`` that doesn't
support the encryption context specified with the bio, the block layer will
en/decrypt the bio with the blk-crypto-fallback.
If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
is encrypted stored in the bounce bio - blk-mq will then proceed to process the
bounce bio as if it were not encrypted at all (except when blk-integrity is
concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
internal function that cleans up the bounce bio and ends the original bio.
If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
is saved and overwritten by ``blk-crypto-fallback`` to
``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
overwritten with ``NULL``, so that to the rest of the stack, the bio looks
as if it was a regular bio that never had an encryption context specified.
``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
``bi_end_io`` (and also ``bi_private``) and end the bio again.
Regardless of whether real inline encryption hardware is used or the
blk-crypto-fallback is used, the ciphertext written to disk (and hence the
on-disk format of data) will be the same (assuming the hardware's implementation
of the algorithm being used adheres to spec and functions correctly).
If a ``request queue``'s inline encryption hardware claimed to support the
encryption context specified with a bio, then it will not be handled by the
``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
:c:type:`struct request` needs to be allocated for that bio. At that point,
blk-mq tries to program the encryption context into the ``request_queue``'s
keyslot_manager, and obtain a keyslot, which it stores in its newly added
``keyslot`` field. This keyslot is released when the request is completed.
When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
which sets the request's ``crypt_ctx`` to a copy of the bio's
``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
bio is merged to the front of the request, which updates the ``crypt_ctx`` of
the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
bio in its bio-list (blk-mq needs to be careful to maintain this invariant
during bio and request merges).
To make it possible for inline encryption to work with request queue based
layered devices, when a request is cloned, its ``crypto fields`` are cloned as
well. When the cloned request is submitted, blk-mq programs the
``bi_crypt_context`` of the request into the clone's request_queue's keyslot
manager, and stores the returned keyslot in the clone's ``keyslot``.
API presented to users of the block layer
=========================================
``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
key, the crypto algorithm to use, the data unit size to use, and the number of
bytes required to represent data unit numbers that will be specified with the
``bi_crypt_context``).
``blk_crypto_init_key`` allows upper layers to initialize such a
``blk_crypto_key``.
``bio_crypt_set_ctx`` should be called on any bio that a user of
the block layer wants en/decrypted via inline encryption (or the
blk-crypto-fallback, if hardware support isn't available for the desired
crypto configuration). This function takes the ``blk_crypto_key`` and the
data unit number (DUN) to use when en/decrypting the bio.
``blk_crypto_config_supported`` allows upper layers to query whether or not the
an encryption context passed to request queue can be handled by blk-crypto
(either by real inline encryption hardware, or by the blk-crypto-fallback).
This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
wants to use an algorithm that may not supported by hardware - this function
lets the upper layer know ahead of time that the algorithm isn't supported,
and the upper layer can fallback to something else if appropriate.
``blk_crypto_start_using_key`` - Upper layers must call this function on
``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
headed for that ``request_queue``. This function ensures that either the
hardware supports the key's crypto settings, or the crypto API fallback has
transforms for the needed mode allocated and ready to go. Note that this
function may allocate an ``skcipher``, and must not be called from the data
path, since allocating ``skciphers`` from the data path can deadlock.
``blk_crypto_evict_key`` *must* be called by upper layers before a
``blk_crypto_key`` is freed. Further, it *must* only be called only once
there are no more in-flight requests that use that ``blk_crypto_key``.
``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
API presented to device drivers
===============================
A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
the ``request_queue`` of the device. The device driver needs to call
``blk_ksm_init`` on the ``blk_keyslot_manager``, which specifying the number of
keyslots supported by the hardware.
The device driver also needs to tell the KSM how to actually manipulate the
IE hardware in the device to do things like programming the crypto key into
the IE hardware into a particular keyslot. All this is achieved through the
:c:type:`struct blk_ksm_ll_ops` field in the KSM that the device driver
must fill up after initing the ``blk_keyslot_manager``.
The KSM also handles runtime power management for the device when applicable
(e.g. when it wants to program a crypto key into the IE hardware, the device
must be runtime powered on) - so the device driver must also set the ``dev``
field in the ksm to point to the `struct device` for the KSM to use for runtime
power management.
``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
needs each and every of its keyslots to be reprogrammed with the key it
"should have" at the point in time when the function is called. This is useful
e.g. if a device loses all its keys on runtime power down/up.
``blk_ksm_destroy`` should be called to free up all resources used by a keyslot
manager upon ``blk_ksm_init``, once the ``blk_keyslot_manager`` is no longer
needed.
Layered Devices
===============
Request queue based layered devices like dm-rq that wish to support IE need to
create their own keyslot manager for their request queue, and expose whatever
functionality they choose. When a layered device wants to pass a clone of that
request to another ``request_queue``, blk-crypto will initialize and prepare the
clone as necessary - see ``blk_crypto_insert_cloned_request`` in
``blk-crypto.c``.
Future Optimizations for layered devices
========================================
Creating a keyslot manager for a layered device uses up memory for each
keyslot, and in general, a layered device merely passes the request on to a
"child" device, so the keyslots in the layered device itself are completely
unused, and don't need any refcounting or keyslot programming. We can instead
define a new type of KSM; the "passthrough KSM", that layered devices can use
to advertise an unlimited number of keyslots, and support for any encryption
algorithms they choose, while not actually using any memory for each keyslot.
Another use case for the "passthrough KSM" is for IE devices that do not have a
limited number of keyslots.
Interaction between inline encryption and blk integrity
=======================================================
At the time of this patch, there is no real hardware that supports both these
features. However, these features do interact with each other, and it's not
completely trivial to make them both work together properly. In particular,
when a WRITE bio wants to use inline encryption on a device that supports both
features, the bio will have an encryption context specified, after which
its integrity information is calculated (using the plaintext data, since
the encryption will happen while data is being written), and the data and
integrity info is sent to the device. Obviously, the integrity info must be
verified before the data is encrypted. After the data is encrypted, the device
must not store the integrity info that it received with the plaintext data
since that might reveal information about the plaintext data. As such, it must
re-generate the integrity info from the ciphertext data and store that on disk
instead. Another issue with storing the integrity info of the plaintext data is
that it changes the on disk format depending on whether hardware inline
encryption support is present or the kernel crypto API fallback is used (since
if the fallback is used, the device will receive the integrity info of the
ciphertext, not that of the plaintext).
Because there isn't any real hardware yet, it seems prudent to assume that
hardware implementations might not implement both features together correctly,
and disallow the combination for now. Whenever a device supports integrity, the
kernel will pretend that the device does not support hardware inline encryption
(by essentially setting the keyslot manager in the request_queue of the device
to NULL). When the crypto API fallback is enabled, this means that all bios with
and encryption context will use the fallback, and IO will complete as usual.
When the fallback is disabled, a bio with an encryption context will be failed.

View File

@@ -146,6 +146,7 @@ config BLK_CGROUP_IOLATENCY
config BLK_CGROUP_IOCOST
bool "Enable support for cost model based cgroup IO controller"
depends on BLK_CGROUP=y
select BLK_RQ_IO_DATA_LEN
select BLK_RQ_ALLOC_TIME
---help---
Enabling this option enables the .weight interface for cost
@@ -185,6 +186,23 @@ config BLK_SED_OPAL
Enabling this option enables users to setup/unlock/lock
Locking ranges for SED devices using the Opal protocol.
config BLK_INLINE_ENCRYPTION
bool "Enable inline encryption support in block layer"
help
Build the blk-crypto subsystem. Enabling this lets the
block layer handle encryption, so users can take
advantage of inline encryption hardware if present.
config BLK_INLINE_ENCRYPTION_FALLBACK
bool "Enable crypto API fallback for blk-crypto"
depends on BLK_INLINE_ENCRYPTION
select CRYPTO
select CRYPTO_SKCIPHER
help
Enabling this lets the block layer handle inline encryption
by falling back to the kernel crypto API when inline
encryption hardware is not present.
menu "Partition Types"
source "block/partitions/Kconfig"

View File

@@ -36,3 +36,5 @@ obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
obj-$(CONFIG_BLK_PM) += blk-pm.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += keyslot-manager.o blk-crypto.o
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o

View File

@@ -6073,7 +6073,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
* comments on bfq_init_rq for the reason behind this delayed
* preparation.
*/
static void bfq_prepare_request(struct request *rq, struct bio *bio)
static void bfq_prepare_request(struct request *rq)
{
/*
* Regardless of whether we have an icq attached, we have to

View File

@@ -42,6 +42,9 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
struct bio_set *bs = bio->bi_pool;
unsigned inline_vecs;
if (WARN_ON_ONCE(bio_has_crypt_ctx(bio)))
return ERR_PTR(-EOPNOTSUPP);
if (!bs || !mempool_initialized(&bs->bio_integrity_pool)) {
bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs), gfp_mask);
inline_vecs = nr_vecs;

View File

@@ -18,6 +18,7 @@
#include <linux/blk-cgroup.h>
#include <linux/highmem.h>
#include <linux/sched/sysctl.h>
#include <linux/blk-crypto.h>
#include <trace/events/block.h>
#include "blk.h"
@@ -237,6 +238,8 @@ void bio_uninit(struct bio *bio)
if (bio_integrity(bio))
bio_integrity_free(bio);
bio_crypt_free_ctx(bio);
}
EXPORT_SYMBOL(bio_uninit);
@@ -708,6 +711,8 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs)
__bio_clone_fast(b, bio);
bio_crypt_clone(b, bio, gfp_mask);
if (bio_integrity(bio)) {
int ret;
@@ -748,9 +753,14 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return true;
}
static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned len, unsigned offset,
bool *same_page)
/*
* Try to merge a page into a segment, while obeying the hardware segment
* size limit. This is not for normal read/write bios, but for passthrough
* or Zone Append operations that we can't split.
*/
static bool bio_try_merge_hw_seg(struct request_queue *q, struct bio *bio,
struct page *page, unsigned len,
unsigned offset, bool *same_page)
{
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
unsigned long mask = queue_segment_boundary(q);
@@ -765,38 +775,32 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
}
/**
* __bio_add_pc_page - attempt to add page to passthrough bio
* @q: the target queue
* @bio: destination bio
* @page: page to add
* @len: vec entry length
* @offset: vec entry offset
* @same_page: return if the merge happen inside the same page
* bio_add_hw_page - attempt to add a page to a bio with hw constraints
* @q: the target queue
* @bio: destination bio
* @page: page to add
* @len: vec entry length
* @offset: vec entry offset
* @max_sectors: maximum number of sectors that can be added
* @same_page: return if the segment has been merged inside the same page
*
* Attempt to add a page to the bio_vec maplist. This can fail for a
* number of reasons, such as the bio being full or target block device
* limitations. The target block device must allow bio's up to PAGE_SIZE,
* so it is always possible to add a single page to an empty bio.
*
* This should only be used by passthrough bios.
* Add a page to a bio while respecting the hardware max_sectors, max_segment
* and gap limitations.
*/
int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
int bio_add_hw_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset,
bool *same_page)
unsigned int max_sectors, bool *same_page)
{
struct bio_vec *bvec;
/*
* cloned bio must not modify vec list
*/
if (unlikely(bio_flagged(bio, BIO_CLONED)))
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
return 0;
if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q))
if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
return 0;
if (bio->bi_vcnt > 0) {
if (bio_try_merge_pc_page(q, bio, page, len, offset, same_page))
if (bio_try_merge_hw_seg(q, bio, page, len, offset, same_page))
return len;
/*
@@ -823,11 +827,27 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
return len;
}
/**
* bio_add_pc_page - attempt to add page to passthrough bio
* @q: the target queue
* @bio: destination bio
* @page: page to add
* @len: vec entry length
* @offset: vec entry offset
*
* Attempt to add a page to the bio_vec maplist. This can fail for a
* number of reasons, such as the bio being full or target block device
* limitations. The target block device must allow bio's up to PAGE_SIZE,
* so it is always possible to add a single page to an empty bio.
*
* This should only be used by passthrough bios.
*/
int bio_add_pc_page(struct request_queue *q, struct bio *bio,
struct page *page, unsigned int len, unsigned int offset)
{
bool same_page = false;
return __bio_add_pc_page(q, bio, page, len, offset, &same_page);
return bio_add_hw_page(q, bio, page, len, offset,
queue_max_hw_sectors(q), &same_page);
}
EXPORT_SYMBOL(bio_add_pc_page);
@@ -936,6 +956,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
put_page(bvec->bv_page);
}
}
EXPORT_SYMBOL_GPL(bio_release_pages);
static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
{
@@ -1010,6 +1031,50 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
return 0;
}
static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
{
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
struct request_queue *q = bio->bi_disk->queue;
unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
struct page **pages = (struct page **)bv;
ssize_t size, left;
unsigned len, i;
size_t offset;
if (WARN_ON_ONCE(!max_append_sectors))
return 0;
/*
* Move page array up in the allocated memory for the bio vecs as far as
* possible so that we can start filling biovecs from the beginning
* without overwriting the temporary page array.
*/
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
if (unlikely(size <= 0))
return size ? size : -EFAULT;
for (left = size, i = 0; left > 0; left -= len, i++) {
struct page *page = pages[i];
bool same_page = false;
len = min_t(size_t, PAGE_SIZE - offset, left);
if (bio_add_hw_page(q, bio, page, len, offset,
max_append_sectors, &same_page) != len)
return -EINVAL;
if (same_page)
put_page(page);
offset = 0;
}
iov_iter_advance(iter, size);
return 0;
}
/**
* bio_iov_iter_get_pages - add user or kernel pages to a bio
* @bio: bio to add pages to
@@ -1039,16 +1104,23 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
return -EINVAL;
do {
if (is_bvec)
ret = __bio_iov_bvec_add_pages(bio, iter);
else
ret = __bio_iov_iter_get_pages(bio, iter);
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
if (WARN_ON_ONCE(is_bvec))
return -EINVAL;
ret = __bio_iov_append_get_pages(bio, iter);
} else {
if (is_bvec)
ret = __bio_iov_bvec_add_pages(bio, iter);
else
ret = __bio_iov_iter_get_pages(bio, iter);
}
} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
if (is_bvec)
bio_set_flag(bio, BIO_NO_PAGE_REF);
return bio->bi_vcnt ? 0 : ret;
}
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
static void submit_bio_wait_endio(struct bio *bio)
{
@@ -1105,6 +1177,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);
bio_crypt_advance(bio, bytes);
bio_advance_iter(bio, &bio->bi_iter, bytes);
}
EXPORT_SYMBOL(bio_advance);
@@ -1303,55 +1376,6 @@ defer:
schedule_work(&bio_dirty_work);
}
void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
{
unsigned long stamp;
again:
stamp = READ_ONCE(part->stamp);
if (unlikely(stamp != now)) {
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
__part_stat_add(part, io_ticks, end ? now - stamp : 1);
}
}
if (part->partno) {
part = &part_to_disk(part)->part0;
goto again;
}
}
void generic_start_io_acct(struct request_queue *q, int op,
unsigned long sectors, struct hd_struct *part)
{
const int sgrp = op_stat_group(op);
part_stat_lock();
update_io_ticks(part, jiffies, false);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, sectors[sgrp], sectors);
part_inc_in_flight(q, part, op_is_write(op));
part_stat_unlock();
}
EXPORT_SYMBOL(generic_start_io_acct);
void generic_end_io_acct(struct request_queue *q, int req_op,
struct hd_struct *part, unsigned long start_time)
{
unsigned long now = jiffies;
unsigned long duration = now - start_time;
const int sgrp = op_stat_group(req_op);
part_stat_lock();
update_io_ticks(part, now, true);
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
part_dec_in_flight(q, part, op_is_write(req_op));
part_stat_unlock();
}
EXPORT_SYMBOL(generic_end_io_acct);
static inline bool bio_remaining_done(struct bio *bio)
{
/*
@@ -1445,6 +1469,10 @@ struct bio *bio_split(struct bio *bio, int sectors,
BUG_ON(sectors <= 0);
BUG_ON(sectors >= bio_sectors(bio));
/* Zone append commands cannot be split */
if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
return NULL;
split = bio_clone_fast(bio, gfp, bs);
if (!split)
return NULL;

View File

@@ -1530,6 +1530,10 @@ static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now)
{
u64 old = atomic64_read(&blkg->delay_start);
/* negative use_delay means no scaling, see blkcg_set_delay() */
if (atomic_read(&blkg->use_delay) < 0)
return;
/*
* We only want to scale down every second. The idea here is that we
* want to delay people for min(delay_nsec, NSEC_PER_SEC) in a certain
@@ -1717,6 +1721,8 @@ void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay)
*/
void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta)
{
if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0))
return;
blkcg_scale_delay(blkg, now);
atomic64_add(delta, &blkg->delay_nsec);
}

File diff suppressed because it is too large Load Diff

657
block/blk-crypto-fallback.c Normal file

File diff suppressed because it is too large Load Diff

201
block/blk-crypto-internal.h Normal file
View File

@@ -0,0 +1,201 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright 2019 Google LLC
*/
#ifndef __LINUX_BLK_CRYPTO_INTERNAL_H
#define __LINUX_BLK_CRYPTO_INTERNAL_H
#include <linux/bio.h>
#include <linux/blkdev.h>
/* Represents a crypto mode supported by blk-crypto */
struct blk_crypto_mode {
const char *cipher_str; /* crypto API name (for fallback case) */
unsigned int keysize; /* key size in bytes */
unsigned int ivsize; /* iv size in bytes */
};
extern const struct blk_crypto_mode blk_crypto_modes[];
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
unsigned int inc);
bool bio_crypt_rq_ctx_compatible(struct request *rq, struct bio *bio);
bool bio_crypt_ctx_mergeable(struct bio_crypt_ctx *bc1, unsigned int bc1_bytes,
struct bio_crypt_ctx *bc2);
static inline bool bio_crypt_ctx_back_mergeable(struct request *req,
struct bio *bio)
{
return bio_crypt_ctx_mergeable(req->crypt_ctx, blk_rq_bytes(req),
bio->bi_crypt_context);
}
static inline bool bio_crypt_ctx_front_mergeable(struct request *req,
struct bio *bio)
{
return bio_crypt_ctx_mergeable(bio->bi_crypt_context,
bio->bi_iter.bi_size, req->crypt_ctx);
}
static inline bool bio_crypt_ctx_merge_rq(struct request *req,
struct request *next)
{
return bio_crypt_ctx_mergeable(req->crypt_ctx, blk_rq_bytes(req),
next->crypt_ctx);
}
static inline void blk_crypto_rq_set_defaults(struct request *rq)
{
rq->crypt_ctx = NULL;
rq->crypt_keyslot = NULL;
}
static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
{
return rq->crypt_ctx;
}
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
static inline bool bio_crypt_rq_ctx_compatible(struct request *rq,
struct bio *bio)
{
return true;
}
static inline bool bio_crypt_ctx_front_mergeable(struct request *req,
struct bio *bio)
{
return true;
}
static inline bool bio_crypt_ctx_back_mergeable(struct request *req,
struct bio *bio)
{
return true;
}
static inline bool bio_crypt_ctx_merge_rq(struct request *req,
struct request *next)
{
return true;
}
static inline void blk_crypto_rq_set_defaults(struct request *rq) { }
static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
{
return false;
}
#endif /* CONFIG_BLK_INLINE_ENCRYPTION */
void __bio_crypt_advance(struct bio *bio, unsigned int bytes);
static inline void bio_crypt_advance(struct bio *bio, unsigned int bytes)
{
if (bio_has_crypt_ctx(bio))
__bio_crypt_advance(bio, bytes);
}
void __bio_crypt_free_ctx(struct bio *bio);
static inline void bio_crypt_free_ctx(struct bio *bio)
{
if (bio_has_crypt_ctx(bio))
__bio_crypt_free_ctx(bio);
}
static inline void bio_crypt_do_front_merge(struct request *rq,
struct bio *bio)
{
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (bio_has_crypt_ctx(bio))
memcpy(rq->crypt_ctx->bc_dun, bio->bi_crypt_context->bc_dun,
sizeof(rq->crypt_ctx->bc_dun));
#endif
}
bool __blk_crypto_bio_prep(struct bio **bio_ptr);
static inline bool blk_crypto_bio_prep(struct bio **bio_ptr)
{
if (bio_has_crypt_ctx(*bio_ptr))
return __blk_crypto_bio_prep(bio_ptr);
return true;
}
blk_status_t __blk_crypto_init_request(struct request *rq);
static inline blk_status_t blk_crypto_init_request(struct request *rq)
{
if (blk_crypto_rq_is_encrypted(rq))
return __blk_crypto_init_request(rq);
return BLK_STS_OK;
}
void __blk_crypto_free_request(struct request *rq);
static inline void blk_crypto_free_request(struct request *rq)
{
if (blk_crypto_rq_is_encrypted(rq))
__blk_crypto_free_request(rq);
}
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask);
static inline void blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
{
if (bio_has_crypt_ctx(bio))
__blk_crypto_rq_bio_prep(rq, bio, gfp_mask);
}
/**
* blk_crypto_insert_cloned_request - Prepare a cloned request to be inserted
* into a request queue.
* @rq: the request being queued
*
* Return: BLK_STS_OK on success, nonzero on error.
*/
static inline blk_status_t blk_crypto_insert_cloned_request(struct request *rq)
{
if (blk_crypto_rq_is_encrypted(rq))
return blk_crypto_init_request(rq);
return BLK_STS_OK;
}
#ifdef CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK
int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num);
bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr);
int blk_crypto_fallback_evict_key(const struct blk_crypto_key *key);
#else /* CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK */
static inline int
blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
{
pr_warn_once("crypto API fallback is disabled\n");
return -ENOPKG;
}
static inline bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
{
pr_warn_once("crypto API fallback disabled; failing request.\n");
(*bio_ptr)->bi_status = BLK_STS_NOTSUPP;
return false;
}
static inline int
blk_crypto_fallback_evict_key(const struct blk_crypto_key *key)
{
return 0;
}
#endif /* CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK */
#endif /* __LINUX_BLK_CRYPTO_INTERNAL_H */

404
block/blk-crypto.c Normal file
View File

@@ -0,0 +1,404 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2019 Google LLC
*/
/*
* Refer to Documentation/block/inline-encryption.rst for detailed explanation.
*/
#define pr_fmt(fmt) "blk-crypto: " fmt
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/keyslot-manager.h>
#include <linux/module.h>
#include <linux/slab.h>
#include "blk-crypto-internal.h"
const struct blk_crypto_mode blk_crypto_modes[] = {
[BLK_ENCRYPTION_MODE_AES_256_XTS] = {
.cipher_str = "xts(aes)",
.keysize = 64,
.ivsize = 16,
},
[BLK_ENCRYPTION_MODE_AES_128_CBC_ESSIV] = {
.cipher_str = "essiv(cbc(aes),sha256)",
.keysize = 16,
.ivsize = 16,
},
[BLK_ENCRYPTION_MODE_ADIANTUM] = {
.cipher_str = "adiantum(xchacha12,aes)",
.keysize = 32,
.ivsize = 32,
},
};
/*
* This number needs to be at least (the number of threads doing IO
* concurrently) * (maximum recursive depth of a bio), so that we don't
* deadlock on crypt_ctx allocations. The default is chosen to be the same
* as the default number of post read contexts in both EXT4 and F2FS.
*/
static int num_prealloc_crypt_ctxs = 128;
module_param(num_prealloc_crypt_ctxs, int, 0444);
MODULE_PARM_DESC(num_prealloc_crypt_ctxs,
"Number of bio crypto contexts to preallocate");
static struct kmem_cache *bio_crypt_ctx_cache;
static mempool_t *bio_crypt_ctx_pool;
static int __init bio_crypt_ctx_init(void)
{
size_t i;
bio_crypt_ctx_cache = KMEM_CACHE(bio_crypt_ctx, 0);
if (!bio_crypt_ctx_cache)
goto out_no_mem;
bio_crypt_ctx_pool = mempool_create_slab_pool(num_prealloc_crypt_ctxs,
bio_crypt_ctx_cache);
if (!bio_crypt_ctx_pool)
goto out_no_mem;
/* This is assumed in various places. */
BUILD_BUG_ON(BLK_ENCRYPTION_MODE_INVALID != 0);
/* Sanity check that no algorithm exceeds the defined limits. */
for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++) {
BUG_ON(blk_crypto_modes[i].keysize > BLK_CRYPTO_MAX_KEY_SIZE);
BUG_ON(blk_crypto_modes[i].ivsize > BLK_CRYPTO_MAX_IV_SIZE);
}
return 0;
out_no_mem:
panic("Failed to allocate mem for bio crypt ctxs\n");
}
subsys_initcall(bio_crypt_ctx_init);
void bio_crypt_set_ctx(struct bio *bio, const struct blk_crypto_key *key,
const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], gfp_t gfp_mask)
{
struct bio_crypt_ctx *bc = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
bc->bc_key = key;
memcpy(bc->bc_dun, dun, sizeof(bc->bc_dun));
bio->bi_crypt_context = bc;
}
void __bio_crypt_free_ctx(struct bio *bio)
{
mempool_free(bio->bi_crypt_context, bio_crypt_ctx_pool);
bio->bi_crypt_context = NULL;
}
void __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask)
{
dst->bi_crypt_context = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
*dst->bi_crypt_context = *src->bi_crypt_context;
}
EXPORT_SYMBOL_GPL(__bio_crypt_clone);
/* Increments @dun by @inc, treating @dun as a multi-limb integer. */
void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
unsigned int inc)
{
int i;
for (i = 0; inc && i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) {
dun[i] += inc;
/*
* If the addition in this limb overflowed, then we need to
* carry 1 into the next limb. Else the carry is 0.
*/
if (dun[i] < inc)
inc = 1;
else
inc = 0;
}
}
void __bio_crypt_advance(struct bio *bio, unsigned int bytes)
{
struct bio_crypt_ctx *bc = bio->bi_crypt_context;
bio_crypt_dun_increment(bc->bc_dun,
bytes >> bc->bc_key->data_unit_size_bits);
}
/*
* Returns true if @bc->bc_dun plus @bytes converted to data units is equal to
* @next_dun, treating the DUNs as multi-limb integers.
*/
bool bio_crypt_dun_is_contiguous(const struct bio_crypt_ctx *bc,
unsigned int bytes,
const u64 next_dun[BLK_CRYPTO_DUN_ARRAY_SIZE])
{
int i;
unsigned int carry = bytes >> bc->bc_key->data_unit_size_bits;
for (i = 0; i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) {
if (bc->bc_dun[i] + carry != next_dun[i])
return false;
/*
* If the addition in this limb overflowed, then we need to
* carry 1 into the next limb. Else the carry is 0.
*/
if ((bc->bc_dun[i] + carry) < carry)
carry = 1;
else
carry = 0;
}
/* If the DUN wrapped through 0, don't treat it as contiguous. */
return carry == 0;
}
/*
* Checks that two bio crypt contexts are compatible - i.e. that
* they are mergeable except for data_unit_num continuity.
*/
static bool bio_crypt_ctx_compatible(struct bio_crypt_ctx *bc1,
struct bio_crypt_ctx *bc2)
{
if (!bc1)
return !bc2;
return bc2 && bc1->bc_key == bc2->bc_key;
}
bool bio_crypt_rq_ctx_compatible(struct request *rq, struct bio *bio)
{
return bio_crypt_ctx_compatible(rq->crypt_ctx, bio->bi_crypt_context);
}
/*
* Checks that two bio crypt contexts are compatible, and also
* that their data_unit_nums are continuous (and can hence be merged)
* in the order @bc1 followed by @bc2.
*/
bool bio_crypt_ctx_mergeable(struct bio_crypt_ctx *bc1, unsigned int bc1_bytes,
struct bio_crypt_ctx *bc2)
{
if (!bio_crypt_ctx_compatible(bc1, bc2))
return false;
return !bc1 || bio_crypt_dun_is_contiguous(bc1, bc1_bytes, bc2->bc_dun);
}
/* Check that all I/O segments are data unit aligned. */
static bool bio_crypt_check_alignment(struct bio *bio)
{
const unsigned int data_unit_size =
bio->bi_crypt_context->bc_key->crypto_cfg.data_unit_size;
struct bvec_iter iter;
struct bio_vec bv;
bio_for_each_segment(bv, bio, iter) {
if (!IS_ALIGNED(bv.bv_len | bv.bv_offset, data_unit_size))
return false;
}
return true;
}
blk_status_t __blk_crypto_init_request(struct request *rq)
{
return blk_ksm_get_slot_for_key(rq->q->ksm, rq->crypt_ctx->bc_key,
&rq->crypt_keyslot);
}
/**
* __blk_crypto_free_request - Uninitialize the crypto fields of a request.
*
* @rq: The request whose crypto fields to uninitialize.
*
* Completely uninitializes the crypto fields of a request. If a keyslot has
* been programmed into some inline encryption hardware, that keyslot is
* released. The rq->crypt_ctx is also freed.
*/
void __blk_crypto_free_request(struct request *rq)
{
blk_ksm_put_slot(rq->crypt_keyslot);
mempool_free(rq->crypt_ctx, bio_crypt_ctx_pool);
blk_crypto_rq_set_defaults(rq);
}
/**
* __blk_crypto_bio_prep - Prepare bio for inline encryption
*
* @bio_ptr: pointer to original bio pointer
*
* If the bio crypt context provided for the bio is supported by the underlying
* device's inline encryption hardware, do nothing.
*
* Otherwise, try to perform en/decryption for this bio by falling back to the
* kernel crypto API. When the crypto API fallback is used for encryption,
* blk-crypto may choose to split the bio into 2 - the first one that will
* continue to be processed and the second one that will be resubmitted via
* generic_make_request. A bounce bio will be allocated to encrypt the contents
* of the aforementioned "first one", and *bio_ptr will be updated to this
* bounce bio.
*
* Caller must ensure bio has bio_crypt_ctx.
*
* Return: true on success; false on error (and bio->bi_status will be set
* appropriately, and bio_endio() will have been called so bio
* submission should abort).
*/
bool __blk_crypto_bio_prep(struct bio **bio_ptr)
{
struct bio *bio = *bio_ptr;
const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
/* Error if bio has no data. */
if (WARN_ON_ONCE(!bio_has_data(bio))) {
bio->bi_status = BLK_STS_IOERR;
goto fail;
}
if (!bio_crypt_check_alignment(bio)) {
bio->bi_status = BLK_STS_IOERR;
goto fail;
}
/*
* Success if device supports the encryption context, or if we succeeded
* in falling back to the crypto API.
*/
if (blk_ksm_crypto_cfg_supported(bio->bi_disk->queue->ksm,
&bc_key->crypto_cfg))
return true;
if (blk_crypto_fallback_bio_prep(bio_ptr))
return true;
fail:
bio_endio(*bio_ptr);
return false;
}
/**
* __blk_crypto_rq_bio_prep - Prepare a request's crypt_ctx when its first bio
* is inserted
*
* @rq: The request to prepare
* @bio: The first bio being inserted into the request
* @gfp_mask: gfp mask
*/
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
gfp_t gfp_mask)
{
if (!rq->crypt_ctx)
rq->crypt_ctx = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
*rq->crypt_ctx = *bio->bi_crypt_context;
}
/**
* blk_crypto_init_key() - Prepare a key for use with blk-crypto
* @blk_key: Pointer to the blk_crypto_key to initialize.
* @raw_key: Pointer to the raw key. Must be the correct length for the chosen
* @crypto_mode; see blk_crypto_modes[].
* @crypto_mode: identifier for the encryption algorithm to use
* @dun_bytes: number of bytes that will be used to specify the DUN when this
* key is used
* @data_unit_size: the data unit size to use for en/decryption
*
* Return: 0 on success, -errno on failure. The caller is responsible for
* zeroizing both blk_key and raw_key when done with them.
*/
int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
enum blk_crypto_mode_num crypto_mode,
unsigned int dun_bytes,
unsigned int data_unit_size)
{
const struct blk_crypto_mode *mode;
memset(blk_key, 0, sizeof(*blk_key));
if (crypto_mode >= ARRAY_SIZE(blk_crypto_modes))
return -EINVAL;
mode = &blk_crypto_modes[crypto_mode];
if (mode->keysize == 0)
return -EINVAL;
if (dun_bytes == 0 || dun_bytes > BLK_CRYPTO_MAX_IV_SIZE)
return -EINVAL;
if (!is_power_of_2(data_unit_size))
return -EINVAL;
blk_key->crypto_cfg.crypto_mode = crypto_mode;
blk_key->crypto_cfg.dun_bytes = dun_bytes;
blk_key->crypto_cfg.data_unit_size = data_unit_size;
blk_key->data_unit_size_bits = ilog2(data_unit_size);
blk_key->size = mode->keysize;
memcpy(blk_key->raw, raw_key, mode->keysize);
return 0;
}
/*
* Check if bios with @cfg can be en/decrypted by blk-crypto (i.e. either the
* request queue it's submitted to supports inline crypto, or the
* blk-crypto-fallback is enabled and supports the cfg).
*/
bool blk_crypto_config_supported(struct request_queue *q,
const struct blk_crypto_config *cfg)
{
return IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
blk_ksm_crypto_cfg_supported(q->ksm, cfg);
}
/**
* blk_crypto_start_using_key() - Start using a blk_crypto_key on a device
* @key: A key to use on the device
* @q: the request queue for the device
*
* Upper layers must call this function to ensure that either the hardware
* supports the key's crypto settings, or the crypto API fallback has transforms
* for the needed mode allocated and ready to go. This function may allocate
* an skcipher, and *should not* be called from the data path, since that might
* cause a deadlock
*
* Return: 0 on success; -ENOPKG if the hardware doesn't support the key and
* blk-crypto-fallback is either disabled or the needed algorithm
* is disabled in the crypto API; or another -errno code.
*/
int blk_crypto_start_using_key(const struct blk_crypto_key *key,
struct request_queue *q)
{
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
return 0;
return blk_crypto_fallback_start_using_mode(key->crypto_cfg.crypto_mode);
}
/**
* blk_crypto_evict_key() - Evict a key from any inline encryption hardware
* it may have been programmed into
* @q: The request queue who's associated inline encryption hardware this key
* might have been programmed into
* @key: The key to evict
*
* Upper layers (filesystems) must call this function to ensure that a key is
* evicted from any hardware that it might have been programmed into. The key
* must not be in use by any in-flight IO when this function is called.
*
* Return: 0 on success or if key is not present in the q's ksm, -err on error.
*/
int blk_crypto_evict_key(struct request_queue *q,
const struct blk_crypto_key *key)
{
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
return blk_ksm_evict_key(q->ksm, key);
/*
* If the request queue's associated inline encryption hardware didn't
* have support for the key, then the key might have been programmed
* into the fallback keyslot manager, so try to evict from there.
*/
return blk_crypto_fallback_evict_key(key);
}

View File

@@ -55,7 +55,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
rq->rq_disk = bd_disk;
rq->end_io = done;
blk_account_io_start(rq, true);
blk_account_io_start(rq);
/*
* don't check dying flag for MQ because the request won't

View File

@@ -258,7 +258,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
blk_flush_complete_seq(rq, fq, seq, error);
}
fq->flush_queue_delayed = 0;
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
}
@@ -433,41 +432,20 @@ void blk_insert_flush(struct request *rq)
* blkdev_issue_flush - queue a flush
* @bdev: blockdev to issue flush for
* @gfp_mask: memory allocation flags (for bio_alloc)
* @error_sector: error sector
*
* Description:
* Issue a flush for the block device in question. Caller can supply
* room for storing the error offset in case of a flush error, if they
* wish to.
* Issue a flush for the block device in question.
*/
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
sector_t *error_sector)
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask)
{
struct request_queue *q;
struct bio *bio;
int ret = 0;
if (bdev->bd_disk == NULL)
return -ENXIO;
q = bdev_get_queue(bdev);
if (!q)
return -ENXIO;
bio = bio_alloc(gfp_mask, 0);
bio_set_dev(bio, bdev);
bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
ret = submit_bio_wait(bio);
/*
* The driver must store the error location in ->bi_sector, if
* it supports it. For non-stacked drivers, this should be
* copied from blk_rq_pos(rq).
*/
if (error_sector)
*error_sector = bio->bi_iter.bi_sector;
bio_put(bio);
return ret;
}

View File

@@ -409,6 +409,13 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->tag_size = template->tag_size;
disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
pr_warn("blk-integrity: Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
blk_ksm_unregister(disk->queue);
}
#endif
}
EXPORT_SYMBOL(blk_integrity_register);

View File

@@ -260,6 +260,7 @@ enum {
VTIME_PER_SEC_SHIFT = 37,
VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT,
VTIME_PER_USEC = VTIME_PER_SEC / USEC_PER_SEC,
VTIME_PER_NSEC = VTIME_PER_SEC / NSEC_PER_SEC,
/* bound vrate adjustments within two orders of magnitude */
VRATE_MIN_PPM = 10000, /* 1% */
@@ -1206,14 +1207,14 @@ static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer)
return HRTIMER_NORESTART;
}
static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
{
struct ioc *ioc = iocg->ioc;
struct blkcg_gq *blkg = iocg_to_blkg(iocg);
u64 vtime = atomic64_read(&iocg->vtime);
u64 vmargin = ioc->margin_us * now->vrate;
u64 margin_ns = ioc->margin_us * NSEC_PER_USEC;
u64 expires, oexpires;
u64 delta_ns, expires, oexpires;
u32 hw_inuse;
lockdep_assert_held(&iocg->waitq.lock);
@@ -1236,15 +1237,10 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
return false;
/* use delay */
if (cost) {
u64 cost_ns = DIV64_U64_ROUND_UP(cost * NSEC_PER_USEC,
now->vrate);
blkcg_add_delay(blkg, now->now_ns, cost_ns);
}
blkcg_use_delay(blkg);
expires = now->now_ns + DIV64_U64_ROUND_UP(vtime - now->vnow,
now->vrate) * NSEC_PER_USEC;
delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow,
now->vrate) * NSEC_PER_USEC;
blkcg_set_delay(blkg, delta_ns);
expires = now->now_ns + delta_ns;
/* if already active and close enough, don't bother */
oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer));
@@ -1265,7 +1261,7 @@ static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer)
spin_lock_irqsave(&iocg->waitq.lock, flags);
ioc_now(iocg->ioc, &now);
iocg_kick_delay(iocg, &now, 0);
iocg_kick_delay(iocg, &now);
spin_unlock_irqrestore(&iocg->waitq.lock, flags);
return HRTIMER_NORESTART;
@@ -1383,7 +1379,7 @@ static void ioc_timer_fn(struct timer_list *timer)
if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) {
/* might be oversleeping vtime / hweight changes, kick */
iocg_kick_waitq(iocg, &now);
iocg_kick_delay(iocg, &now, 0);
iocg_kick_delay(iocg, &now);
} else if (iocg_is_idle(iocg)) {
/* no waiter and idle, deactivate */
iocg->last_inuse = iocg->inuse;
@@ -1543,19 +1539,39 @@ skip_surplus_transfers:
if (rq_wait_pct > RQ_WAIT_BUSY_PCT ||
missed_ppm[READ] > ppm_rthr ||
missed_ppm[WRITE] > ppm_wthr) {
/* clearly missing QoS targets, slow down vrate */
ioc->busy_level = max(ioc->busy_level, 0);
ioc->busy_level++;
} else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 &&
missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) {
/* take action iff there is contention */
if (nr_shortages && !nr_lagging) {
/* QoS targets are being met with >25% margin */
if (nr_shortages) {
/*
* We're throttling while the device has spare
* capacity. If vrate was being slowed down, stop.
*/
ioc->busy_level = min(ioc->busy_level, 0);
/* redistribute surpluses first */
if (!nr_surpluses)
/*
* If there are IOs spanning multiple periods, wait
* them out before pushing the device harder. If
* there are surpluses, let redistribution work it
* out first.
*/
if (!nr_lagging && !nr_surpluses)
ioc->busy_level--;
} else {
/*
* Nobody is being throttled and the users aren't
* issuing enough IOs to saturate the device. We
* simply don't know how close the device is to
* saturation. Coast.
*/
ioc->busy_level = 0;
}
} else {
/* inside the hysterisis margin, we're good */
ioc->busy_level = 0;
}
@@ -1678,6 +1694,31 @@ static u64 calc_vtime_cost(struct bio *bio, struct ioc_gq *iocg, bool is_merge)
return cost;
}
static void calc_size_vtime_cost_builtin(struct request *rq, struct ioc *ioc,
u64 *costp)
{
unsigned int pages = blk_rq_stats_sectors(rq) >> IOC_SECT_TO_PAGE_SHIFT;
switch (req_op(rq)) {
case REQ_OP_READ:
*costp = pages * ioc->params.lcoefs[LCOEF_RPAGE];
break;
case REQ_OP_WRITE:
*costp = pages * ioc->params.lcoefs[LCOEF_WPAGE];
break;
default:
*costp = 0;
}
}
static u64 calc_size_vtime_cost(struct request *rq, struct ioc *ioc)
{
u64 cost;
calc_size_vtime_cost_builtin(rq, ioc, &cost);
return cost;
}
static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
{
struct blkcg_gq *blkg = bio->bi_blkg;
@@ -1762,7 +1803,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
*/
if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) {
iocg->abs_vdebt += abs_cost;
if (iocg_kick_delay(iocg, &now, cost))
if (iocg_kick_delay(iocg, &now))
blkcg_schedule_throttle(rqos->q,
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
spin_unlock_irq(&iocg->waitq.lock);
@@ -1850,7 +1891,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
spin_lock_irqsave(&iocg->waitq.lock, flags);
if (likely(!list_empty(&iocg->active_list))) {
iocg->abs_vdebt += abs_cost;
iocg_kick_delay(iocg, &now, cost);
iocg_kick_delay(iocg, &now);
} else {
iocg_commit_bio(iocg, bio, cost);
}
@@ -1868,7 +1909,7 @@ static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
{
struct ioc *ioc = rqos_to_ioc(rqos);
u64 on_q_ns, rq_wait_ns;
u64 on_q_ns, rq_wait_ns, size_nsec;
int pidx, rw;
if (!ioc->enabled || !rq->alloc_time_ns || !rq->start_time_ns)
@@ -1889,8 +1930,10 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
if (on_q_ns <= ioc->params.qos[pidx] * NSEC_PER_USEC)
if (on_q_ns <= size_nsec ||
on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC)
this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_met);
else
this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_missed);
@@ -2297,6 +2340,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
spin_lock_irq(&ioc->lock);
if (enable) {
blk_stat_enable_accounting(ioc->rqos.q);
blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, ioc->rqos.q);
ioc->enabled = true;
} else {

View File

@@ -257,6 +257,7 @@ out_bmd:
static struct bio *bio_map_user_iov(struct request_queue *q,
struct iov_iter *iter, gfp_t gfp_mask)
{
unsigned int max_sectors = queue_max_hw_sectors(q);
int j;
struct bio *bio;
int ret;
@@ -294,8 +295,8 @@ static struct bio *bio_map_user_iov(struct request_queue *q,
if (n > bytes)
n = bytes;
if (!__bio_add_pc_page(q, bio, page, n, offs,
&same_page)) {
if (!bio_add_hw_page(q, bio, page, n, offs,
max_sectors, &same_page)) {
if (same_page)
put_page(page);
break;
@@ -549,6 +550,7 @@ int blk_rq_append_bio(struct request *rq, struct bio **bio)
rq->biotail->bi_next = *bio;
rq->biotail = *bio;
rq->__data_len += (*bio)->bi_iter.bi_size;
bio_crypt_free_ctx(*bio);
}
return 0;
@@ -654,8 +656,6 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
bio = rq->bio;
} while (iov_iter_count(&i));
if (!bio_flagged(bio, BIO_USER_MAPPED))
rq->rq_flags |= RQF_COPY_USER;
return 0;
unmap_rq:
@@ -731,7 +731,6 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
{
int reading = rq_data_dir(rq) == READ;
unsigned long addr = (unsigned long) kbuf;
int do_copy = 0;
struct bio *bio, *orig_bio;
int ret;
@@ -740,8 +739,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
if (!len || !kbuf)
return -EINVAL;
do_copy = !blk_rq_aligned(q, addr, len) || object_is_on_stack(kbuf);
if (do_copy)
if (!blk_rq_aligned(q, addr, len) || object_is_on_stack(kbuf))
bio = bio_copy_kern(q, kbuf, len, gfp_mask, reading);
else
bio = bio_map_kern(q, kbuf, len, gfp_mask);
@@ -752,9 +750,6 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
bio->bi_opf &= ~REQ_OP_MASK;
bio->bi_opf |= req_op(rq);
if (do_copy)
rq->rq_flags |= RQF_COPY_USER;
orig_bio = bio;
ret = blk_rq_append_bio(rq, &bio);
if (unlikely(ret)) {

View File

@@ -336,16 +336,6 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
/* there isn't chance to merge the splitted bio */
split->bi_opf |= REQ_NOMERGE;
/*
* Since we're recursing into make_request here, ensure
* that we mark this bio as already having entered the queue.
* If not, and the queue is going away, we can get stuck
* forever on waiting for the queue reference to drop. But
* that will never happen, as we're already holding a
* reference to it.
*/
bio_set_flag(*bio, BIO_QUEUE_ENTERED);
bio_chain(split, *bio);
trace_block_split(q, split, (*bio)->bi_iter.bi_sector);
generic_make_request(*bio);
@@ -519,44 +509,20 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
* map a request to scatterlist, return number of sg entries setup. Caller
* must make sure sg can hold rq->nr_phys_segments entries
*/
int blk_rq_map_sg(struct request_queue *q, struct request *rq,
struct scatterlist *sglist)
int __blk_rq_map_sg(struct request_queue *q, struct request *rq,
struct scatterlist *sglist, struct scatterlist **last_sg)
{
struct scatterlist *sg = NULL;
int nsegs = 0;
if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, &sg);
nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, last_sg);
else if (rq->bio && bio_op(rq->bio) == REQ_OP_WRITE_SAME)
nsegs = __blk_bvec_map_sg(bio_iovec(rq->bio), sglist, &sg);
nsegs = __blk_bvec_map_sg(bio_iovec(rq->bio), sglist, last_sg);
else if (rq->bio)
nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg);
nsegs = __blk_bios_map_sg(q, rq->bio, sglist, last_sg);
if (unlikely(rq->rq_flags & RQF_COPY_USER) &&
(blk_rq_bytes(rq) & q->dma_pad_mask)) {
unsigned int pad_len =
(q->dma_pad_mask & ~blk_rq_bytes(rq)) + 1;
sg->length += pad_len;
rq->extra_len += pad_len;
}
if (q->dma_drain_size && q->dma_drain_needed(rq)) {
if (op_is_write(req_op(rq)))
memset(q->dma_drain_buffer, 0, q->dma_drain_size);
sg_unmark_end(sg);
sg = sg_next(sg);
sg_set_page(sg, virt_to_page(q->dma_drain_buffer),
q->dma_drain_size,
((unsigned long)q->dma_drain_buffer) &
(PAGE_SIZE - 1));
nsegs++;
rq->extra_len += q->dma_drain_size;
}
if (sg)
sg_mark_end(sg);
if (*last_sg)
sg_mark_end(*last_sg);
/*
* Something must have been wrong if the figured number of
@@ -566,7 +532,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
return nsegs;
}
EXPORT_SYMBOL(blk_rq_map_sg);
EXPORT_SYMBOL(__blk_rq_map_sg);
static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
unsigned int nr_phys_segs)
@@ -596,6 +562,8 @@ int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
if (blk_integrity_rq(req) &&
integrity_req_gap_back_merge(req, bio))
return 0;
if (!bio_crypt_ctx_back_mergeable(req, bio))
return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
req_set_nomerge(req->q, req);
@@ -612,6 +580,8 @@ int ll_front_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs
if (blk_integrity_rq(req) &&
integrity_req_gap_front_merge(req, bio))
return 0;
if (!bio_crypt_ctx_front_mergeable(req, bio))
return 0;
if (blk_rq_sectors(req) + bio_sectors(bio) >
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
req_set_nomerge(req->q, req);
@@ -661,6 +631,9 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
if (blk_integrity_merge_rq(q, req, next) == false)
return 0;
if (!bio_crypt_ctx_merge_rq(req, next))
return 0;
/* Merge is OK... */
req->nr_phys_segments = total_phys_segments;
return 1;
@@ -696,20 +669,17 @@ void blk_rq_set_mixed_merge(struct request *rq)
rq->rq_flags |= RQF_MIXED_MERGE;
}
static void blk_account_io_merge(struct request *req)
static void blk_account_io_merge_request(struct request *req)
{
if (blk_do_io_stat(req)) {
struct hd_struct *part;
part_stat_lock();
part = req->part;
part_dec_in_flight(req->q, part, rq_data_dir(req));
hd_struct_put(part);
part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
part_stat_unlock();
hd_struct_put(req->part);
}
}
/*
* Two cases of handling DISCARD merge:
* If max_discard_segments > 1, the driver takes every bio
@@ -821,7 +791,7 @@ static struct request *attempt_merge(struct request_queue *q,
/*
* 'next' is going away, so update stats accordingly
*/
blk_account_io_merge(next);
blk_account_io_merge_request(next);
/*
* ownership of bio passed from next to req, return 'next' for
@@ -885,6 +855,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
if (blk_integrity_merge_bio(rq->q, rq, bio) == false)
return false;
/* Only merge if the crypt contexts are compatible */
if (!bio_crypt_rq_ctx_compatible(rq, bio))
return false;
/* must be using the same buffer */
if (req_op(rq) == REQ_OP_WRITE_SAME &&
!blk_write_same_mergeable(rq->bio, bio))

View File

@@ -213,6 +213,7 @@ static const char *const hctx_state_name[] = {
HCTX_STATE_NAME(STOPPED),
HCTX_STATE_NAME(TAG_ACTIVE),
HCTX_STATE_NAME(SCHED_RESTART),
HCTX_STATE_NAME(INACTIVE),
};
#undef HCTX_STATE_NAME
@@ -239,6 +240,7 @@ static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(TAG_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
HCTX_FLAG_NAME(STACKING),
};
#undef HCTX_FLAG_NAME
@@ -292,7 +294,6 @@ static const char *const rqf_name[] = {
RQF_NAME(MQ_INFLIGHT),
RQF_NAME(DONTPREP),
RQF_NAME(PREEMPT),
RQF_NAME(COPY_USER),
RQF_NAME(FAILED),
RQF_NAME(QUIET),
RQF_NAME(ELVPRIV),

View File

@@ -80,16 +80,22 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
blk_mq_run_hw_queue(hctx, true);
}
#define BLK_MQ_BUDGET_DELAY 3 /* ms units */
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* be run again. This is necessary to avoid starving flushes.
*/
static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
LIST_HEAD(rq_list);
int ret = 0;
do {
struct request *rq;
@@ -97,12 +103,25 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
break;
if (!list_empty_careful(&hctx->dispatch)) {
ret = -EAGAIN;
break;
}
if (!blk_mq_get_dispatch_budget(hctx))
break;
rq = e->type->ops.dispatch_request(hctx);
if (!rq) {
blk_mq_put_dispatch_budget(hctx);
/*
* We're releasing without dispatching. Holding the
* budget could have blocked any "hctx"s with the
* same queue and if we didn't dispatch then there's
* no guarantee anyone will kick the queue. Kick it
* ourselves.
*/
blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY);
break;
}
@@ -113,6 +132,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
*/
list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
return ret;
}
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
@@ -130,16 +151,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* to be run again. This is necessary to avoid starving flushes.
*/
static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
LIST_HEAD(rq_list);
struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
int ret = 0;
do {
struct request *rq;
if (!list_empty_careful(&hctx->dispatch)) {
ret = -EAGAIN;
break;
}
if (!sbitmap_any_bit_set(&hctx->ctx_map))
break;
@@ -149,6 +179,14 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
rq = blk_mq_dequeue_from_ctx(hctx, ctx);
if (!rq) {
blk_mq_put_dispatch_budget(hctx);
/*
* We're releasing without dispatching. Holding the
* budget could have blocked any "hctx"s with the
* same queue and if we didn't dispatch then there's
* no guarantee anyone will kick the queue. Kick it
* ourselves.
*/
blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY);
break;
}
@@ -165,21 +203,17 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
WRITE_ONCE(hctx->dispatch_from, ctx);
return ret;
}
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
int ret = 0;
LIST_HEAD(rq_list);
/* RCU or SRCU read lock is needed before checking quiesced flag */
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
return;
hctx->run++;
/*
* If we have previous entries on our dispatch list, grab them first for
* more fair dispatch.
@@ -208,19 +242,41 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
blk_mq_sched_mark_restart_hctx(hctx);
if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
if (has_sched_dispatch)
blk_mq_do_dispatch_sched(hctx);
ret = blk_mq_do_dispatch_sched(hctx);
else
blk_mq_do_dispatch_ctx(hctx);
ret = blk_mq_do_dispatch_ctx(hctx);
}
} else if (has_sched_dispatch) {
blk_mq_do_dispatch_sched(hctx);
ret = blk_mq_do_dispatch_sched(hctx);
} else if (hctx->dispatch_busy) {
/* dequeue request one by one from sw queue if queue is busy */
blk_mq_do_dispatch_ctx(hctx);
ret = blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
blk_mq_dispatch_rq_list(q, &rq_list, false);
}
return ret;
}
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
/* RCU or SRCU read lock is needed before checking quiesced flag */
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
return;
hctx->run++;
/*
* A return of -EAGAIN is an indication that hctx->dispatch is not
* empty and we must run again in order to avoid starving flushes.
*/
if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) {
if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN)
blk_mq_run_hw_queue(hctx, true);
}
}
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,

Some files were not shown because too many files have changed in this diff Show More