mirror of
https://github.com/armbian/linux-cix.git
synced 2026-01-06 12:30:45 -08:00
Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
This commit is contained in:
@@ -14,6 +14,7 @@ Block
|
||||
cmdline-partition
|
||||
data-integrity
|
||||
deadline-iosched
|
||||
inline-encryption
|
||||
ioprio
|
||||
kyber-iosched
|
||||
null_blk
|
||||
|
||||
263
Documentation/block/inline-encryption.rst
Normal file
263
Documentation/block/inline-encryption.rst
Normal file
@@ -0,0 +1,263 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
Inline Encryption
|
||||
=================
|
||||
|
||||
Background
|
||||
==========
|
||||
|
||||
Inline encryption hardware sits logically between memory and the disk, and can
|
||||
en/decrypt data as it goes in/out of the disk. Inline encryption hardware has a
|
||||
fixed number of "keyslots" - slots into which encryption contexts (i.e. the
|
||||
encryption key, encryption algorithm, data unit size) can be programmed by the
|
||||
kernel at any time. Each request sent to the disk can be tagged with the index
|
||||
of a keyslot (and also a data unit number to act as an encryption tweak), and
|
||||
the inline encryption hardware will en/decrypt the data in the request with the
|
||||
encryption context programmed into that keyslot. This is very different from
|
||||
full disk encryption solutions like self encrypting drives/TCG OPAL/ATA
|
||||
Security standards, since with inline encryption, any block on disk could be
|
||||
encrypted with any encryption context the kernel chooses.
|
||||
|
||||
|
||||
Objective
|
||||
=========
|
||||
|
||||
We want to support inline encryption (IE) in the kernel.
|
||||
To allow for testing, we also want a crypto API fallback when actual
|
||||
IE hardware is absent. We also want IE to work with layered devices
|
||||
like dm and loopback (i.e. we want to be able to use the IE hardware
|
||||
of the underlying devices if present, or else fall back to crypto API
|
||||
en/decryption).
|
||||
|
||||
|
||||
Constraints and notes
|
||||
=====================
|
||||
|
||||
- IE hardware has a limited number of "keyslots" that can be programmed
|
||||
with an encryption context (key, algorithm, data unit size, etc.) at any time.
|
||||
One can specify a keyslot in a data request made to the device, and the
|
||||
device will en/decrypt the data using the encryption context programmed into
|
||||
that specified keyslot. When possible, we want to make multiple requests with
|
||||
the same encryption context share the same keyslot.
|
||||
|
||||
- We need a way for upper layers like filesystems to specify an encryption
|
||||
context to use for en/decrypting a struct bio, and a device driver (like UFS)
|
||||
needs to be able to use that encryption context when it processes the bio.
|
||||
|
||||
- We need a way for device drivers to expose their inline encryption
|
||||
capabilities in a unified way to the upper layers.
|
||||
|
||||
|
||||
Design
|
||||
======
|
||||
|
||||
We add a :c:type:`struct bio_crypt_ctx` to :c:type:`struct bio` that can
|
||||
represent an encryption context, because we need to be able to pass this
|
||||
encryption context from the upper layers (like the fs layer) to the
|
||||
device driver to act upon.
|
||||
|
||||
While IE hardware works on the notion of keyslots, the FS layer has no
|
||||
knowledge of keyslots - it simply wants to specify an encryption context to
|
||||
use while en/decrypting a bio.
|
||||
|
||||
We introduce a keyslot manager (KSM) that handles the translation from
|
||||
encryption contexts specified by the FS to keyslots on the IE hardware.
|
||||
This KSM also serves as the way IE hardware can expose its capabilities to
|
||||
upper layers. The generic mode of operation is: each device driver that wants
|
||||
to support IE will construct a KSM and set it up in its struct request_queue.
|
||||
Upper layers that want to use IE on this device can then use this KSM in
|
||||
the device's struct request_queue to translate an encryption context into
|
||||
a keyslot. The presence of the KSM in the request queue shall be used to mean
|
||||
that the device supports IE.
|
||||
|
||||
The KSM uses refcounts to track which keyslots are idle (either they have no
|
||||
encryption context programmed, or there are no in-flight struct bios
|
||||
referencing that keyslot). When a new encryption context needs a keyslot, it
|
||||
tries to find a keyslot that has already been programmed with the same
|
||||
encryption context, and if there is no such keyslot, it evicts the least
|
||||
recently used idle keyslot and programs the new encryption context into that
|
||||
one. If no idle keyslots are available, then the caller will sleep until there
|
||||
is at least one.
|
||||
|
||||
|
||||
blk-mq changes, other block layer changes and blk-crypto-fallback
|
||||
=================================================================
|
||||
|
||||
We add a pointer to a ``bi_crypt_context`` and ``keyslot`` to
|
||||
:c:type:`struct request`. These will be referred to as the ``crypto fields``
|
||||
for the request. This ``keyslot`` is the keyslot into which the
|
||||
``bi_crypt_context`` has been programmed in the KSM of the ``request_queue``
|
||||
that this request is being sent to.
|
||||
|
||||
We introduce ``block/blk-crypto-fallback.c``, which allows upper layers to remain
|
||||
blissfully unaware of whether or not real inline encryption hardware is present
|
||||
underneath. When a bio is submitted with a target ``request_queue`` that doesn't
|
||||
support the encryption context specified with the bio, the block layer will
|
||||
en/decrypt the bio with the blk-crypto-fallback.
|
||||
|
||||
If the bio is a ``WRITE`` bio, a bounce bio is allocated, and the data in the bio
|
||||
is encrypted stored in the bounce bio - blk-mq will then proceed to process the
|
||||
bounce bio as if it were not encrypted at all (except when blk-integrity is
|
||||
concerned). ``blk-crypto-fallback`` sets the bounce bio's ``bi_end_io`` to an
|
||||
internal function that cleans up the bounce bio and ends the original bio.
|
||||
|
||||
If the bio is a ``READ`` bio, the bio's ``bi_end_io`` (and also ``bi_private``)
|
||||
is saved and overwritten by ``blk-crypto-fallback`` to
|
||||
``bio_crypto_fallback_decrypt_bio``. The bio's ``bi_crypt_context`` is also
|
||||
overwritten with ``NULL``, so that to the rest of the stack, the bio looks
|
||||
as if it was a regular bio that never had an encryption context specified.
|
||||
``bio_crypto_fallback_decrypt_bio`` will decrypt the bio, restore the original
|
||||
``bi_end_io`` (and also ``bi_private``) and end the bio again.
|
||||
|
||||
Regardless of whether real inline encryption hardware is used or the
|
||||
blk-crypto-fallback is used, the ciphertext written to disk (and hence the
|
||||
on-disk format of data) will be the same (assuming the hardware's implementation
|
||||
of the algorithm being used adheres to spec and functions correctly).
|
||||
|
||||
If a ``request queue``'s inline encryption hardware claimed to support the
|
||||
encryption context specified with a bio, then it will not be handled by the
|
||||
``blk-crypto-fallback``. We will eventually reach a point in blk-mq when a
|
||||
:c:type:`struct request` needs to be allocated for that bio. At that point,
|
||||
blk-mq tries to program the encryption context into the ``request_queue``'s
|
||||
keyslot_manager, and obtain a keyslot, which it stores in its newly added
|
||||
``keyslot`` field. This keyslot is released when the request is completed.
|
||||
|
||||
When the first bio is added to a request, ``blk_crypto_rq_bio_prep`` is called,
|
||||
which sets the request's ``crypt_ctx`` to a copy of the bio's
|
||||
``bi_crypt_context``. bio_crypt_do_front_merge is called whenever a subsequent
|
||||
bio is merged to the front of the request, which updates the ``crypt_ctx`` of
|
||||
the request so that it matches the newly merged bio's ``bi_crypt_context``. In particular, the request keeps a copy of the ``bi_crypt_context`` of the first
|
||||
bio in its bio-list (blk-mq needs to be careful to maintain this invariant
|
||||
during bio and request merges).
|
||||
|
||||
To make it possible for inline encryption to work with request queue based
|
||||
layered devices, when a request is cloned, its ``crypto fields`` are cloned as
|
||||
well. When the cloned request is submitted, blk-mq programs the
|
||||
``bi_crypt_context`` of the request into the clone's request_queue's keyslot
|
||||
manager, and stores the returned keyslot in the clone's ``keyslot``.
|
||||
|
||||
|
||||
API presented to users of the block layer
|
||||
=========================================
|
||||
|
||||
``struct blk_crypto_key`` represents a crypto key (the raw key, size of the
|
||||
key, the crypto algorithm to use, the data unit size to use, and the number of
|
||||
bytes required to represent data unit numbers that will be specified with the
|
||||
``bi_crypt_context``).
|
||||
|
||||
``blk_crypto_init_key`` allows upper layers to initialize such a
|
||||
``blk_crypto_key``.
|
||||
|
||||
``bio_crypt_set_ctx`` should be called on any bio that a user of
|
||||
the block layer wants en/decrypted via inline encryption (or the
|
||||
blk-crypto-fallback, if hardware support isn't available for the desired
|
||||
crypto configuration). This function takes the ``blk_crypto_key`` and the
|
||||
data unit number (DUN) to use when en/decrypting the bio.
|
||||
|
||||
``blk_crypto_config_supported`` allows upper layers to query whether or not the
|
||||
an encryption context passed to request queue can be handled by blk-crypto
|
||||
(either by real inline encryption hardware, or by the blk-crypto-fallback).
|
||||
This is useful e.g. when blk-crypto-fallback is disabled, and the upper layer
|
||||
wants to use an algorithm that may not supported by hardware - this function
|
||||
lets the upper layer know ahead of time that the algorithm isn't supported,
|
||||
and the upper layer can fallback to something else if appropriate.
|
||||
|
||||
``blk_crypto_start_using_key`` - Upper layers must call this function on
|
||||
``blk_crypto_key`` and a ``request_queue`` before using the key with any bio
|
||||
headed for that ``request_queue``. This function ensures that either the
|
||||
hardware supports the key's crypto settings, or the crypto API fallback has
|
||||
transforms for the needed mode allocated and ready to go. Note that this
|
||||
function may allocate an ``skcipher``, and must not be called from the data
|
||||
path, since allocating ``skciphers`` from the data path can deadlock.
|
||||
|
||||
``blk_crypto_evict_key`` *must* be called by upper layers before a
|
||||
``blk_crypto_key`` is freed. Further, it *must* only be called only once
|
||||
there are no more in-flight requests that use that ``blk_crypto_key``.
|
||||
``blk_crypto_evict_key`` will ensure that a key is removed from any keyslots in
|
||||
inline encryption hardware that the key might have been programmed into (or the blk-crypto-fallback).
|
||||
|
||||
API presented to device drivers
|
||||
===============================
|
||||
|
||||
A :c:type:``struct blk_keyslot_manager`` should be set up by device drivers in
|
||||
the ``request_queue`` of the device. The device driver needs to call
|
||||
``blk_ksm_init`` on the ``blk_keyslot_manager``, which specifying the number of
|
||||
keyslots supported by the hardware.
|
||||
|
||||
The device driver also needs to tell the KSM how to actually manipulate the
|
||||
IE hardware in the device to do things like programming the crypto key into
|
||||
the IE hardware into a particular keyslot. All this is achieved through the
|
||||
:c:type:`struct blk_ksm_ll_ops` field in the KSM that the device driver
|
||||
must fill up after initing the ``blk_keyslot_manager``.
|
||||
|
||||
The KSM also handles runtime power management for the device when applicable
|
||||
(e.g. when it wants to program a crypto key into the IE hardware, the device
|
||||
must be runtime powered on) - so the device driver must also set the ``dev``
|
||||
field in the ksm to point to the `struct device` for the KSM to use for runtime
|
||||
power management.
|
||||
|
||||
``blk_ksm_reprogram_all_keys`` can be called by device drivers if the device
|
||||
needs each and every of its keyslots to be reprogrammed with the key it
|
||||
"should have" at the point in time when the function is called. This is useful
|
||||
e.g. if a device loses all its keys on runtime power down/up.
|
||||
|
||||
``blk_ksm_destroy`` should be called to free up all resources used by a keyslot
|
||||
manager upon ``blk_ksm_init``, once the ``blk_keyslot_manager`` is no longer
|
||||
needed.
|
||||
|
||||
|
||||
Layered Devices
|
||||
===============
|
||||
|
||||
Request queue based layered devices like dm-rq that wish to support IE need to
|
||||
create their own keyslot manager for their request queue, and expose whatever
|
||||
functionality they choose. When a layered device wants to pass a clone of that
|
||||
request to another ``request_queue``, blk-crypto will initialize and prepare the
|
||||
clone as necessary - see ``blk_crypto_insert_cloned_request`` in
|
||||
``blk-crypto.c``.
|
||||
|
||||
|
||||
Future Optimizations for layered devices
|
||||
========================================
|
||||
|
||||
Creating a keyslot manager for a layered device uses up memory for each
|
||||
keyslot, and in general, a layered device merely passes the request on to a
|
||||
"child" device, so the keyslots in the layered device itself are completely
|
||||
unused, and don't need any refcounting or keyslot programming. We can instead
|
||||
define a new type of KSM; the "passthrough KSM", that layered devices can use
|
||||
to advertise an unlimited number of keyslots, and support for any encryption
|
||||
algorithms they choose, while not actually using any memory for each keyslot.
|
||||
Another use case for the "passthrough KSM" is for IE devices that do not have a
|
||||
limited number of keyslots.
|
||||
|
||||
|
||||
Interaction between inline encryption and blk integrity
|
||||
=======================================================
|
||||
|
||||
At the time of this patch, there is no real hardware that supports both these
|
||||
features. However, these features do interact with each other, and it's not
|
||||
completely trivial to make them both work together properly. In particular,
|
||||
when a WRITE bio wants to use inline encryption on a device that supports both
|
||||
features, the bio will have an encryption context specified, after which
|
||||
its integrity information is calculated (using the plaintext data, since
|
||||
the encryption will happen while data is being written), and the data and
|
||||
integrity info is sent to the device. Obviously, the integrity info must be
|
||||
verified before the data is encrypted. After the data is encrypted, the device
|
||||
must not store the integrity info that it received with the plaintext data
|
||||
since that might reveal information about the plaintext data. As such, it must
|
||||
re-generate the integrity info from the ciphertext data and store that on disk
|
||||
instead. Another issue with storing the integrity info of the plaintext data is
|
||||
that it changes the on disk format depending on whether hardware inline
|
||||
encryption support is present or the kernel crypto API fallback is used (since
|
||||
if the fallback is used, the device will receive the integrity info of the
|
||||
ciphertext, not that of the plaintext).
|
||||
|
||||
Because there isn't any real hardware yet, it seems prudent to assume that
|
||||
hardware implementations might not implement both features together correctly,
|
||||
and disallow the combination for now. Whenever a device supports integrity, the
|
||||
kernel will pretend that the device does not support hardware inline encryption
|
||||
(by essentially setting the keyslot manager in the request_queue of the device
|
||||
to NULL). When the crypto API fallback is enabled, this means that all bios with
|
||||
and encryption context will use the fallback, and IO will complete as usual.
|
||||
When the fallback is disabled, a bio with an encryption context will be failed.
|
||||
@@ -146,6 +146,7 @@ config BLK_CGROUP_IOLATENCY
|
||||
config BLK_CGROUP_IOCOST
|
||||
bool "Enable support for cost model based cgroup IO controller"
|
||||
depends on BLK_CGROUP=y
|
||||
select BLK_RQ_IO_DATA_LEN
|
||||
select BLK_RQ_ALLOC_TIME
|
||||
---help---
|
||||
Enabling this option enables the .weight interface for cost
|
||||
@@ -185,6 +186,23 @@ config BLK_SED_OPAL
|
||||
Enabling this option enables users to setup/unlock/lock
|
||||
Locking ranges for SED devices using the Opal protocol.
|
||||
|
||||
config BLK_INLINE_ENCRYPTION
|
||||
bool "Enable inline encryption support in block layer"
|
||||
help
|
||||
Build the blk-crypto subsystem. Enabling this lets the
|
||||
block layer handle encryption, so users can take
|
||||
advantage of inline encryption hardware if present.
|
||||
|
||||
config BLK_INLINE_ENCRYPTION_FALLBACK
|
||||
bool "Enable crypto API fallback for blk-crypto"
|
||||
depends on BLK_INLINE_ENCRYPTION
|
||||
select CRYPTO
|
||||
select CRYPTO_SKCIPHER
|
||||
help
|
||||
Enabling this lets the block layer handle inline encryption
|
||||
by falling back to the kernel crypto API when inline
|
||||
encryption hardware is not present.
|
||||
|
||||
menu "Partition Types"
|
||||
|
||||
source "block/partitions/Kconfig"
|
||||
|
||||
@@ -36,3 +36,5 @@ obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
|
||||
obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
|
||||
obj-$(CONFIG_BLK_SED_OPAL) += sed-opal.o
|
||||
obj-$(CONFIG_BLK_PM) += blk-pm.o
|
||||
obj-$(CONFIG_BLK_INLINE_ENCRYPTION) += keyslot-manager.o blk-crypto.o
|
||||
obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
|
||||
|
||||
@@ -6073,7 +6073,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
|
||||
* comments on bfq_init_rq for the reason behind this delayed
|
||||
* preparation.
|
||||
*/
|
||||
static void bfq_prepare_request(struct request *rq, struct bio *bio)
|
||||
static void bfq_prepare_request(struct request *rq)
|
||||
{
|
||||
/*
|
||||
* Regardless of whether we have an icq attached, we have to
|
||||
|
||||
@@ -42,6 +42,9 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
|
||||
struct bio_set *bs = bio->bi_pool;
|
||||
unsigned inline_vecs;
|
||||
|
||||
if (WARN_ON_ONCE(bio_has_crypt_ctx(bio)))
|
||||
return ERR_PTR(-EOPNOTSUPP);
|
||||
|
||||
if (!bs || !mempool_initialized(&bs->bio_integrity_pool)) {
|
||||
bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs), gfp_mask);
|
||||
inline_vecs = nr_vecs;
|
||||
|
||||
184
block/bio.c
184
block/bio.c
@@ -18,6 +18,7 @@
|
||||
#include <linux/blk-cgroup.h>
|
||||
#include <linux/highmem.h>
|
||||
#include <linux/sched/sysctl.h>
|
||||
#include <linux/blk-crypto.h>
|
||||
|
||||
#include <trace/events/block.h>
|
||||
#include "blk.h"
|
||||
@@ -237,6 +238,8 @@ void bio_uninit(struct bio *bio)
|
||||
|
||||
if (bio_integrity(bio))
|
||||
bio_integrity_free(bio);
|
||||
|
||||
bio_crypt_free_ctx(bio);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_uninit);
|
||||
|
||||
@@ -708,6 +711,8 @@ struct bio *bio_clone_fast(struct bio *bio, gfp_t gfp_mask, struct bio_set *bs)
|
||||
|
||||
__bio_clone_fast(b, bio);
|
||||
|
||||
bio_crypt_clone(b, bio, gfp_mask);
|
||||
|
||||
if (bio_integrity(bio)) {
|
||||
int ret;
|
||||
|
||||
@@ -748,9 +753,14 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
|
||||
struct page *page, unsigned len, unsigned offset,
|
||||
bool *same_page)
|
||||
/*
|
||||
* Try to merge a page into a segment, while obeying the hardware segment
|
||||
* size limit. This is not for normal read/write bios, but for passthrough
|
||||
* or Zone Append operations that we can't split.
|
||||
*/
|
||||
static bool bio_try_merge_hw_seg(struct request_queue *q, struct bio *bio,
|
||||
struct page *page, unsigned len,
|
||||
unsigned offset, bool *same_page)
|
||||
{
|
||||
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
|
||||
unsigned long mask = queue_segment_boundary(q);
|
||||
@@ -765,38 +775,32 @@ static bool bio_try_merge_pc_page(struct request_queue *q, struct bio *bio,
|
||||
}
|
||||
|
||||
/**
|
||||
* __bio_add_pc_page - attempt to add page to passthrough bio
|
||||
* @q: the target queue
|
||||
* @bio: destination bio
|
||||
* @page: page to add
|
||||
* @len: vec entry length
|
||||
* @offset: vec entry offset
|
||||
* @same_page: return if the merge happen inside the same page
|
||||
* bio_add_hw_page - attempt to add a page to a bio with hw constraints
|
||||
* @q: the target queue
|
||||
* @bio: destination bio
|
||||
* @page: page to add
|
||||
* @len: vec entry length
|
||||
* @offset: vec entry offset
|
||||
* @max_sectors: maximum number of sectors that can be added
|
||||
* @same_page: return if the segment has been merged inside the same page
|
||||
*
|
||||
* Attempt to add a page to the bio_vec maplist. This can fail for a
|
||||
* number of reasons, such as the bio being full or target block device
|
||||
* limitations. The target block device must allow bio's up to PAGE_SIZE,
|
||||
* so it is always possible to add a single page to an empty bio.
|
||||
*
|
||||
* This should only be used by passthrough bios.
|
||||
* Add a page to a bio while respecting the hardware max_sectors, max_segment
|
||||
* and gap limitations.
|
||||
*/
|
||||
int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
|
||||
int bio_add_hw_page(struct request_queue *q, struct bio *bio,
|
||||
struct page *page, unsigned int len, unsigned int offset,
|
||||
bool *same_page)
|
||||
unsigned int max_sectors, bool *same_page)
|
||||
{
|
||||
struct bio_vec *bvec;
|
||||
|
||||
/*
|
||||
* cloned bio must not modify vec list
|
||||
*/
|
||||
if (unlikely(bio_flagged(bio, BIO_CLONED)))
|
||||
if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)))
|
||||
return 0;
|
||||
|
||||
if (((bio->bi_iter.bi_size + len) >> 9) > queue_max_hw_sectors(q))
|
||||
if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
|
||||
return 0;
|
||||
|
||||
if (bio->bi_vcnt > 0) {
|
||||
if (bio_try_merge_pc_page(q, bio, page, len, offset, same_page))
|
||||
if (bio_try_merge_hw_seg(q, bio, page, len, offset, same_page))
|
||||
return len;
|
||||
|
||||
/*
|
||||
@@ -823,11 +827,27 @@ int __bio_add_pc_page(struct request_queue *q, struct bio *bio,
|
||||
return len;
|
||||
}
|
||||
|
||||
/**
|
||||
* bio_add_pc_page - attempt to add page to passthrough bio
|
||||
* @q: the target queue
|
||||
* @bio: destination bio
|
||||
* @page: page to add
|
||||
* @len: vec entry length
|
||||
* @offset: vec entry offset
|
||||
*
|
||||
* Attempt to add a page to the bio_vec maplist. This can fail for a
|
||||
* number of reasons, such as the bio being full or target block device
|
||||
* limitations. The target block device must allow bio's up to PAGE_SIZE,
|
||||
* so it is always possible to add a single page to an empty bio.
|
||||
*
|
||||
* This should only be used by passthrough bios.
|
||||
*/
|
||||
int bio_add_pc_page(struct request_queue *q, struct bio *bio,
|
||||
struct page *page, unsigned int len, unsigned int offset)
|
||||
{
|
||||
bool same_page = false;
|
||||
return __bio_add_pc_page(q, bio, page, len, offset, &same_page);
|
||||
return bio_add_hw_page(q, bio, page, len, offset,
|
||||
queue_max_hw_sectors(q), &same_page);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_add_pc_page);
|
||||
|
||||
@@ -936,6 +956,7 @@ void bio_release_pages(struct bio *bio, bool mark_dirty)
|
||||
put_page(bvec->bv_page);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_release_pages);
|
||||
|
||||
static int __bio_iov_bvec_add_pages(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
@@ -1010,6 +1031,50 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||
{
|
||||
unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
|
||||
unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
|
||||
struct request_queue *q = bio->bi_disk->queue;
|
||||
unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
|
||||
struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
|
||||
struct page **pages = (struct page **)bv;
|
||||
ssize_t size, left;
|
||||
unsigned len, i;
|
||||
size_t offset;
|
||||
|
||||
if (WARN_ON_ONCE(!max_append_sectors))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* Move page array up in the allocated memory for the bio vecs as far as
|
||||
* possible so that we can start filling biovecs from the beginning
|
||||
* without overwriting the temporary page array.
|
||||
*/
|
||||
BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
|
||||
pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
|
||||
|
||||
size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
|
||||
if (unlikely(size <= 0))
|
||||
return size ? size : -EFAULT;
|
||||
|
||||
for (left = size, i = 0; left > 0; left -= len, i++) {
|
||||
struct page *page = pages[i];
|
||||
bool same_page = false;
|
||||
|
||||
len = min_t(size_t, PAGE_SIZE - offset, left);
|
||||
if (bio_add_hw_page(q, bio, page, len, offset,
|
||||
max_append_sectors, &same_page) != len)
|
||||
return -EINVAL;
|
||||
if (same_page)
|
||||
put_page(page);
|
||||
offset = 0;
|
||||
}
|
||||
|
||||
iov_iter_advance(iter, size);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* bio_iov_iter_get_pages - add user or kernel pages to a bio
|
||||
* @bio: bio to add pages to
|
||||
@@ -1039,16 +1104,23 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
|
||||
return -EINVAL;
|
||||
|
||||
do {
|
||||
if (is_bvec)
|
||||
ret = __bio_iov_bvec_add_pages(bio, iter);
|
||||
else
|
||||
ret = __bio_iov_iter_get_pages(bio, iter);
|
||||
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
|
||||
if (WARN_ON_ONCE(is_bvec))
|
||||
return -EINVAL;
|
||||
ret = __bio_iov_append_get_pages(bio, iter);
|
||||
} else {
|
||||
if (is_bvec)
|
||||
ret = __bio_iov_bvec_add_pages(bio, iter);
|
||||
else
|
||||
ret = __bio_iov_iter_get_pages(bio, iter);
|
||||
}
|
||||
} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
|
||||
|
||||
if (is_bvec)
|
||||
bio_set_flag(bio, BIO_NO_PAGE_REF);
|
||||
return bio->bi_vcnt ? 0 : ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
|
||||
|
||||
static void submit_bio_wait_endio(struct bio *bio)
|
||||
{
|
||||
@@ -1105,6 +1177,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
|
||||
if (bio_integrity(bio))
|
||||
bio_integrity_advance(bio, bytes);
|
||||
|
||||
bio_crypt_advance(bio, bytes);
|
||||
bio_advance_iter(bio, &bio->bi_iter, bytes);
|
||||
}
|
||||
EXPORT_SYMBOL(bio_advance);
|
||||
@@ -1303,55 +1376,6 @@ defer:
|
||||
schedule_work(&bio_dirty_work);
|
||||
}
|
||||
|
||||
void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
|
||||
{
|
||||
unsigned long stamp;
|
||||
again:
|
||||
stamp = READ_ONCE(part->stamp);
|
||||
if (unlikely(stamp != now)) {
|
||||
if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
|
||||
__part_stat_add(part, io_ticks, end ? now - stamp : 1);
|
||||
}
|
||||
}
|
||||
if (part->partno) {
|
||||
part = &part_to_disk(part)->part0;
|
||||
goto again;
|
||||
}
|
||||
}
|
||||
|
||||
void generic_start_io_acct(struct request_queue *q, int op,
|
||||
unsigned long sectors, struct hd_struct *part)
|
||||
{
|
||||
const int sgrp = op_stat_group(op);
|
||||
|
||||
part_stat_lock();
|
||||
|
||||
update_io_ticks(part, jiffies, false);
|
||||
part_stat_inc(part, ios[sgrp]);
|
||||
part_stat_add(part, sectors[sgrp], sectors);
|
||||
part_inc_in_flight(q, part, op_is_write(op));
|
||||
|
||||
part_stat_unlock();
|
||||
}
|
||||
EXPORT_SYMBOL(generic_start_io_acct);
|
||||
|
||||
void generic_end_io_acct(struct request_queue *q, int req_op,
|
||||
struct hd_struct *part, unsigned long start_time)
|
||||
{
|
||||
unsigned long now = jiffies;
|
||||
unsigned long duration = now - start_time;
|
||||
const int sgrp = op_stat_group(req_op);
|
||||
|
||||
part_stat_lock();
|
||||
|
||||
update_io_ticks(part, now, true);
|
||||
part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
|
||||
part_dec_in_flight(q, part, op_is_write(req_op));
|
||||
|
||||
part_stat_unlock();
|
||||
}
|
||||
EXPORT_SYMBOL(generic_end_io_acct);
|
||||
|
||||
static inline bool bio_remaining_done(struct bio *bio)
|
||||
{
|
||||
/*
|
||||
@@ -1445,6 +1469,10 @@ struct bio *bio_split(struct bio *bio, int sectors,
|
||||
BUG_ON(sectors <= 0);
|
||||
BUG_ON(sectors >= bio_sectors(bio));
|
||||
|
||||
/* Zone append commands cannot be split */
|
||||
if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
|
||||
return NULL;
|
||||
|
||||
split = bio_clone_fast(bio, gfp, bs);
|
||||
if (!split)
|
||||
return NULL;
|
||||
|
||||
@@ -1530,6 +1530,10 @@ static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now)
|
||||
{
|
||||
u64 old = atomic64_read(&blkg->delay_start);
|
||||
|
||||
/* negative use_delay means no scaling, see blkcg_set_delay() */
|
||||
if (atomic_read(&blkg->use_delay) < 0)
|
||||
return;
|
||||
|
||||
/*
|
||||
* We only want to scale down every second. The idea here is that we
|
||||
* want to delay people for min(delay_nsec, NSEC_PER_SEC) in a certain
|
||||
@@ -1717,6 +1721,8 @@ void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay)
|
||||
*/
|
||||
void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta)
|
||||
{
|
||||
if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0))
|
||||
return;
|
||||
blkcg_scale_delay(blkg, now);
|
||||
atomic64_add(delta, &blkg->delay_nsec);
|
||||
}
|
||||
|
||||
335
block/blk-core.c
335
block/blk-core.c
File diff suppressed because it is too large
Load Diff
657
block/blk-crypto-fallback.c
Normal file
657
block/blk-crypto-fallback.c
Normal file
File diff suppressed because it is too large
Load Diff
201
block/blk-crypto-internal.h
Normal file
201
block/blk-crypto-internal.h
Normal file
@@ -0,0 +1,201 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* Copyright 2019 Google LLC
|
||||
*/
|
||||
|
||||
#ifndef __LINUX_BLK_CRYPTO_INTERNAL_H
|
||||
#define __LINUX_BLK_CRYPTO_INTERNAL_H
|
||||
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
|
||||
/* Represents a crypto mode supported by blk-crypto */
|
||||
struct blk_crypto_mode {
|
||||
const char *cipher_str; /* crypto API name (for fallback case) */
|
||||
unsigned int keysize; /* key size in bytes */
|
||||
unsigned int ivsize; /* iv size in bytes */
|
||||
};
|
||||
|
||||
extern const struct blk_crypto_mode blk_crypto_modes[];
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
|
||||
void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
|
||||
unsigned int inc);
|
||||
|
||||
bool bio_crypt_rq_ctx_compatible(struct request *rq, struct bio *bio);
|
||||
|
||||
bool bio_crypt_ctx_mergeable(struct bio_crypt_ctx *bc1, unsigned int bc1_bytes,
|
||||
struct bio_crypt_ctx *bc2);
|
||||
|
||||
static inline bool bio_crypt_ctx_back_mergeable(struct request *req,
|
||||
struct bio *bio)
|
||||
{
|
||||
return bio_crypt_ctx_mergeable(req->crypt_ctx, blk_rq_bytes(req),
|
||||
bio->bi_crypt_context);
|
||||
}
|
||||
|
||||
static inline bool bio_crypt_ctx_front_mergeable(struct request *req,
|
||||
struct bio *bio)
|
||||
{
|
||||
return bio_crypt_ctx_mergeable(bio->bi_crypt_context,
|
||||
bio->bi_iter.bi_size, req->crypt_ctx);
|
||||
}
|
||||
|
||||
static inline bool bio_crypt_ctx_merge_rq(struct request *req,
|
||||
struct request *next)
|
||||
{
|
||||
return bio_crypt_ctx_mergeable(req->crypt_ctx, blk_rq_bytes(req),
|
||||
next->crypt_ctx);
|
||||
}
|
||||
|
||||
static inline void blk_crypto_rq_set_defaults(struct request *rq)
|
||||
{
|
||||
rq->crypt_ctx = NULL;
|
||||
rq->crypt_keyslot = NULL;
|
||||
}
|
||||
|
||||
static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
|
||||
{
|
||||
return rq->crypt_ctx;
|
||||
}
|
||||
|
||||
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
|
||||
|
||||
static inline bool bio_crypt_rq_ctx_compatible(struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline bool bio_crypt_ctx_front_mergeable(struct request *req,
|
||||
struct bio *bio)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline bool bio_crypt_ctx_back_mergeable(struct request *req,
|
||||
struct bio *bio)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline bool bio_crypt_ctx_merge_rq(struct request *req,
|
||||
struct request *next)
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline void blk_crypto_rq_set_defaults(struct request *rq) { }
|
||||
|
||||
static inline bool blk_crypto_rq_is_encrypted(struct request *rq)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
#endif /* CONFIG_BLK_INLINE_ENCRYPTION */
|
||||
|
||||
void __bio_crypt_advance(struct bio *bio, unsigned int bytes);
|
||||
static inline void bio_crypt_advance(struct bio *bio, unsigned int bytes)
|
||||
{
|
||||
if (bio_has_crypt_ctx(bio))
|
||||
__bio_crypt_advance(bio, bytes);
|
||||
}
|
||||
|
||||
void __bio_crypt_free_ctx(struct bio *bio);
|
||||
static inline void bio_crypt_free_ctx(struct bio *bio)
|
||||
{
|
||||
if (bio_has_crypt_ctx(bio))
|
||||
__bio_crypt_free_ctx(bio);
|
||||
}
|
||||
|
||||
static inline void bio_crypt_do_front_merge(struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
if (bio_has_crypt_ctx(bio))
|
||||
memcpy(rq->crypt_ctx->bc_dun, bio->bi_crypt_context->bc_dun,
|
||||
sizeof(rq->crypt_ctx->bc_dun));
|
||||
#endif
|
||||
}
|
||||
|
||||
bool __blk_crypto_bio_prep(struct bio **bio_ptr);
|
||||
static inline bool blk_crypto_bio_prep(struct bio **bio_ptr)
|
||||
{
|
||||
if (bio_has_crypt_ctx(*bio_ptr))
|
||||
return __blk_crypto_bio_prep(bio_ptr);
|
||||
return true;
|
||||
}
|
||||
|
||||
blk_status_t __blk_crypto_init_request(struct request *rq);
|
||||
static inline blk_status_t blk_crypto_init_request(struct request *rq)
|
||||
{
|
||||
if (blk_crypto_rq_is_encrypted(rq))
|
||||
return __blk_crypto_init_request(rq);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
void __blk_crypto_free_request(struct request *rq);
|
||||
static inline void blk_crypto_free_request(struct request *rq)
|
||||
{
|
||||
if (blk_crypto_rq_is_encrypted(rq))
|
||||
__blk_crypto_free_request(rq);
|
||||
}
|
||||
|
||||
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
|
||||
gfp_t gfp_mask);
|
||||
static inline void blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
|
||||
gfp_t gfp_mask)
|
||||
{
|
||||
if (bio_has_crypt_ctx(bio))
|
||||
__blk_crypto_rq_bio_prep(rq, bio, gfp_mask);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_insert_cloned_request - Prepare a cloned request to be inserted
|
||||
* into a request queue.
|
||||
* @rq: the request being queued
|
||||
*
|
||||
* Return: BLK_STS_OK on success, nonzero on error.
|
||||
*/
|
||||
static inline blk_status_t blk_crypto_insert_cloned_request(struct request *rq)
|
||||
{
|
||||
|
||||
if (blk_crypto_rq_is_encrypted(rq))
|
||||
return blk_crypto_init_request(rq);
|
||||
return BLK_STS_OK;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK
|
||||
|
||||
int blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num);
|
||||
|
||||
bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr);
|
||||
|
||||
int blk_crypto_fallback_evict_key(const struct blk_crypto_key *key);
|
||||
|
||||
#else /* CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK */
|
||||
|
||||
static inline int
|
||||
blk_crypto_fallback_start_using_mode(enum blk_crypto_mode_num mode_num)
|
||||
{
|
||||
pr_warn_once("crypto API fallback is disabled\n");
|
||||
return -ENOPKG;
|
||||
}
|
||||
|
||||
static inline bool blk_crypto_fallback_bio_prep(struct bio **bio_ptr)
|
||||
{
|
||||
pr_warn_once("crypto API fallback disabled; failing request.\n");
|
||||
(*bio_ptr)->bi_status = BLK_STS_NOTSUPP;
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline int
|
||||
blk_crypto_fallback_evict_key(const struct blk_crypto_key *key)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
#endif /* CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK */
|
||||
|
||||
#endif /* __LINUX_BLK_CRYPTO_INTERNAL_H */
|
||||
404
block/blk-crypto.c
Normal file
404
block/blk-crypto.c
Normal file
@@ -0,0 +1,404 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Copyright 2019 Google LLC
|
||||
*/
|
||||
|
||||
/*
|
||||
* Refer to Documentation/block/inline-encryption.rst for detailed explanation.
|
||||
*/
|
||||
|
||||
#define pr_fmt(fmt) "blk-crypto: " fmt
|
||||
|
||||
#include <linux/bio.h>
|
||||
#include <linux/blkdev.h>
|
||||
#include <linux/keyslot-manager.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "blk-crypto-internal.h"
|
||||
|
||||
const struct blk_crypto_mode blk_crypto_modes[] = {
|
||||
[BLK_ENCRYPTION_MODE_AES_256_XTS] = {
|
||||
.cipher_str = "xts(aes)",
|
||||
.keysize = 64,
|
||||
.ivsize = 16,
|
||||
},
|
||||
[BLK_ENCRYPTION_MODE_AES_128_CBC_ESSIV] = {
|
||||
.cipher_str = "essiv(cbc(aes),sha256)",
|
||||
.keysize = 16,
|
||||
.ivsize = 16,
|
||||
},
|
||||
[BLK_ENCRYPTION_MODE_ADIANTUM] = {
|
||||
.cipher_str = "adiantum(xchacha12,aes)",
|
||||
.keysize = 32,
|
||||
.ivsize = 32,
|
||||
},
|
||||
};
|
||||
|
||||
/*
|
||||
* This number needs to be at least (the number of threads doing IO
|
||||
* concurrently) * (maximum recursive depth of a bio), so that we don't
|
||||
* deadlock on crypt_ctx allocations. The default is chosen to be the same
|
||||
* as the default number of post read contexts in both EXT4 and F2FS.
|
||||
*/
|
||||
static int num_prealloc_crypt_ctxs = 128;
|
||||
|
||||
module_param(num_prealloc_crypt_ctxs, int, 0444);
|
||||
MODULE_PARM_DESC(num_prealloc_crypt_ctxs,
|
||||
"Number of bio crypto contexts to preallocate");
|
||||
|
||||
static struct kmem_cache *bio_crypt_ctx_cache;
|
||||
static mempool_t *bio_crypt_ctx_pool;
|
||||
|
||||
static int __init bio_crypt_ctx_init(void)
|
||||
{
|
||||
size_t i;
|
||||
|
||||
bio_crypt_ctx_cache = KMEM_CACHE(bio_crypt_ctx, 0);
|
||||
if (!bio_crypt_ctx_cache)
|
||||
goto out_no_mem;
|
||||
|
||||
bio_crypt_ctx_pool = mempool_create_slab_pool(num_prealloc_crypt_ctxs,
|
||||
bio_crypt_ctx_cache);
|
||||
if (!bio_crypt_ctx_pool)
|
||||
goto out_no_mem;
|
||||
|
||||
/* This is assumed in various places. */
|
||||
BUILD_BUG_ON(BLK_ENCRYPTION_MODE_INVALID != 0);
|
||||
|
||||
/* Sanity check that no algorithm exceeds the defined limits. */
|
||||
for (i = 0; i < BLK_ENCRYPTION_MODE_MAX; i++) {
|
||||
BUG_ON(blk_crypto_modes[i].keysize > BLK_CRYPTO_MAX_KEY_SIZE);
|
||||
BUG_ON(blk_crypto_modes[i].ivsize > BLK_CRYPTO_MAX_IV_SIZE);
|
||||
}
|
||||
|
||||
return 0;
|
||||
out_no_mem:
|
||||
panic("Failed to allocate mem for bio crypt ctxs\n");
|
||||
}
|
||||
subsys_initcall(bio_crypt_ctx_init);
|
||||
|
||||
void bio_crypt_set_ctx(struct bio *bio, const struct blk_crypto_key *key,
|
||||
const u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE], gfp_t gfp_mask)
|
||||
{
|
||||
struct bio_crypt_ctx *bc = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
|
||||
|
||||
bc->bc_key = key;
|
||||
memcpy(bc->bc_dun, dun, sizeof(bc->bc_dun));
|
||||
|
||||
bio->bi_crypt_context = bc;
|
||||
}
|
||||
|
||||
void __bio_crypt_free_ctx(struct bio *bio)
|
||||
{
|
||||
mempool_free(bio->bi_crypt_context, bio_crypt_ctx_pool);
|
||||
bio->bi_crypt_context = NULL;
|
||||
}
|
||||
|
||||
void __bio_crypt_clone(struct bio *dst, struct bio *src, gfp_t gfp_mask)
|
||||
{
|
||||
dst->bi_crypt_context = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
|
||||
*dst->bi_crypt_context = *src->bi_crypt_context;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__bio_crypt_clone);
|
||||
|
||||
/* Increments @dun by @inc, treating @dun as a multi-limb integer. */
|
||||
void bio_crypt_dun_increment(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
|
||||
unsigned int inc)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; inc && i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) {
|
||||
dun[i] += inc;
|
||||
/*
|
||||
* If the addition in this limb overflowed, then we need to
|
||||
* carry 1 into the next limb. Else the carry is 0.
|
||||
*/
|
||||
if (dun[i] < inc)
|
||||
inc = 1;
|
||||
else
|
||||
inc = 0;
|
||||
}
|
||||
}
|
||||
|
||||
void __bio_crypt_advance(struct bio *bio, unsigned int bytes)
|
||||
{
|
||||
struct bio_crypt_ctx *bc = bio->bi_crypt_context;
|
||||
|
||||
bio_crypt_dun_increment(bc->bc_dun,
|
||||
bytes >> bc->bc_key->data_unit_size_bits);
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true if @bc->bc_dun plus @bytes converted to data units is equal to
|
||||
* @next_dun, treating the DUNs as multi-limb integers.
|
||||
*/
|
||||
bool bio_crypt_dun_is_contiguous(const struct bio_crypt_ctx *bc,
|
||||
unsigned int bytes,
|
||||
const u64 next_dun[BLK_CRYPTO_DUN_ARRAY_SIZE])
|
||||
{
|
||||
int i;
|
||||
unsigned int carry = bytes >> bc->bc_key->data_unit_size_bits;
|
||||
|
||||
for (i = 0; i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) {
|
||||
if (bc->bc_dun[i] + carry != next_dun[i])
|
||||
return false;
|
||||
/*
|
||||
* If the addition in this limb overflowed, then we need to
|
||||
* carry 1 into the next limb. Else the carry is 0.
|
||||
*/
|
||||
if ((bc->bc_dun[i] + carry) < carry)
|
||||
carry = 1;
|
||||
else
|
||||
carry = 0;
|
||||
}
|
||||
|
||||
/* If the DUN wrapped through 0, don't treat it as contiguous. */
|
||||
return carry == 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Checks that two bio crypt contexts are compatible - i.e. that
|
||||
* they are mergeable except for data_unit_num continuity.
|
||||
*/
|
||||
static bool bio_crypt_ctx_compatible(struct bio_crypt_ctx *bc1,
|
||||
struct bio_crypt_ctx *bc2)
|
||||
{
|
||||
if (!bc1)
|
||||
return !bc2;
|
||||
|
||||
return bc2 && bc1->bc_key == bc2->bc_key;
|
||||
}
|
||||
|
||||
bool bio_crypt_rq_ctx_compatible(struct request *rq, struct bio *bio)
|
||||
{
|
||||
return bio_crypt_ctx_compatible(rq->crypt_ctx, bio->bi_crypt_context);
|
||||
}
|
||||
|
||||
/*
|
||||
* Checks that two bio crypt contexts are compatible, and also
|
||||
* that their data_unit_nums are continuous (and can hence be merged)
|
||||
* in the order @bc1 followed by @bc2.
|
||||
*/
|
||||
bool bio_crypt_ctx_mergeable(struct bio_crypt_ctx *bc1, unsigned int bc1_bytes,
|
||||
struct bio_crypt_ctx *bc2)
|
||||
{
|
||||
if (!bio_crypt_ctx_compatible(bc1, bc2))
|
||||
return false;
|
||||
|
||||
return !bc1 || bio_crypt_dun_is_contiguous(bc1, bc1_bytes, bc2->bc_dun);
|
||||
}
|
||||
|
||||
/* Check that all I/O segments are data unit aligned. */
|
||||
static bool bio_crypt_check_alignment(struct bio *bio)
|
||||
{
|
||||
const unsigned int data_unit_size =
|
||||
bio->bi_crypt_context->bc_key->crypto_cfg.data_unit_size;
|
||||
struct bvec_iter iter;
|
||||
struct bio_vec bv;
|
||||
|
||||
bio_for_each_segment(bv, bio, iter) {
|
||||
if (!IS_ALIGNED(bv.bv_len | bv.bv_offset, data_unit_size))
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
blk_status_t __blk_crypto_init_request(struct request *rq)
|
||||
{
|
||||
return blk_ksm_get_slot_for_key(rq->q->ksm, rq->crypt_ctx->bc_key,
|
||||
&rq->crypt_keyslot);
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_crypto_free_request - Uninitialize the crypto fields of a request.
|
||||
*
|
||||
* @rq: The request whose crypto fields to uninitialize.
|
||||
*
|
||||
* Completely uninitializes the crypto fields of a request. If a keyslot has
|
||||
* been programmed into some inline encryption hardware, that keyslot is
|
||||
* released. The rq->crypt_ctx is also freed.
|
||||
*/
|
||||
void __blk_crypto_free_request(struct request *rq)
|
||||
{
|
||||
blk_ksm_put_slot(rq->crypt_keyslot);
|
||||
mempool_free(rq->crypt_ctx, bio_crypt_ctx_pool);
|
||||
blk_crypto_rq_set_defaults(rq);
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_crypto_bio_prep - Prepare bio for inline encryption
|
||||
*
|
||||
* @bio_ptr: pointer to original bio pointer
|
||||
*
|
||||
* If the bio crypt context provided for the bio is supported by the underlying
|
||||
* device's inline encryption hardware, do nothing.
|
||||
*
|
||||
* Otherwise, try to perform en/decryption for this bio by falling back to the
|
||||
* kernel crypto API. When the crypto API fallback is used for encryption,
|
||||
* blk-crypto may choose to split the bio into 2 - the first one that will
|
||||
* continue to be processed and the second one that will be resubmitted via
|
||||
* generic_make_request. A bounce bio will be allocated to encrypt the contents
|
||||
* of the aforementioned "first one", and *bio_ptr will be updated to this
|
||||
* bounce bio.
|
||||
*
|
||||
* Caller must ensure bio has bio_crypt_ctx.
|
||||
*
|
||||
* Return: true on success; false on error (and bio->bi_status will be set
|
||||
* appropriately, and bio_endio() will have been called so bio
|
||||
* submission should abort).
|
||||
*/
|
||||
bool __blk_crypto_bio_prep(struct bio **bio_ptr)
|
||||
{
|
||||
struct bio *bio = *bio_ptr;
|
||||
const struct blk_crypto_key *bc_key = bio->bi_crypt_context->bc_key;
|
||||
|
||||
/* Error if bio has no data. */
|
||||
if (WARN_ON_ONCE(!bio_has_data(bio))) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
if (!bio_crypt_check_alignment(bio)) {
|
||||
bio->bi_status = BLK_STS_IOERR;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
/*
|
||||
* Success if device supports the encryption context, or if we succeeded
|
||||
* in falling back to the crypto API.
|
||||
*/
|
||||
if (blk_ksm_crypto_cfg_supported(bio->bi_disk->queue->ksm,
|
||||
&bc_key->crypto_cfg))
|
||||
return true;
|
||||
|
||||
if (blk_crypto_fallback_bio_prep(bio_ptr))
|
||||
return true;
|
||||
fail:
|
||||
bio_endio(*bio_ptr);
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* __blk_crypto_rq_bio_prep - Prepare a request's crypt_ctx when its first bio
|
||||
* is inserted
|
||||
*
|
||||
* @rq: The request to prepare
|
||||
* @bio: The first bio being inserted into the request
|
||||
* @gfp_mask: gfp mask
|
||||
*/
|
||||
void __blk_crypto_rq_bio_prep(struct request *rq, struct bio *bio,
|
||||
gfp_t gfp_mask)
|
||||
{
|
||||
if (!rq->crypt_ctx)
|
||||
rq->crypt_ctx = mempool_alloc(bio_crypt_ctx_pool, gfp_mask);
|
||||
*rq->crypt_ctx = *bio->bi_crypt_context;
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_init_key() - Prepare a key for use with blk-crypto
|
||||
* @blk_key: Pointer to the blk_crypto_key to initialize.
|
||||
* @raw_key: Pointer to the raw key. Must be the correct length for the chosen
|
||||
* @crypto_mode; see blk_crypto_modes[].
|
||||
* @crypto_mode: identifier for the encryption algorithm to use
|
||||
* @dun_bytes: number of bytes that will be used to specify the DUN when this
|
||||
* key is used
|
||||
* @data_unit_size: the data unit size to use for en/decryption
|
||||
*
|
||||
* Return: 0 on success, -errno on failure. The caller is responsible for
|
||||
* zeroizing both blk_key and raw_key when done with them.
|
||||
*/
|
||||
int blk_crypto_init_key(struct blk_crypto_key *blk_key, const u8 *raw_key,
|
||||
enum blk_crypto_mode_num crypto_mode,
|
||||
unsigned int dun_bytes,
|
||||
unsigned int data_unit_size)
|
||||
{
|
||||
const struct blk_crypto_mode *mode;
|
||||
|
||||
memset(blk_key, 0, sizeof(*blk_key));
|
||||
|
||||
if (crypto_mode >= ARRAY_SIZE(blk_crypto_modes))
|
||||
return -EINVAL;
|
||||
|
||||
mode = &blk_crypto_modes[crypto_mode];
|
||||
if (mode->keysize == 0)
|
||||
return -EINVAL;
|
||||
|
||||
if (dun_bytes == 0 || dun_bytes > BLK_CRYPTO_MAX_IV_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
if (!is_power_of_2(data_unit_size))
|
||||
return -EINVAL;
|
||||
|
||||
blk_key->crypto_cfg.crypto_mode = crypto_mode;
|
||||
blk_key->crypto_cfg.dun_bytes = dun_bytes;
|
||||
blk_key->crypto_cfg.data_unit_size = data_unit_size;
|
||||
blk_key->data_unit_size_bits = ilog2(data_unit_size);
|
||||
blk_key->size = mode->keysize;
|
||||
memcpy(blk_key->raw, raw_key, mode->keysize);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Check if bios with @cfg can be en/decrypted by blk-crypto (i.e. either the
|
||||
* request queue it's submitted to supports inline crypto, or the
|
||||
* blk-crypto-fallback is enabled and supports the cfg).
|
||||
*/
|
||||
bool blk_crypto_config_supported(struct request_queue *q,
|
||||
const struct blk_crypto_config *cfg)
|
||||
{
|
||||
return IS_ENABLED(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) ||
|
||||
blk_ksm_crypto_cfg_supported(q->ksm, cfg);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_start_using_key() - Start using a blk_crypto_key on a device
|
||||
* @key: A key to use on the device
|
||||
* @q: the request queue for the device
|
||||
*
|
||||
* Upper layers must call this function to ensure that either the hardware
|
||||
* supports the key's crypto settings, or the crypto API fallback has transforms
|
||||
* for the needed mode allocated and ready to go. This function may allocate
|
||||
* an skcipher, and *should not* be called from the data path, since that might
|
||||
* cause a deadlock
|
||||
*
|
||||
* Return: 0 on success; -ENOPKG if the hardware doesn't support the key and
|
||||
* blk-crypto-fallback is either disabled or the needed algorithm
|
||||
* is disabled in the crypto API; or another -errno code.
|
||||
*/
|
||||
int blk_crypto_start_using_key(const struct blk_crypto_key *key,
|
||||
struct request_queue *q)
|
||||
{
|
||||
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
|
||||
return 0;
|
||||
return blk_crypto_fallback_start_using_mode(key->crypto_cfg.crypto_mode);
|
||||
}
|
||||
|
||||
/**
|
||||
* blk_crypto_evict_key() - Evict a key from any inline encryption hardware
|
||||
* it may have been programmed into
|
||||
* @q: The request queue who's associated inline encryption hardware this key
|
||||
* might have been programmed into
|
||||
* @key: The key to evict
|
||||
*
|
||||
* Upper layers (filesystems) must call this function to ensure that a key is
|
||||
* evicted from any hardware that it might have been programmed into. The key
|
||||
* must not be in use by any in-flight IO when this function is called.
|
||||
*
|
||||
* Return: 0 on success or if key is not present in the q's ksm, -err on error.
|
||||
*/
|
||||
int blk_crypto_evict_key(struct request_queue *q,
|
||||
const struct blk_crypto_key *key)
|
||||
{
|
||||
if (blk_ksm_crypto_cfg_supported(q->ksm, &key->crypto_cfg))
|
||||
return blk_ksm_evict_key(q->ksm, key);
|
||||
|
||||
/*
|
||||
* If the request queue's associated inline encryption hardware didn't
|
||||
* have support for the key, then the key might have been programmed
|
||||
* into the fallback keyslot manager, so try to evict from there.
|
||||
*/
|
||||
return blk_crypto_fallback_evict_key(key);
|
||||
}
|
||||
@@ -55,7 +55,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
|
||||
rq->rq_disk = bd_disk;
|
||||
rq->end_io = done;
|
||||
|
||||
blk_account_io_start(rq, true);
|
||||
blk_account_io_start(rq);
|
||||
|
||||
/*
|
||||
* don't check dying flag for MQ because the request won't
|
||||
|
||||
@@ -258,7 +258,6 @@ static void flush_end_io(struct request *flush_rq, blk_status_t error)
|
||||
blk_flush_complete_seq(rq, fq, seq, error);
|
||||
}
|
||||
|
||||
fq->flush_queue_delayed = 0;
|
||||
spin_unlock_irqrestore(&fq->mq_flush_lock, flags);
|
||||
}
|
||||
|
||||
@@ -433,41 +432,20 @@ void blk_insert_flush(struct request *rq)
|
||||
* blkdev_issue_flush - queue a flush
|
||||
* @bdev: blockdev to issue flush for
|
||||
* @gfp_mask: memory allocation flags (for bio_alloc)
|
||||
* @error_sector: error sector
|
||||
*
|
||||
* Description:
|
||||
* Issue a flush for the block device in question. Caller can supply
|
||||
* room for storing the error offset in case of a flush error, if they
|
||||
* wish to.
|
||||
* Issue a flush for the block device in question.
|
||||
*/
|
||||
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
|
||||
sector_t *error_sector)
|
||||
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask)
|
||||
{
|
||||
struct request_queue *q;
|
||||
struct bio *bio;
|
||||
int ret = 0;
|
||||
|
||||
if (bdev->bd_disk == NULL)
|
||||
return -ENXIO;
|
||||
|
||||
q = bdev_get_queue(bdev);
|
||||
if (!q)
|
||||
return -ENXIO;
|
||||
|
||||
bio = bio_alloc(gfp_mask, 0);
|
||||
bio_set_dev(bio, bdev);
|
||||
bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
|
||||
|
||||
ret = submit_bio_wait(bio);
|
||||
|
||||
/*
|
||||
* The driver must store the error location in ->bi_sector, if
|
||||
* it supports it. For non-stacked drivers, this should be
|
||||
* copied from blk_rq_pos(rq).
|
||||
*/
|
||||
if (error_sector)
|
||||
*error_sector = bio->bi_iter.bi_sector;
|
||||
|
||||
bio_put(bio);
|
||||
return ret;
|
||||
}
|
||||
|
||||
@@ -409,6 +409,13 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
|
||||
bi->tag_size = template->tag_size;
|
||||
|
||||
disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
|
||||
|
||||
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
|
||||
if (disk->queue->ksm) {
|
||||
pr_warn("blk-integrity: Integrity and hardware inline encryption are not supported together. Disabling hardware inline encryption.\n");
|
||||
blk_ksm_unregister(disk->queue);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
EXPORT_SYMBOL(blk_integrity_register);
|
||||
|
||||
|
||||
@@ -260,6 +260,7 @@ enum {
|
||||
VTIME_PER_SEC_SHIFT = 37,
|
||||
VTIME_PER_SEC = 1LLU << VTIME_PER_SEC_SHIFT,
|
||||
VTIME_PER_USEC = VTIME_PER_SEC / USEC_PER_SEC,
|
||||
VTIME_PER_NSEC = VTIME_PER_SEC / NSEC_PER_SEC,
|
||||
|
||||
/* bound vrate adjustments within two orders of magnitude */
|
||||
VRATE_MIN_PPM = 10000, /* 1% */
|
||||
@@ -1206,14 +1207,14 @@ static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer)
|
||||
return HRTIMER_NORESTART;
|
||||
}
|
||||
|
||||
static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
|
||||
static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
|
||||
{
|
||||
struct ioc *ioc = iocg->ioc;
|
||||
struct blkcg_gq *blkg = iocg_to_blkg(iocg);
|
||||
u64 vtime = atomic64_read(&iocg->vtime);
|
||||
u64 vmargin = ioc->margin_us * now->vrate;
|
||||
u64 margin_ns = ioc->margin_us * NSEC_PER_USEC;
|
||||
u64 expires, oexpires;
|
||||
u64 delta_ns, expires, oexpires;
|
||||
u32 hw_inuse;
|
||||
|
||||
lockdep_assert_held(&iocg->waitq.lock);
|
||||
@@ -1236,15 +1237,10 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now, u64 cost)
|
||||
return false;
|
||||
|
||||
/* use delay */
|
||||
if (cost) {
|
||||
u64 cost_ns = DIV64_U64_ROUND_UP(cost * NSEC_PER_USEC,
|
||||
now->vrate);
|
||||
blkcg_add_delay(blkg, now->now_ns, cost_ns);
|
||||
}
|
||||
blkcg_use_delay(blkg);
|
||||
|
||||
expires = now->now_ns + DIV64_U64_ROUND_UP(vtime - now->vnow,
|
||||
now->vrate) * NSEC_PER_USEC;
|
||||
delta_ns = DIV64_U64_ROUND_UP(vtime - now->vnow,
|
||||
now->vrate) * NSEC_PER_USEC;
|
||||
blkcg_set_delay(blkg, delta_ns);
|
||||
expires = now->now_ns + delta_ns;
|
||||
|
||||
/* if already active and close enough, don't bother */
|
||||
oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->delay_timer));
|
||||
@@ -1265,7 +1261,7 @@ static enum hrtimer_restart iocg_delay_timer_fn(struct hrtimer *timer)
|
||||
|
||||
spin_lock_irqsave(&iocg->waitq.lock, flags);
|
||||
ioc_now(iocg->ioc, &now);
|
||||
iocg_kick_delay(iocg, &now, 0);
|
||||
iocg_kick_delay(iocg, &now);
|
||||
spin_unlock_irqrestore(&iocg->waitq.lock, flags);
|
||||
|
||||
return HRTIMER_NORESTART;
|
||||
@@ -1383,7 +1379,7 @@ static void ioc_timer_fn(struct timer_list *timer)
|
||||
if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt) {
|
||||
/* might be oversleeping vtime / hweight changes, kick */
|
||||
iocg_kick_waitq(iocg, &now);
|
||||
iocg_kick_delay(iocg, &now, 0);
|
||||
iocg_kick_delay(iocg, &now);
|
||||
} else if (iocg_is_idle(iocg)) {
|
||||
/* no waiter and idle, deactivate */
|
||||
iocg->last_inuse = iocg->inuse;
|
||||
@@ -1543,19 +1539,39 @@ skip_surplus_transfers:
|
||||
if (rq_wait_pct > RQ_WAIT_BUSY_PCT ||
|
||||
missed_ppm[READ] > ppm_rthr ||
|
||||
missed_ppm[WRITE] > ppm_wthr) {
|
||||
/* clearly missing QoS targets, slow down vrate */
|
||||
ioc->busy_level = max(ioc->busy_level, 0);
|
||||
ioc->busy_level++;
|
||||
} else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
|
||||
missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 &&
|
||||
missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) {
|
||||
/* take action iff there is contention */
|
||||
if (nr_shortages && !nr_lagging) {
|
||||
/* QoS targets are being met with >25% margin */
|
||||
if (nr_shortages) {
|
||||
/*
|
||||
* We're throttling while the device has spare
|
||||
* capacity. If vrate was being slowed down, stop.
|
||||
*/
|
||||
ioc->busy_level = min(ioc->busy_level, 0);
|
||||
/* redistribute surpluses first */
|
||||
if (!nr_surpluses)
|
||||
|
||||
/*
|
||||
* If there are IOs spanning multiple periods, wait
|
||||
* them out before pushing the device harder. If
|
||||
* there are surpluses, let redistribution work it
|
||||
* out first.
|
||||
*/
|
||||
if (!nr_lagging && !nr_surpluses)
|
||||
ioc->busy_level--;
|
||||
} else {
|
||||
/*
|
||||
* Nobody is being throttled and the users aren't
|
||||
* issuing enough IOs to saturate the device. We
|
||||
* simply don't know how close the device is to
|
||||
* saturation. Coast.
|
||||
*/
|
||||
ioc->busy_level = 0;
|
||||
}
|
||||
} else {
|
||||
/* inside the hysterisis margin, we're good */
|
||||
ioc->busy_level = 0;
|
||||
}
|
||||
|
||||
@@ -1678,6 +1694,31 @@ static u64 calc_vtime_cost(struct bio *bio, struct ioc_gq *iocg, bool is_merge)
|
||||
return cost;
|
||||
}
|
||||
|
||||
static void calc_size_vtime_cost_builtin(struct request *rq, struct ioc *ioc,
|
||||
u64 *costp)
|
||||
{
|
||||
unsigned int pages = blk_rq_stats_sectors(rq) >> IOC_SECT_TO_PAGE_SHIFT;
|
||||
|
||||
switch (req_op(rq)) {
|
||||
case REQ_OP_READ:
|
||||
*costp = pages * ioc->params.lcoefs[LCOEF_RPAGE];
|
||||
break;
|
||||
case REQ_OP_WRITE:
|
||||
*costp = pages * ioc->params.lcoefs[LCOEF_WPAGE];
|
||||
break;
|
||||
default:
|
||||
*costp = 0;
|
||||
}
|
||||
}
|
||||
|
||||
static u64 calc_size_vtime_cost(struct request *rq, struct ioc *ioc)
|
||||
{
|
||||
u64 cost;
|
||||
|
||||
calc_size_vtime_cost_builtin(rq, ioc, &cost);
|
||||
return cost;
|
||||
}
|
||||
|
||||
static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
|
||||
{
|
||||
struct blkcg_gq *blkg = bio->bi_blkg;
|
||||
@@ -1762,7 +1803,7 @@ static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
|
||||
*/
|
||||
if (bio_issue_as_root_blkg(bio) || fatal_signal_pending(current)) {
|
||||
iocg->abs_vdebt += abs_cost;
|
||||
if (iocg_kick_delay(iocg, &now, cost))
|
||||
if (iocg_kick_delay(iocg, &now))
|
||||
blkcg_schedule_throttle(rqos->q,
|
||||
(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
|
||||
spin_unlock_irq(&iocg->waitq.lock);
|
||||
@@ -1850,7 +1891,7 @@ static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
|
||||
spin_lock_irqsave(&iocg->waitq.lock, flags);
|
||||
if (likely(!list_empty(&iocg->active_list))) {
|
||||
iocg->abs_vdebt += abs_cost;
|
||||
iocg_kick_delay(iocg, &now, cost);
|
||||
iocg_kick_delay(iocg, &now);
|
||||
} else {
|
||||
iocg_commit_bio(iocg, bio, cost);
|
||||
}
|
||||
@@ -1868,7 +1909,7 @@ static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
|
||||
static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
|
||||
{
|
||||
struct ioc *ioc = rqos_to_ioc(rqos);
|
||||
u64 on_q_ns, rq_wait_ns;
|
||||
u64 on_q_ns, rq_wait_ns, size_nsec;
|
||||
int pidx, rw;
|
||||
|
||||
if (!ioc->enabled || !rq->alloc_time_ns || !rq->start_time_ns)
|
||||
@@ -1889,8 +1930,10 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
|
||||
|
||||
on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
|
||||
rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
|
||||
size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
|
||||
|
||||
if (on_q_ns <= ioc->params.qos[pidx] * NSEC_PER_USEC)
|
||||
if (on_q_ns <= size_nsec ||
|
||||
on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC)
|
||||
this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_met);
|
||||
else
|
||||
this_cpu_inc(ioc->pcpu_stat->missed[rw].nr_missed);
|
||||
@@ -2297,6 +2340,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
|
||||
spin_lock_irq(&ioc->lock);
|
||||
|
||||
if (enable) {
|
||||
blk_stat_enable_accounting(ioc->rqos.q);
|
||||
blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, ioc->rqos.q);
|
||||
ioc->enabled = true;
|
||||
} else {
|
||||
|
||||
@@ -257,6 +257,7 @@ out_bmd:
|
||||
static struct bio *bio_map_user_iov(struct request_queue *q,
|
||||
struct iov_iter *iter, gfp_t gfp_mask)
|
||||
{
|
||||
unsigned int max_sectors = queue_max_hw_sectors(q);
|
||||
int j;
|
||||
struct bio *bio;
|
||||
int ret;
|
||||
@@ -294,8 +295,8 @@ static struct bio *bio_map_user_iov(struct request_queue *q,
|
||||
if (n > bytes)
|
||||
n = bytes;
|
||||
|
||||
if (!__bio_add_pc_page(q, bio, page, n, offs,
|
||||
&same_page)) {
|
||||
if (!bio_add_hw_page(q, bio, page, n, offs,
|
||||
max_sectors, &same_page)) {
|
||||
if (same_page)
|
||||
put_page(page);
|
||||
break;
|
||||
@@ -549,6 +550,7 @@ int blk_rq_append_bio(struct request *rq, struct bio **bio)
|
||||
rq->biotail->bi_next = *bio;
|
||||
rq->biotail = *bio;
|
||||
rq->__data_len += (*bio)->bi_iter.bi_size;
|
||||
bio_crypt_free_ctx(*bio);
|
||||
}
|
||||
|
||||
return 0;
|
||||
@@ -654,8 +656,6 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
|
||||
bio = rq->bio;
|
||||
} while (iov_iter_count(&i));
|
||||
|
||||
if (!bio_flagged(bio, BIO_USER_MAPPED))
|
||||
rq->rq_flags |= RQF_COPY_USER;
|
||||
return 0;
|
||||
|
||||
unmap_rq:
|
||||
@@ -731,7 +731,6 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
|
||||
{
|
||||
int reading = rq_data_dir(rq) == READ;
|
||||
unsigned long addr = (unsigned long) kbuf;
|
||||
int do_copy = 0;
|
||||
struct bio *bio, *orig_bio;
|
||||
int ret;
|
||||
|
||||
@@ -740,8 +739,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
|
||||
if (!len || !kbuf)
|
||||
return -EINVAL;
|
||||
|
||||
do_copy = !blk_rq_aligned(q, addr, len) || object_is_on_stack(kbuf);
|
||||
if (do_copy)
|
||||
if (!blk_rq_aligned(q, addr, len) || object_is_on_stack(kbuf))
|
||||
bio = bio_copy_kern(q, kbuf, len, gfp_mask, reading);
|
||||
else
|
||||
bio = bio_map_kern(q, kbuf, len, gfp_mask);
|
||||
@@ -752,9 +750,6 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
|
||||
bio->bi_opf &= ~REQ_OP_MASK;
|
||||
bio->bi_opf |= req_op(rq);
|
||||
|
||||
if (do_copy)
|
||||
rq->rq_flags |= RQF_COPY_USER;
|
||||
|
||||
orig_bio = bio;
|
||||
ret = blk_rq_append_bio(rq, &bio);
|
||||
if (unlikely(ret)) {
|
||||
|
||||
@@ -336,16 +336,6 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
|
||||
/* there isn't chance to merge the splitted bio */
|
||||
split->bi_opf |= REQ_NOMERGE;
|
||||
|
||||
/*
|
||||
* Since we're recursing into make_request here, ensure
|
||||
* that we mark this bio as already having entered the queue.
|
||||
* If not, and the queue is going away, we can get stuck
|
||||
* forever on waiting for the queue reference to drop. But
|
||||
* that will never happen, as we're already holding a
|
||||
* reference to it.
|
||||
*/
|
||||
bio_set_flag(*bio, BIO_QUEUE_ENTERED);
|
||||
|
||||
bio_chain(split, *bio);
|
||||
trace_block_split(q, split, (*bio)->bi_iter.bi_sector);
|
||||
generic_make_request(*bio);
|
||||
@@ -519,44 +509,20 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
|
||||
* map a request to scatterlist, return number of sg entries setup. Caller
|
||||
* must make sure sg can hold rq->nr_phys_segments entries
|
||||
*/
|
||||
int blk_rq_map_sg(struct request_queue *q, struct request *rq,
|
||||
struct scatterlist *sglist)
|
||||
int __blk_rq_map_sg(struct request_queue *q, struct request *rq,
|
||||
struct scatterlist *sglist, struct scatterlist **last_sg)
|
||||
{
|
||||
struct scatterlist *sg = NULL;
|
||||
int nsegs = 0;
|
||||
|
||||
if (rq->rq_flags & RQF_SPECIAL_PAYLOAD)
|
||||
nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, &sg);
|
||||
nsegs = __blk_bvec_map_sg(rq->special_vec, sglist, last_sg);
|
||||
else if (rq->bio && bio_op(rq->bio) == REQ_OP_WRITE_SAME)
|
||||
nsegs = __blk_bvec_map_sg(bio_iovec(rq->bio), sglist, &sg);
|
||||
nsegs = __blk_bvec_map_sg(bio_iovec(rq->bio), sglist, last_sg);
|
||||
else if (rq->bio)
|
||||
nsegs = __blk_bios_map_sg(q, rq->bio, sglist, &sg);
|
||||
nsegs = __blk_bios_map_sg(q, rq->bio, sglist, last_sg);
|
||||
|
||||
if (unlikely(rq->rq_flags & RQF_COPY_USER) &&
|
||||
(blk_rq_bytes(rq) & q->dma_pad_mask)) {
|
||||
unsigned int pad_len =
|
||||
(q->dma_pad_mask & ~blk_rq_bytes(rq)) + 1;
|
||||
|
||||
sg->length += pad_len;
|
||||
rq->extra_len += pad_len;
|
||||
}
|
||||
|
||||
if (q->dma_drain_size && q->dma_drain_needed(rq)) {
|
||||
if (op_is_write(req_op(rq)))
|
||||
memset(q->dma_drain_buffer, 0, q->dma_drain_size);
|
||||
|
||||
sg_unmark_end(sg);
|
||||
sg = sg_next(sg);
|
||||
sg_set_page(sg, virt_to_page(q->dma_drain_buffer),
|
||||
q->dma_drain_size,
|
||||
((unsigned long)q->dma_drain_buffer) &
|
||||
(PAGE_SIZE - 1));
|
||||
nsegs++;
|
||||
rq->extra_len += q->dma_drain_size;
|
||||
}
|
||||
|
||||
if (sg)
|
||||
sg_mark_end(sg);
|
||||
if (*last_sg)
|
||||
sg_mark_end(*last_sg);
|
||||
|
||||
/*
|
||||
* Something must have been wrong if the figured number of
|
||||
@@ -566,7 +532,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
|
||||
|
||||
return nsegs;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_rq_map_sg);
|
||||
EXPORT_SYMBOL(__blk_rq_map_sg);
|
||||
|
||||
static inline int ll_new_hw_segment(struct request *req, struct bio *bio,
|
||||
unsigned int nr_phys_segs)
|
||||
@@ -596,6 +562,8 @@ int ll_back_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs)
|
||||
if (blk_integrity_rq(req) &&
|
||||
integrity_req_gap_back_merge(req, bio))
|
||||
return 0;
|
||||
if (!bio_crypt_ctx_back_mergeable(req, bio))
|
||||
return 0;
|
||||
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
||||
blk_rq_get_max_sectors(req, blk_rq_pos(req))) {
|
||||
req_set_nomerge(req->q, req);
|
||||
@@ -612,6 +580,8 @@ int ll_front_merge_fn(struct request *req, struct bio *bio, unsigned int nr_segs
|
||||
if (blk_integrity_rq(req) &&
|
||||
integrity_req_gap_front_merge(req, bio))
|
||||
return 0;
|
||||
if (!bio_crypt_ctx_front_mergeable(req, bio))
|
||||
return 0;
|
||||
if (blk_rq_sectors(req) + bio_sectors(bio) >
|
||||
blk_rq_get_max_sectors(req, bio->bi_iter.bi_sector)) {
|
||||
req_set_nomerge(req->q, req);
|
||||
@@ -661,6 +631,9 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
|
||||
if (blk_integrity_merge_rq(q, req, next) == false)
|
||||
return 0;
|
||||
|
||||
if (!bio_crypt_ctx_merge_rq(req, next))
|
||||
return 0;
|
||||
|
||||
/* Merge is OK... */
|
||||
req->nr_phys_segments = total_phys_segments;
|
||||
return 1;
|
||||
@@ -696,20 +669,17 @@ void blk_rq_set_mixed_merge(struct request *rq)
|
||||
rq->rq_flags |= RQF_MIXED_MERGE;
|
||||
}
|
||||
|
||||
static void blk_account_io_merge(struct request *req)
|
||||
static void blk_account_io_merge_request(struct request *req)
|
||||
{
|
||||
if (blk_do_io_stat(req)) {
|
||||
struct hd_struct *part;
|
||||
|
||||
part_stat_lock();
|
||||
part = req->part;
|
||||
|
||||
part_dec_in_flight(req->q, part, rq_data_dir(req));
|
||||
|
||||
hd_struct_put(part);
|
||||
part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
|
||||
part_stat_unlock();
|
||||
|
||||
hd_struct_put(req->part);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Two cases of handling DISCARD merge:
|
||||
* If max_discard_segments > 1, the driver takes every bio
|
||||
@@ -821,7 +791,7 @@ static struct request *attempt_merge(struct request_queue *q,
|
||||
/*
|
||||
* 'next' is going away, so update stats accordingly
|
||||
*/
|
||||
blk_account_io_merge(next);
|
||||
blk_account_io_merge_request(next);
|
||||
|
||||
/*
|
||||
* ownership of bio passed from next to req, return 'next' for
|
||||
@@ -885,6 +855,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
|
||||
if (blk_integrity_merge_bio(rq->q, rq, bio) == false)
|
||||
return false;
|
||||
|
||||
/* Only merge if the crypt contexts are compatible */
|
||||
if (!bio_crypt_rq_ctx_compatible(rq, bio))
|
||||
return false;
|
||||
|
||||
/* must be using the same buffer */
|
||||
if (req_op(rq) == REQ_OP_WRITE_SAME &&
|
||||
!blk_write_same_mergeable(rq->bio, bio))
|
||||
|
||||
@@ -213,6 +213,7 @@ static const char *const hctx_state_name[] = {
|
||||
HCTX_STATE_NAME(STOPPED),
|
||||
HCTX_STATE_NAME(TAG_ACTIVE),
|
||||
HCTX_STATE_NAME(SCHED_RESTART),
|
||||
HCTX_STATE_NAME(INACTIVE),
|
||||
};
|
||||
#undef HCTX_STATE_NAME
|
||||
|
||||
@@ -239,6 +240,7 @@ static const char *const hctx_flag_name[] = {
|
||||
HCTX_FLAG_NAME(TAG_SHARED),
|
||||
HCTX_FLAG_NAME(BLOCKING),
|
||||
HCTX_FLAG_NAME(NO_SCHED),
|
||||
HCTX_FLAG_NAME(STACKING),
|
||||
};
|
||||
#undef HCTX_FLAG_NAME
|
||||
|
||||
@@ -292,7 +294,6 @@ static const char *const rqf_name[] = {
|
||||
RQF_NAME(MQ_INFLIGHT),
|
||||
RQF_NAME(DONTPREP),
|
||||
RQF_NAME(PREEMPT),
|
||||
RQF_NAME(COPY_USER),
|
||||
RQF_NAME(FAILED),
|
||||
RQF_NAME(QUIET),
|
||||
RQF_NAME(ELVPRIV),
|
||||
|
||||
@@ -80,16 +80,22 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
|
||||
blk_mq_run_hw_queue(hctx, true);
|
||||
}
|
||||
|
||||
#define BLK_MQ_BUDGET_DELAY 3 /* ms units */
|
||||
|
||||
/*
|
||||
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
|
||||
* its queue by itself in its completion handler, so we don't need to
|
||||
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
|
||||
*
|
||||
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
|
||||
* be run again. This is necessary to avoid starving flushes.
|
||||
*/
|
||||
static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
||||
static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
LIST_HEAD(rq_list);
|
||||
int ret = 0;
|
||||
|
||||
do {
|
||||
struct request *rq;
|
||||
@@ -97,12 +103,25 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
||||
if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
|
||||
break;
|
||||
|
||||
if (!list_empty_careful(&hctx->dispatch)) {
|
||||
ret = -EAGAIN;
|
||||
break;
|
||||
}
|
||||
|
||||
if (!blk_mq_get_dispatch_budget(hctx))
|
||||
break;
|
||||
|
||||
rq = e->type->ops.dispatch_request(hctx);
|
||||
if (!rq) {
|
||||
blk_mq_put_dispatch_budget(hctx);
|
||||
/*
|
||||
* We're releasing without dispatching. Holding the
|
||||
* budget could have blocked any "hctx"s with the
|
||||
* same queue and if we didn't dispatch then there's
|
||||
* no guarantee anyone will kick the queue. Kick it
|
||||
* ourselves.
|
||||
*/
|
||||
blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -113,6 +132,8 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
|
||||
*/
|
||||
list_add(&rq->queuelist, &rq_list);
|
||||
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
|
||||
@@ -130,16 +151,25 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
|
||||
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
|
||||
* its queue by itself in its completion handler, so we don't need to
|
||||
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
|
||||
*
|
||||
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
|
||||
* to be run again. This is necessary to avoid starving flushes.
|
||||
*/
|
||||
static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
|
||||
static int blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
LIST_HEAD(rq_list);
|
||||
struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
|
||||
int ret = 0;
|
||||
|
||||
do {
|
||||
struct request *rq;
|
||||
|
||||
if (!list_empty_careful(&hctx->dispatch)) {
|
||||
ret = -EAGAIN;
|
||||
break;
|
||||
}
|
||||
|
||||
if (!sbitmap_any_bit_set(&hctx->ctx_map))
|
||||
break;
|
||||
|
||||
@@ -149,6 +179,14 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
|
||||
rq = blk_mq_dequeue_from_ctx(hctx, ctx);
|
||||
if (!rq) {
|
||||
blk_mq_put_dispatch_budget(hctx);
|
||||
/*
|
||||
* We're releasing without dispatching. Holding the
|
||||
* budget could have blocked any "hctx"s with the
|
||||
* same queue and if we didn't dispatch then there's
|
||||
* no guarantee anyone will kick the queue. Kick it
|
||||
* ourselves.
|
||||
*/
|
||||
blk_mq_delay_run_hw_queues(q, BLK_MQ_BUDGET_DELAY);
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -165,21 +203,17 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
|
||||
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
|
||||
|
||||
WRITE_ONCE(hctx->dispatch_from, ctx);
|
||||
return ret;
|
||||
}
|
||||
|
||||
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||
static int __blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
struct elevator_queue *e = q->elevator;
|
||||
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
|
||||
int ret = 0;
|
||||
LIST_HEAD(rq_list);
|
||||
|
||||
/* RCU or SRCU read lock is needed before checking quiesced flag */
|
||||
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
|
||||
return;
|
||||
|
||||
hctx->run++;
|
||||
|
||||
/*
|
||||
* If we have previous entries on our dispatch list, grab them first for
|
||||
* more fair dispatch.
|
||||
@@ -208,19 +242,41 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||
blk_mq_sched_mark_restart_hctx(hctx);
|
||||
if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
|
||||
if (has_sched_dispatch)
|
||||
blk_mq_do_dispatch_sched(hctx);
|
||||
ret = blk_mq_do_dispatch_sched(hctx);
|
||||
else
|
||||
blk_mq_do_dispatch_ctx(hctx);
|
||||
ret = blk_mq_do_dispatch_ctx(hctx);
|
||||
}
|
||||
} else if (has_sched_dispatch) {
|
||||
blk_mq_do_dispatch_sched(hctx);
|
||||
ret = blk_mq_do_dispatch_sched(hctx);
|
||||
} else if (hctx->dispatch_busy) {
|
||||
/* dequeue request one by one from sw queue if queue is busy */
|
||||
blk_mq_do_dispatch_ctx(hctx);
|
||||
ret = blk_mq_do_dispatch_ctx(hctx);
|
||||
} else {
|
||||
blk_mq_flush_busy_ctxs(hctx, &rq_list);
|
||||
blk_mq_dispatch_rq_list(q, &rq_list, false);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
|
||||
{
|
||||
struct request_queue *q = hctx->queue;
|
||||
|
||||
/* RCU or SRCU read lock is needed before checking quiesced flag */
|
||||
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
|
||||
return;
|
||||
|
||||
hctx->run++;
|
||||
|
||||
/*
|
||||
* A return of -EAGAIN is an indication that hctx->dispatch is not
|
||||
* empty and we must run again in order to avoid starving flushes.
|
||||
*/
|
||||
if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN) {
|
||||
if (__blk_mq_sched_dispatch_requests(hctx) == -EAGAIN)
|
||||
blk_mq_run_hw_queue(hctx, true);
|
||||
}
|
||||
}
|
||||
|
||||
bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user