Pull device mapper fixes from Mike Snitzer:
- Fix DM integrity shrink crash due to journal entry not being marked
unused.
- Fix DM bio polling to handle possibility that underlying device(s)
return BLK_STS_AGAIN during submission.
- Fix dm_io and dm_target_io flags race condition on Alpha.
- Add some pr_err debugging to help debug cases when DM ioctl structure
is corrupted.
* tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix bio polling to handle possibile BLK_STS_AGAIN
dm: fix dm_io and dm_target_io flags race condition on Alpha
dm integrity: set journal entry unused when shrinking device
dm ioctl: log an error if the ioctl structure is corrupted
Expanded testing of DM's bio polling support (using more fio threads
to dm-linear ontop of null_blk) exposed the possibility for polled
bios to hang (repeatedly polling in io_uring) when null_blk responds
with BLK_STS_AGAIN (due to lack of resources):
1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
notify kiocb is done, that is the completion interface between block
layer and io_uring
2) io_complete_rw_iopoll() is called from io_do_iopoll()
3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)
4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
of requests (easier to reproduce by setting low hw_queue_depth).
5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
retry in dm layer.
This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
using queue_io().
Fixes: b99fdcdc36 ("dm: support bio polling")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.
This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.
Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: bd4a6dd241 ("dm: reduce size of dm_io and dm_target_io structs")
[snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Commit f6f72f32c2 ("dm integrity: don't replay journal data past the
end of the device") skips journal replay if the target sector points
beyond the end of the device. Unfortunatelly, it doesn't set the
journal entry unused, which resulted in this BUG being triggered:
BUG_ON(!journal_entry_is_unused(je))
Fix this by calling journal_entry_set_unused() for this case.
Fixes: f6f72f32c2 ("dm integrity: don't replay journal data past the end of the device")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
[snitzer: revised header]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Pull driver core updates from Greg KH:
"Here is the set of driver core changes for 5.18-rc1.
Not much here, primarily it was a bunch of cleanups and small updates:
- kobj_type cleanups for default_groups
- documentation updates
- firmware loader minor changes
- component common helper added and take advantage of it in many
drivers (the largest part of this pull request).
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
Documentation: update stable review cycle documentation
drivers/base/dd.c : Remove the initial value of the global variable
Documentation: update stable tree link
Documentation: add link to stable release candidate tree
devres: fix typos in comments
Documentation: add note block surrounding security patch note
samples/kobject: Use sysfs_emit instead of sprintf
base: soc: Make soc_device_match() simpler and easier to read
driver core: dd: fix return value of __setup handler
driver core: Refactor sysfs and drv/bus remove hooks
driver core: Refactor multiple copies of device cleanup
scripts: get_abi.pl: Fix typo in help message
kernfs: fix typos in comments
kernfs: remove unneeded #if 0 guard
ALSA: hda/realtek: Make use of the helper component_compare_dev_name
video: omapfb: dss: Make use of the helper component_compare_dev
power: supply: ab8500: Make use of the helper component_compare_dev
ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of
iommu/mediatek: Make use of the helper component_compare/release_of
drm: of: Make use of the helper component_release_of
...
Pull NVMe write streams removal from Jens Axboe:
"This removes the write streams support in NVMe. No vendor ever really
shipped working support for this, and they are not interested in
supporting it.
With the NVMe support gone, we have nothing in the tree that supports
this. Remove passing around of the hints.
The only discussion point in this patchset imho is the fact that the
file specific write hint setting/getting fcntl helpers will now return
-1/EINVAL like they did before we supported write hints. No known
applications use these functions, I only know of one prototype that I
help do for RocksDB, and that's not used. That said, with a change
like this, it's always a bit controversial. Alternatively, we could
just make them return 0 and pretend it worked. It's placement based
hints after all"
* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
fs: remove fs.f_write_hint
fs: remove kiocb.ki_hint
block: remove the per-bio/request write hint
nvme: remove support or stream based temperature hint
Pull device mapper updates from Mike Snitzer:
- Significant refactoring and fixing of how DM core does bio-based IO
accounting with focus on fixing wildly inaccurate IO stats for
dm-crypt (and other DM targets that defer bio submission in their own
workqueues). End result is proper IO accounting, made possible by
targets being updated to use the new dm_submit_bio_remap() interface.
- Add hipri bio polling support (REQ_POLLED) to bio-based DM.
- Reduce dm_io and dm_target_io structs so that a single dm_io (which
contains dm_target_io and first clone bio) weighs in at 256 bytes.
For reference the bio struct is 128 bytes.
- Various other small cleanups, fixes or improvements in DM core and
targets.
- Update MAINTAINERS with my kernel.org email address to allow
distinction between my "upstream" and "Red" Hats.
* tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits)
dm: consolidate spinlocks in dm_io struct
dm: reduce size of dm_io and dm_target_io structs
dm: switch dm_target_io booleans over to proper flags
dm: switch dm_io booleans over to proper flags
dm: update email address in MAINTAINERS
dm: return void from __send_empty_flush
dm: factor out dm_io_complete
dm cache: use dm_submit_bio_remap
dm: simplify dm_sumbit_bio_remap interface
dm thin: use dm_submit_bio_remap
dm: add WARN_ON_ONCE to dm_submit_bio_remap
dm: support bio polling
block: add ->poll_bio to block_device_operations
dm mpath: use DMINFO instead of printk with KERN_INFO
dm: stop using bdevname
dm-zoned: remove the ->name field in struct dmz_dev
dm: remove unnecessary local variables in __bind
dm: requeue IO if mapping table not yet available
dm io: remove stale comment block for dm_io()
dm thin metadata: remove unused dm_thin_remove_block and __remove
...
Pull block driver updates from Jens Axboe:
- NVMe updates via Christoph:
- add vectored-io support for user-passthrough (Kanchan Joshi)
- add verbose error logging (Alan Adamson)
- support buffered I/O on block devices in nvmet (Chaitanya
Kulkarni)
- central discovery controller support (Martin Belanger)
- fix and extended the globally unique idenfier validation
(Christoph)
- move away from the deprecated IDA APIs (Sagi Grimberg)
- misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
Chaitanya Kulkarni)
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
- warn about shared namespaces without multipathing (Christoph)
- MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
Erik, Dirk)
- loop cleanups and queue depth configuration (Chaitanya)
- null_blk cleanups and fixes (Chaitanya)
- Use descriptive init/exit names in virtio_blk (Randy)
- Use bvec_kmap_local() in drivers (Christoph)
- bcache fixes (Mingzhe)
- xen blk-front persistent grant speedups (Juergen)
- rnbd fix and cleanup (Gioh)
- Misc fixes (Christophe, Colin)
* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
virtio_blk: eliminate anonymous module_init & module_exit
nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
nvme: remove nvme_alloc_request and nvme_alloc_request_qid
nvme: cleanup how disk->disk_name is assigned
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet: use snprintf() with PAGE_SIZE in configfs
nvmet: don't fold lines
nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
nvme-tcp: lockdep: annotate in-kernel sockets
nvme-tcp: don't fold the line
nvme-tcp: don't initialize ret variable
nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
nvme-multipath: use vmalloc for ANA log buffer
xen/blkfront: speed up purge_persistent_grants()
raid5: initialize the stripe_head embeeded bios as needed
raid5-cache: statically allocate the recovery ra bio
raid5-cache: fully initialize flush_bio when needed
raid5-ppl: fully initialize the bio in ppl_new_iounit
...
No reason to have separate startio_lock and endio_lock given endio_lock
could be used during submission anyway.
This change leaves the dm_io struct weighing in at 256 bytes (down
from 272 bytes, so saves a cacheline).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the
need for dm_sumbit_bio_remap() callers to know whether they are
calling for a workqueue or from the original dm_submit_bio().
Add map_task to dm_io struct, record the map_task in alloc_io and
clear it after all target ->map() calls have completed. Update
dm_sumbit_bio_remap to check if 'current' matches io->map_task rather
than rely on passed 'from_rq' argument.
This change really simplifies the chore of porting each DM target to
using dm_sumbit_bio_remap() because there is no longer the risk of
programming error by not completely knowing all the different contexts
a particular method that calls dm_sumbit_bio_remap() might be used in.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a target uses dm_submit_bio_remap() it should set
ti->accounts_remapped_io.
Also, switch dm_start_io_acct() WARN_ON to WARN_ON_ONCE.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Support bio polling (REQ_POLLED) in the following approach:
1) only support io polling on normal READ/WRITE, and other abnormal IOs
still fallback to IRQ mode, so the target io (and DM's clone bio) is
exactly inside the dm io.
2) hold one refcnt on io->io_count after submitting this dm bio with
REQ_POLLED
3) support dm native bio splitting, any dm io instance associated with
current bio will be added into one list which head is bio->bi_private
which will be recovered before ending this bio
4) implement .poll_bio() callback, call bio_poll() on the single target
bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
dm_io_dec_pending() after the target io is done in .poll_bio()
5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
which is based on Jeffle's previous patch.
These changes are good for a 30-35% IOPS improvement for polled IO.
For detailed test results please see (Jens, thanks for testing!):
https://listman.redhat.com/archives/dm-devel/2022-March/049868.html
or https://marc.info/?l=linux-block&m=164684246214700&w=2
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
There is no need to preallocate the bio and reset it when use. Just
allocate it on-stack and use a bvec places next to the pages used for
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
Stop using bio_reset and just initialize the bio fully when needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>