Commit Graph

7004 Commits

Author SHA1 Message Date
Linus Torvalds fe35fdb305 Merge tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:

 - Fix DM integrity shrink crash due to journal entry not being marked
   unused.

 - Fix DM bio polling to handle possibility that underlying device(s)
   return BLK_STS_AGAIN during submission.

 - Fix dm_io and dm_target_io flags race condition on Alpha.

 - Add some pr_err debugging to help debug cases when DM ioctl structure
   is corrupted.

* tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: fix bio polling to handle possibile BLK_STS_AGAIN
  dm: fix dm_io and dm_target_io flags race condition on Alpha
  dm integrity: set journal entry unused when shrinking device
  dm ioctl: log an error if the ioctl structure is corrupted
2022-04-01 15:57:27 -07:00
Ming Lei 5291984004 dm: fix bio polling to handle possibile BLK_STS_AGAIN
Expanded testing of DM's bio polling support (using more fio threads
to dm-linear ontop of null_blk) exposed the possibility for polled
bios to hang (repeatedly polling in io_uring) when null_blk responds
with BLK_STS_AGAIN (due to lack of resources):

1) io_complete_rw_iopoll() is called from blkdev_bio_end_io_async() to
   notify kiocb is done, that is the completion interface between block
   layer and io_uring

2) io_complete_rw_iopoll() is called from io_do_iopoll()

3) dm returns BLK_STS_AGAIN for one bio (on behalf of underlying
   driver), then io_complete_rw_iopoll is called, but io_do_iopoll()
   doesn't handle -EAGAIN at all (due to logic in io_rw_should_reissue)

4) reason for dm's BLK_STS_AGAIN is underlying null_blk driver ran out
   of requests (easier to reproduce by setting low hw_queue_depth).

5) dm should handle BLK_STS_AGAIN for POLLED underlying IO, and may
   retry in dm layer.

This fix adds REQ_POLLED specific BLK_STS_AGAIN handling to
dm_io_complete() that clears REQ_POLLED and requeues the bio to DM
using queue_io().

Fixes: b99fdcdc36 ("dm: support bio polling")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[snitzer: revised header, reused dm_io_complete's REQ_POLLED case]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 13:23:12 -04:00
Mikulas Patocka aad5b23ebf dm: fix dm_io and dm_target_io flags race condition on Alpha
Early alpha processors cannot write a single byte or short; they read 8
bytes, modify the value in registers and write back 8 bytes.

This could cause race condition in the structure dm_io - if the fields
flags and io_count are modified simultaneously.

Fix this bug by using 32-bit flags if we are on Alpha and if we are
compiling for a processor that doesn't have the byte-word-extension.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: bd4a6dd241 ("dm: reduce size of dm_io and dm_target_io structs")
[snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!]
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 13:19:27 -04:00
Mikulas Patocka cc09e8a9de dm integrity: set journal entry unused when shrinking device
Commit f6f72f32c2 ("dm integrity: don't replay journal data past the
end of the device") skips journal replay if the target sector points
beyond the end of the device. Unfortunatelly, it doesn't set the
journal entry unused, which resulted in this BUG being triggered:
BUG_ON(!journal_entry_is_unused(je))

Fix this by calling journal_entry_set_unused() for this case.

Fixes: f6f72f32c2 ("dm integrity: don't replay journal data past the end of the device")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
[snitzer: revised header]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 10:31:23 -04:00
Mikulas Patocka dbdcc906d9 dm ioctl: log an error if the ioctl structure is corrupted
This will help triage bugs when userspace is passing invalid ioctl
structure to the kernel.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
[snitzer: log errors using DMERR instead of DMWARN]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-04-01 10:29:43 -04:00
Linus Torvalds 266d17a8c0 Merge tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
 "Here is the set of driver core changes for 5.18-rc1.

  Not much here, primarily it was a bunch of cleanups and small updates:

   - kobj_type cleanups for default_groups

   - documentation updates

   - firmware loader minor changes

   - component common helper added and take advantage of it in many
     drivers (the largest part of this pull request).

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
  Documentation: update stable review cycle documentation
  drivers/base/dd.c : Remove the initial value of the global variable
  Documentation: update stable tree link
  Documentation: add link to stable release candidate tree
  devres: fix typos in comments
  Documentation: add note block surrounding security patch note
  samples/kobject: Use sysfs_emit instead of sprintf
  base: soc: Make soc_device_match() simpler and easier to read
  driver core: dd: fix return value of __setup handler
  driver core: Refactor sysfs and drv/bus remove hooks
  driver core: Refactor multiple copies of device cleanup
  scripts: get_abi.pl: Fix typo in help message
  kernfs: fix typos in comments
  kernfs: remove unneeded #if 0 guard
  ALSA: hda/realtek: Make use of the helper component_compare_dev_name
  video: omapfb: dss: Make use of the helper component_compare_dev
  power: supply: ab8500: Make use of the helper component_compare_dev
  ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of
  iommu/mediatek: Make use of the helper component_compare/release_of
  drm: of: Make use of the helper component_release_of
  ...
2022-03-28 12:41:28 -07:00
Linus Torvalds 561593a048 Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block
Pull NVMe write streams removal from Jens Axboe:
 "This removes the write streams support in NVMe. No vendor ever really
  shipped working support for this, and they are not interested in
  supporting it.

  With the NVMe support gone, we have nothing in the tree that supports
  this. Remove passing around of the hints.

  The only discussion point in this patchset imho is the fact that the
  file specific write hint setting/getting fcntl helpers will now return
  -1/EINVAL like they did before we supported write hints. No known
  applications use these functions, I only know of one prototype that I
  help do for RocksDB, and that's not used. That said, with a change
  like this, it's always a bit controversial. Alternatively, we could
  just make them return 0 and pretend it worked. It's placement based
  hints after all"

* tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block:
  fs: remove fs.f_write_hint
  fs: remove kiocb.ki_hint
  block: remove the per-bio/request write hint
  nvme: remove support or stream based temperature hint
2022-03-26 11:51:46 -07:00
Linus Torvalds 6f2689a766 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Linus Torvalds b1f8ccdaae Merge tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - Significant refactoring and fixing of how DM core does bio-based IO
   accounting with focus on fixing wildly inaccurate IO stats for
   dm-crypt (and other DM targets that defer bio submission in their own
   workqueues). End result is proper IO accounting, made possible by
   targets being updated to use the new dm_submit_bio_remap() interface.

 - Add hipri bio polling support (REQ_POLLED) to bio-based DM.

 - Reduce dm_io and dm_target_io structs so that a single dm_io (which
   contains dm_target_io and first clone bio) weighs in at 256 bytes.
   For reference the bio struct is 128 bytes.

 - Various other small cleanups, fixes or improvements in DM core and
   targets.

 - Update MAINTAINERS with my kernel.org email address to allow
   distinction between my "upstream" and "Red" Hats.

* tag 'for-5.18/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (46 commits)
  dm: consolidate spinlocks in dm_io struct
  dm: reduce size of dm_io and dm_target_io structs
  dm: switch dm_target_io booleans over to proper flags
  dm: switch dm_io booleans over to proper flags
  dm: update email address in MAINTAINERS
  dm: return void from __send_empty_flush
  dm: factor out dm_io_complete
  dm cache: use dm_submit_bio_remap
  dm: simplify dm_sumbit_bio_remap interface
  dm thin: use dm_submit_bio_remap
  dm: add WARN_ON_ONCE to dm_submit_bio_remap
  dm: support bio polling
  block: add ->poll_bio to block_device_operations
  dm mpath: use DMINFO instead of printk with KERN_INFO
  dm: stop using bdevname
  dm-zoned: remove the ->name field in struct dmz_dev
  dm: remove unnecessary local variables in __bind
  dm: requeue IO if mapping table not yet available
  dm io: remove stale comment block for dm_io()
  dm thin metadata: remove unused dm_thin_remove_block and __remove
  ...
2022-03-24 19:25:24 -07:00
Linus Torvalds 69d1dea852 Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:

 - NVMe updates via Christoph:
      - add vectored-io support for user-passthrough (Kanchan Joshi)
      - add verbose error logging (Alan Adamson)
      - support buffered I/O on block devices in nvmet (Chaitanya
        Kulkarni)
      - central discovery controller support (Martin Belanger)
      - fix and extended the globally unique idenfier validation
        (Christoph)
      - move away from the deprecated IDA APIs (Sagi Grimberg)
      - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
        Chaitanya Kulkarni)
      - add lockdep annotations for in-kernel sockets (Chris Leech)
      - use vmalloc for ANA log buffer (Hannes Reinecke)
      - kerneldoc fixes (Chaitanya Kulkarni)
      - cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
      - warn about shared namespaces without multipathing (Christoph)

 - MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
   Erik, Dirk)

 - loop cleanups and queue depth configuration (Chaitanya)

 - null_blk cleanups and fixes (Chaitanya)

 - Use descriptive init/exit names in virtio_blk (Randy)

 - Use bvec_kmap_local() in drivers (Christoph)

 - bcache fixes (Mingzhe)

 - xen blk-front persistent grant speedups (Juergen)

 - rnbd fix and cleanup (Gioh)

 - Misc fixes (Christophe, Colin)

* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
  virtio_blk: eliminate anonymous module_init & module_exit
  nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
  nvme: remove nvme_alloc_request and nvme_alloc_request_qid
  nvme: cleanup how disk->disk_name is assigned
  nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
  nvmet: use snprintf() with PAGE_SIZE in configfs
  nvmet: don't fold lines
  nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
  nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
  nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
  nvme-tcp: lockdep: annotate in-kernel sockets
  nvme-tcp: don't fold the line
  nvme-tcp: don't initialize ret variable
  nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
  nvme-multipath: use vmalloc for ANA log buffer
  xen/blkfront: speed up purge_persistent_grants()
  raid5: initialize the stripe_head embeeded bios as needed
  raid5-cache: statically allocate the recovery ra bio
  raid5-cache: fully initialize flush_bio when needed
  raid5-ppl: fully initialize the bio in ppl_new_iounit
  ...
2022-03-21 17:16:01 -07:00
Linus Torvalds 616355cc81 Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Mike Snitzer 4d7bca13dd dm: consolidate spinlocks in dm_io struct
No reason to have separate startio_lock and endio_lock given endio_lock
could be used during submission anyway.

This change leaves the dm_io struct weighing in at 256 bytes (down
from 272 bytes, so saves a cacheline).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:36 -04:00
Mike Snitzer bd4a6dd241 dm: reduce size of dm_io and dm_target_io structs
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:35 -04:00
Mike Snitzer 655f3aad7a dm: switch dm_target_io booleans over to proper flags
Add flags to dm_target_io and manage them using the same pattern used
for bi_flags in struct bio.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:35 -04:00
Mike Snitzer 82f6cdcc36 dm: switch dm_io booleans over to proper flags
Add flags to dm_io and manage them using the same pattern used for
bi_flags in struct bio.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-03-21 14:15:34 -04:00
Mike Snitzer 332f2b1e73 dm: return void from __send_empty_flush
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 16:52:11 -05:00
Mike Snitzer e27363472f dm: factor out dm_io_complete
Optimizes dm_io_dec_pending() slightly by avoiding local variables.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 16:52:10 -05:00
Mike Snitzer 69596f555b dm cache: use dm_submit_bio_remap
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:57 -05:00
Mike Snitzer b7f8dff098 dm: simplify dm_sumbit_bio_remap interface
Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the
need for dm_sumbit_bio_remap() callers to know whether they are
calling for a workqueue or from the original dm_submit_bio().

Add map_task to dm_io struct, record the map_task in alloc_io and
clear it after all target ->map() calls have completed. Update
dm_sumbit_bio_remap to check if 'current' matches io->map_task rather
than rely on passed 'from_rq' argument.

This change really simplifies the chore of porting each DM target to
using dm_sumbit_bio_remap() because there is no longer the risk of
programming error by not completely knowing all the different contexts
a particular method that calls dm_sumbit_bio_remap() might be used in.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:56 -05:00
Mike Snitzer a92512819b dm thin: use dm_submit_bio_remap
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:55 -05:00
Mike Snitzer 0a8e9599b9 dm: add WARN_ON_ONCE to dm_submit_bio_remap
If a target uses dm_submit_bio_remap() it should set
ti->accounts_remapped_io.

Also, switch dm_start_io_acct() WARN_ON to WARN_ON_ONCE.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-10 13:44:43 -05:00
Ming Lei b99fdcdc36 dm: support bio polling
Support bio polling (REQ_POLLED) in the following approach:

1) only support io polling on normal READ/WRITE, and other abnormal IOs
still fallback to IRQ mode, so the target io (and DM's clone bio) is
exactly inside the dm io.

2) hold one refcnt on io->io_count after submitting this dm bio with
REQ_POLLED

3) support dm native bio splitting, any dm io instance associated with
current bio will be added into one list which head is bio->bi_private
which will be recovered before ending this bio

4) implement .poll_bio() callback, call bio_poll() on the single target
bio inside the dm io which is retrieved via bio->bi_bio_drv_data; call
dm_io_dec_pending() after the target io is done in .poll_bio()

5) enable QUEUE_FLAG_POLL if all underlying queues enable QUEUE_FLAG_POLL,
which is based on Jeffle's previous patch.

These changes are good for a 30-35% IOPS improvement for polled IO.

For detailed test results please see (Jens, thanks for testing!):
https://listman.redhat.com/archives/dm-devel/2022-March/049868.html
or https://marc.info/?l=linux-block&m=164684246214700&w=2

Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2022-03-09 12:21:56 -05:00
Christoph Hellwig 03a6b195e8 raid5: initialize the stripe_head embeeded bios as needed
Use bio_init to initialize the bios when needed to the full state
instead of a partial initialization plus later setting of dev and op
and bio_reset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:13 -08:00
Christoph Hellwig 89f94b6440 raid5-cache: statically allocate the recovery ra bio
There is no need to preallocate the bio and reset it when use.  Just
allocate it on-stack and use a bvec places next to the pages used for
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:09 -08:00
Christoph Hellwig 0dd00cba99 raid5-cache: fully initialize flush_bio when needed
Stop using bio_reset and just initialize the bio fully when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>
2022-03-08 22:55:04 -08:00