Commit Graph

5371 Commits

Author SHA1 Message Date
Linus Torvalds
582cd91f69 Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "Another nice round of removing more code than what is added, mostly
  due to Christoph's relentless pursuit of tech debt removal/cleanups.
  This pull request contains:

   - Two series of BFQ improvements (Paolo, Jan, Jia)

   - Block iov_iter improvements (Pavel)

   - bsg error path fix (Pan)

   - blk-mq scheduler improvements (Jan)

   - -EBUSY discard fix (Jan)

   - bvec allocation improvements (Ming, Christoph)

   - bio allocation and init improvements (Christoph)

   - Store bdev pointer in bio instead of gendisk + partno (Christoph)

   - Block trace point cleanups (Christoph)

   - hard read-only vs read-only split (Christoph)

   - Block based swap cleanups (Christoph)

   - Zoned write granularity support (Damien)

   - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"

* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
  mm: simplify swapdev_block
  sd_zbc: clear zone resources for non-zoned case
  block: introduce blk_queue_clear_zone_settings()
  zonefs: use zone write granularity as block size
  block: introduce zone_write_granularity limit
  block: use blk_queue_set_zoned in add_partition()
  nullb: use blk_queue_set_zoned() to setup zoned devices
  nvme: cleanup zone information initialization
  block: document zone_append_max_bytes attribute
  block: use bi_max_vecs to find the bvec pool
  md/raid10: remove dead code in reshape_request
  block: mark the bio as cloned in bio_iov_bvec_set
  block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
  block: remove a layer of indentation in bio_iov_iter_get_pages
  block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
  block: remove the 1 and 4 vec bvec_slabs entries
  block: streamline bvec_alloc
  block: factor out a bvec_alloc_gfp helper
  block: move struct biovec_slab to bio.c
  block: reuse BIO_INLINE_VECS for integrity bvecs
  ...
2021-02-21 11:02:48 -08:00
Damien Le Moal
508aebb805 block: introduce blk_queue_clear_zone_settings()
Introduce the internal function blk_queue_clear_zone_settings() to
cleanup all limits and resources related to zoned block devices. This
new function is called from blk_queue_set_zoned() when a disk zoned
model is set to BLK_ZONED_NONE. This particular case can happens when a
partition is created on a host-aware scsi disk.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:41 -07:00
Damien Le Moal
a805a4fa4f block: introduce zone_write_granularity limit
Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
all writes into sequential write required zones be aligned to the device
physical block size. However, NVMe ZNS does not have this constraint and
allows write operations into sequential zones to be aligned to the
device logical block size. This inconsistency does not help with
software portability across device types.

To solve this, introduce the zone_write_granularity queue limit to
indicate the alignment constraint, in bytes, of write operations into
zones of a zoned block device. This new limit is exported as a
read-only sysfs queue attribute and the helper
blk_queue_zone_write_granularity() introduced for drivers to set this
limit.

The function blk_queue_set_zoned() is modified to set this new limit to
the device logical block size by default. NVMe ZNS devices as well as
zoned nullb devices use this default value as is. The scsi disk driver
is modified to execute the blk_queue_zone_write_granularity() helper to
set the zone write granularity of host-managed SMR disks to the disk
physical block size.

The accessor functions queue_zone_write_granularity() and
bdev_zone_write_granularity() are also introduced.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Damien Le Moal
eafc63a9f7 block: use blk_queue_set_zoned in add_partition()
When changing the zoned model of host-aware zoned block devices, use
blk_queue_set_zoned() instead of directly assigning the gendisk queue
zoned limit.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Johannes Thumshirn
ae29333fa6 block: add bio_add_zone_append_page
Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
is intended to be used by file systems that directly add pages to a bio
instead of using bio_iov_iter_get_pages().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-09 00:52:19 +01:00
Christoph Hellwig
7a800a20ae block: use bi_max_vecs to find the bvec pool
Instead of encoding of the bvec pool using magic bio flags, just use
a helper to find the pool based on the max_vecs value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
977be01273 block: mark the bio as cloned in bio_iov_bvec_set
bio_iov_bvec_set clones the bio_vecs from the iter, and thus should be
treated like a cloned bio in every respect.  That also includes not
touching bi_max_vecs as that is a property of the bio allocation and not
its current payload.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
ed97ce5e1d block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
bio_iov_bvec_set assigns the foreign bvec, so setting the NO_PAGE_REF
directly there seems like the best fit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
86004515ed block: remove a layer of indentation in bio_iov_iter_get_pages
Remove a pointless layer of indentation after a return statement.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
0f2e6ab851 block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
The bi_max_vecs and bi_vcnt fields are defined as unsigned short, so
don't allow passing larger values in.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
de76fd8930 block: remove the 1 and 4 vec bvec_slabs entries
All bios with up to 4 bvecs use the inline bvecs in the bio itself, so
don't bother to define bvec_slabs entries for them.  Also decruftify
the bvec_slabs definition and initialization while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
f007a3d66c block: streamline bvec_alloc
Avoid the pointless goto by trying the slab allocation first and falling
through to the mempool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
f2c3eb9bb0 block: factor out a bvec_alloc_gfp helper
Clean up bvec_alloc a little by factoring out a helper for the gfp_t
manipulations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
6ac0b71537 block: move struct biovec_slab to bio.c
struct biovec_slab is only used inside of bio.c, so move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:16 -07:00
Christoph Hellwig
dc0b8a57ad block: reuse BIO_INLINE_VECS for integrity bvecs
bvec_alloc always uses biovec_slabs, and thus always needs to use the
same number of inline vecs.  Share a single definition for the data
and integrity bvecs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-08 08:33:15 -07:00
Linus Torvalds
eec7918121 Merge tag 'block-5.11-2021-02-05' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A few small regression fixes:

   - NVMe pull request from Christoph:
       - more quirks for buggy devices (Thorsten Leemhuis, Claus Stovgaard)
       - update the email address for Keith (Keith Busch)
       - fix an out of bounds access in nvmet-tcp (Sagi Grimberg)

   - Regression fix for BFQ shallow depth calculations introduced in
     this merge window (Lin)"

* tag 'block-5.11-2021-02-05' of git://git.kernel.dk/linux-block:
  nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
  bfq-iosched: Revert "bfq: Fix computation of shallow depth"
  update the email address for Keith Bush
  nvme-pci: ignore the subsysem NQN on Phison E16
  nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
2021-02-06 14:40:27 -08:00
Lin Feng
388c705b95 bfq-iosched: Revert "bfq: Fix computation of shallow depth"
This reverts commit 6d4d273588.

bfq.limit_depth passes word_depths[] as shallow_depth down to sbitmap core
sbitmap_get_shallow, which uses just the number to limit the scan depth of
each bitmap word, formula:
scan_percentage_for_each_word = shallow_depth / (1 << sbimap->shift) * 100%

That means the comments's percentiles 50%, 75%, 18%, 37% of bfq are correct.
But after commit patch 'bfq: Fix computation of shallow depth', we use
sbitmap.depth instead, as a example in following case:

sbitmap.depth = 256, map_nr = 4, shift = 6; sbitmap_word.depth = 64.
The resulsts of computed bfqd->word_depths[] are {128, 192, 48, 96}, and
three of the numbers exceed core dirver's 'sbitmap_word.depth=64' limit
nothing.

Signed-off-by: Lin Feng <linf@wangsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-02 20:37:08 -07:00
Ming Lei
8358c28a5d block: fix memory leak of bvec
bio_init() clears bio instance, so the bvec index has to be set after
bio_init(), otherwise bio->bi_io_vec may be leaked.

Fixes: 3175199ab0 ("block: split bio_kmalloc from bio_alloc_bioset")
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-02 08:57:56 -07:00
Linus Torvalds
2ba1c4d1a4 Merge tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "All over the place fixes for this release:

   - blk-cgroup iteration teardown resched fix (Baolin)

   - NVMe pull request from Christoph:
        - add another Write Zeroes quirk (Chaitanya Kulkarni)
        - handle a no path available corner case (Daniel Wagner)
        - use the proper RCU aware list_add helper (Chao Leng)

   - bcache regression fix (Coly)

   - bdev->bd_size_lock IRQ fix. This will be fixed in drivers for 5.12,
     but for now, we'll make it IRQ safe (Damien)

   - null_blk zoned init fix (Damien)

   - add_partition() error handling fix (Dinghao)

   - s390 dasd kobject fix (Jan)

   - nbd fix for freezing queue while adding connections (Josef)

   - tag queueing regression fix (Ming)

   - revert of a patch that inadvertently meant that we regressed write
     performance on raid (Maxim)"

* tag 'block-5.11-2021-01-29' of git://git.kernel.dk/linux-block:
  null_blk: cleanup zoned mode initialization
  nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
  nvme-multipath: Early exit if no path is available
  nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a SPCC device
  bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES
  block: fix bd_size_lock use
  blk-cgroup: Use cond_resched() when destroy blkgs
  Revert "block: simplify set_init_blocksize" to regain lost performance
  nbd: freeze the queue while we're adding connections
  s390/dasd: Fix inconsistent kobject removal
  block: Fix an error handling in add_partition
  blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue
2021-01-29 13:50:06 -08:00
Lukas Bulwahn
f7bf5e24e0 block: drop removed argument from kernel-doc of blk_execute_rq()
Commit 684da7628d ("block: remove unnecessary argument from
blk_execute_rq") changes the signature of blk_execute_rq(), but misses
to adjust its kernel-doc.

Hence, make htmldocs warns on ./block/blk-exec.c:78:

  warning: Excess function parameter 'q' description in 'blk_execute_rq'

Drop removed argument from kernel-doc of blk_execute_rq() as well.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Guoqing Jiang <Guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-29 07:43:29 -07:00
Lukas Bulwahn
7f31bee360 block: remove typo in kernel-doc of set_disk_ro()
Commit 52f019d43c ("block: add a hard-readonly flag to struct gendisk")
provides some kernel-doc for set_disk_ro(), but introduces a small typo.

Hence, make htmldocs warns on ./block/genhd.c:1441:

  warning: Function parameter or member 'read_only' not described in 'set_disk_ro'
  warning: Excess function parameter 'ready_only' description in 'set_disk_ro'

Remove that typo in the kernel-doc for set_disk_ro().

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-29 07:15:50 -07:00
Baolin Wang
6b4eeba331 blk-cgroup: Remove obsolete macro
Remove the obsolete 'MAX_KEY_LEN' macro.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-28 07:33:36 -07:00
Damien Le Moal
0fe37724f8 block: fix bd_size_lock use
Some block device drivers, e.g. the skd driver, call set_capacity() with
IRQ disabled. This results in lockdep ito complain about inconsistent
lock states ("inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage")
because set_capacity takes a block device bd_size_lock using the
functions spin_lock() and spin_unlock(). Ensure a consistent locking
state by replacing these calls with spin_lock_irqsave() and
spin_lock_irqrestore(). The same applies to bdev_set_nr_sectors().
With this fix, all lockdep complaints are resolved.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-28 07:31:50 -07:00
Baolin Wang
6c635caef4 blk-cgroup: Use cond_resched() when destroy blkgs
On !PREEMPT kernel, we can get below softlockup when doing stress
testing with creating and destroying block cgroup repeatly. The
reason is it may take a long time to acquire the queue's lock in
the loop of blkcg_destroy_blkgs(), or the system can accumulate a
huge number of blkgs in pathological cases. We can add a need_resched()
check on each loop and release locks and do cond_resched() if true
to avoid this issue, since the blkcg_destroy_blkgs() is not called
from atomic contexts.

[ 4757.010308] watchdog: BUG: soft lockup - CPU#11 stuck for 94s!
[ 4757.010698] Call trace:
[ 4757.010700]  blkcg_destroy_blkgs+0x68/0x150
[ 4757.010701]  cgwb_release_workfn+0x104/0x158
[ 4757.010702]  process_one_work+0x1bc/0x3f0
[ 4757.010704]  worker_thread+0x164/0x468
[ 4757.010705]  kthread+0x108/0x138

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-28 07:31:48 -07:00
Christoph Hellwig
c6bf3f0e25 block: use an on-stack bio in blkdev_issue_flush
There is no point in allocating memory for a synchronous flush.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-27 09:51:48 -07:00