Pull bdev size cleanups from Jens Axboe:
"Clean up the bdev size handling with new bdev_nr_bytes() helper"
* tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits)
partitions/ibm: use bdev_nr_sectors instead of open coding it
partitions/efi: use bdev_nr_bytes instead of open coding it
block/ioctl: use bdev_nr_sectors and bdev_nr_bytes
block: cache inode size in bdev
udf: use sb_bdev_nr_blocks
reiserfs: use sb_bdev_nr_blocks
ntfs: use sb_bdev_nr_blocks
jfs: use sb_bdev_nr_blocks
ext4: use sb_bdev_nr_blocks
block: add a sb_bdev_nr_blocks helper
block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
squashfs: use bdev_nr_bytes instead of open coding it
reiserfs: use bdev_nr_bytes instead of open coding it
pstore/blk: use bdev_nr_bytes instead of open coding it
ntfs3: use bdev_nr_bytes instead of open coding it
nilfs2: use bdev_nr_bytes instead of open coding it
nfs/blocklayout: use bdev_nr_bytes instead of open coding it
jfs: use bdev_nr_bytes instead of open coding it
hfsplus: use bdev_nr_sectors instead of open coding it
hfs: use bdev_nr_sectors instead of open coding it
...
Pull block updates from Jens Axboe:
- mq-deadline accounting improvements (Bart)
- blk-wbt timer fix (Andrea)
- Untangle the block layer includes (Christoph)
- Rework the poll support to be bio based, which will enable adding
support for polling for bio based drivers (Christoph)
- Block layer core support for multi-actuator drives (Damien)
- blk-crypto improvements (Eric)
- Batched tag allocation support (me)
- Request completion batching support (me)
- Plugging improvements (me)
- Shared tag set improvements (John)
- Concurrent queue quiesce support (Ming)
- Cache bdev in ->private_data for block devices (Pavel)
- bdev dio improvements (Pavel)
- Block device invalidation and block size improvements (Xie)
- Various cleanups, fixes, and improvements (Christoph, Jackie,
Masahira, Tejun, Yu, Pavel, Zheng, me)
* tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block: (174 commits)
blk-mq-debugfs: Show active requests per queue for shared tags
block: improve readability of blk_mq_end_request_batch()
virtio-blk: Use blk_validate_block_size() to validate block size
loop: Use blk_validate_block_size() to validate block size
nbd: Use blk_validate_block_size() to validate block size
block: Add a helper to validate the block size
block: re-flow blk_mq_rq_ctx_init()
block: prefetch request to be initialized
block: pass in blk_mq_tags to blk_mq_rq_ctx_init()
block: add rq_flags to struct blk_mq_alloc_data
block: add async version of bio_set_polled
block: kill DIO_MULTI_BIO
block: kill unused polling bits in __blkdev_direct_IO()
block: avoid extra iter advance with async iocb
block: Add independent access ranges support
blk-mq: don't issue request directly in case that current is to be blocked
sbitmap: silence data race warning
blk-cgroup: synchronize blkg creation against policy deactivation
block: refactor bio_iov_bvec_set()
block: add single bio async direct IO helper
...
Reading the inode size brings in a new cacheline for IO submit, and
it's in the hot path being checked for every single IO. When doing
millions of IOs per core per second, this is noticeable overhead.
Cache the nr_sectors in the bdev itself.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When adding partitions to the disk, the reference count of the disk
object is increased. then alloc partition device and called
device_add(), if the device_add() return error, the reference
count of the disk object will be reduced twice, at put_device(pdev)
and put_disk(disk). this leads to the end of the object's life cycle
prematurely, and trigger following calltrace.
__init_work+0x2d/0x50 kernel/workqueue.c:519
synchronize_rcu_expedited+0x3af/0x650 kernel/rcu/tree_exp.h:847
bdi_remove_from_list mm/backing-dev.c:938 [inline]
bdi_unregister+0x17f/0x5c0 mm/backing-dev.c:946
release_bdi+0xa1/0xc0 mm/backing-dev.c:968
kref_put include/linux/kref.h:65 [inline]
bdi_put+0x72/0xa0 mm/backing-dev.c:976
bdev_free_inode+0x11e/0x220 block/bdev.c:408
i_callback+0x3f/0x70 fs/inode.c:226
rcu_do_batch kernel/rcu/tree.c:2508 [inline]
rcu_core+0x76d/0x16c0 kernel/rcu/tree.c:2743
__do_softirq+0x1d7/0x93b kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0xf2/0x130 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x93/0xc0
making disk is NULL when calling put_disk().
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211018103422.2043-1-qiang.zhang1211@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Partition scanning only happens on the whole device, so pass a
struct gendisk instead of the whole device block_device to the scanners.
This allows to simplify printing the device name in various places as the
disk name is available in disk->name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810154512.1809898-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block fixes from Jens Axboe:
"A few minor fixes:
- Fix ldm kernel-doc warning (Bart)
- Fix adding offset twice for DMA address in n64cart (Christoph)
- Fix use-after-free in dasd path handling (Stefan)
- Order kyber insert trace correctly (Vincent)
- raid1 errored write handling fix (Wei)
- Fix blk-iolatency queue get failure handling (Yu)"
* tag 'block-5.14-2021-08-07' of git://git.kernel.dk/linux-block:
kyber: make trace_block_rq call consistent with documentation
block/partitions/ldm.c: Fix a kernel-doc warning
blk-iolatency: error out if blk_get_queue() failed in iolatency_set_limit()
n64cart: fix the dma address in n64cart_do_bvec
s390/dasd: fix use after free in dasd path handling
md/raid10: properly indicate failure when ending a failed write request
Instead of acquiring an inode reference on open make sure partitions
always hold device model references to the disk while alive, and switch
open to grab only a device model reference to the opened block device.
If that is a partition the disk reference is transitively held by the
partition already.
Link: https://lore.kernel.org/r/20210722075402.983367-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>