Commit Graph

731 Commits

Author SHA1 Message Date
Linus Torvalds
cb690f5238 Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block
Pull more block driver updates from Jens Axboe:

 - Last series adding error handling support for add_disk() in drivers.
   After this one, and once the SCSI side has been merged, we can
   finally annotate add_disk() as must_check. (Luis)

 - bcache fixes (Coly)

 - zram fixes (Ming)

 - ataflop locking fix (Tetsuo)

 - nbd fixes (Ye, Yu)

 - MD merge via Song
      - Cleanup (Yang)
      - sysfs fix (Guoqing)

 - Misc fixes (Geert, Wu, luo)

* tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits)
  bcache: Revert "bcache: use bvec_virt"
  ataflop: Add missing semicolon to return statement
  floppy: address add_disk() error handling on probe
  ataflop: address add_disk() error handling on probe
  block: update __register_blkdev() probe documentation
  ataflop: remove ataflop_probe_lock mutex
  mtd/ubi/block: add error handling support for add_disk()
  block/sunvdc: add error handling support for add_disk()
  z2ram: add error handling support for add_disk()
  nvdimm/pmem: use add_disk() error handling
  nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned
  nvdimm/blk: add error handling support for add_disk()
  nvdimm/blk: avoid calling del_gendisk() on early failures
  nvdimm/btt: add error handling support for add_disk()
  nvdimm/btt: use goto error labels on btt_blk_init()
  loop: Remove duplicate assignments
  drbd: Fix double free problem in drbd_create_device
  nvdimm/btt: do not call del_gendisk() if not needed
  bcache: fix use-after-free problem in bcache_device_free()
  zram: replace fsync_bdev with sync_blockdev
  ...
2021-11-09 11:24:08 -08:00
Coly Li
2878feaed5 bcache: Revert "bcache: use bvec_virt"
This reverts commit 2fd3e5efe7.

The above commit replaces page_address(bv->bv_page) by bvec_virt(bv) to
avoid directly access to bv->bv_page, but in situation bv->bv_offset is
not zero and page_address(bv->bv_page) is not equal to bvec_virt(bv). In
such case a memory corruption may happen because memory in next page is
tainted by following line in do_btree_node_write(),
	memcpy(bvec_virt(bv), addr, PAGE_SIZE);

This patch reverts the mentioned commit to avoid the memory corruption.

Fixes: 2fd3e5efe7 ("bcache: use bvec_virt")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211103151041.70516-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-08 06:23:17 -07:00
Coly Li
8468f45091 bcache: fix use-after-free problem in bcache_device_free()
In bcache_device_free(), pointer disk is referenced still in
ida_simple_remove() after blk_cleanup_disk() gets called on this
pointer. This may cause a potential panic by use-after-free on the
disk pointer.

This patch fixes the problem by calling blk_cleanup_disk() after
ida_simple_remove().

Fixes: bc70852fd1 ("bcache: convert to blk_alloc_disk/blk_cleanup_disk")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: stable@vger.kernel.org # v5.14+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211103064917.67383-1-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-03 06:00:29 -06:00
Linus Torvalds
3f01727f75 Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block
Pull bdev size cleanups from Jens Axboe:
 "Clean up the bdev size handling with new bdev_nr_bytes() helper"

* tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits)
  partitions/ibm: use bdev_nr_sectors instead of open coding it
  partitions/efi: use bdev_nr_bytes instead of open coding it
  block/ioctl: use bdev_nr_sectors and bdev_nr_bytes
  block: cache inode size in bdev
  udf: use sb_bdev_nr_blocks
  reiserfs: use sb_bdev_nr_blocks
  ntfs: use sb_bdev_nr_blocks
  jfs: use sb_bdev_nr_blocks
  ext4: use sb_bdev_nr_blocks
  block: add a sb_bdev_nr_blocks helper
  block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
  squashfs: use bdev_nr_bytes instead of open coding it
  reiserfs: use bdev_nr_bytes instead of open coding it
  pstore/blk: use bdev_nr_bytes instead of open coding it
  ntfs3: use bdev_nr_bytes instead of open coding it
  nilfs2: use bdev_nr_bytes instead of open coding it
  nfs/blocklayout: use bdev_nr_bytes instead of open coding it
  jfs: use bdev_nr_bytes instead of open coding it
  hfsplus: use bdev_nr_sectors instead of open coding it
  hfs: use bdev_nr_sectors instead of open coding it
  ...
2021-11-01 09:50:37 -07:00
Qing Wang
1b86db5f4e bcache: replace snprintf in show functions with sysfs_emit
coccicheck complains about the use of snprintf() in sysfs show functions.

Fix the following coccicheck warning:
drivers/md/bcache/sysfs.h:54:12-20: WARNING: use scnprintf or sprintf.

Implement sysfs_print() by sysfs_emit() and remove snprint() since no one
uses it any more.

Suggested-by: Coly Li <colyli@suse.de>
Signed-off-by: Qing Wang <wangqing@vivo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211029060930.119923-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29 06:43:21 -06:00
Coly Li
cf2197ca4b bcache: move uapi header bcache.h to bcache code directory
The header file include/uapi/linux/bcache.h is not really a user space
API heaer. This file defines the ondisk format of bcache internal meta
data but no one includes it from user space, bcache-tools has its own
copy of this header with minor modification.

Therefore, this patch moves include/uapi/linux/bcache.h to bcache code
directory as drivers/md/bcache/bcache_ondisk.h.

Suggested-by: Arnd Bergmann <arnd@kernel.org>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29 06:43:21 -06:00
Luis Chamberlain
2961c3bbca bcache: add error handling support for add_disk()
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

This driver doesn't do any unwinding with blk_cleanup_disk()
even on errors after add_disk() and so we follow that
tradition.

Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-5-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-21 09:00:56 -06:00
Christoph Hellwig
39fa7a9555 bcache: remove bch_crc64_update
bch_crc64_update is an entirely pointless wrapper around crc64_be.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-9-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Christoph Hellwig
00387bd21d bcache: use bvec_kmap_local in bch_data_verify
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Also switch from page_address to bvec_kmap_local for cbv to be on the
safe side and to avoid pointlessly poking into bvec internals.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-8-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Christoph Hellwig
0f5cd7815f bcache: remove the backing_dev_name field from struct cached_dev
Just use the %pg format specifier to print the name directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Christoph Hellwig
7e84c21507 bcache: remove the cache_dev_name field from struct cache
Just use the %pg format specifier to print the name directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Lin Feng
0259d4498b bcache: move calc_cached_dev_sectors to proper place on backing device detach
Calculation of cache_set's cached sectors is done by travelling
cached_devs list as shown below:

static void calc_cached_dev_sectors(struct cache_set *c)
{
...
        list_for_each_entry(dc, &c->cached_devs, list)
                sectors += bdev_sectors(dc->bdev);

        c->cached_dev_sectors = sectors;
}

But cached_dev won't be unlinked from c->cached_devs list until we call
following list_move(&dc->list, &uncached_devices),
so previous fix in 'commit 46010141da
("bcache: recal cached_dev_sectors on detach")' is wrong, now we move
it to its right place.

Signed-off-by: Lin Feng <linf@wangsu.com>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Chao Yu
d55f7cb2e5 bcache: fix error info in register_bcache()
In register_bcache(), there are several cases we didn't set
correct error info (return value and/or error message):
- if kzalloc() fails, it needs to return ENOMEM and print
"cannot allocate memory";
- if register_cache() fails, it's better to propagate its
return value rather than using default EINVAL.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-4-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Ding Senjie
a307e2abfc md: bcache: Fix spelling of 'acquire'
acqurie -> acquire

Signed-off-by: Ding Senjie <dingsenjie@yulong.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211020143812.6403-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-20 08:40:54 -06:00
Christoph Hellwig
cda25b82c4 bcache: remove bdev_sectors
Use the equivalent block layer helper instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20211018101130.1838532-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 14:43:22 -06:00
Christoph Hellwig
3e08773c38 block: switch polling to be bio based
Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18 06:17:36 -06:00
Christoph Hellwig
2fd3e5efe7 bcache: use bvec_virt
Use bvec_virt instead of open coding it.  Note that the existing code is
fine despite ignoring bv_offset as the bio is known to contain exactly
one page from the page allocator per bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210804095634.460779-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16 10:50:33 -06:00
Christoph Hellwig
b75f4aed88 bcache: move the del_gendisk call out of bcache_device_free
Let the callers call del_gendisk so that we can check if add_disk
has been called properly for the cached device case instead of relying
on the block layer internal GENHD_FL_UP flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12 10:29:36 -06:00
Christoph Hellwig
224b068322 bcache: add proper error unwinding in bcache_device_init
Except for the IDA none of the allocations in bcache_device_init is
unwound on error, fix that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20210809064028.1198327-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-12 10:29:36 -06:00
Guoqing Jiang
018eca456c block: move some macros to blkdev.h
Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the
generic header file to remove redundancy.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-11 19:40:28 -06:00
Christoph Hellwig
c66fd01971 block: make the block holder code optional
Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.

The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa41
("block: restore multiple bd_link_disk_holder() support").

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-09 11:50:42 -06:00
Linus Torvalds
df668a5fe4 Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:

 - disk events cleanup (Christoph)

 - gendisk and request queue allocation simplifications (Christoph)

 - bdev_disk_changed cleanups (Christoph)

 - IO priority improvements (Bart)

 - Chained bio completion trace fix (Edward)

 - blk-wbt fixes (Jan)

 - blk-wbt enable/disable fix (Zhang)

 - Scheduler dispatch improvements (Jan, Ming)

 - Shared tagset scheduler improvements (John)

 - BFQ updates (Paolo, Luca, Pietro)

 - BFQ lock inversion fix (Jan)

 - Documentation improvements (Kir)

 - CLONE_IO block cgroup fix (Tejun)

 - Remove of ancient and deprecated block dump feature (zhangyi)

 - Discard merge fix (Ming)

 - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas,
   Yang)

* tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits)
  block: fix discard request merge
  block/mq-deadline: Remove a WARN_ON_ONCE() call
  blk-mq: update hctx->dispatch_busy in case of real scheduler
  blk: Fix lock inversion between ioc lock and bfqd lock
  bfq: Remove merged request already in bfq_requests_merged()
  block: pass a gendisk to bdev_disk_changed
  block: move bdev_disk_changed
  block: add the events* attributes to disk_attrs
  block: move the disk events code to a separate file
  block: fix trace completion for chained bio
  block/partitions/msdos: Fix typo inidicator -> indicator
  block, bfq: reset waker pointer with shared queues
  block, bfq: check waker only for queues with no in-flight I/O
  block, bfq: avoid delayed merge of async queues
  block, bfq: boost throughput by extending queue-merging times
  block, bfq: consider also creation time in delayed stable merge
  block, bfq: fix delayed stable merge check
  block, bfq: let also stably merged queues enjoy weight raising
  blk-wbt: make sure throttle is enabled properly
  blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled()
  ...
2021-06-30 12:12:56 -07:00
Linus Torvalds
efc1fd601a Merge tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A few fixes that should go into 5.13:

   - Fix a regression deadlock introduced in this release between open
     and remove of a bdev (Christoph)

   - Fix an async_xor md regression in this release (Xiao)

   - Fix bcache oversized read issue (Coly)"

* tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block:
  block: loop: fix deadlock between open and remove
  async_xor: check src_offs is not NULL before updating it
  bcache: avoid oversized read request in cache missing code path
  bcache: remove bcache device self-defined readahead
2021-06-12 11:59:58 -07:00
Coly Li
41fe8d088e bcache: avoid oversized read request in cache missing code path
In the cache missing code path of cached device, if a proper location
from the internal B+ tree is matched for a cache miss range, function
cached_dev_cache_miss() will be called in cache_lookup_fn() in the
following code block,
[code block 1]
  526         unsigned int sectors = KEY_INODE(k) == s->iop.inode
  527                 ? min_t(uint64_t, INT_MAX,
  528                         KEY_START(k) - bio->bi_iter.bi_sector)
  529                 : INT_MAX;
  530         int ret = s->d->cache_miss(b, s, bio, sectors);

Here s->d->cache_miss() is the call backfunction pointer initialized as
cached_dev_cache_miss(), the last parameter 'sectors' is an important
hint to calculate the size of read request to backing device of the
missing cache data.

Current calculation in above code block may generate oversized value of
'sectors', which consequently may trigger 2 different potential kernel
panics by BUG() or BUG_ON() as listed below,

1) BUG_ON() inside bch_btree_insert_key(),
[code block 2]
   886         BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
2) BUG() inside biovec_slab(),
[code block 3]
   51         default:
   52                 BUG();
   53                 return NULL;

All the above panics are original from cached_dev_cache_miss() by the
oversized parameter 'sectors'.

Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
the size of data read from backing device for the cache missing. This
size is stored in s->insert_bio_sectors by the following lines of code,
[code block 4]
  909    s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);

Then the actual key inserting to the internal B+ tree is generated and
stored in s->iop.replace_key by the following lines of code,
[code block 5]
  911   s->iop.replace_key = KEY(s->iop.inode,
  912                    bio->bi_iter.bi_sector + s->insert_bio_sectors,
  913                    s->insert_bio_sectors);
The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
the above code block.

And the bio sending to backing device for the missing data is allocated
with hint from s->insert_bio_sectors by the following lines of code,
[code block 6]
  926    cache_bio = bio_alloc_bioset(GFP_NOWAIT,
  927                 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
  928                 &dc->disk.bio_split);
The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
agove code block.

Now let me explain how the panics happen with the oversized 'sectors'.
In code block 5, replace_key is generated by macro KEY(). From the
definition of macro KEY(),
[code block 7]
  71 #define KEY(inode, offset, size)                                  \
  72 ((struct bkey) {                                                  \
  73      .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode),     \
  74      .low = (offset)                                              \
  75 })

Here 'size' is 16bits width embedded in 64bits member 'high' of struct
bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
very probably to be larger than (1<<16) - 1, which makes the bkey size
calculation in code block 5 is overflowed. In one bug report the value
of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
results the overflowed s->insert_bio_sectors in code block 4, then makes
size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
sized s->iop.replace_key is inserted into the internal B+ tree as cache
missing check key (a special key to detect and avoid a racing between
normal write request and cache missing read request) as,
[code block 8]
  915   ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);

Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
size check BUG_ON() in code block 2, and causes the kernel panic 1).

Another kernel panic is from code block 6, is by the bvecs number
oversized value s->insert_bio_sectors from code block 4,
        min(sectors, bio_sectors(bio) + reada)
There are two possibility for oversized reresult,
- bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
- sectors < bio_sectors(bio) + reada, but sectors is oversized.

From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
this value will eventually be sent to biovec_slab() as parameter
'nr_vecs' in following code path,
   bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
in code block 3 is triggered inside biovec_slab().

From the above analysis, we know that the 4th parameter 'sector' sent
into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
and finally cause kernel panic in code block 2 and 3. And if result of
bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
kernel panic in code block 3 from code block 6.

Now the almost-useless readahead size for cache missing request back to
backing device is removed, this patch can fix the oversized issue with
more simpler method.
- add a local variable size_limit,  set it by the minimum value from
  the max bkey size and max bio bvecs number.
- set s->insert_bio_sectors by the minimum value from size_limit,
  sectors, and the sectors size of bio.
- replace sectors by s->insert_bio_sectors to do bio_next_split.

By the above method with size_limit, s->insert_bio_sectors will never
result oversized replace_key size or bio bvecs number. And split bio
'miss' from bio_next_split() will always match the size of 'cache_bio',
that is the current maximum bio size we can sent to backing device for
fetching the cache missing data.

Current problmatic code can be partially found since Linux v3.13-rc1,
therefore all maintained stable kernels should try to apply this fix.

Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-3-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-08 15:06:03 -06:00
Coly Li
1616a4c2ab bcache: remove bcache device self-defined readahead
For read cache missing, bcache defines a readahead size for the read I/O
request to the backing device for the missing data. This readahead size
is initialized to 0, and almost no one uses it to avoid unnecessary read
amplifying onto backing device and write amplifying onto cache device.
Considering upper layer file system code has readahead logic allready
and works fine with readahead_cache_policy sysfile interface, we don't
have to keep bcache self-defined readahead anymore.

This patch removes the bcache self-defined readahead for cache missing
request for backing device, and the readahead sysfs file interfaces are
removed as well.

This is the preparation for next patch to fix potential kernel panic due
to oversized request in a simpler method.

Reported-by: Alexander Ullrich <ealex1979@gmail.com>
Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
Reported-by: Marco Rebhan <me@dblsaiko.net>
Reported-by: Matthias Ferdinand <bcache@mfedv.net>
Reported-by: Victor Westerhuis <victor@westerhu.is>
Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Reported-and-tested-by: Rolf Fokkens <rolf@rolffokkens.nl>
Reported-and-tested-by: Thorsten Knabe <linux@thorsten-knabe.de>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Nix <nix@esperi.org.uk>
Cc: Takashi Iwai <tiwai@suse.com>
Link: https://lore.kernel.org/r/20210607125052.21277-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-08 15:06:03 -06:00