You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..
This commit is contained in:
@@ -213,14 +213,8 @@ What: /sys/block/<disk>/queue/discard_zeroes_data
|
||||
Date: May 2011
|
||||
Contact: Martin K. Petersen <martin.petersen@oracle.com>
|
||||
Description:
|
||||
Devices that support discard functionality may return
|
||||
stale or random data when a previously discarded block
|
||||
is read back. This can cause problems if the filesystem
|
||||
expects discarded blocks to be explicitly cleared. If a
|
||||
device reports that it deterministically returns zeroes
|
||||
when a discarded area is read the discard_zeroes_data
|
||||
parameter will be set to one. Otherwise it will be 0 and
|
||||
the result of reading a discarded area is undefined.
|
||||
Will always return 0. Don't rely on any specific behavior
|
||||
for discards, and don't read this file.
|
||||
|
||||
What: /sys/block/<disk>/queue/write_same_max_bytes
|
||||
Date: January 2012
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
00-INDEX
|
||||
- This file
|
||||
bfq-iosched.txt
|
||||
- BFQ IO scheduler and its tunables
|
||||
biodoc.txt
|
||||
- Notes on the Generic Block Layer Rewrite in Linux 2.5
|
||||
biovecs.txt
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,14 @@
|
||||
Kyber I/O scheduler tunables
|
||||
===========================
|
||||
|
||||
The only two tunables for the Kyber scheduler are the target latencies for
|
||||
reads and synchronous writes. Kyber will throttle requests in order to meet
|
||||
these target latencies.
|
||||
|
||||
read_lat_nsec
|
||||
-------------
|
||||
Target latency for reads (in nanoseconds).
|
||||
|
||||
write_lat_nsec
|
||||
--------------
|
||||
Target latency for synchronous writes (in nanoseconds).
|
||||
@@ -43,11 +43,6 @@ large discards are issued, setting this value lower will make Linux issue
|
||||
smaller discards and potentially help reduce latencies induced by large
|
||||
discard operations.
|
||||
|
||||
discard_zeroes_data (RO)
|
||||
------------------------
|
||||
When read, this file will show if the discarded block are zeroed by the
|
||||
device or not. If its value is '1' the blocks are zeroed otherwise not.
|
||||
|
||||
hw_sector_size (RO)
|
||||
-------------------
|
||||
This is the hardware sector size of the device, in bytes.
|
||||
@@ -192,5 +187,11 @@ scaling back writes. Writing a value of '0' to this file disables the
|
||||
feature. Writing a value of '-1' to this file resets the value to the
|
||||
default setting.
|
||||
|
||||
throttle_sample_time (RW)
|
||||
-------------------------
|
||||
This is the time window that blk-throttle samples data, in millisecond.
|
||||
blk-throttle makes decision based on the samplings. Lower time means cgroups
|
||||
have more smooth throughput, but higher CPU overhead. This exists only when
|
||||
CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
|
||||
|
||||
Jens Axboe <jens.axboe@oracle.com>, February 2009
|
||||
|
||||
@@ -1,84 +0,0 @@
|
||||
This document describes m[g]flash support in linux.
|
||||
|
||||
Contents
|
||||
1. Overview
|
||||
2. Reserved area configuration
|
||||
3. Example of mflash platform driver registration
|
||||
|
||||
1. Overview
|
||||
|
||||
Mflash and gflash are embedded flash drive. The only difference is mflash is
|
||||
MCP(Multi Chip Package) device. These two device operate exactly same way.
|
||||
So the rest mflash repersents mflash and gflash altogether.
|
||||
|
||||
Internally, mflash has nand flash and other hardware logics and supports
|
||||
2 different operation (ATA, IO) modes. ATA mode doesn't need any new
|
||||
driver and currently works well under standard IDE subsystem. Actually it's
|
||||
one chip SSD. IO mode is ATA-like custom mode for the host that doesn't have
|
||||
IDE interface.
|
||||
|
||||
Following are brief descriptions about IO mode.
|
||||
A. IO mode based on ATA protocol and uses some custom command. (read confirm,
|
||||
write confirm)
|
||||
B. IO mode uses SRAM bus interface.
|
||||
C. IO mode supports 4kB boot area, so host can boot from mflash.
|
||||
|
||||
2. Reserved area configuration
|
||||
If host boot from mflash, usually needs raw area for boot loader image. All of
|
||||
the mflash's block device operation will be taken this value as start offset.
|
||||
Note that boot loader's size of reserved area and kernel configuration value
|
||||
must be same.
|
||||
|
||||
3. Example of mflash platform driver registration
|
||||
Working mflash is very straight forward. Adding platform device stuff to board
|
||||
configuration file is all. Here is some pseudo example.
|
||||
|
||||
static struct mg_drv_data mflash_drv_data = {
|
||||
/* If you want to polling driver set to 1 */
|
||||
.use_polling = 0,
|
||||
/* device attribution */
|
||||
.dev_attr = MG_BOOT_DEV
|
||||
};
|
||||
|
||||
static struct resource mg_mflash_rsc[] = {
|
||||
/* Base address of mflash */
|
||||
[0] = {
|
||||
.start = 0x08000000,
|
||||
.end = 0x08000000 + SZ_64K - 1,
|
||||
.flags = IORESOURCE_MEM
|
||||
},
|
||||
/* mflash interrupt pin */
|
||||
[1] = {
|
||||
.start = IRQ_GPIO(84),
|
||||
.end = IRQ_GPIO(84),
|
||||
.flags = IORESOURCE_IRQ
|
||||
},
|
||||
/* mflash reset pin */
|
||||
[2] = {
|
||||
.start = 43,
|
||||
.end = 43,
|
||||
.name = MG_RST_PIN,
|
||||
.flags = IORESOURCE_IO
|
||||
},
|
||||
/* mflash reset-out pin
|
||||
* If you use mflash as storage device (i.e. other than MG_BOOT_DEV),
|
||||
* should assign this */
|
||||
[3] = {
|
||||
.start = 51,
|
||||
.end = 51,
|
||||
.name = MG_RSTOUT_PIN,
|
||||
.flags = IORESOURCE_IO
|
||||
}
|
||||
};
|
||||
|
||||
static struct platform_device mflash_dev = {
|
||||
.name = MG_DEV_NAME,
|
||||
.id = -1,
|
||||
.dev = {
|
||||
.platform_data = &mflash_drv_data,
|
||||
},
|
||||
.num_resources = ARRAY_SIZE(mg_mflash_rsc),
|
||||
.resource = mg_mflash_rsc
|
||||
};
|
||||
|
||||
platform_device_register(&mflash_dev);
|
||||
@@ -0,0 +1,21 @@
|
||||
pblk: Physical Block Device Target
|
||||
==================================
|
||||
|
||||
pblk implements a fully associative, host-based FTL that exposes a traditional
|
||||
block I/O interface. Its primary responsibilities are:
|
||||
|
||||
- Map logical addresses onto physical addresses (4KB granularity) in a
|
||||
logical-to-physical (L2P) table.
|
||||
- Maintain the integrity and consistency of the L2P table as well as its
|
||||
recovery from normal tear down and power outage.
|
||||
- Deal with controller- and media-specific constrains.
|
||||
- Handle I/O errors.
|
||||
- Implement garbage collection.
|
||||
- Maintain consistency across the I/O stack during synchronization points.
|
||||
|
||||
For more information please refer to:
|
||||
|
||||
http://lightnvm.io
|
||||
|
||||
which maintains updated FAQs, manual pages, technical documentation, tools,
|
||||
contacts, etc.
|
||||
@@ -2544,6 +2544,14 @@ F: block/
|
||||
F: kernel/trace/blktrace.c
|
||||
F: lib/sbitmap.c
|
||||
|
||||
BFQ I/O SCHEDULER
|
||||
M: Paolo Valente <paolo.valente@linaro.org>
|
||||
M: Jens Axboe <axboe@kernel.dk>
|
||||
L: linux-block@vger.kernel.org
|
||||
S: Maintained
|
||||
F: block/bfq-*
|
||||
F: Documentation/block/bfq-iosched.txt
|
||||
|
||||
BLOCK2MTD DRIVER
|
||||
M: Joern Engel <joern@lazybastard.org>
|
||||
L: linux-mtd@lists.infradead.org
|
||||
|
||||
@@ -115,6 +115,18 @@ config BLK_DEV_THROTTLING
|
||||
|
||||
See Documentation/cgroups/blkio-controller.txt for more information.
|
||||
|
||||
config BLK_DEV_THROTTLING_LOW
|
||||
bool "Block throttling .low limit interface support (EXPERIMENTAL)"
|
||||
depends on BLK_DEV_THROTTLING
|
||||
default n
|
||||
---help---
|
||||
Add .low limit interface for block throttling. The low limit is a best
|
||||
effort limit to prioritize cgroups. Depending on the setting, the limit
|
||||
can be used to protect cgroups in terms of bandwidth/iops and better
|
||||
utilize disk resource.
|
||||
|
||||
Note, this is an experimental interface and could be changed someday.
|
||||
|
||||
config BLK_CMDLINE_PARSER
|
||||
bool "Block device command line partition parser"
|
||||
default n
|
||||
|
||||
@@ -40,6 +40,7 @@ config CFQ_GROUP_IOSCHED
|
||||
Enable group IO scheduling in CFQ.
|
||||
|
||||
choice
|
||||
|
||||
prompt "Default I/O scheduler"
|
||||
default DEFAULT_CFQ
|
||||
help
|
||||
@@ -69,6 +70,35 @@ config MQ_IOSCHED_DEADLINE
|
||||
---help---
|
||||
MQ version of the deadline IO scheduler.
|
||||
|
||||
config MQ_IOSCHED_KYBER
|
||||
tristate "Kyber I/O scheduler"
|
||||
default y
|
||||
---help---
|
||||
The Kyber I/O scheduler is a low-overhead scheduler suitable for
|
||||
multiqueue and other fast devices. Given target latencies for reads and
|
||||
synchronous writes, it will self-tune queue depths to achieve that
|
||||
goal.
|
||||
|
||||
config IOSCHED_BFQ
|
||||
tristate "BFQ I/O scheduler"
|
||||
default n
|
||||
---help---
|
||||
BFQ I/O scheduler for BLK-MQ. BFQ distributes the bandwidth of
|
||||
of the device among all processes according to their weights,
|
||||
regardless of the device parameters and with any workload. It
|
||||
also guarantees a low latency to interactive and soft
|
||||
real-time applications. Details in
|
||||
Documentation/block/bfq-iosched.txt
|
||||
|
||||
config BFQ_GROUP_IOSCHED
|
||||
bool "BFQ hierarchical scheduling support"
|
||||
depends on IOSCHED_BFQ && BLK_CGROUP
|
||||
default n
|
||||
---help---
|
||||
|
||||
Enable hierarchical scheduling in BFQ, using the blkio
|
||||
(cgroups-v1) or io (cgroups-v2) controller.
|
||||
|
||||
endmenu
|
||||
|
||||
endif
|
||||
|
||||
@@ -20,6 +20,9 @@ obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
|
||||
obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
|
||||
obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
|
||||
obj-$(CONFIG_MQ_IOSCHED_DEADLINE) += mq-deadline.o
|
||||
obj-$(CONFIG_MQ_IOSCHED_KYBER) += kyber-iosched.o
|
||||
bfq-y := bfq-iosched.o bfq-wf2q.o bfq-cgroup.o
|
||||
obj-$(CONFIG_IOSCHED_BFQ) += bfq.o
|
||||
|
||||
obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
|
||||
obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
|
||||
|
||||
+1139
File diff suppressed because it is too large
Load Diff
+5047
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
+1616
File diff suppressed because it is too large
Load Diff
+18
-1
@@ -30,6 +30,7 @@
|
||||
#include <linux/cgroup.h>
|
||||
|
||||
#include <trace/events/block.h>
|
||||
#include "blk.h"
|
||||
|
||||
/*
|
||||
* Test patch to inline a certain number of bi_io_vec's inside the bio
|
||||
@@ -427,7 +428,8 @@ static void punt_bios_to_rescuer(struct bio_set *bs)
|
||||
* RETURNS:
|
||||
* Pointer to new bio on success, NULL on failure.
|
||||
*/
|
||||
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
|
||||
struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
|
||||
struct bio_set *bs)
|
||||
{
|
||||
gfp_t saved_gfp = gfp_mask;
|
||||
unsigned front_pad;
|
||||
@@ -1824,6 +1826,11 @@ static inline bool bio_remaining_done(struct bio *bio)
|
||||
* bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
|
||||
* way to end I/O on a bio. No one should call bi_end_io() directly on a
|
||||
* bio unless they own it and thus know that it has an end_io function.
|
||||
*
|
||||
* bio_endio() can be called several times on a bio that has been chained
|
||||
* using bio_chain(). The ->bi_end_io() function will only be called the
|
||||
* last time. At this point the BLK_TA_COMPLETE tracing event will be
|
||||
* generated if BIO_TRACE_COMPLETION is set.
|
||||
**/
|
||||
void bio_endio(struct bio *bio)
|
||||
{
|
||||
@@ -1844,6 +1851,13 @@ again:
|
||||
goto again;
|
||||
}
|
||||
|
||||
if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
|
||||
trace_block_bio_complete(bdev_get_queue(bio->bi_bdev),
|
||||
bio, bio->bi_error);
|
||||
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
|
||||
}
|
||||
|
||||
blk_throtl_bio_endio(bio);
|
||||
if (bio->bi_end_io)
|
||||
bio->bi_end_io(bio);
|
||||
}
|
||||
@@ -1882,6 +1896,9 @@ struct bio *bio_split(struct bio *bio, int sectors,
|
||||
|
||||
bio_advance(bio, split->bi_iter.bi_size);
|
||||
|
||||
if (bio_flagged(bio, BIO_TRACE_COMPLETION))
|
||||
bio_set_flag(bio, BIO_TRACE_COMPLETION);
|
||||
|
||||
return split;
|
||||
}
|
||||
EXPORT_SYMBOL(bio_split);
|
||||
|
||||
+99
-26
@@ -772,6 +772,27 @@ struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkcg_gq *blkg,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum);
|
||||
|
||||
/* Performs queue bypass and policy enabled checks then looks up blkg. */
|
||||
static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg,
|
||||
const struct blkcg_policy *pol,
|
||||
struct request_queue *q)
|
||||
{
|
||||
WARN_ON_ONCE(!rcu_read_lock_held());
|
||||
lockdep_assert_held(q->queue_lock);
|
||||
|
||||
if (!blkcg_policy_enabled(q, pol))
|
||||
return ERR_PTR(-EOPNOTSUPP);
|
||||
|
||||
/*
|
||||
* This could be the first entry point of blkcg implementation and
|
||||
* we shouldn't allow anything to go through for a bypassing queue.
|
||||
*/
|
||||
if (unlikely(blk_queue_bypass(q)))
|
||||
return ERR_PTR(blk_queue_dying(q) ? -ENODEV : -EBUSY);
|
||||
|
||||
return __blkg_lookup(blkcg, q, true /* update_hint */);
|
||||
}
|
||||
|
||||
/**
|
||||
* blkg_conf_prep - parse and prepare for per-blkg config update
|
||||
* @blkcg: target block cgroup
|
||||
@@ -789,6 +810,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
__acquires(rcu) __acquires(disk->queue->queue_lock)
|
||||
{
|
||||
struct gendisk *disk;
|
||||
struct request_queue *q;
|
||||
struct blkcg_gq *blkg;
|
||||
struct module *owner;
|
||||
unsigned int major, minor;
|
||||
@@ -807,44 +829,95 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
|
||||
if (!disk)
|
||||
return -ENODEV;
|
||||
if (part) {
|
||||
owner = disk->fops->owner;
|
||||
put_disk(disk);
|
||||
module_put(owner);
|
||||
return -ENODEV;
|
||||
ret = -ENODEV;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
q = disk->queue;
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(disk->queue->queue_lock);
|
||||
|
||||
if (blkcg_policy_enabled(disk->queue, pol))
|
||||
blkg = blkg_lookup_create(blkcg, disk->queue);
|
||||
else
|
||||
blkg = ERR_PTR(-EOPNOTSUPP);
|
||||
spin_lock_irq(q->queue_lock);
|
||||
|
||||
blkg = blkg_lookup_check(blkcg, pol, q);
|
||||
if (IS_ERR(blkg)) {
|
||||
ret = PTR_ERR(blkg);
|
||||
rcu_read_unlock();
|
||||
spin_unlock_irq(disk->queue->queue_lock);
|
||||
owner = disk->fops->owner;
|
||||
put_disk(disk);
|
||||
module_put(owner);
|
||||
/*
|
||||
* If queue was bypassing, we should retry. Do so after a
|
||||
* short msleep(). It isn't strictly necessary but queue
|
||||
* can be bypassing for some time and it's always nice to
|
||||
* avoid busy looping.
|
||||
*/
|
||||
if (ret == -EBUSY) {
|
||||
msleep(10);
|
||||
ret = restart_syscall();
|
||||
}
|
||||
return ret;
|
||||
goto fail_unlock;
|
||||
}
|
||||
|
||||
if (blkg)
|
||||
goto success;
|
||||
|
||||
/*
|
||||
* Create blkgs walking down from blkcg_root to @blkcg, so that all
|
||||
* non-root blkgs have access to their parents.
|
||||
*/
|
||||
while (true) {
|
||||
struct blkcg *pos = blkcg;
|
||||
struct blkcg *parent;
|
||||
struct blkcg_gq *new_blkg;
|
||||
|
||||
parent = blkcg_parent(blkcg);
|
||||
while (parent && !__blkg_lookup(parent, q, false)) {
|
||||
pos = parent;
|
||||
parent = blkcg_parent(parent);
|
||||
}
|
||||
|
||||
/* Drop locks to do new blkg allocation with GFP_KERNEL. */
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
|
||||
new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
|
||||
if (unlikely(!new_blkg)) {
|
||||
ret = -ENOMEM;
|
||||
goto fail;
|
||||
}
|
||||
|
||||
rcu_read_lock();
|
||||
spin_lock_irq(q->queue_lock);
|
||||
|
||||
blkg = blkg_lookup_check(pos, pol, q);
|
||||
if (IS_ERR(blkg)) {
|
||||
ret = PTR_ERR(blkg);
|
||||
goto fail_unlock;
|
||||
}
|
||||
|
||||
if (blkg) {
|
||||
blkg_free(new_blkg);
|
||||
} else {
|
||||
blkg = blkg_create(pos, q, new_blkg);
|
||||
if (unlikely(IS_ERR(blkg))) {
|
||||
ret = PTR_ERR(blkg);
|
||||
goto fail_unlock;
|
||||
}
|
||||
}
|
||||
|
||||
if (pos == blkcg)
|
||||
goto success;
|
||||
}
|
||||
success:
|
||||
ctx->disk = disk;
|
||||
ctx->blkg = blkg;
|
||||
ctx->body = body;
|
||||
return 0;
|
||||
|
||||
fail_unlock:
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
rcu_read_unlock();
|
||||
fail:
|
||||
owner = disk->fops->owner;
|
||||
put_disk(disk);
|
||||
module_put(owner);
|
||||
/*
|
||||
* If queue was bypassing, we should retry. Do so after a
|
||||
* short msleep(). It isn't strictly necessary but queue
|
||||
* can be bypassing for some time and it's always nice to
|
||||
* avoid busy looping.
|
||||
*/
|
||||
if (ret == -EBUSY) {
|
||||
msleep(10);
|
||||
ret = restart_syscall();
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blkg_conf_prep);
|
||||
|
||||
|
||||
+59
-84
@@ -268,10 +268,8 @@ void blk_sync_queue(struct request_queue *q)
|
||||
struct blk_mq_hw_ctx *hctx;
|
||||
int i;
|
||||
|
||||
queue_for_each_hw_ctx(q, hctx, i) {
|
||||
cancel_work_sync(&hctx->run_work);
|
||||
cancel_delayed_work_sync(&hctx->delay_work);
|
||||
}
|
||||
queue_for_each_hw_ctx(q, hctx, i)
|
||||
cancel_delayed_work_sync(&hctx->run_work);
|
||||
} else {
|
||||
cancel_delayed_work_sync(&q->delay_work);
|
||||
}
|
||||
@@ -500,6 +498,13 @@ void blk_set_queue_dying(struct request_queue *q)
|
||||
queue_flag_set(QUEUE_FLAG_DYING, q);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
|
||||
/*
|
||||
* When queue DYING flag is set, we need to block new req
|
||||
* entering queue, so we call blk_freeze_queue_start() to
|
||||
* prevent I/O from crossing blk_queue_enter().
|
||||
*/
|
||||
blk_freeze_queue_start(q);
|
||||
|
||||
if (q->mq_ops)
|
||||
blk_mq_wake_waiters(q);
|
||||
else {
|
||||
@@ -556,9 +561,13 @@ void blk_cleanup_queue(struct request_queue *q)
|
||||
* prevent that q->request_fn() gets invoked after draining finished.
|
||||
*/
|
||||
blk_freeze_queue(q);
|
||||
spin_lock_irq(lock);
|
||||
if (!q->mq_ops)
|
||||
if (!q->mq_ops) {
|
||||
spin_lock_irq(lock);
|
||||
__blk_drain_queue(q, true);
|
||||
} else {
|
||||
blk_mq_debugfs_unregister_mq(q);
|
||||
spin_lock_irq(lock);
|
||||
}
|
||||
queue_flag_set(QUEUE_FLAG_DEAD, q);
|
||||
spin_unlock_irq(lock);
|
||||
|
||||
@@ -669,6 +678,15 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
|
||||
if (nowait)
|
||||
return -EBUSY;
|
||||
|
||||
/*
|
||||
* read pair of barrier in blk_freeze_queue_start(),
|
||||
* we need to order reading __PERCPU_REF_DEAD flag of
|
||||
* .q_usage_counter and reading .mq_freeze_depth or
|
||||
* queue dying flag, otherwise the following wait may
|
||||
* never return if the two reads are reordered.
|
||||
*/
|
||||
smp_rmb();
|
||||
|
||||
ret = wait_event_interruptible(q->mq_freeze_wq,
|
||||
!atomic_read(&q->mq_freeze_depth) ||
|
||||
blk_queue_dying(q));
|
||||
@@ -720,6 +738,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
|
||||
if (!q->backing_dev_info)
|
||||
goto fail_split;
|
||||
|
||||
q->stats = blk_alloc_queue_stats();
|
||||
if (!q->stats)
|
||||
goto fail_stats;
|
||||
|
||||
q->backing_dev_info->ra_pages =
|
||||
(VM_MAX_READAHEAD * 1024) / PAGE_SIZE;
|
||||
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
|
||||
@@ -776,6 +798,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
|
||||
fail_ref:
|
||||
percpu_ref_exit(&q->q_usage_counter);
|
||||
fail_bdi:
|
||||
blk_free_queue_stats(q->stats);
|
||||
fail_stats:
|
||||
bdi_put(q->backing_dev_info);
|
||||
fail_split:
|
||||
bioset_free(q->bio_split);
|
||||
@@ -889,7 +913,6 @@ out_exit_flush_rq:
|
||||
q->exit_rq_fn(q, q->fq->flush_rq);
|
||||
out_free_flush_queue:
|
||||
blk_free_flush_queue(q->fq);
|
||||
wbt_exit(q);
|
||||
return -ENOMEM;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_init_allocated_queue);
|
||||
@@ -1128,7 +1151,6 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
|
||||
|
||||
blk_rq_init(q, rq);
|
||||
blk_rq_set_rl(rq, rl);
|
||||
blk_rq_set_prio(rq, ioc);
|
||||
rq->cmd_flags = op;
|
||||
rq->rq_flags = rq_flags;
|
||||
|
||||
@@ -1608,17 +1630,23 @@ out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
void init_request_from_bio(struct request *req, struct bio *bio)
|
||||
void blk_init_request_from_bio(struct request *req, struct bio *bio)
|
||||
{
|
||||
struct io_context *ioc = rq_ioc(bio);
|
||||
|
||||
if (bio->bi_opf & REQ_RAHEAD)
|
||||
req->cmd_flags |= REQ_FAILFAST_MASK;
|
||||
|
||||
req->errors = 0;
|
||||
req->__sector = bio->bi_iter.bi_sector;
|
||||
if (ioprio_valid(bio_prio(bio)))
|
||||
req->ioprio = bio_prio(bio);
|
||||
else if (ioc)
|
||||
req->ioprio = ioc->ioprio;
|
||||
else
|
||||
req->ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
|
||||
blk_rq_bio_prep(req->q, req, bio);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_init_request_from_bio);
|
||||
|
||||
static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
|
||||
{
|
||||
@@ -1709,7 +1737,7 @@ get_rq:
|
||||
* We don't worry about that case for efficiency. It won't happen
|
||||
* often, and the elevators are able to handle it.
|
||||
*/
|
||||
init_request_from_bio(req, bio);
|
||||
blk_init_request_from_bio(req, bio);
|
||||
|
||||
if (test_bit(QUEUE_FLAG_SAME_COMP, &q->queue_flags))
|
||||
req->cpu = raw_smp_processor_id();
|
||||
@@ -1936,7 +1964,13 @@ generic_make_request_checks(struct bio *bio)
|
||||
if (!blkcg_bio_issue_check(q, bio))
|
||||
return false;
|
||||
|
||||
trace_block_bio_queue(q, bio);
|
||||
if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) {
|
||||
trace_block_bio_queue(q, bio);
|
||||
/* Now that enqueuing has been traced, we need to trace
|
||||
* completion as well.
|
||||
*/
|
||||
bio_set_flag(bio, BIO_TRACE_COMPLETION);
|
||||
}
|
||||
return true;
|
||||
|
||||
not_supported:
|
||||
@@ -2478,7 +2512,7 @@ void blk_start_request(struct request *req)
|
||||
blk_dequeue_request(req);
|
||||
|
||||
if (test_bit(QUEUE_FLAG_STATS, &req->q->queue_flags)) {
|
||||
blk_stat_set_issue_time(&req->issue_stat);
|
||||
blk_stat_set_issue(&req->issue_stat, blk_rq_sectors(req));
|
||||
req->rq_flags |= RQF_STATS;
|
||||
wbt_issue(req->q->rq_wb, &req->issue_stat);
|
||||
}
|
||||
@@ -2540,22 +2574,11 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
|
||||
{
|
||||
int total_bytes;
|
||||
|
||||
trace_block_rq_complete(req->q, req, nr_bytes);
|
||||
trace_block_rq_complete(req, error, nr_bytes);
|
||||
|
||||
if (!req->bio)
|
||||
return false;
|
||||
|
||||
/*
|
||||
* For fs requests, rq is just carrier of independent bio's
|
||||
* and each partial completion should be handled separately.
|
||||
* Reset per-request error on each partial completion.
|
||||
*
|
||||
* TODO: tj: This is too subtle. It would be better to let
|
||||
* low level drivers do what they see fit.
|
||||
*/
|
||||
if (!blk_rq_is_passthrough(req))
|
||||
req->errors = 0;
|
||||
|
||||
if (error && !blk_rq_is_passthrough(req) &&
|
||||
!(req->rq_flags & RQF_QUIET)) {
|
||||
char *error_type;
|
||||
@@ -2601,6 +2624,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
|
||||
if (bio_bytes == bio->bi_iter.bi_size)
|
||||
req->bio = bio->bi_next;
|
||||
|
||||
/* Completion has already been traced */
|
||||
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
|
||||
req_bio_endio(req, bio, bio_bytes, error);
|
||||
|
||||
total_bytes += bio_bytes;
|
||||
@@ -2699,7 +2724,7 @@ void blk_finish_request(struct request *req, int error)
|
||||
struct request_queue *q = req->q;
|
||||
|
||||
if (req->rq_flags & RQF_STATS)
|
||||
blk_stat_add(&q->rq_stats[rq_data_dir(req)], req);
|
||||
blk_stat_add(req);
|
||||
|
||||
if (req->rq_flags & RQF_QUEUED)
|
||||
blk_queue_end_tag(q, req);
|
||||
@@ -2776,7 +2801,7 @@ static bool blk_end_bidi_request(struct request *rq, int error,
|
||||
* %false - we are done with this request
|
||||
* %true - still buffers pending for this request
|
||||
**/
|
||||
bool __blk_end_bidi_request(struct request *rq, int error,
|
||||
static bool __blk_end_bidi_request(struct request *rq, int error,
|
||||
unsigned int nr_bytes, unsigned int bidi_bytes)
|
||||
{
|
||||
if (blk_update_bidi_request(rq, error, nr_bytes, bidi_bytes))
|
||||
@@ -2828,43 +2853,6 @@ void blk_end_request_all(struct request *rq, int error)
|
||||
}
|
||||
EXPORT_SYMBOL(blk_end_request_all);
|
||||
|
||||
/**
|
||||
* blk_end_request_cur - Helper function to finish the current request chunk.
|
||||
* @rq: the request to finish the current chunk for
|
||||
* @error: %0 for success, < %0 for error
|
||||
*
|
||||
* Description:
|
||||
* Complete the current consecutively mapped chunk from @rq.
|
||||
*
|
||||
* Return:
|
||||
* %false - we are done with this request
|
||||
* %true - still buffers pending for this request
|
||||
*/
|
||||
bool blk_end_request_cur(struct request *rq, int error)
|
||||
{
|
||||
return blk_end_request(rq, error, blk_rq_cur_bytes(rq));
|
||||
}
|
||||
EXPORT_SYMBOL(blk_end_request_cur);
|
||||
|
||||
/**
|
||||
* blk_end_request_err - Finish a request till the next failure boundary.
|
||||
* @rq: the request to finish till the next failure boundary for
|
||||
* @error: must be negative errno
|
||||
*
|
||||
* Description:
|
||||
* Complete @rq till the next failure boundary.
|
||||
*
|
||||
* Return:
|
||||
* %false - we are done with this request
|
||||
* %true - still buffers pending for this request
|
||||
*/
|
||||
bool blk_end_request_err(struct request *rq, int error)
|
||||
{
|
||||
WARN_ON(error >= 0);
|
||||
return blk_end_request(rq, error, blk_rq_err_bytes(rq));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(blk_end_request_err);
|
||||
|
||||
/**
|
||||
* __blk_end_request - Helper function for drivers to complete the request.
|
||||
* @rq: the request being processed
|
||||
@@ -2924,26 +2912,6 @@ bool __blk_end_request_cur(struct request *rq, int error)
|
||||
}
|
||||
EXPORT_SYMBOL(__blk_end_request_cur);
|
||||
|
||||
/**
|
||||
* __blk_end_request_err - Finish a request till the next failure boundary.
|
||||
* @rq: the request to finish till the next failure boundary for
|
||||
* @error: must be negative errno
|
||||
*
|
||||
* Description:
|
||||
* Complete @rq till the next failure boundary. Must be called
|
||||
* with queue lock held.
|
||||
*
|
||||
* Return:
|
||||
* %false - we are done with this request
|
||||
* %true - still buffers pending for this request
|
||||
*/
|
||||
bool __blk_end_request_err(struct request *rq, int error)
|
||||
{
|
||||
WARN_ON(error >= 0);
|
||||
return __blk_end_request(rq, error, blk_rq_err_bytes(rq));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__blk_end_request_err);
|
||||
|
||||
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
|
||||
struct bio *bio)
|
||||
{
|
||||
@@ -3106,6 +3074,13 @@ int kblockd_schedule_work_on(int cpu, struct work_struct *work)
|
||||
}
|
||||
EXPORT_SYMBOL(kblockd_schedule_work_on);
|
||||
|
||||
int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
|
||||
unsigned long delay)
|
||||
{
|
||||
return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
|
||||
}
|
||||
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
|
||||
|
||||
int kblockd_schedule_delayed_work(struct delayed_work *dwork,
|
||||
unsigned long delay)
|
||||
{
|
||||
|
||||
+2
-9
@@ -69,8 +69,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
|
||||
|
||||
if (unlikely(blk_queue_dying(q))) {
|
||||
rq->rq_flags |= RQF_QUIET;
|
||||
rq->errors = -ENXIO;
|
||||
__blk_end_request_all(rq, rq->errors);
|
||||
__blk_end_request_all(rq, -ENXIO);
|
||||
spin_unlock_irq(q->queue_lock);
|
||||
return;
|
||||
}
|
||||
@@ -92,11 +91,10 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
|
||||
* Insert a fully prepared request at the back of the I/O scheduler queue
|
||||
* for execution and wait for completion.
|
||||
*/
|
||||
int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
|
||||
void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
|
||||
struct request *rq, int at_head)
|
||||
{
|
||||
DECLARE_COMPLETION_ONSTACK(wait);
|
||||
int err = 0;
|
||||
unsigned long hang_check;
|
||||
|
||||
rq->end_io_data = &wait;
|
||||
@@ -108,10 +106,5 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
|
||||
while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2)));
|
||||
else
|
||||
wait_for_completion_io(&wait);
|
||||
|
||||
if (rq->errors)
|
||||
err = -EIO;
|
||||
|
||||
return err;
|
||||
}
|
||||
EXPORT_SYMBOL(blk_execute_rq);
|
||||
|
||||
+2
-3
@@ -447,7 +447,7 @@ void blk_insert_flush(struct request *rq)
|
||||
if (q->mq_ops)
|
||||
blk_mq_end_request(rq, 0);
|
||||
else
|
||||
__blk_end_bidi_request(rq, 0, 0, 0);
|
||||
__blk_end_request(rq, 0, 0);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -497,8 +497,7 @@ void blk_insert_flush(struct request *rq)
|
||||
* Description:
|
||||
* Issue a flush for the block device in question. Caller can supply
|
||||
* room for storing the error offset in case of a flush error, if they
|
||||
* wish to. If WAIT flag is not passed then caller may check only what
|
||||
* request was pushed in some internal queue for later handling.
|
||||
* wish to.
|
||||
*/
|
||||
int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
|
||||
sector_t *error_sector)
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user