DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless. Root devices should
be explicitly specified using textual names. Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning
to the help text.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
bdget_disk() and blk_lookup_devt() never cared whether the specified
partition (or disk) is zero sized or not. I got confused while
converting those not to depend on consecutive minor numbers in commit
5a6411b1178baf534aa9138052864dfa89d3eada and later when dev0 was added
it broke callers which expected to get valid return for zero sized
disk devices.
So, they never needed nr_sects checks in the first place. Kill them.
This problem was spotted and debugged by Bartlmoiej Zolnierkiewicz.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's a debug option that you would explicitly enable to test this
feature, we should default it to 'n' to prevent accidental surprises
for now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Noticed by sparse:
block/blk-softirq.c:156:12: warning: symbol 'blk_softirq_init' was not declared. Should it be static?
block/genhd.c:583:28: warning: function 'bdget_disk' with external linkage has definition
block/genhd.c:659:17: warning: incorrect type in argument 1 (different base types)
block/genhd.c:659:17: expected unsigned int [unsigned] [usertype] size
block/genhd.c:659:17: got restricted gfp_t
block/genhd.c:659:29: warning: incorrect type in argument 2 (different base types)
block/genhd.c:659:29: expected restricted gfp_t [usertype] flags
block/genhd.c:659:29: got unsigned int
block: kmalloc args reversed
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds blk_rq_aligned helper function to see if alignment and
padding requirement is satisfied for DMA transfer. This also converts
blk_rq_map_kern and __blk_rq_map_user to use the helper function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch converts the indirect IO path (including mmap IO and old
struct sg_header) to use the block layer functions (blk_get_request,
blk_execute_rq_nowait, blk_rq_map_user, etc) instead of
scsi_execute_async().
[Jens: fixed compile error with SCSI logging enabled]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch introduces struct rq_map_data to enable bio_copy_use_iov()
use reserved pages.
Currently, bio_copy_user_iov allocates bounce pages but
drivers/scsi/sg.c wants to allocate pages by itself and use
them. struct rq_map_data can be used to pass allocated pages to
bio_copy_user_iov.
The current users of bio_copy_user_iov simply passes NULL (they don't
want to use pre-allocated pages).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Douglas Gilbert <dougg@torque.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
CFQ's detection of queueing devices assumes a non-queuing device and detects
if the queue depth reaches a certain threshold. Under some workloads (e.g.
synchronous reads), CFQ effectively forces a unit queue depth, thus defeating
the detection logic. This leads to poor performance on queuing hardware,
since the idle window remains enabled.
This patch inverts the sense of the logic: assume a queuing-capable device,
and detect if the depth does not exceed the threshold.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We should just check for rq->bio, as that is really the information
we are looking for. Even if the bio attached doesn't carry data,
we still need to do IO post processing on it.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Somewhat incomplete, as we do allow merges of requests and bios
that have different completion CPUs given. This is done on the
assumption that a larger IO is still more beneficial than CPU
locality.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds support for controlling the IO completion CPU of
either all requests on a queue, or on a per-request basis. We export
a sysfs variable (rq_affinity) which, if set, migrates completions
of requests to the CPU that originally submitted it. A bio helper
(bio_set_completion_cpu()) is also added, so that queuers can ask
for completion on that specific CPU.
In testing, this has been show to cut the system time by as much
as 20-40% on synthetic workloads where CPU affinity is desired.
This requires a little help from the architecture, so it'll only
work as designed for archs that are using the new generic smp
helper infrastructure.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Now that disk and partition handlings are mostly unified, it's easy to
allow disk to have extended device number. This patch makes
add_disk() use extended device number if disk->minors is zero. Both
sd and ide-disk are updated to use this.
* sd_format_disk_name() is implemented which can generically determine
the drive name. This removes disk number restriction stemming from
limited device names.
* If sd index goes over SD_MAX_DISKS (which can be increased now BTW),
sd simply doesn't initialize minors letting block layer choose
extended device number.
* If CONFIG_DEBUG_EXT_DEVT is set, both sd and ide-disk always set
minors to 0 and use extended device numbers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
With previous changes, it's meaningless to limit the number of
partitions. Replace @ext_minors with GENHD_FL_EXT_DEVT such that
setting the flag allows the disk to have maximum number of allowed
partitions (only limited by the number of entries in parsed_partitions
as determined by MAX_PART constant).
This kills not-too-pretty alloc_disk_ext[_node]() functions and makes
@minors parameter to alloc_disk[_node]() unnecessary. The parameter
is left alone to avoid disturbing the users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
disk->__part used to be statically allocated to the maximum possible
number of partitions. This patch makes partition array allocation
dynamic. The added overhead is minimal as only real change is one
memory dereference changed to RCU one. This saves both a bit of
memory and cpu cycles iterating through unoccupied slots and makes
increasing partition limit easier.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...
* part_stat_*() now updates part0 together if the specified partition
is not part0. ie. part_stat_*() are now essentially all_stat_*().
* {disk|all}_stat_*() are gone.
* part_round_stats() is updated similary. It handles part0 stats
automatically and disk_round_stats() is killed.
* part_{inc|dec}_in_fligh() is implemented which automatically updates
part0 stats for parts other than part0.
* disk_map_sector_rcu() is updated to return part0 if no part matches.
Combined with the above changes, this makes NULL special case
handling in callers unnecessary.
* Separate stats show code paths for disk are collapsed into part
stats show code paths.
* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
GENHD_FL_FAIL for disk is what make_it_fail is for parts. Kill it and
use part0->make_it_fail. Sysfs node handling is unified too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>