blk_init_allocated_queue_node may fail and the caller _could_ retry.
Accommodate the unlikely event that blk_init_allocated_queue_node is
called on an already initialized (possibly partially) request_queue.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
On blk_init_allocated_queue_node failure, only free the request_queue if
it is wasn't previously allocated outside the block layer
(e.g. blk_init_queue_node was blk_init_allocated_queue_node caller).
This addresses an interface bug introduced by the following commit:
01effb0 block: allow initialization of previously allocated
request_queue
Otherwise the request_queue may be free'd out from underneath a caller
that is managing the request_queue directly (e.g. caller uses
blk_alloc_queue + blk_init_allocated_queue_node).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Bio-based DM doesn't use an elevator (queue is !blk_queue_stackable()).
Longer-term DM will not allocate an elevator for bio-based DM. But even
then there will be small potential for an elevator to be allocated for
a request-based DM table only to have a bio-based table be loaded in the
end.
Displaying "none" for bio-based DM will help avoid user confusion.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use small consequent indexes as radix tree keys instead of sparse cfqd address.
This change will reduce radix tree depth from 11 (6 for 32-bit hosts)
to 1 if host have <=64 disks under cfq control, or to 0 if there only one disk.
So, this patch save 10*560 bytes for each process (5*296 for 32-bit hosts)
For each cfqd allocate cic index from ida.
To unlink dead cic from tree without cfqd access store index into ->key.
(bit 0 -- dead mark, bits 1..30 -- index: ida produce id in range 0..2^31-1)
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Remove ->dead_key field from cfq_io_context to shrink its size to 128 bytes.
(64 bytes for 32-bit hosts)
Use lower bit in ->key as dead-mark, instead of moving key to separate field.
After this for dead cfq_io_context we got cic->key != cfqd automatically.
Thus, io_context's last-hit cache should work without changing.
Now to check ->key for non-dead state compare it with cfqd,
instead of checking ->key for non-null value as it was before.
Plus remove obsolete race protection in cfq_cic_lookup.
This race gone after v2.6.24-1728-g4ac845a
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_init_queue() allocates the request_queue structure and then
initializes it as needed (request_fn, elevator, etc).
Split initialization out to blk_init_allocated_queue_node.
Introduce blk_init_allocated_queue wrapper function to model existing
blk_init_queue and blk_init_queue_node interfaces.
Export elv_register_queue to allow a newly added elevator to be
registered with sysfs. Export elv_unregister_queue for symmetry.
These changes allow DM to initialize a device's request_queue with more
precision. In particular, DM no longer unconditionally initializes a
full request_queue (elevator et al). It only does so for a
request-based DM device.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
with CONFIG_PROVE_RCU=y, a warning can be triggered:
# mount -t cgroup -o blkio xxx /mnt
# mkdir /mnt/subgroup
...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...
To fix this, we avoid caling css_depth() here, which is a bit simpler
than the original code.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
- Add bio_batch helper primitive. This is rather generic primitive
for submitting/waiting a complex request which consists of several
bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move blkdev_issue_discard from blk-barrier.c because it is
not barrier related.
Later the file will be populated by other helpers.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
In some places caller don't want to wait a request to complete.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch fixes few usability and configurability issues.
o All the cgroup based controller options are configurable from
"Genral Setup/Control Group Support/" menu. blkio is the only exception.
Hence make this option visible in above menu and make it configurable from
there to bring it inline with rest of the cgroup based controllers.
o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.
This option currently does two things.
- Enable printing of cgroup paths in blktrace
- Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
files in cgroup.
If we are using group scheduling, blktrace data is of not really much use
if cgroup information is not present. To get this data, currently one has to
also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
all the additional debug stat files which is not desired.
Hence, this patch moves printing of cgroup paths under
CONFIG_CFQ_GROUP_IOSCHED.
This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
can be enabled through config menu.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Divyesh Shah <dpshah@google.com>
Reviewed-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8d
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.
This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
After merging the block tree, 20100414's linux-next build (x86_64
allmodconfig) failed like this:
ERROR: "get_gendisk" [block/blk-cgroup.ko] undefined!
ERROR: "sched_clock" [block/blk-cgroup.ko] undefined!
This happens because the two symbols aren't exported and hence not available
when blk-cgroup code is built as a module. I've tried to stay consistent with
the use of EXPORT_SYMBOL or EXPORT_SYMBOL_GPL with the other symbols in the
respective files.
Signed-off-by: Divyesh Shah <dpshah@google.com>
Acked-by: Gui Jianfeng <guijianfeng@cn.fujitsu.cn>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8d
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.
This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>