- 100msec sleep is a little excessive, lots of requests can complete
in that timeframe. Use 10msec instead.
- Rename QUEUE_FLAG_BYPASS to QUEUE_FLAG_ELVSWITCH to indicate what
is going on.
Signed-off-by: Jens Axboe <axboe@suse.de>
This patch reimplements elevator switch. This patch assumes generic
dispatch queue patchset is applied.
* Each request is tagged with REQ_ELVPRIV flag if it has its elevator
private data set.
* Requests which doesn't have REQ_ELVPRIV flag set never enter
iosched. They are always directly back inserted to dispatch queue.
Of course, elevator_put_req_fn is called only for requests which
have its REQ_ELVPRIV set.
* Request queue maintains the current number of requests which have
its elevator data set (elevator_set_req_fn called) in
q->rq->elvpriv.
* If a request queue has QUEUE_FLAG_BYPASS set, elevator private data
is not allocated for new requests.
To switch to another iosched, we set QUEUE_FLAG_BYPASS and wait until
elvpriv goes to zero; then, we attach the new iosched and clears
QUEUE_FLAG_BYPASS. New implementation is much simpler and main code
paths are less cluttered, IMHO.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Implements generic dispatch queue which can replace all
dispatch queues implemented by each iosched. This reduces
code duplication, eases enforcing semantics over dispatch
queue, and simplifies specific ioscheds.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
This function was removed a while ago, but crept in again via a recent
scsi merge.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Per-queue parameters should be updated using the appropriate blk_queue_xxx
functions.
Signed-off-by: Stuart McLaren <stuart.mclaren@hp.com>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
My patch in commit fa72b903f7 incorrectly
removed blk_queue_tag->real_max_depth.
The original resize implementation was incorrect in the following
points.
* actual allocation size of tag_index was shorter than real_max_size,
but assumed to be of the same size, possibly causing memory access
beyond the allocated area.
* bits in tag_map between max_deptn and real_max_depth were
initialized to 1's, making the tags permanently reserved.
In an attempt to fix above two bugs, I had removed allocation optimization
in init_tag_map and real_max_size. Tag map/index were allocated and freed
immediately during resize.
Unfortunately, I wasn't considering that tag map/index can be resized
dynamically with tags beyond new_depth active. This led to accessing
freed area after shrinking tags and led to the following bug reporting
thread on linux-scsi.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112319898111885&w=2
To fix the problem, I've revived real_max_depth without allocation
optimization in init_tag_map, and Andrew Vasquez confirmed that the
problem was fixed. As Jens is not going to be available for a week, he
asked me to make sure that this patch reaches you.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112325778530886&w=2
Also, a comment was added to make sure that real_max_size is needed for
dynamic shrinking.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_request is now expected to be holding on to queue_lock, with interrupts
disabled, when it returns NULL; but one path forgot that, causing all kinds
of nastiness under swap load - badness backtraces, strange failures, BUGs.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_io_context needlessly turned off interrupts and checked for racing io
context creations. Both of which aren't needed, because the io context can
only be created while in process context of the current process.
Also, split the function in 2. A light version, current_io_context does not
elevate the reference count specifically, but can be used when in process
context, because the process holds a reference itself.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change around locking a bit for a result of 1-2 less spin lock unlock pairs in
request submission paths.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the case where the request is not able to be merged by the elevator, don't
retake the lock and retry the merge mechanism after allocating a new request.
Instead assume that the chance of a merge remains slim, and now that we've
done most of the work allocating a request we may as well just go with it.
Also be rid of the GFP_ATOMIC allocation: we've got working mempools for the
block layer now, so let's save atomic memory for things like networking.
Lastly, in get_request_wait, do an initial get_request call before going into
the waitqueue. This is reported to help efficiency.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently we cap request allocations at q->nr_requests, but we allow a
batching io context to allocate up to 32 more (default setting). This
can flood the queue with request allocations, with only a few batching
processes. The real fix would be to limit the number of batchers, but
as that isn't currently tracked, I suggest we just cap the maximum
number of allocated requests to eg 50% over the limit.
This was observed in real life, users typically see this as vmstat bo
numbers going off the wall with seconds of no queueing afterwards.
Behaviour this bursty is not beneficial.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global code static
- remove the following unused global functions:
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
- remove the following unused EXPORT_SYMBOL's:
- blk_phys_contig_segment
- blk_hw_contig_segment
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This memory barrier is not needed because the waitqueue will only get waiters
on it in the following situations:
rq->count has exceeded the threshold - however all manipulations of ->count
are performed under the runqueue lock, and so we will correctly pick up any
waiter.
Memory allocation for the request fails. In this case, there is no additional
help provided by the memory barrier. We are guaranteed to eventually wake up
waiters because the request allocation mempool guarantees that if the mem
allocation for a request fails, there must be some requests in flight. They
will wake up waiters when they are retired.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>