commit a534dbe96e upstream.
blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b8d
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.
This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
block: Backport of various I/O topology fixes from 2.6.33 and 2.6.34
The stacking code incorrectly scaled up the data offset in some cases
causing misaligned devices to report alignment. Rewrite the stacking
algorithm to remedy this.
(Upstream commit 9504e0864b)
The top device misalignment flag would not be set if the added bottom
device was already misaligned as opposed to causing a stacking failure.
Also massage the reporting so that an error is only returned if adding
the bottom device caused the misalignment. I.e. don't return an error
if the top is already flagged as misaligned.
(Upstream commit fe0b393f2c)
lcm() was defined to take integer-sized arguments. The supplied
arguments are multiplied, however, causing us to overflow given
sufficiently large input. That in turn led to incorrect optimal I/O
size reporting in some cases. Switch lcm() over to unsigned long
similar to gcd() and move the function from blk-settings.c to lib.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
CFQ has an optimization for cooperated applications. if several
io-context have close requests, they will get boost. But the
optimization get abused. Considering thread a, b, which work on one
file. a reads sectors s, s+2, s+4, ...; b reads sectors s+1, s+3, s
+5, ... Both a and b are sequential read, so they can open idle window.
a reads a sector s and goes to idle window and wakeup b. b reads sector
s+1, since in current implementation, cfq_should_preempt() thinks a and
b are cooperators, b will preempt a. b then reads sector s+1 and goes to
idle window and wakeup a. for the same reason, a will preempt b and
reads s+2. a and b will continue the circle. The circle will be very
long, and a and b will occupy whole disk queue. Other applications will
nearly have no chance to run.
Fix this limiting coop preempt until a queue is scheduled normally
again.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Commit a6151c3a5c inadvertently reversed
a preempt condition check, potentially causing a performance regression.
Make the meta check correct again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
With 2.6.32-rc5 in a KVM guest using dm and virtio_blk, we see the
following errors:
end_request: I/O error, dev vda, sector 0
end_request: I/O error, dev vda, sector 0
The errors go away if dm stops submitting empty barriers, by reverting:
commit 52b1fd5a27
Author: Mikulas Patocka <mpatocka@redhat.com>
dm: send empty barriers to targets in dm_flush
We should silently error all barriers, even empty barriers, on devices
like virtio_blk which don't support them.
See also:
https://bugzilla.redhat.com/514901
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
If the average think time is larger than the remaining time slice
for any given queue, don't allow it to idle. A succesful idle also
means that we need to dispatch and complete a request, so if we don't
even have time left for the idle process, we would overrun the slice
in any case.
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Saves 16 bytes of text, woohoo. But the more important point is
that it makes the code more readable when returning bool for 0/1
cases.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
CFQ enables idle only for processes that think less than the allowed
idle time. Since idle time is lower for seeky queues, we should use the
correct value in the comparison.
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We should subtract the slice residual from the rb tree key, since
a negative residual count indicates that the cfqq overran its slice
the last time. Hence we want to add the overrun time, to position
it a bit further away in the service tree.
Reported-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Commit a9327cac44 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.
But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.
So this was reverted by commit 0f78ab9899
The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;
- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));
With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
--
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It was briefly introduced to allow CFQ to to delayed scheduling,
but we ended up removing that feature again. So lets kill the
function and export, and just switch CFQ back to the normal work
schedule since it is now passing in a '0' delay from all call
sites.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The RR service tree is indexed by a key that is relative to current jiffies.
This can cause problems on jiffies wraparound.
The patch fixes it using time_before comparison, and changing
the add_front path to use a relative number, too.
Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cfq uses rq->start_time as the fifo indicator, but that field may
get modified prior to cfq doing it's fifo list adjustment when
a request gets merged with another request. This can cause the
fifo list to become unordered.
Reported-by: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
We cannot delay for the first dispatch of the async queue if it
hasn't dispatched at all, since that could present a local user
DoS attack vector using an app that just did slow timed sync reads
while filling memory.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Not all users of the topology information want to use libblkid. Provide
the topology information through bdev ioctls.
Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Don't think that's necessarily a perfect description of what this
option fiddles with, but it's probably better than 'desktop'.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This slowly ramps up the async queue depth based on the time
passed since the sync IO, and doesn't allow async at all until
a sync slice period has passed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
o Do not allow more than max_dispatch requests from an async queue, if some
sync request has finished recently. This is in the hope that sync activity
is still going on in the system and we might receive a sync request soon.
Most likely from a sync queue which finished a request and we did not enable
idling on it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>