We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.
With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.
This bug was introduced in commit d2eb35acfd and so the patch
is suitable for 3.1.x and 3.2.x
Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org
Now that WantReplacement drives are replaced cleanly, mark a drive
as want_replacement when we see a write error. It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.
Signed-off-by: NeilBrown <neilb@suse.de>
When attempting to add a spare to a RAID1 array, also consider
adding it as a replacement for a want_replacement device.
Signed-off-by: NeilBrown <neilb@suse.de>
If a Replacement is seen, file it as such.
If we see two replacements (or two normal devices) for the one slot,
abort.
Signed-off-by: NeilBrown <neilb@suse.de>
When recovery completes ->spare_active is called.
This checks if the replacement is ready and if so it fails
the original.
Signed-off-by: NeilBrown <neilb@suse.de>
In RAID1, a replacement is much like a normal device, so we just
double the size of the relevant arrays and look at all possible
devices for reads and writes.
This means that the array looks like it is now double the size in some
way - we need to be careful about that.
In particular, we checking if the array is still degraded while
creating a recovery request we need to only consider the first 'half'
- i.e. the real (non-replacement) devices.
Signed-off-by: NeilBrown <neilb@suse.de>
In general mddev->raid_disks can change unexpectedly while
conf->raid_disks will only change in a very controlled way. So change
some uses of one to the other.
The use of mddev->raid_disks will not cause actually problems but
this way is more consistent and safer in the long term.
Signed-off-by: NeilBrown <neilb@suse.de>
Soon an array will be able to have multiple devices with the
same raid_disk number (an original and a replacement). So removing
a device based on the number won't work. So pass the actual device
handle instead.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
A pending cleanup will mean that module.h won't be implicitly
everywhere anymore. Make sure the modular drivers in md dir
are actually calling out for <module.h> explicitly in advance.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
In 3.0 we changed the way recovery_disabled was handle so that instead
of testing against zero, we test an mddev-> value against a conf->
value.
Two problems:
1/ one place in raid1 was missed and still sets to '1'.
2/ We didn't explicitly set the conf-> value at array creation
time.
It defaulted to '0' just like the mddev value does so they
could appear equal and thus disable recovery.
This did not affect normal 'md' as it calls bind_rdev_to_array
which changes the mddev value. However the dmraid interface
doesn't call this and so doesn't change ->recovery_disabled; so at
array start all recovery is incorrectly disabled.
So initialise the 'conf' value to one less that the mddev value, so
the will only be the same when explicitly set that way.
Reported-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
bio originally has the functionality to set the complete cpu, but
it is broken.
Chirstoph said that "This code is unused, and from the all the
discussions lately pretty obviously broken. The only thing keeping
it serves is creating more confusion and possibly more bugs."
And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine
with leaving cpu control to the request based drivers, they are the
only ones that can toggle the setting anyway".
So this patch tries to remove all the work of controling complete cpu
from a bio.
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
RAID1 and RAID10 handle write requests by queuing them for handling by
a separate thread. This is because when a write-intent-bitmap is
active we might need to update the bitmap first, so it is good to
queue a lot of writes, then do one big bitmap update for them all.
However writeback request devices to appear to be congested after a
while so it can make some guesstimate of throughput. The infinite
queue defeats that (note that RAID5 has already has a finite queue so
it doesn't suffer from this problem).
So impose a limit on the number of pending write requests. By default
it is 1024 which seems to be generally suitable. Make it configurable
via module option just in case someone finds a regression.
Signed-off-by: NeilBrown <neilb@suse.de>
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.
Signed-off-by: NeilBrown <neilb@suse.de>
When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>