Commit Graph

48 Commits

Author SHA1 Message Date
Shaohua Li
bc0934f047 raid5: support sync request
REQ_SYNC is ignored in current raid5 code. Block layer does use it to do
policy,
for example ioscheduler. This patch adds it.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-22 13:55:05 +10:00
NeilBrown
b5254dd5fd md/raid5: allow for change in data_offset while managing a reshape.
The important issue here is incorporating the different in data_offset
into calculations concerning when we might need to over-write data
that is still thought to be valid.

To this end we find the minimum offset difference across all devices
and add that where appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-21 09:27:01 +10:00
NeilBrown
9a3e1101b8 md/raid5: detect and handle replacements during recovery.
During recovery we want to write to the replacement but not
the original.  So we have two new flags
 - R5_NeedReplace if this stripe has a replacement that needs to
   be written at some stage
 - R5_WantReplace if NeedReplace, and the data is available, and
   a 'sync' has been requested on this stripe.

We also distinguish between 'sync and replace' which need to read
all other devices, and 'replace' which only needs to read the
devices being replaced.

Note that during resync we always write to any replacement device.
It might not need to be written to, but as we don't read to compare,
we have to write to be sure.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 10:17:53 +11:00
NeilBrown
977df36255 md/raid5: writes should get directed to replacement as well as original.
When writing, we need to submit two writes, one to the original, and
one to the replacement - if there is a replacement.

If the write to the replacement results in a write error, we just fail
the device.  We only try to record write errors to the original.

When writing for recovery, we shouldn't write to the original.  This
will be addressed in a subsequent patch that generally addresses
recovery.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 10:17:53 +11:00
NeilBrown
ede7ee8b4d md/raid5: raid5.h cleanup
Remove some #defines that are no longer used, and replace some
others with an enum.
And remove an unused field.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 10:17:52 +11:00
NeilBrown
671488cc25 md/raid5: allow each slot to have an extra replacement device
Just enhance data structures to record a second device per slot to be
used as a 'replacement' device, replacing the original.
We also have a second bio in each slot in each stripe_head.  This will
only be used when writing to the array - we need to write to both the
original and the replacement at the same time, so will need two bios.

For now, only try using the replacement drive for aligned-reads.
In this case, we prefer the replacement if it has been recovered far
enough, otherwise use the original.

This includes a small enhancement.  Previously we would only do
aligned reads if the target device was fully recovered.  Now we also
do them if it has recovered far enough.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 10:17:52 +11:00
NeilBrown
d1688a6d55 md/raid5: typedef removal: raid5_conf_t -> struct r5conf
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:49:52 +11:00
NeilBrown
2b8bf3451d md: remove typedefs: mdk_thread_t -> struct md_thread
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:48:23 +11:00
NeilBrown
fd01b88c75 md: remove typedefs: mddev_t -> struct mddev
Having mddev_t and 'struct mddev_s' is ugly and not preferred

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:47:53 +11:00
NeilBrown
3cb0300200 md: removing typedefs: mdk_rdev_t -> struct md_rdev
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h'
which used to be an include file that defined this thing.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11 16:45:26 +11:00
NeilBrown
b84db560ea md/raid5: Clear bad blocks on successful write.
On a successful write to a known bad block, flag the sh
so that raid5d can remove the known bad block from the list.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28 11:39:23 +10:00
NeilBrown
bc2607f393 md/raid5: write errors should be recorded as bad blocks if possible.
When a write error is detected, don't mark the device as failed
immediately but rather record the fact for handle_stripe to deal with.

Handle_stripe then attempts to record a bad block.  Only if that fails
does the device get marked as faulty.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28 11:39:22 +10:00
NeilBrown
7f0da59bdc md/raid5: use bad-block log to improve handling of uncorrectable read errors.
If we get an uncorrectable read error - record a bad block rather than
failing the device.
And if these errors (which may be due to known bad blocks) cause
recovery to be impossible, record a bad block on the recovering
devices, or abort the recovery.

As we might abort a recovery without failing a device we need to teach
RAID5 about recovery_disabled handling.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28 11:39:22 +10:00
NeilBrown
c5709ef6a0 md/raid5: add some more fields to stripe_head_state
Adding these three fields will allow more common code to be moved
to handle_stripe()

struct field rearrangement by Namhyung Kim.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:20 +10:00
NeilBrown
f2b3b44dee md/raid5: unify stripe_head_state and r6_state
'struct stripe_head_state' stores state about the 'current' stripe
that is passed around while handling the stripe.
For RAID6 there is an extension structure: r6_state, which is also
passed around.
There is no value in keeping these separate, so move the fields from
the latter into the former.

This means that all code now needs to treat s->failed_num as an small
array, but this is a small cost.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:35:19 +10:00
NeilBrown
c4c1663be4 md/raid5: replace sh->lock with an 'active' flag.
sh->lock is now mainly used to ensure that two threads aren't running
in the locked part of handle_stripe[56] at the same time.

That can more neatly be achieved with an 'active' flag which we set
while running handle_stripe.  If we find the flag is set, we simply
requeue the stripe for later by setting STRIPE_HANDLE.

For safety we take ->device_lock while examining the state of the
stripe and creating a summary in 'stripe_head_state / r6_state'.
This possibly isn't needed but as shared fields like ->toread,
->towrite are checked it is safer for now at least.

We leave the label after the old 'unlock' called "unlock" because it
will disappear in a few patches, so renaming seems pointless.

This leaves the stripe 'locked' for longer as we clear STRIPE_ACTIVE
later, but that is not a problem.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:34:20 +10:00
NeilBrown
83206d66b6 md/raid5: Remove use of sh->lock in sync_request
This is the start of a series of patches to remove sh->lock.

sync_request takes sh->lock before setting STRIPE_SYNCING to ensure
there is no race with testing it in handle_stripe[56].

Instead, use a new flag STRIPE_SYNC_REQUESTED and test it early
in handle_stripe[56] (after getting the same lock) and perform the
same set/clear operations if it was set.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-26 11:19:49 +10:00
NeilBrown
482c083492 md - remove old plugging code.
md has some plugging infrastructure for RAID5 to use because the
normal plugging infrastructure required a 'request_queue', and when
called from dm, RAID5 doesn't have one of those available.

This relied on the ->unplug_fn callback which doesn't exist any more.

So remove all of that code, both in md and raid5.  Subsequent patches
with restore the plugging functionality.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-04-18 18:25:42 +10:00
Jens Axboe
7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Tejun Heo
e9c7469bb4 md: implment REQ_FLUSH/FUA support
This patch converts md to support REQ_FLUSH/FUA instead of now
deprecated REQ_HARDBARRIER.  In the core part (md.c), the following
changes are notable.

* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA don't interfere with
  processing of other requests and thus there is no reason to mark the
  queue congested while FLUSH/FUA is in progress.

* REQ_FLUSH/FUA failures are final and its users don't need retry
  logic.  Retry logic is removed.

* Preflush needs to be issued to all member devices but FUA writes can
  be handled the same way as other writes - their processing can be
  deferred to request_queue of member devices.  md_barrier_request()
  is renamed to md_flush_request() and simplified accordingly.

For linear, raid0 and multipath, the core changes are enough.  raid1,
5 and 10 need the following conversions.

* raid1: Handling of FLUSH/FUA bio's can simply be deferred to
  request_queues of member devices.  Barrier related logic removed.

* raid5: Queue draining logic dropped.  FUA bit is propagated through
  biodrain and stripe resconstruction such that all the updated parts
  of the stripe are written out with FUA writes if any of the dirtying
  writes was FUA.  preread_active_stripes handling in make_request()
  is updated as suggested by Neil Brown.

* raid10: FUA bit needs to be propagated to write clones.

linear, raid0, 1, 5 and 10 tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-09-10 12:35:38 +02:00
NeilBrown
9f7c222001 md/raid5: export raid5 unplugging interface.
Also remove remaining accesses to ->queue and ->gendisk when ->queue
is NULL (As it is in a DM target).

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 12:53:10 +10:00
NeilBrown
2ac8740151 md/raid5: add simple plugging infrastructure.
md/raid5 uses the plugging infrastructure provided by the block layer
and 'struct request_queue'.  However when we plug raid5 under dm there
is no request queue so we cannot use that.

So create a similar infrastructure that is much lighter weight and use
it for raid5.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 12:53:08 +10:00
NeilBrown
11d8a6e371 md/raid5: export is_congested test
the dm module will need this for dm-raid45.

Also only access ->queue->backing_dev_info->congested_fn
if ->queue actually exists.  It won't in a dm target.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 12:52:29 +10:00
NeilBrown
f4be6b43f1 md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk
We will shortly allow md devices with no gendisk (they are attached to
a dm-target instead).  That will cause mdname() to return 'mdX'.
There is one place where mdname really needs to be unique: when
creating the name for a slab cache.
So in that case, if there is no gendisk, you the address of the mddev
formatted in HEX to provide a unique name.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-26 12:52:26 +10:00
NeilBrown
c41d4ac40d md/raid5: factor out code for changing size of stripe cache.
Separate the actual 'change' code from the sysfs interface
so that it can eventually be called internally.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-07-21 13:28:15 +10:00