Commit Graph

52 Commits

Author SHA1 Message Date
Jonathan Brassow
c4a3955145 MD: Remember the last sync operation that was performed
MD:  Remember the last sync operation that was performed

This patch adds a field to the mddev structure to track the last
sync operation that was performed.  This is especially useful when
it comes to what is recorded in mismatch_cnt in sysfs.  If the
last operation was "data-check", then it reports the number of
descrepancies found by the user-initiated check.  If it was a
"repair" operation, then it is reporting the number of
descrepancies repaired.  etc.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-26 12:38:24 +10:00
Jingoo Han
b29bebd66d md: replace strict_strto*() with kstrto*()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:26 +10:00
NeilBrown
3f6bbd3ffd dm-raid: silence compiler warning on rebuilds_per_group.
This doesn't really need to be initialised, but it doesn't hurt,
silences the compiler, and as it is a counter it makes sense for it to
start at zero.

Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:26 +10:00
Jonathan Brassow
a4dc163a55 DM RAID: Fix raid_resume not reviving failed devices in all cases
DM RAID:  Fix raid_resume not reviving failed devices in all cases

When a device fails in a RAID array, it is marked as Faulty.  Later,
md_check_recovery is called which (through the call chain) calls
'hot_remove_disk' in order to have the personalities remove the device
from use in the array.

Sometimes, it is possible for the array to be suspended before the
personalities get their chance to perform 'hot_remove_disk'.  This is
normally not an issue.  If the array is deactivated, then the failed
device will be noticed when the array is reinstantiated.  If the
array is resumed and the disk is still missing, md_check_recovery will
be called upon resume and 'hot_remove_disk' will be called at that
time.  However, (for dm-raid) if the device has been restored,
a resume on the array would cause it to attempt to revive the device
by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
a situation is then created where the device is thought to concurrently
be the replacement and the device to be replaced.  Thus, the device
is first sync'ed with the rest of the array (because it is the replacement
device) and then marked Faulty and removed from the array (because
it is also the device being replaced).

The solution is to check and see if the device had properly been removed
before the array was suspended.  This is done by seeing whether the
device's 'raid_disk' field is -1 - a condition that implies that
'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
-> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
'hot_remove_disk' must be called to complete the removal of the previously
faulty device before it can be revived via 'hot_add_disk'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:25 +10:00
Jonathan Brassow
f381e71b04 DM RAID: Break-up untidy function
DM RAID:  Break-up untidy function

Clean-up excessive indentation by moving some code in raid_resume()
into its own function.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:25 +10:00
Jonathan Brassow
9092c02d94 DM RAID: Add ability to restore transiently failed devices on resume
DM RAID: Add ability to restore transiently failed devices on resume

This patch adds code to the resume function to check over the devices
in the RAID array.  If any are found to be marked as failed and their
superblocks can be read, an attempt is made to reintegrate them into
the array.  This allows the user to refresh the array with a simple
suspend and resume of the array - rather than having to load a
completely new table, allocate and initialize all the structures and
throw away the old instantiation.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-06-14 08:10:24 +10:00
Jonathan Brassow
be83651f00 DM RAID: Add message/status support for changing sync action
DM RAID:  Add message/status support for changing sync action

This patch adds a message interface to dm-raid to allow the user to more
finely control the sync actions being performed by the MD driver.  This
gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
Two additional fields have been appended to the status output to provide more
information about the type of sync action occurring and the results of those
actions, specifically: <sync_action> and <mismatch_cnt>.  These new fields
will always be populated.  This is essentially the device-mapper way of doing
what MD controls through the 'sync_action' sysfs file and shows through the
'mismatch_cnt' sysfs file.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-04-24 11:42:43 +10:00
Linus Torvalds
a5e0d73163 Merge tag 'md-3.9' of git://neil.brown.name/md
Pull md updates from NeilBrown:
 "Mostly little bugfixes.

  Only "feature" is a new RAID10 layout which slightly improves the
  number of sets of devices that can concurrently fail, without data
  loss."

* tag 'md-3.9' of git://neil.brown.name/md:
  md: expedite metadata update when switching  read-auto -> active
  md: remove CONFIG_MULTICORE_RAID456
  md/raid1,raid10: fix deadlock with freeze_array()
  md/raid0: improve error message when converting RAID4-with-spares to RAID0
  md: raid0: fix error return from create_stripe_zones.
  md: fix two bugs when attempting to resize RAID0 array.
  DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
  MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2)
  MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
  MD RAID10: Minor non-functional code changes
  md: raid1,10: Handle REQ_WRITE_SAME flag in write bios
  md: protect against crash upon fsync on ro array
2013-03-05 17:22:08 -08:00
Alasdair G Kergon
55a62eef8d dm: rename request variables to bios
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.

No functional changes.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Mikulas Patocka
fd7c092e71 dm: fix truncated status strings
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.

When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.

However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.

If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.

In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
  key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
  This is incorrect because retrieve_status doesn't check the error
  code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.

This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Jonathan Brassow
fe5d2f4a15 DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms
DM RAID:  Add support for MD's RAID10 "far" and "offset" algorithms

Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
implementation.  This patch adds support for the "far" and "offset"
algorithms, but only with the improved redundancy that is brought with
the introduction of the 'use_far_sets' bit, which shifts copied stripes
according to smaller sets vs the entire array.  That is, the 17th bit
of the 'layout' variable that defines the RAID10 implementation will
always be set.   (More information on how the 'layout' variable selects
the RAID10 algorithm can be found in the opening comments of
drivers/md/raid10.c.)

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-02-26 11:55:36 +11:00
Jonathan Brassow
55ebbb59c1 DM-RAID: Fix RAID10's check for sufficient redundancy
Before attempting to activate a RAID array, it is checked for sufficient
redundancy.  That is, we make sure that there are not too many failed
devices - or devices specified for rebuild - to undermine our ability to
activate the array.  The current code performs this check twice - once to
ensure there were not too many devices specified for rebuild by the user
('validate_rebuild_devices') and again after possibly experiencing a failure
to read the superblock ('analyse_superblocks').  Neither of these checks are
sufficient.  The first check is done properly but with insufficient
information about the possible failure state of the devices to make a good
determination if the array can be activated.  The second check is simply
done wrong in the case of RAID10 because it doesn't account for the
independence of the stripes (i.e. mirror sets).  The solution is to use the
properly written check ('validate_rebuild_devices'), but perform the check
after the superblocks have been read and we know which devices have failed.
This gives us one check instead of two and performs it in a location where
it can be done right.

Only RAID10 was affected and it was affected in the following ways:
- the code did not properly catch the condition where a user specified
  a device for rebuild that already had a failed device in the same mirror
  set.  (This condition would, however, be caught at a deeper level in MD.)
- the code triggers a false positive and denies activation when devices in
  independent mirror sets have failed - counting the failures as though they
  were all in the same set.

The most likely place this error was introduced (or this patch should have
been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
Consequently this fix should also go in v3.7.y, however there is a
small conflict on the .version in raid_target, so I'll submit a
separate patch to -stable.

Cc: stable@vger.kernel.org
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2013-01-24 12:02:36 +11:00
Mikulas Patocka
7de3ee57da dm: remove map_info
This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Jonathan Brassow
3a0f9aaee0 dm raid: round region_size to power of two
If the user does not supply a bitmap region_size to the dm raid target,
a reasonable size is computed automatically.  If this is not a power of 2,
the md code will report an error later.

This patch catches the problem early and rounds the region_size to the
next power of two.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:33 +00:00
Jonathan Brassow
761becff01 DM RAID: Fix for "sync" directive ineffectiveness
There are two table arguments that can be given to a DM RAID target
that control whether the array is forced to (re)synchronize or skip
initialization: "sync" and "nosync".  When "sync" is given, we set
mddev->recovery_cp to 0 in order to cause the device to resynchronize.
This is insufficient if there is a bitmap in use, because the array
will simply look at the bitmap and see that there is no recovery
necessary.

The fix is to skip over the loading of the superblocks when "sync" is
given, causing new superblocks to be written that will force the array
to go through initialization (i.e. synchronization).

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:42:19 +11:00
Jonathan Brassow
7386199c47 DM RAID: Fix comparison of index and quantity for "rebuild" parameter
DM RAID: Fix comparison of index and quantity for "rebuild" parameter

The "rebuild" parameter takes an index argument that starts counting from
zero.  The conditional used to validate the index was using '>' rather than
'>=', leaving the door open for an index value that would be 1 too large.

Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:40:36 +11:00
Jonathan Brassow
4ec1e369af DM RAID: Add rebuild capability for RAID10
DM RAID:  Add code to validate replacement slots for RAID10 arrays

RAID10 can handle 'copies - 1' failures for each mirror group.  This code
ensures the user has provided a valid array - one whose devices specified for
rebuild do not exceed the amount of redundancy available.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:40:24 +11:00
Jonathan Brassow
eb6491236f DM RAID: Move 'rebuild' checking code to its own function
DM RAID:  Move chunk of code to it's own function

The code that checks whether device replacements/rebuilds are possible given
a specific RAID type is moved to it's own function.  It will further expand
when the code to check RAID10 is added.  A separate function makes it easier
to read.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-10-11 13:40:09 +11:00
Linus Torvalds
fcff06c438 Merge branch 'for-next' of git://neil.brown.name/md
Pull md updates from NeilBrown.

* 'for-next' of git://neil.brown.name/md:
  DM RAID: Add support for MD RAID10
  md/RAID1: Add missing case for attempting to repair known bad blocks.
  md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
  md/raid1: don't abort a resync on the first badblock.
  md: remove duplicated test on ->openers when calling do_md_stop()
  raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
  md/raid1: prevent merging too large request
  md/raid1: read balance chooses idlest disk for SSD
  md/raid1: make sequential read detection per disk based
  MD RAID10: Export md_raid10_congested
  MD: Move macros from raid1*.h to raid1*.c
  MD RAID1: rename mirror_info structure
  MD RAID10: rename mirror_info structure
  MD RAID10: Fix compiler warning.
  raid5: add a per-stripe lock
  raid5: remove unnecessary bitmap write optimization
  raid5: lockless access raid5 overrided bi_phys_segments
  raid5: reduce chance release_stripe() taking device_lock
2012-08-01 09:02:01 -07:00
Jonathan Brassow
63f33b8dda DM RAID: Add support for MD RAID10
Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-08-01 20:41:20 +10:00
Alasdair G Kergon
1f4e0ff079 dm thin: commit before gathering status
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:16 +01:00
Jonathan E Brassow
c039c332f2 dm raid: move sectors_per_dev calculation
In preparation for RAID10 inclusion in dm-raid, we move the sectors_per_dev
calculation later in the device creation process.  This is because we won't
know up-front how many stripes vs how many mirrors there are which will
change the calculation.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:04 +01:00
Jonathan E Brassow
f999e8fe70 dm raid: restructure parse_raid_params
In preparation for RAID10 addition to dm-raid, we change an 'if' conditional
to a 'switch' conditional to make it easier to see what is being checked for
each RAID type.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:04 +01:00
Mike Snitzer
542f903814 dm: support non power of two target max_io_len
Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.

Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:00 +01:00
Jonathan Brassow
c32fb9e7ec DM RAID: Use md_error() in place of simply setting Faulty bit
When encountering an error while reading the superblock, call md_error.

We are currently setting the 'Faulty' bit on one of the array devices when an
error is encountered while reading the superblock of a dm-raid array.  We should
be calling md_error(), as it handles the error more completely.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-22 13:55:31 +10:00