Commit Graph

2156 Commits

Author SHA1 Message Date
NeilBrown 961902c0f8 md/bitmap: It is OK to clear bits during recovery.
commit d0a4bb4927 introduced a
regression which is annoying but fairly harmless.

When writing to an array that is undergoing recovery (a spare
in being integrated into the array), writing to the array will
set bits in the bitmap, but they will not be cleared when the
write completes.

For bits covering areas that have not been recovered yet this is not a
problem as the recovery will clear the bits.  However bits set in
already-recovered region will stay set and never be cleared.
This doesn't risk data integrity.  The only negatives are:
 - next time there is a crash, more resyncing than necessary will
   be done.
 - the bitmap doesn't look clean, which is confusing.

While an array is recovering we don't want to update the
'events_cleared' setting in the bitmap but we do still want to clear
bits that have very recently been set - providing they were written to
the recovering device.

So split those two needs - which previously both depended on 'success'
and always clear the bit of the write went to all devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:48 +11:00
NeilBrown 60fc13702a md: don't give up looking for spares on first failure-to-add
Before performing a recovery we try to remove any spares that
might not be working, then add any that might have become relevant.

Currently we abort on the first spare that cannot be added.
This is a false optimisation.
It is conceivable that - depending on rules in the personality - a
subsequent spare might be accepted.
Also the loop does other things like count the available spares and
reset the 'recovery_offset' value.

If we abort early these might not happen properly.

So remove the early abort.

In particular if you have an array what is undergoing recovery and
which has extra spares, then the recovery may not restart after as
reboot as the could of 'spares' might end up as zero.

Reported-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:19 +11:00
NeilBrown 30d7a48368 md/raid5: ensure correct assessment of drives during degraded reshape.
While reshaping a degraded array (as when reshaping a RAID0 by first
converting it to a degraded RAID4) we currently get confused about
which devices are in_sync.  In most cases we get it right, but in the
region that is being reshaped we need to treat non-failed devices as
in-sync when we have the data but haven't actually written it out yet.

Reported-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:57:00 +11:00
NeilBrown 09cd9270ea md/linear: fix hot-add of devices to linear arrays.
commit d70ed2e4fa
broke hot-add to a linear array.
After that commit, metadata if not written to devices until they
have been fully integrated into the array as determined by
saved_raid_disk.  That patch arranged to clear that field after
a recovery completed.

However for linear arrays, there is no recovery - the integration is
instantaneous.  So we need to explicitly clear the saved_raid_disk
field.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23 09:56:55 +11:00
Adam Kwolek 5d8c71f9e5 md: raid5 crash during degradation
NULL pointer access causes crash in raid5 module.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-09 14:26:11 +11:00
NeilBrown 9283d8c5af md/raid5: never wait for bad-block acks on failed device.
Once a device is failed we really want to completely ignore it.
It should go away soon anyway.

In particular the presence of bad blocks on it should not cause us to
block as we won't be trying to write there anyway.

So as soon as we can check if a device is Faulty, do so and pretend
that it is already gone if it is Faulty.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08 16:27:57 +11:00
NeilBrown 8bd2f0a05b md: ensure new badblocks are handled promptly.
When we mark blocks as bad we need them to be acknowledged by the
metadata handler promptly.

For an in-kernel metadata handler that was already being done.  But
for an external metadata handler we need to alert it of the change by
sending a notification through the sysfs file.  This adds that
notification.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08 16:26:08 +11:00
NeilBrown 52c64152a9 md: bad blocks shouldn't cause a Blocked status on a Faulty device.
Once a device is marked Faulty the badblocks - whether acknowledged or
not - become irrelevant.  So they shouldn't cause the device to be
marked as Blocked.

Without this patch, a process might write "-blocked" to clear the
Blocked status, but while that will correctly fail the device, it
won't remove the apparent 'blocked' status.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08 16:22:48 +11:00
NeilBrown af8a24347f md: take a reference to mddev during sysfs access.
When we are accessing an mddev via sysfs we know that the
mddev cannot disappear because it has an embedded kobj which
is refcounted by sysfs.
And we also take the mddev_lock.
However this is not enough.

The final mddev_put could have been called and the
mddev_delayed_delete is waiting for sysfs to let go so it can destroy
the kobj and mddev.
In this state there are a lot of changes that should not be attempted.

To to guard against this we:
 - initialise mddev->all_mddevs in on last put so the state can be
   easily detected.
 - in md_attr_show and md_attr_store, check ->all_mddevs under
   all_mddevs_lock and mddev_get the mddev if it still appears to
   be active.

This means that if we get to sysfs as the mddev is being deleted we
will get -EBUSY.

rdev_attr_store and rdev_attr_show are similar but already have
sufficient protection.  They check that rdev->mddev still points to
mddev after taking mddev_lock.  As this is cleared  before delayed
removal which can only be requested under the mddev_lock, this
ensure the rdev and mddev are still alive.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08 15:49:46 +11:00
NeilBrown 1d23f178d5 md: refine interpretation of "hold_active == UNTIL_IOCTL".
We like md devices to disappear when they really are not needed.
However it is not possible to tell from the current state whether it
is needed or not.  We can only tell from recent history of changes.

In particular immediately after we create an md device it looks very
similar to immediately after we have finished with it.

So we always preserve a newly created md device until something
significant happens.  This state is stored in 'hold_active'.

The normal case is to keep it until an ioctl happens, as that will
normally either activate it, or explicitly de-activate it.  If it
doesn't then it was probably created by mistake and it is now time to
get rid of it.

We can also modify an array via sysfs (instead of via ioctl) and we
currently treat any change via sysfs like an ioctl as a sign that if
it now isn't more active, it should be destroyed.
However this is not appropriate as changes made via sysfs are more
gradual so we should look for a more definitive change.

So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when
the array_state is changed via sysfs.  Other changes via sysfs
are ignored.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-08 15:49:12 +11:00
NeilBrown 7c8f424798 md/lock: ensure updates to page_attrs are properly locked.
Page attributes are set using __set_bit rather than set_bit as
it normally called under a spinlock so the extra atomicity is not
needed.

However there are two places where we might set or clear page
attributes without holding the spinlock.
So add the spinlock in those cases.

This might be the cause of occasional reports that bits a aren't
getting clear properly - theory is that BITMAP_PAGE_PENDING gets lost
when BITMAP_PAGE_NEEDWRITE is set or cleared.  This is an
inconvenience, not a threat to data safety.

Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-23 10:18:52 +11:00
Dan Williams 257a4b42af md/raid5: STRIPE_ACTIVE has lock semantics, add barriers
All updates that occur under STRIPE_ACTIVE should be globally visible
when STRIPE_ACTIVE clears.  test_and_set_bit() implies a barrier, but
clear_bit() does not.

This is suitable for 3.1-stable.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2011-11-08 16:22:06 +11:00
NeilBrown 9a3f530f39 md/raid5: abort any pending parity operations when array fails.
When the number of failed devices exceeds the allowed number
we must abort any active parity operations (checks or updates) as they
are no longer meaningful, and can lead to a BUG_ON in
handle_parity_checks6.

This bug was introduce by commit 6c0069c0ae
in 2.6.29.

Reported-by: Manish Katiyar <mkatiyar@gmail.com>
Tested-by: Manish Katiyar <mkatiyar@gmail.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org
2011-11-08 16:22:01 +11:00
Stephen Rothwell a84450604d device-mapper: using EXPORT_SYBOL in dm-space-map-checker.c needs export.h
Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 10:29:10 -08:00
Stephen Rothwell 6f66263f8e device-mapper: dm-bufio.c needs to include module.h
since it uses the module facilities.

Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 10:29:10 -08:00
Paul Gortmaker 1944ce60fe drivers/md: change module.h -> export.h in persistent-data/dm-*
For the files which are not themselves modular, we can change
them to include only the smaller export.h since all they are
doing is looking for EXPORT_SYMBOL.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-07 10:29:09 -08:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds b4fdcb02f1 Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-block
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
  block: don't call blk_drain_queue() if elevator is not up
  blk-throttle: use queue_is_locked() instead of lockdep_is_held()
  blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
  blk-throttle: Free up policy node associated with deleted rule
  block: warn if tag is greater than real_max_depth.
  block: make gendisk hold a reference to its queue
  blk-flush: move the queue kick into
  blk-flush: fix invalid BUG_ON in blk_insert_flush
  block: Remove the control of complete cpu from bio.
  block: fix a typo in the blk-cgroup.h file
  block: initialize the bounce pool if high memory may be added later
  block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
  block: drop @tsk from attempt_plug_merge() and explain sync rules
  block: make get_request[_wait]() fail if queue is dead
  block: reorganize throtl_get_tg() and blk_throtl_bio()
  block: reorganize queue draining
  block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
  block: pass around REQ_* flags instead of broken down booleans during request alloc/free
  block: move blk_throtl prototypes to block/blk.h
  block: fix genhd refcounting in blkio_policy_parse_and_set()
  ...

Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
 - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
 - drivers/staging/zram/zram_drv.c
2011-11-04 17:06:58 -07:00
Linus Torvalds 43672a0784 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm:
  dm: raid fix device status indicator when array initializing
  dm log userspace: add log device dependency
  dm log userspace: fix comment hyphens
  dm: add thin provisioning target
  dm: add persistent data library
  dm: add bufio
  dm: export dm get md
  dm table: add immutable feature
  dm table: add always writeable feature
  dm table: add singleton feature
  dm kcopyd: add dm_kcopyd_zero to zero an area
  dm: remove superfluous smp_mb
  dm: use local printk ratelimit
  dm table: propagate non rotational flag
2011-11-02 17:02:37 -07:00
Paul Gortmaker daaa5f7cbe md: Add in export.h for files using EXPORT_SYMBOL
These files were getting the defines for EXPORT_SYMBOL because
device.h was including module.h.  But we are going to put an
end to that.  So add the proper export.h include now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:19 -04:00
Paul Gortmaker 056075c764 md: Add module.h to all files using it implicitly
A pending cleanup will mean that module.h won't be implicitly
everywhere anymore.  Make sure the modular drivers in md dir
are actually calling out for <module.h> explicitly in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:18 -04:00
Linus Torvalds 571109f536 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md/raid10:  Fix bug when activating a hot-spare.
2011-10-31 15:21:29 -07:00
Jonathan E Brassow 2e727c3ca1 dm: raid fix device status indicator when array initializing
When devices in a RAID array are not in-sync, they are supposed to be
reported as such in the status output as an 'a' character, which means
"alive, but not in-sync".  But when the entire array is rebuilt 'A' is
being used, which is incorrect.  This patch corrects this to 'a'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31 20:21:26 +00:00
Jonathan E Brassow 5a25f0eb70 dm log userspace: add log device dependency
Allow userspace dm log implementations to register their log device so it
is no longer missing from the list of device dependencies.

When device mapper targets use a device they normally call dm_get_device
which includes it in the device list returned to userspace applications
such as LVM through the DM_TABLE_DEPS ioctl.  Userspace log devices
don't use dm_get_device as userspace opens them so they are missing from
the list of dependencies.

This patch extends the DM_ULOG_CTR operation to allow userspace to
respond with the name of the log device (if appropriate) to be
registered via 'dm_get_device'.  DM_ULOG_REQUEST_VERSION is incremented.

This is backwards compatible.  If the kernel and userspace log server
have both been updated, the new information will be passed down to the
kernel and the device will be registered.  If the kernel is new, but
the log server is old, the log server will not pass down any device
information and the kernel will simply bypass the device registration
as before.  If the kernel is old but the log server is new, the log
server will see the old version number and not pass the device info.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31 20:21:24 +00:00
Jonathan Brassow b89544575d dm log userspace: fix comment hyphens
Fix comments: clustered-disk needs a hyphen not an underscore.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31 20:21:22 +00:00