Commit Graph

237 Commits

Author SHA1 Message Date
NeilBrown 1a67dde0ab md/raid5: Properly remove excess drives after shrinking a raid5/6
We were removing the drives, from the array, but not
removing symlinks from /sys/.... and not marking the device
as having been removed.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-13 10:41:49 +10:00
NeilBrown a639755cf8 md/raid5: make sure a reshape restarts at the correct address.
This "if" don't allow for the possibility that the number of devices
doesn't change, and so sector_nr isn't set correctly in that case.
So change '>' to '>='.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-13 10:13:00 +10:00
NeilBrown 67ac6011db md/raid5: allow new reshape modes to be restarted in the middle.
md/raid5 doesn't allow a reshape to restart if it involves writing
over the same part of disk that it would be reading from.
This happens at the beginning of a reshape that increases the number
of devices, at the end of a reshape that decreases the number of
devices, and continuously for a reshape that does not change the
number of devices.

The current code is correct for the "increase number of devices"
case as the critical section at the start is handled by userspace
performing a backup.

It does not work for reducing the number of devices, or the
no-change case.
For 'reducing', we need to invert the test.  For no-change we cannot
really be sure things will be safe, so simply require the array
to be read-only, which is how the user-space code which carefully
starts such arrays works.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-13 10:06:24 +10:00
NeilBrown 449aad3e25 md: Use revalidate_disk to effect changes in size of device.
As revalidate_disk calls check_disk_size_change, it will cause
any capacity change of a gendisk to be propagated to the blockdev
inode.  So use that instead of mucking about with locks and
i_size_write.

Also add a call to revalidate_disk in do_md_run and a few other places
where the gendisk capacity is changed.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:58 +10:00
NeilBrown 64bd660b51 md: allow raid5_quiesce to work properly when reshape is happening.
The ->quiesce method is not supposed to stop resync/recovery/reshape,
just normal IO.
But in raid5 we don't have a way to know which stripes are being
used for normal IO and which for resync etc, so we need to wait for
all stripes to be idle to be sure that all writes have completed.

However reshape keeps at least some stripe busy for an extended period
of time, so a call to raid5_quiesce can block for several seconds
needlessly.
So arrange for reshape etc to pause briefly while raid5_quiesce is
trying to quiesce the array so that the active_stripes count can
drop to zero.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:58 +10:00
NeilBrown e516402c0d md/raid5: set reshape_position correctly when reshape starts.
As the internal reshape_progress counter is the main driver
for reshape, the fact that reshape_position sometimes starts with the
wrong value has minimal effect.  It is visible in sysfs and that
is all.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-03 10:59:57 +10:00
Dan Williams 95fc17aac4 md/raid6: release spare page at ->stop()
Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().

Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-31 12:39:15 +10:00
NeilBrown e62e58a5ff md: use interruptible wait when duration is controlled by userspace.
User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-01 13:15:35 +10:00
NeilBrown a5c308d4d1 md/raid5: suspend shouldn't affect read requests.
md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-01 13:15:35 +10:00
Martin K. Petersen 8f6c2e4b32 md: Use new topology calls to indicate alignment and I/O sizes
Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-07-01 11:13:45 +10:00
NeilBrown 48606a9f2f md/raid5: correctly update sync_completed when we reach max_resync
At the end of reshape_request we update cyrr_resync_completed
if we are about to pause due to reaching resync_max.
However we update it to the wrong value.  We need to add the
"reshape_sectors" that have just been reshaped.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 09:14:12 +10:00
Dan Williams 7a3ab90894 md/raid5: add missing call to schedule() after prepare_to_wait()
In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
   intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
   conflict.

Cc: <stable@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:50:18 +10:00
Andre Noll 8c6ac868b1 md: Push down reconstruction log message to personality code.
Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).

Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:48:06 +10:00
NeilBrown 50ac168a6e md: merge reconfig and check_reshape methods.
The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.

So make them just one method.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:47:55 +10:00
NeilBrown 597a711b69 md: remove unnecessary arguments from ->reconfig method.
Passing the new layout and chunksize as args is not necessary as
the mddev has fields for new_check and new_layout.

This is preparation for combining the check_reshape and reconfig
methods

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:47:42 +10:00
NeilBrown 01ee22b496 md: raid5: check stripe cache is large enough in start_reshape
In reshape cases that do not change the number of devices,
start_reshape is called without first calling check_reshape.

Currently, the check that the stripe_cache is large enough is
only done in check_reshape.  It should be in start_reshape too.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:47:20 +10:00
Andre Noll cdc2ae6d6a md: fix some comments.
1/ Raid5 has learned to take over also raid4 and raid6 arrays.
2/ new_chunk in mdp_superblock_1 is in sectors, not bytes.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:46:47 +10:00
Andre Noll 0ba459d262 md/raid5: Use is_power_of_2() in raid5_reconfig()/raid6_reconfig().
Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:46:10 +10:00
Andre Noll 09c9e5fa1b md: convert conf->chunk_size and conf->prev_chunk to sectors.
This kills some more shifts.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:45:55 +10:00
Andre Noll 664e7c413f md: Convert mddev->new_chunk to sectors.
A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:45:27 +10:00
Andre Noll 9d8f036362 md: Make mddev->chunk_size sector-based.
This patch renames the chunk_size field to chunk_sectors with the
implied change of semantics.  Since

	is_power_of_2(chunk_size) = is_power_of_2(chunk_sectors << 9)
				  = is_power_of_2(chunk_sectors)

these bits don't need an adjustment for the shift.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-18 08:45:01 +10:00
raz ben yehuda 740da44918 md: raid5: chunk size check in setup_conf
have raid5 check chunk size in run/reshape method instead of in md

Signed-off-by: raziebe@gmail.com
Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 17:01:36 +10:00
NeilBrown 070ec55d07 md: remove mddev_to_conf "helper" macro
Having a macro just to cast a void* isn't really helpful.
I would must rather see that we are simply de-referencing ->private,
than have to know what the macro does.

So open code the macro everywhere and remove the pointless cast.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-16 16:54:21 +10:00
Linus Torvalds c9059598ea Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
  block: add request clone interface (v2)
  floppy: fix hibernation
  ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
  fs/bio.c: add missing __user annotation
  block: prevent possible io_context->refcount overflow
  Add serial number support for virtio_blk, V4a
  block: Add missing bounce_pfn stacking and fix comments
  Revert "block: Fix bounce limit setting in DM"
  cciss: decode unit attention in SCSI error handling code
  cciss: Remove no longer needed sendcmd reject processing code
  cciss: change SCSI error handling routines to work with interrupts enabled.
  cciss: separate error processing and command retrying code in sendcmd_withirq_core()
  cciss: factor out fix target status processing code from sendcmd functions
  cciss: simplify interface of sendcmd() and sendcmd_withirq()
  cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
  cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
  block: needs to set the residual length of a bidi request
  Revert "block: implement blkdev_readpages"
  block: Fix bounce limit setting in DM
  Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
  ...

Manually fix conflicts with tracing updates in:
	block/blk-sysfs.c
	drivers/ide/ide-atapi.c
	drivers/ide/ide-cd.c
	drivers/ide/ide-floppy.c
	drivers/ide/ide-tape.c
	include/trace/events/block.h
	kernel/trace/blktrace.c
2009-06-11 11:10:35 -07:00
NeilBrown 0e6e0271a2 md/raid5: fix bug in reshape code when chunk_size decreases.
Now that we support changing the chunksize, we calculate
"reshape_sectors" to be the max of number of sectors in old
and new chunk size.
However there is one please where we still use 'chunksize'
rather than 'reshape_sectors'.
This causes a reshape that reduces the size of chunks to freeze.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-09 16:32:22 +10:00