Commit Graph

264 Commits

Author SHA1 Message Date
NeilBrown 4b3df5668c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx into for-linus 2009-09-23 18:31:11 +10:00
NeilBrown 3fa841d7e7 md: report device as congested when suspended
This should writeback from coming when the device is temporarily
suspended.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23 18:10:29 +10:00
NeilBrown 0da3c6194e md: Improve name of threads created by md_register_thread
The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc <zhenchengjin@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23 18:09:45 +10:00
NeilBrown a9f326ebf2 md: remove sparse waring "symbol xxx shadows an earlier one"
Rename some variable and remove some duplicate definitions
to avoid there warnings.  None of them are actual errors.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-09-23 18:06:41 +10:00
Dan Williams 6c910a78e4 md/raid6: cleanup ops_run_compute6_2
Neil says:
	"It is correct as it stands, but the fact that every branch in
	 the 'if' part ends with a 'return' isn't immediately obvious,
	 so it is clearer if we are explicit about the if / then / else
	 structure."

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-16 12:24:54 -07:00
Dan Williams 2d6e4ecc87 md/raid6: eliminate BUG_ON with side effect
As pointed out by Neil it should be possible to build a driver with all
BUG_ON statements deleted.  It's bad form to have a BUG_ON with a side
effect.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-16 12:11:54 -07:00
Jens Axboe 1f98a13f62 bio: first step in sanitizing the bio->bi_rw flag testing
Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 14:33:31 +02:00
Dan Williams 9134d02bc0 Merge commit 'md/for-linus' into async-tx-next
Conflicts:
	drivers/md/raid5.c
2009-09-08 17:55:54 -07:00
Dan Williams bbb20089a3 Merge branch 'dmaengine' into async-tx-next
Conflicts:
	crypto/async_tx/async_xor.c
	drivers/dma/ioat/dma_v2.h
	drivers/dma/ioat/pci.c
	drivers/md/raid5.c
2009-09-08 17:55:21 -07:00
Dan Williams 0403e38277 dmaengine: add fence support
Some engines optimize operation by reading ahead in the descriptor chain
such that descriptor2 may start execution before descriptor1 completes.
If descriptor2 depends on the result from descriptor1 then a fence is
required (on descriptor2) to disable this optimization.  The async_tx
api could implicitly identify dependencies via the 'depend_tx'
parameter, but that would constrain cases where the dependency chain
only specifies a completion order rather than a data dependency.  So,
provide an ASYNC_TX_FENCE to explicitly identify data dependencies.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-09-08 17:42:50 -07:00
Dan Williams f9dd213437 Merge branch 'md-raid6-accel' into ioat3.2
Conflicts:
	include/linux/dmaengine.h
2009-09-08 17:42:29 -07:00
Dan Williams 07a3b417dc md/raid456: distribute raid processing over multiple cores
Now that the resources to handle stripe_head operations are allocated
percpu it is possible for raid5d to distribute stripe handling over
multiple cores.  This conversion also adds a call to cond_resched() in
the non-multicore case to prevent one core from getting monopolized for
raid operations.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:13 -07:00
Yuri Tikhonov b774ef491b md/raid6: remove synchronous infrastructure
These routines have been replaced by there asynchronous counterparts.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:13 -07:00
Yuri Tikhonov 6c0069c0ae md/raid6: asynchronous handle_stripe6
1/ Use STRIPE_OP_BIOFILL to offload completion of read requests to
   raid_run_ops
2/ Implement a handler for sh->reconstruct_state similar to the raid5 case
   (adds handling of Q parity)
3/ Prevent handle_parity_checks6 from running concurrently with 'compute'
   operations
4/ Hook up raid_run_ops

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:13 -07:00
Dan Williams d82dfee0ad md/raid6: asynchronous handle_parity_check6
[ Based on an original patch by Yuri Tikhonov ]

Implement the state machine for handling the RAID-6 parities check and
repair functionality.  Note that the raid6 case does not need to check
for new failures, like raid5, as it will always writeback the correct
disks.  The raid5 case can be updated to check zero_sum_result to avoid
getting confused by new failures rather than retrying the entire check
operation.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:13 -07:00
Yuri Tikhonov a9b39a741a md/raid6: asynchronous handle_stripe_dirtying6
In the synchronous implementation of stripe dirtying we processed a
degraded stripe with one call to handle_stripe_dirtying6().  I.e.
compute the missing blocks from the other drives, then copy in the new
data and reconstruct the parities.

In the asynchronous case we do not perform stripe operations directly.
Instead, operations are scheduled with flags to be later serviced by
raid_run_ops.  So, for the degraded case the final reconstruction step
can only be carried out after all blocks have been brought up to date by
being read, or computed.  Like the raid5 case schedule_reconstruction()
sets STRIPE_OP_RECONSTRUCT to request a parity generation pass and
through operation chaining can handle compute and reconstruct in a
single raid_run_ops pass.

[dan.j.williams@intel.com: fixup handle_stripe_dirtying6 gating]
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:12 -07:00
Yuri Tikhonov 5599becca4 md/raid6: asynchronous handle_stripe_fill6
Modify handle_stripe_fill6 to work asynchronously by introducing
fetch_block6 as the raid6 analog of fetch_block5 (schedule compute
operations for missing/out-of-sync disks).

[dan.j.williams@intel.com: compute D+Q in one pass]
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:12 -07:00
Yuri Tikhonov c0f7bddbe6 md/raid5,6: common schedule_reconstruction for raid5/6
Extend schedule_reconstruction5 for reuse by the raid6 path.  Add
support for generating Q and BUG() if a request is made to perform
'prexor'.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:12 -07:00
Dan Williams ac6b53b6e6 md/raid6: asynchronous raid6 operations
[ Based on an original patch by Yuri Tikhonov ]

The raid_run_ops routine uses the asynchronous offload api and
the stripe_operations member of a stripe_head to carry out xor+pq+copy
operations asynchronously, outside the lock.

The operations performed by RAID-6 are the same as in the RAID-5 case
except for no support of STRIPE_OP_PREXOR operations. All the others
are supported:
STRIPE_OP_BIOFILL
 - copy data into request buffers to satisfy a read request
STRIPE_OP_COMPUTE_BLK
 - generate missing blocks (1 or 2) in the cache from the other blocks
STRIPE_OP_BIODRAIN
 - copy data out of request buffers to satisfy a write request
STRIPE_OP_RECONSTRUCT
 - recalculate parity for new data that has entered the cache
STRIPE_OP_CHECK
 - verify that the parity is correct

The flow is the same as in the RAID-5 case, and reuses some routines, namely:
1/ ops_complete_postxor (renamed to ops_complete_reconstruct)
2/ ops_complete_compute (updated to set up to 2 targets uptodate)
3/ ops_run_check (renamed to ops_run_check_p for xor parity checks)

[neilb@suse.de: fixes to get it to pass mdadm regression suite]
Reviewed-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:12 -07:00
Dan Williams 4e7d2c0aef md/raid5: factor out mark_uptodate from ops_complete_compute5
ops_complete_compute5 can be reused in the raid6 path if it is updated to
generically handle a second target.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:13:11 -07:00
Dan Williams ad283ea4a3 async_tx: add sum check flags
Replace the flat zero_sum_result with a collection of flags to contain
the P (xor) zero-sum result, and the soon to be utilized Q (raid6 reed
solomon syndrome) zero-sum result.  Use the SUM_CHECK_ namespace instead
of DMA_ since these flags will be used on non-dma-zero-sum enabled
platforms.

Reviewed-by: Andre Noll <maan@systemlinux.org>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:09:26 -07:00
Dan Williams d6f38f31f3 md/raid5,6: add percpu scribble region for buffer lists
Use percpu memory rather than stack for storing the buffer lists used in
parity calculations.  Include space for dma address conversions and pass
that to async_tx via the async_submit_ctl.scribble pointer.

[ Impact: move memory pressure from stack to heap ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:09:26 -07:00
Dan Williams 36d1c6476b md/raid6: move the spare page to a percpu allocation
In preparation for asynchronous handling of raid6 operations move the
spare page to a percpu allocation to allow multiple simultaneous
synchronous raid6 recovery operations.

Make this allocation cpu hotplug aware to maximize allocation
efficiency.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2009-08-29 19:09:26 -07:00
NeilBrown 1a67dde0ab md/raid5: Properly remove excess drives after shrinking a raid5/6
We were removing the drives, from the array, but not
removing symlinks from /sys/.... and not marking the device
as having been removed.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-13 10:41:49 +10:00
NeilBrown a639755cf8 md/raid5: make sure a reshape restarts at the correct address.
This "if" don't allow for the possibility that the number of devices
doesn't change, and so sector_nr isn't set correctly in that case.
So change '>' to '>='.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-08-13 10:13:00 +10:00