linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
NeilBrown	f001a70cdc	md/raid5: use conf->raid_disks in preference to mddev->raid_disk mddev->raid_disks can be changed and any time by a request from user-space. It is a suggestion as to what number of raid_disks is desired. conf->raid_disks can only be changed by the raid5 module with suitable locks in place. It is a statement as to the current number of raid_disks. There are two places where the latter should be used, but the former is used. This can lead to a crash when reshaping an array. This patch changes to mddev-> to conf-> Signed-off-by: NeilBrown <neilb@suse.de>	2009-06-09 14:30:31 +10:00
Dan Williams	a08abd8ca8	async_tx: structify submission arguments, add scribble Prepare the api for the arrival of a new parameter, 'scribble'. This will allow callers to identify scratchpad memory for dma address or page address conversions. As this adds yet another parameter, take this opportunity to convert the common submission parameters (flags, dependency, callback, and callback argument) into an object that is passed by reference. Also, take this opportunity to fix up the kerneldoc and add notes about the relevant ASYNC_TX_* flags for each routine. [ Impact: moves api pass-by-value parameters to a pass-by-reference struct ] Signed-off-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-06-03 14:07:35 -07:00
Dan Williams	88ba2aa586	async_tx: kill ASYNC_TX_DEP_ACK flag In support of inter-channel chaining async_tx utilizes an ack flag to gate whether a dependent operation can be chained to another. While the flag is not set the chain can be considered open for appending. Setting the ack flag closes the chain and flags the descriptor for garbage collection. The ASYNC_TX_DEP_ACK flag essentially means "close the chain after adding this dependency". Since each operation can only have one child the api now implicitly sets the ack flag at dependency submission time. This removes an unnecessary management burden from clients of the api. [ Impact: clean up and enforce one dependency per operation ] Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-06-03 14:07:34 -07:00
NeilBrown	ed37d83e6a	md: raid5: change incorrect usage of 'min' macro to 'min_t' A recent patch to raid5.c use min on an int and a sector_t. This isn't allowed. So change it to min_t(sector_t,x,y). Signed-off-by: NeilBrown <neilb@suse.de>	2009-05-27 21:39:05 +10:00
NeilBrown	848b318236	md: raid5: avoid sector values going negative when testing reshape progress. As sector_t in unsigned, we cannot afford to let 'safepos' etc go negative. So replace a -= b; by a -= min(b,a); Signed-off-by: NeilBrown <neilb@suse.de>	2009-05-26 12:41:08 +10:00
Martin K. Petersen	ae03bf639a	block: Use accessor functions for queue limits Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
NeilBrown	c03f6a1969	md: update sync_completed and reshape_position even more often. There are circumstances when a user-space process might need to "oversee" a resync/reshape process. For example when doing an in-place reshape of a raid5, it is prudent to take a backup of each section before reshaping it as this is the only way to provide safety against an unplanned shutdown (i.e. crash/power failure). The sync_max sysfs value can be used to stop the resync from advancing beyond a particular point. So user-space can: suspend IO to the first section and back it up set 'sync_max' to the end of the section wait for 'sync_completed' to reach that point resume IO on the first section and move on to the next section. However this process requires the kernel and user-space to run in lock-step which could introduce unnecessary delays. It would be better if a 'double buffered' approach could be used with userspace and kernel space working on different sections with the 'next' section always ready when the 'current' section is finished. One problem with implementing this is that sync_completed is only guaranteed to be updated when the sync process reaches sync_max. (it is updated on a time basis at other times, but it is hard to rely on that). This defeats some of the double buffering. With this patch, sync_completed (and reshape_position) get updated as the current position approaches sync_max, so there is room for userspace to advance sync_max early without losing updates. To be precise, sync_completed is updated when the current sync position reaches half way between the current value of sync_completed and the value of sync_max. This will usually be a good time for user space to update sync_max. If sync_max does not get updated, the updates to sync_completed (together with associated metadata updates) will occur at an exponentially increasing frequency which will get unreasonably fast (one update every page) immediately before the process hits sync_max and stops. So the update rate will be unreasonably fast only for an insignificant period of time. Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-17 11:06:30 +10:00
NeilBrown	acb180b0e3	md: improve usefulness and accuracy of sysfs file md/sync_completed. The sync_completed file reports how much of a resync (or recovery or reshape) has been completed. However due to the possibility of out-of-order completion of writes, it is not certain to be accurate. We have an internal value - mddev->curr_resync_completed - which is an accurate value (though it might not always be quite so uptodate). So: - make curr_resync_completed be uptodate a little more often, particularly when raid5 reshape updates status in the metadata - report curr_resync_completed in the sysfs file - allow poll/select to report all updates to md/sync_completed. This makes sync_completed completed usable by any external metadata handler that wants to record this status information in its metadata. Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-14 16:28:34 +10:00
Dan Williams	099f53cb50	async_tx: rename zero_sum to val 'zero_sum' does not properly describe the operation of generating parity and checking that it validates against an existing buffer. Change the name of the operation to 'val' (for 'validate'). This is in anticipation of the p+q case where it is a requirement to identify the target parity buffers separately from the source buffers, because the target parity buffers will not have corresponding pq coefficients. Reviewed-by: Andre Noll <maan@systemlinux.org> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-04-08 14:28:37 -07:00
NeilBrown	c8f517c444	md/raid5 revise rules for when to update metadata during reshape We currently update the metadata : 1/ every 3Megabytes 2/ When the place we will write new-layout data to is recorded in the metadata as still containing old-layout data. Rule one exists to avoid having to re-do too much reshaping in the face of a crash/restart. So it should really be time based rather than size based. So change it to "every 10 seconds". Rule two turns out to be too harsh when restriping an array 'in-place', as in that case the metadata much be updates for every stripe. For the in-place update, it can only possibly be safe from a crash if some user-space program data a backup of every e.g. few hundred stripes before allowing them to be reshaped. In that case, the constant metadata update is pointless. So only update the metadata if the new metadata will report that the end of the 'old-layout' data is beyond where we are currently writing 'new-layout' data. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:28:40 +11:00
NeilBrown	b0f9ec047b	md/raid5: minor code cleanups in make_request. ... and to be certain the that make_request doesn't wait forever, add a 'wake_up' when ->reshape_progress has been set to MaxSector Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:27:18 +11:00
NeilBrown	2cffc4a01d	md: remove CONFIG_MD_RAID_RESHAPE config option. This was only needed when the code was experimental. Most of it is well tested now, so the option is no longer useful. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:27:05 +11:00
NeilBrown	ab69ae12ce	md/raid5: be more careful about write ordering when reshaping. When we are reshaping an array, it is very important that we read the data from a particular sector offset before writing new data at that offset. In most cases when growing or shrinking an array we read long before we even consider writing. But when restriping an array without changing it size, there is a small possibility that we might have some data to available write before the read has happened at the same location. This would require some stripes to be in cache already. To guard against this small possibility, we check, before writing, that the 'old' stripe at the same location is not in the process of being read. And we ensure that we mark all 'source' stripes as such before allowing new 'destination' stripes to proceed. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:26:47 +11:00
NeilBrown	88ce4930e2	md/raid5: allow layout and chunksize to be changed on active array. If an array has 3 or more devices, we allow the chunksize or layout to be changed and when a reshape starts, we use these as the 'new' values. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:24:23 +11:00
NeilBrown	7a66138107	md/raid5: reshape using largest of old and new chunk size This ensures that even when old and new stripes are overlapping, we will try to read all of the old before having to write any of the new. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:21:40 +11:00
NeilBrown	e183eaedd5	md/raid5: prepare for allowing reshape to change layout Add prev_algo to raid5_conf_t along the same lines as prev_chunk and previous_raid_disks. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:20:22 +11:00
NeilBrown	784052ecc6	md/raid5: prepare for allowing reshape to change chunksize. Add "prev_chunk" to raid5_conf_t, similar to "previous_raid_disks", to remember what the chunk size was before the reshape that is currently underway. This seems like duplication with "chunk_size" and "new_chunk" in mddev_t, and to some extent it is, but there are differences. The values in mddev_t are always defined and often the same. The prev* values are only defined if a reshape is underway. Also (and more significantly) the raid5_conf_t values will be changed at the same time (inside an appropriate lock) that the reshape is started by setting reshape_position. In contrast, the new_chunk value is set when the sysfs file is written which could be well before the reshape starts. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:19:07 +11:00
NeilBrown	86b42c713b	md/raid5: clearly differentiate 'before' and 'after' stripes during reshape. During a raid5 reshape, we have some stripes in the cache that are 'before' the reshape (and are still to be processed) and some that are 'after'. They are currently differentiated by having different ->disks values as the only reshape current supported involves changing the number of disks. However we will soon support reshapes that do not change the number of disks (chunk parity or chunk size). So make the difference more explicit with a 'generation' number. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:19:03 +11:00
NeilBrown	ec32a2bd35	md: allow number of drives in raid5 to be reduced When reshaping a raid5 to have fewer devices, we work from the end of the array to the beginning. md_do_sync gives addresses to sync_request that go from the beginning to the end. So largely ignore them use the internal state variable "reshape_progress" to keep track of what to do next. Never allow the size to be reduced below the minimum (4 for raid6, 3 otherwise). We require that the size of the array has already been reduced before the array is reshaped to a smaller size. This is because simply reducing the size is an easily reversible operation, while the reshape is immediately destructive and so is not reversible for the blocks at the ends of the devices. Thus to reshape an array to have fewer devices, you must first write an appropriately small size to md/array_size. When reshape finished, we remove any drives that are no longer needed and fix up ->degraded. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:17:38 +11:00
NeilBrown	fef9c61fdf	md/raid5: change reshape-progress measurement to cope with reshaping backwards. When reducing the number of devices in a raid4/5/6, the reshape process has to start at the end of the array and work down to the beginning. So we need to handle expand_progress and expand_lo differently. This patch renames "expand_progress" and "expand_lo" to avoid the implication that anything is getting bigger (expand->reshape) and every place they are used, we make sure that they are used the right way depending on whether delta_disks is positive or negative. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:16:46 +11:00
NeilBrown	cea9c22800	md: add explicit method to signal the end of a reshape. Currently raid5 (the only module that supports restriping) notices that the reshape has finished be sync_request being given a large value, and handles any cleanup them. This patch changes it so md_check_recovery calls into an explicit finish_reshape method as well. The clean-up from sync_request can do things that need to be done promptly, typically things local to the raid5_conf_t structure. The "finish_reshape" method is called under the mddev_lock so it can do things involving reconfiguring the device. This allows us to get rid of md_set_array_sectors_locked, which would have caused a deadlock if you tried to stop and array while a reshape was happening. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:15:05 +11:00
NeilBrown	7ec0547838	md/raid5: enhance raid5_size to work correctly with negative delta_disks This is the first of four patches which combine to allow md/raid5 to reduce the number of devices in the array by restriping the data over a subset of the devices. If the number of disks in a raid4/5/6 is being reduced, then the default size must be based on the new number, not the old number of devices. In general, it should be based on the smaller of new and old. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:10:36 +11:00
NeilBrown	34e04e87fb	md/raid5: drop qd_idx from r6_state We now have this value in stripe_head so we don't need to duplicate it. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:10:16 +11:00
Dan Williams	f701d589aa	md/raid6: move raid6 data processing to raid6_pq.ko Move the raid6 data processing routines into a standalone module (raid6_pq) to prepare them to be called from async_tx wrappers and other non-md drivers/modules. This precludes a circular dependency of raid456 needing the async modules for data processing while those modules in turn depend on raid456 for the base level synchronous raid6 routines. To support this move: 1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h 2/ The raid6_call, recovery calls, and table symbols are exported 3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to compile Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:09:39 +11:00
Andre Noll	18b0033491	md: raid5 run(): Fix max_degraded for raid level 4. raid4 allows only one failed disk. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 15:00:56 +11:00

1 2 3 4 5 ...

264 Commits