linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
NeilBrown	ffa23322b1	md/bitmap: update dirty flag when bitmap bits are explicitly set. There is a sysfs file which allows bits in the write-intent bitmap to be explicit set - indicating that the block is thought to be 'dirty'. When this happens we should really set recovery_cp backwards to include the block to reflect this dirtiness. In particular, a 'resync' process will refuse to start if recovery_cp is beyond the end of the array, so this is needed to allow a resync to be triggered. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	ece5cff0da	md: Support write-intent bitmaps with externally managed metadata. In this case, the metadata needs to not be in the same sector as the bitmap. md will not read/write any bitmap metadata. Config must be done via sysfs and when a recovery makes the array non-degraded again, writing 'true' to 'bitmap/can_clear' will allow bits in the bitmap to be cleared again. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	624ce4f565	md/bitmap: move setting of daemon_lastrun out of bitmap_read_sb Setting daemon_lastrun really has nothing to do with reading the bitmap superblock, it just happens to be needed at the same time. bitmap_read_sb is about to become options, so move that code out to after the call to bitmap_read_sb. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	43a705076e	md: support updating bitmap parameters via sysfs. A new attribute directory 'bitmap' in 'md' is created which contains files for configuring the bitmap. 'location' identifies where the bitmap is, either 'none', or 'file' or 'sector offset from metadata'. Writing 'location' can create or remove a bitmap. Adding a 'file' bitmap this way is not yet supported. 'chunksize' and 'time_base' must be set before 'location' can be set. 'chunksize' can be set before creating a bitmap, but is currently always over-ridden by the bitmap superblock. 'time_base' and 'backlog' can be updated at any time. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Andre Noll <maan@systemlinux.org>	2009-12-14 12:51:41 +11:00
NeilBrown	f6af949c56	md: support bitmap offset appropriate for external-metadata arrays. For md arrays were metadata is managed externally, the kernel does not know about a superblock so the superblock offset is 0. If we want to have a write-intent-bitmap near the end of the devices of such an array, we should support sector_t sized offset. We need offset be possibly negative for when the bitmap is before the metadata, so use loff_t instead. Also add sanity check that bitmap does not overlap with data. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	9cd30fdc33	md: remove needless setting of thread->timeout in raid10_quiesce As bitmap_create and bitmap_destroy already set thread->timeout as appropriate, there is no need to do it in raid10_quiesce. There is a possible need to wake the thread after the timeout has been set low, but it is better to do that where the timeout is actually set low, in bitmap_create. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	1b04be96f6	md: change daemon_sleep to be in 'jiffies' rather than 'seconds'. This removes a lot of multiplications by HZ. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	42a04b5078	md: move offset, daemon_sleep and chunksize out of bitmap structure ... and into bitmap_info. These are all configuration parameters that need to be set before the bitmap is created. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	c3d9714e88	md: collect bitmap-specific fields into one structure. In preparation for making bitmap fields configurable via sysfs, start tidying up by making a single structure to contain the configuration fields. Signed-off-by: NeilBrown <neilb@suse.de>	2009-12-14 12:51:41 +11:00
NeilBrown	aa5cbd1038	md/bitmap: protect against bitmap removal while being updated. A write intent bitmap can be removed from an array while the array is active. When this happens, all IO is suspended and flushed before the bitmap is removed. However it is possible that bitmap_daemon_work is still running to clear old bits from the bitmap. If it is, it can dereference the bitmap after it has been freed. So introduce a new mutex to protect bitmap_daemon_work and get it before destroying a bitmap. This is suitable for any current -stable kernel. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org	2009-12-14 12:49:46 +11:00
NeilBrown	ae8fa2831b	md: remove clumsy usage of do_sync_mapping_range from bitmap code and replace with vfs_fsync which is much neater (but wasn't exported, or even in existence at the time the code was written). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>	2009-10-16 15:56:01 +11:00
NeilBrown	ee305acef5	md: remove sparse warnings about lock context. There was a real error here on a failure path where we incorrectly call rcu_read_unlock. Signed-off-by: NeilBrown <neilb@suse.de>	2009-09-23 18:06:44 +10:00
Linus Torvalds	c9059598ea	Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits) block: add request clone interface (v2) floppy: fix hibernation ramdisk: remove long-deprecated "ramdisk=" boot-time parameter fs/bio.c: add missing __user annotation block: prevent possible io_context->refcount overflow Add serial number support for virtio_blk, V4a block: Add missing bounce_pfn stacking and fix comments Revert "block: Fix bounce limit setting in DM" cciss: decode unit attention in SCSI error handling code cciss: Remove no longer needed sendcmd reject processing code cciss: change SCSI error handling routines to work with interrupts enabled. cciss: separate error processing and command retrying code in sendcmd_withirq_core() cciss: factor out fix target status processing code from sendcmd functions cciss: simplify interface of sendcmd() and sendcmd_withirq() cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code cciss: Use schedule_timeout_uninterruptible in SCSI error handling code block: needs to set the residual length of a bidi request Revert "block: implement blkdev_readpages" block: Fix bounce limit setting in DM Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt ... Manually fix conflicts with tracing updates in: block/blk-sysfs.c drivers/ide/ide-atapi.c drivers/ide/ide-cd.c drivers/ide/ide-floppy.c drivers/ide/ide-tape.c include/trace/events/block.h kernel/trace/blktrace.c	2009-06-11 11:10:35 -07:00
NeilBrown	be51269103	md: bitmap: improve bitmap maintenance code. The code for checking which bits in the bitmap can be cleared has 2 problems: 1/ it repeatedly takes and drops a spinlock, where it would make more sense to just hold on to it most of the time. 2/ it doesn't make use of some opportunities to skip large sections of the bitmap This patch fixes those. It will only affect CPU consumption, not correctness. Signed-off-by: NeilBrown <neilb@suse.de>	2009-05-26 09:41:17 +10:00
Martin K. Petersen	e1defc4ff0	block: Do away with the notion of hardsect_size Until now we have had a 1:1 mapping between storage device physical block size and the logical block sized used when addressing the device. With SATA 4KB drives coming out that will no longer be the case. The sector size will be 4KB but the logical block size will remain 512-bytes. Hence we need to distinguish between the physical block size and the logical ditto. This patch renames hardsect_size to logical_block_size. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 23:22:54 +02:00
NeilBrown	db305e507d	md: fix some (more) errors with bitmaps on devices larger than 2TB. If a write intent bitmap covers more than 2TB, we sometimes work with values beyond 32bit, so these need to be sector_t. This patches add the required casts to some unsigned longs that are being shifted up. This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with member devices that are larger than 2TB. Signed-off-by: NeilBrown <neilb@suse.de> Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE> Cc: stable@kernel.org	2009-05-07 12:49:06 +10:00
NeilBrown	b74fd2826c	md: fix loading of out-of-date bitmap. When md is loading a bitmap which it knows is out of date, it fills each page with 1s and writes it back out again. However the write_page call makes used of bitmap->file_pages and bitmap->last_page_size which haven't been set correctly yet. So this can sometimes fail. Move the setting of file_pages and last_page_size to before the call to write_page. This bug can cause the assembly on an array to fail, thus making the data inaccessible. Hence I think it is a suitable candidate for -stable. Cc: stable@kernel.org Reported-by: Vojtech Pavlik <vojtech@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de>	2009-05-07 12:47:19 +10:00
NeilBrown	1f59390339	md: support bitmaps on RAID10 arrays larger then 2 terabytes .. and other arrays with components larger than 2 terabytes. We use a "long" rather than a "sector_t" in part of the bitmap size calculations, which is sad. Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE> Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-20 11:50:24 +10:00
NeilBrown	acb180b0e3	md: improve usefulness and accuracy of sysfs file md/sync_completed. The sync_completed file reports how much of a resync (or recovery or reshape) has been completed. However due to the possibility of out-of-order completion of writes, it is not certain to be accurate. We have an internal value - mddev->curr_resync_completed - which is an accurate value (though it might not always be quite so uptodate). So: - make curr_resync_completed be uptodate a little more often, particularly when raid5 reshape updates status in the metadata - report curr_resync_completed in the sysfs file - allow poll/select to report all updates to md/sync_completed. This makes sync_completed completed usable by any external metadata handler that wants to record this status information in its metadata. Signed-off-by: NeilBrown <neilb@suse.de>	2009-04-14 16:28:34 +10:00
Andre Noll	58c0fed400	md: Make mddev->size sector-based. This patch renames the "size" field of struct mddev_s to "dev_sectors" and stores the number of 512-byte sectors instead of the number of 1K-blocks in it. All users of that field, including raid levels 1,4-6,10, are adjusted accordingly. This simplifies the code a bit because it allows to get rid of a couple of divisions/multiplications by two. In order to make checkpatch happy, some minor coding style issues have also been addressed. In particular, size_store() now uses strict_strtoull() instead of simple_strtoull(). Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:33:13 +11:00
NeilBrown	97e4f42d62	md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash Version 1.x metadata has the ability to record the status of a partially completed drive recovery. However we only update that record on a clean shutdown. It would be nice to update it on unclean shutdowns too, particularly when using a bitmap that removes much to the 'sync' effort after an unclean shutdown. One complication with checkpointing recovery is that we only know where we are up to in terms of IO requests started, not which ones have completed. And we need to know what has completed to record how much is recovered. So occasionally pause the recovery until all submitted requests are completed, then update the record of where we are up to. When we have a bitmap, we already do that pause occasionally to keep the bitmap up-to-date. So enhance that code to record the recovery offset and schedule a superblock update. And when there is no bitmap, just pause 16 times during the resync to do a checkpoint. '16' is a fairly arbitrary number. But we don't really have any good way to judge how often is acceptable, and it seems like a reasonable number for now. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:33:13 +11:00
NeilBrown	43b2e5d86d	md: move md_k.h from include/linux/raid/ to drivers/md/ It really is nicer to keep related code together.. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:33:13 +11:00
NeilBrown	bff61975b3	md: move lots of #include lines out of .h files and into .c This makes the includes more explicit, and is preparation for moving md_k.h to drivers/md/md.h Remove include/raid/md.h as its only remaining use was to #include other files. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:33:13 +11:00
Christoph Hellwig	ef740c372d	md: move headers out of include/linux/raid/ Move the headers with the local structures for the disciplines and bitmap.h into drivers/md/ so that they are more easily grepable for hacking and not far away. md.h is left where it is for now as there are some uses from the outside. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:27:03 +11:00
NeilBrown	355a43e641	md: write bitmap information to devices that are undergoing recovery. When we add some spares to an array and start recovery, and we have a bitmap which is stored 'internally' on all devices, we call bitmap_write_all to make sure the bitmap is correct on the new device(s). However that doesn't work as write_sb_page only writes to 'In_sync' devices, and devices undergoing recovery are not 'In_sync' until recovery finishes. So extend write_sb_page (actually next_active_rdev) to include devices that are under recovery. Signed-off-by: NeilBrown <neilb@suse.de>	2009-03-31 14:27:02 +11:00

1 2 3 4 5

106 Commits