linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
NeilBrown	e0ee778528	md/raid10: fix problem with on-stack allocation of r10bio structure. A 'struct r10bio' has an array of per-copy information at the end. This array is declared with size [0] and r10bio_pool_alloc allocates enough extra space to store the per-copy information depending on the number of copies needed. So declaring a 'struct r10bio on the stack isn't going to work. It won't allocate enough space, and memory corruption will ensue. So in the two places where this is done, declare a sufficiently large structure and use that instead. The two call-sites of this bug were introduced in 3.4 and 3.5 so this is suitable for both those kernels. The patch will have to be modified for 3.4 as it only has one bug. Cc: stable@vger.kernel.org Reported-by: Ivan Vasilyev <ivan.vasilyev@gmail.com> Tested-by: Ivan Vasilyev <ivan.vasilyev@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-08-18 09:51:42 +10:00
Linus Torvalds	eff0d13f38	Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block Pull block driver changes from Jens Axboe: - Making the plugging support for drivers a bit more sane from Neil. This supersedes the plugging change from Shaohua as well. - The usual round of drbd updates. - Using a tail add instead of a head add in the request completion for ndb, making us find the most completed request more quickly. - A few floppy changes, getting rid of a duplicated flag and also running the floppy init async (since it takes forever in boot terms) from Andi. * 'for-3.6/drivers' of git://git.kernel.dk/linux-block: floppy: remove duplicated flag FD_RAW_NEED_DISK blk: pass from_schedule to non-request unplug functions. block: stack unplug blk: centralize non-request unplug handling. md: remove plug_cnt feature of plugging. block/nbd: micro-optimization in nbd request completion drbd: announce FLUSH/FUA capability to upper layers drbd: fix max_bio_size to be unsigned drbd: flush drbd work queue before invalidate/invalidate remote drbd: fix potential access after free drbd: call local-io-error handler early drbd: do not reset rs_pending_cnt too early drbd: reset congestion information before reporting it in /proc/drbd drbd: report congestion if we are waiting for some userland callback drbd: differentiate between normal and forced detach drbd: cleanup, remove two unused global flags floppy: Run floppy initialization asynchronous	2012-08-01 09:06:47 -07:00
NeilBrown	0021b7bc04	md: remove plug_cnt feature of plugging. This seemed like a good idea at the time, but after further thought I cannot see it making a difference other than very occasionally and testing to try to exercise the case it is most likely to help did not show any performance difference by removing it. So remove the counting of active plugs and allow 'pending writes' to be activated at any time, not just when no plugs are active. This is only relevant when there is a write-intent bitmap, and the updating of the bitmap will likely introduce enough delay that the single-threading of bitmap updates will be enough to collect large numbers of updates together. Removing this will make it easier to centralise the unplug code, and will clear the other for other unplug enhancements which have a measurable effect. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-07-31 09:08:14 +02:00
Jonathan Brassow	cc4d1efdd0	MD RAID10: Export md_raid10_congested md/raid10: Export is_congested test. In similar fashion to commits `11d8a6e371` `1ed7242e59` we export the RAID10 congestion checking function so that dm-raid.c can make use of it and make use of the personality. The 'queue' and 'gendisk' structures will not be available to the MD code when device-mapper sets up the device, so we conditionalize access to these fields also. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-31 10:03:53 +10:00
Jonathan Brassow	473e87ce48	MD: Move macros from raid1.h to raid1.c MD RAID1/RAID10: Move some macros from .h file to .c file There are three macros (IO_BLOCKED,IO_MADE_GOOD,BIO_SPECIAL) which are defined in both raid1.h and raid10.h. They are only used in there respective .c files. However, if we wish to make RAID10 accessible to the device-mapper RAID target (dm-raid.c), then we need to move these macros into the .c files where they are used so that they do not conflict with each other. The macros from the two files are identical and could be moved into md.h, but I chose to leave the duplication and have them remain in the personality files. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-31 10:03:52 +10:00
Jonathan Brassow	dc280d987f	MD RAID10: rename mirror_info structure MD RAID10: Rename the structure 'mirror_info' to 'raid10_info' The same structure name ('mirror_info') is used by raid1. Each of these structures are defined in there respective header files. If dm-raid is to support both RAID1 and RAID10, the header files will be included and the structure names must not collide. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-31 10:03:52 +10:00
Jonathan Brassow	3bbae04b12	MD RAID10: Fix compiler warning. MD RAID10: Fix compiler warning. Initialize variable to prevent compiler warning. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-31 10:03:52 +10:00
NeilBrown	10684112c9	md/raid10: fix careless build error build error introduced by commit `b357f04a67` That function doesn't get extra args until a later patch. Bother. Reported-by: Fengguang Wu <wfg@linux.intel.com> Reported-by: Simon Kirby <sim@hostway.ca> Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-04 09:35:35 +10:00
NeilBrown	b357f04a67	md: fix up plugging (again). The value returned by "mddev_check_plug" is only valid until the next 'schedule' as that will unplug things. This could happen at any call to mempool_alloc. So just calling mddev_check_plug at the start doesn't really make sense. So call it just before, or just after, queuing things for the thread. As the action that happens at unplug is to wake the thread, this makes lots of sense. If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we wake thread immediately. RAID5 is a bit different. Requests are queued for the thread and the thread is woken by release_stripe. So we don't need to wake the thread on failure. However the thread doesn't perform certain actions when there is any active plug, so it is important to install a plug before waking the thread. So for RAID5 we install the plug before queuing the request and waking the thread. Without this patch it is possible for raid1 or raid10 to queue a request without then waking the thread, resulting in the array locking up. Also change raid10 to only flush_pending_write when there are not active plugs, just like raid1. This patch is suitable for 3.0 or later. I plan to submit it to -stable, but I'll like to let it spend a few weeks in mainline first to be sure it is completely safe. Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-03 17:45:31 +10:00
NeilBrown	0232605d98	md: make 'name' arg to md_register_thread non-optional. Having the 'name' arg optional and defaulting to the current personality name is no necessary and leads to errors, as when changing the level of an array we can end up using the name of the old level instead of the new one. So make it non-optional and always explicitly pass the name of the level that the array will be. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-03 15:56:52 +10:00
NeilBrown	055d3747db	md/raid10: fix failure when trying to repair a read error. commit `58c54fcca3` md/raid10: handle further errors during fix_read_error better. in 3.1 added "r10_sync_page_io" which takes an IO size in sectors. But we were passing the IO size in bytes!!! This resulting in bio_add_page failing, and empty request being sent down, and a consequent BUG_ON in scsi_lib. [fix missing space in error message at same time] This fix is suitable for 3.1.y and later. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-03 15:55:33 +10:00
NeilBrown	fc448a18ae	md/raid10: Don't try to recovery unmatched (and unused) chunks. If a RAID10 has an odd number of chunks - as might happen when there are an odd number of devices - the last chunk has no pair and so is not mirrored. We don't store data there, but when recovering the last device in an array we retry to recover that last chunk from a non-existent location. This results in an error, and the recovery aborts. When we get to that last chunk we should just stop - there is nothing more to do anyway. This bug has been present since the introduction of RAID10, so the patch is appropriate for any -stable kernel. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Tested-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-07-03 10:37:30 +10:00
NeilBrown	aba336bd1d	md: raid1/raid10: fix problem with merge_bvec_fn The new merge_bvec_fn which calls the corresponding function in subsidiary devices requires that mddev->merge_check_needed be set if any child has a merge_bvec_fn. However were were only setting that when a device was hot-added, not when a device was present from the start. This bug was introduced in 3.4 so patch is suitable for 3.4.y kernels. However that are conflicts in raid10.c so a separate patch will be needed for 3.4.y. Cc: stable@vger.kernel.org Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-31 15:56:30 +10:00
NeilBrown	63aced6102	md/raid10: Remove extras after reshape to smaller number of devices. When a reshape which reduced the number of devices finishes we must remove the extra devices. So ensure that raid10_remove_disk won't try to keep them, and have raid10_finish_reshape clear the 'in_sync' flag. Then remove_and_add_spares will be able to remove them. Reported-by: Hannes Reinecke <hare@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-22 13:55:33 +10:00
NeilBrown	bb63a7019d	md/raid10: resize bitmap when required during reshape. If a reshape changes the size of the array, then we can now update the bitmap to suit - so do so. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-22 13:55:28 +10:00
NeilBrown	a4a6125a07	md: allow array to be resized while bitmap is present. Now that bitmaps can be resized, we can allow an array to be resized while the bitmap is present. This only covers resizing that involves changing the effective size of member devices, not resizing that changes the number of devices. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-22 13:55:27 +10:00
majianpeng	5fdd2cf826	md/raid10: Fix memleak in r10buf_pool_alloc If the allocation of rep1_bio fails, we currently don't free the 'bio' of the same dev. Reported by kmemleak. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-22 13:55:03 +10:00
NeilBrown	3ea7daa5d7	md/raid10: add reshape support A 'near' or 'offset' lay RAID10 array can be reshaped to a different 'near' or 'offset' layout, a different chunk size, and a different number of devices. However the number of copies cannot change. Unlike RAID5/6, we do not support having user-space backup data that is being relocated during a 'critical section'. Rather, the data_offset of each device must change so that when writing any block to a new location, it will not over-write any data that is still 'live'. This means that RAID10 reshape is not supportable on v0.90 metadata. The different between the old data_offset and the new_offset must be at least the larger of the chunksize multiplied by offset copies of each of the old and new layout. (for 'near' mode, offset_copies == 1). A larger difference of around 64M seems useful for in-place reshapes as more data can be moved between metadata updates. Very large differences (e.g. 512M) seem to slow the process down due to lots of long seeks (on oldish consumer graded devices at least). Metadata needs to be updated whenever the place we are about to write to is considered - by the current metadata - to still contain data in the old layout. [unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>] Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-22 13:53:47 +10:00
NeilBrown	deb200d085	md/raid10: split out interpretation of layout to separate function. We will soon be interpreting the layout (and chunksize etc) from multiple places to support reshape. So split it out into separate function. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-21 09:28:33 +10:00
NeilBrown	f8c9e74ff0	md/raid10: Introduce 'prev' geometry to support reshape. When RAID10 supports reshape it will need a 'previous' and a 'current' geometry, so introduce that here. Use the 'prev' geometry when before the reshape_position, and the current 'geo' when beyond it. At other times, use both as appropriate. For now, both are identical (And reshape_position is never set). When we use the 'prev' geometry, we must use the old data_offset. When we use the current (And a reshape is happening) we must use the new_data_offset. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-21 09:28:33 +10:00
NeilBrown	5cf00fcd3c	md/raid10: collect some geometry fields into a dedicated structure. We will shortly be adding reshape support for RAID10 which will require it having 2 concurrent geometries (before and after). To make that easier, collect most geometry fields into 'struct geom' and access them from there. Then we will more easily be able to add a second set of fields. Note that 'copies' is not in this struct and so cannot be changed. There is little need to change this number and doing so is a lot more difficult as it requires reallocating more things. So leave it out for now. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-21 09:28:20 +10:00
NeilBrown	c6563a8c38	md: add possibility to change data-offset for devices. When reshaping we can avoid costly intermediate backup by changing the 'start' address of the array on the device (if there is enough room). So as a first step, allow such a change to be requested through sysfs, and recorded in v1.x metadata. (As we didn't previous check that all 'pad' fields were zero, we need a new FEATURE flag for this. A (belatedly) check that all remaining 'pad' fields are zero to avoid a repeat of this) The new data offset must be requested separately for each device. This allows each to have a different change in the data offset. This is not likely to be used often but as data_offset can be set per-device, new_data_offset should be too. This patch also removes the 'acknowledged' arg to rdev_set_badblocks as it is never used and never will be. At the same time we add a new arg ('in_new') which is currently always zero but will be used more soon. When a reshape finishes we will need to update the data_offset and rdev->sectors. So provide an exported function to do that. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-21 09:27:00 +10:00
NeilBrown	b0d634d568	md/raid10: fix transcription error in calc_sectors conversion. The old code was sector_div(stride, fc); the new code was sector_dir(size, conf->near_copies); 'size' is right (the stride various wasn't really needed), but 'fc' means 'far_copies', and that is an important difference. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-19 09:01:13 +10:00
NeilBrown	6508fdbf40	md/raid10: set dev_sectors properly when resizing devices in array. raid10 stores dev_sectors in 'conf' separately from the one in 'mddev' because it can have a very significant effect on block addressing and so need to be updated carefully. However raid10_resize isn't updating it at all! To update it correctly, we need to make sure it is a proper multiple of the chunksize taking various details of the layout in to account. This calculation is currently done in setup_conf. So split it out from there and call it from raid10_resize as well. Then set conf->dev_sectors properly. Signed-off-by: NeilBrown <neilb@suse.de>	2012-05-17 10:08:45 +10:00
majianpeng	f4380a9158	md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery. If r1bio->sectors % 8 != 0,then the memcmp and a later memcpy will omit the last bio_vec. This is suitable for any stable kernel since 3.1 when bad-block management was introduced. Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>	2012-04-12 16:04:47 +10:00

1 2 3 4 5 ...

252 Commits