linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
K.Tanaka	a07e6ab41b	md: the md RAID10 resync thread could cause a md RAID10 array deadlock This message describes another issue about md RAID10 found by testing the 2.6.24 md RAID10 using new scsi fault injection framework. Abstract: When a scsi error results in disabling a disk during RAID10 recovery, the resync threads of md RAID10 could stall. This case, the raid array has already been broken and it may not matter. But I think stall is not preferable. If it occurs, even shutdown or reboot will fail because of resource busy. The deadlock mechanism: The r10bio_s structure has a "remaining" member to keep track of BIOs yet to be handled when recovering. The "remaining" counter is incremented when building a BIO in sync_request() and is decremented when finish a BIO in end_sync_write(). If building a BIO fails for some reasons in sync_request(), the "remaining" should be decremented if it has already been incremented. I found a case where this decrement is forgotten. This causes a md_do_sync() deadlock because md_do_sync() waits for md_done_sync() called by end_sync_write(), but end_sync_write() never calls md_done_sync() because of the "remaining" counter mismatch. For example, this problem would be reproduced in the following case: Personalities : [raid10] md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F) 3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_] [>....................] recovery = 2.2% (45376/1959808) finish=0.7min speed=45376K/sec This case, sdf1 is recovering, sdb1 and sde1 are disabled. An additional error with detaching sdd will cause a deadlock. md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F) 3919616 blocks 64K chunks 2 near-copies [4/1] [_U__] [=>...................] recovery = 5.0% (99520/1959808) finish=5.9min speed=5237K/sec 2739 ? S< 0:17 [md0_raid10] 28608 ? D< 0:00 [md0_resync] 28629 pts/1 Ss 0:00 bash 28830 pts/1 R+ 0:00 ps ax 31819 ? D< 0:00 [kjournald] The resync thread keeps working, but actually it is deadlocked. Patch: By this patch, the remaining counter will be decremented if needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	1c830532f6	md: fix possible raid1/raid10 deadlock on read error during resync Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for another possible deadlock in raid1/raid10 error handing. If a read request returns an error while a resync is happening and a resync request is pending, the attempt to fix the error will block until the resync progresses, and the resync will block until the read request completes. Thus a deadlock. This patch fixes the problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
Keld Simonsen	8ed3a19563	md: don't attempt read-balancing for raid10 'far' layouts This patch changes the disk to be read for layout "far > 1" to always be the disk with the lowest block address. Thus the chunks to be read will always be (for a fully functioning array) from the first band of stripes, and the raid will then work as a raid0 consisting of the first band of stripes. Some advantages: The fastest part which is the outer sectors of the disks involved will be used. The outer blocks of a disk may be as much as 100 % faster than the inner blocks. Average seek time will be smaller, as seeks will always be confined to the first part of the disks. Mixed disks with different performance characteristics will work better, as they will work as raid0, the sequential read rate will be number of disks involved times the IO rate of the slowest disk. If a disk is malfunctioning, the first disk which is working, and has the lowest block address for the logical block will be used. Signed-off-by: Keld Simonsen <keld@dkuug.dk> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	27c529bb8e	md: lock access to rdev attributes properly When we access attributes of an rdev (component device on an md array) through sysfs, we really need to lock the array against concurrent changes. We currently do that when we change an attribute, but not when we read an attribute. We need to lock when reading as well else rdev->mddev could become NULL while we are accessing it. So add appropriate locking (mddev_lock) to rdev_attr_show. rdev_size_store requires some extra care as well as it needs to unlock the mddev while scanning other mddevs for overlapping regions. We currently assume that rdev->mddev will still be unchanged after the scan, but that cannot be certain. So take a copy of rdev->mddev for use at the end of the function. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	2515619823	md: make sure a reshape is started when device switches to read-write A resync/reshape/recovery thread will refuse to progress when the array is marked read-only. So whenever it mark it not read-only, it is important to wake up thread resync thread. There is one place we didn't do this. The problem manifests if the start_ro module parameters is set, and a raid5 array that is in the middle of a reshape (restripe) is started. The array will initially be semi-read-only (meaning it acts like it is readonly until the first write). So the reshape will not proceed. On the first write, the array will become read-write, but the reshape will not be started, and there is no event which will ever restart that thread. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	d0fae18f1b	md: clean up irregularity with raid autodetect When a raid1 array is stopped, all components currently get added to the list for auto-detection. However we should really only add components that were found by autodetection in the first place. So add a flag to record that information, and use it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:18 -08:00
NeilBrown	a1801f858e	md: guard against possible bad array geometry in v1 metadata Make sure the data doesn't start before the end of the superblock when the superblock is at the start of the device. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
NeilBrown	8311c29d40	md: reduce CPU wastage on idle md array with a write-intent bitmap On an md array with a write-intent bitmap, a thread wakes up every few seconds and scans the bitmap looking for work to do. If the array is idle, there will be no work to do, but a lot of scanning is done to discover this. So cache the fact that the bitmap is completely clean, and avoid scanning the whole bitmap when the cache is known to be clean. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
NeilBrown	a35e63efa1	md: fix deadlock in md/raid1 and md/raid10 when handling a read error When handling a read error, we freeze the array to stop any other IO while attempting to over-write with correct data. This is done in the raid1d(raid10d) thread and must wait for all submitted IO to complete (except for requests that failed and are sitting in the retry queue - these are counted in ->nr_queue and will stay there during a freeze). However write requests need attention from raid1d as bitmap updates might be required. This can cause a deadlock as raid1 is waiting for requests to finish that themselves need attention from raid1d. So we create a new function 'flush_pending_writes' to give that attention, and call it in freeze_array to be sure that we aren't waiting on raid1d. Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this problem. Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
FUJITA Tomonori	466634488e	iommu: parisc: make the IOMMUs respect the segment boundary limits Make PARISC's two IOMMU implementations not allocate a memory area spanning LLD's segment boundary. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
FUJITA Tomonori	7c8cda625a	iommu: parisc: pass struct device to iommu_alloc_range This adds struct device argument to sba_alloc_range and ccio_alloc_range, a preparation for modifications to fix the IOMMU segment boundary problem. This change enables ccio_alloc_range to access to LLD's segment boundary limits. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
FUJITA Tomonori	3715863aa1	iommu: export iommu_is_span_boundary helper function iommu_is_span_boundary is used internally in the IOMMU helper (lib/iommu-helper.c), a primitive function that judges whether a memory area spans LLD's segment boundary or not. It's difficult to convert some IOMMUs to use the IOMMU helper but iommu_is_span_boundary is still useful for them. So this patch exports it. This is needed for the parisc iommu fixes. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:17 -08:00
Kyle McMartin	7eb701dc77	hisax_fcpcipnp: move request_irq later in probe After a quick glance at the code, we're getting the DEBUG_SHIRQ spurious interrupt before we have the adapter template filled in. Real interrupts appear to be turned on by fcpci*_init(), so move request_irq until just before that. Signed-off-by: Kyle McMartin <kmcmartin@redhat.com> Cc: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Michael Halcrow	e4465fdaeb	eCryptfs: make ecryptfs_prepare_write decrypt the page When the page is not up to date, ecryptfs_prepare_write() should be acting much like ecryptfs_readpage(). This includes the painfully obvious step of actually decrypting the page contents read from the lower encrypted file. Note that this patch resolves a bug in eCryptfs in 2.6.24 that one can produce with these steps: # mount -t ecryptfs /secret /secret # echo "abc" > /secret/file.txt # umount /secret # mount -t ecryptfs /secret /secret # echo "def" >> /secret/file.txt # cat /secret/file.txt Without this patch, the resulting data returned from cat is likely to be something other than "abc\ndef\n". (Thanks to Benedikt Driessen for reporting this.) Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Benedikt Driessen <bdriessen@escrypt.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Jesper Nilsson	87ffbe679e	cris: correct syscall numbers in unistd.h for timerfd_settime and timerfd_gettime Last commit for unistd was not correct, it only had a partial update of syscall numbers for __NR_timerfd_settime and __NR_timerfd_gettime. Also, NR_syscalls was not incremented for the new syscalls. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <mikael.starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Jesper Nilsson	07f2402b4a	cris: correct usage of __user for copy to and from user space in lib/usercopy and uaccess.h Function __copy_user_zeroing in arch/lib/usercopy.c had the wrong parameter set as __user, and in include/asm-cris/uaccess.h, it was not set at all for some of the calling functions. This will cut the number of warnings quite dramatically when using sparse. While we're here, remove useless CVS log and correct confusing typo. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <mikael.starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Henrique de Moraes Holschuh	cee47f5a32	ACPI: thinkpad-acpi: fix hotkey_get_tablet_mode I used the wrong return convention on hotkey_get_tablet_mode(), breaking a lot of stuff. Bad Henrique! Fix it to return the status in the parameter-by-reference, and IO status on the function return value. Duh. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Zdenek Kabelac <zdenek.kabelac@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Julia Lawall	acc1f3ede9	fs/reiserfs/super.c: correct use of ! and & In commit `e6bafba5b4` ("wmi: (!x & y) strikes again"), a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E1,E2; @@ ( !E1 & !E2 \| - !E1 & E2 + !(E1 & E2) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Julia Lawall	022d917d96	drivers/serial/m32r_sio.c: correct use of ! and & In commit `e6bafba5b4` ("wmi: (!x & y) strikes again"), a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E1,E2; @@ ( !E1 & !E2 \| - !E1 & E2 + !(E1 & E2) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Hirokazu Takata <takata@linux-m32r.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Julia Lawall	ae91d60ba8	drivers/isdn: correct use of ! and & In commit `e6bafba5b4` ("wmi: (!x & y) strikes again"), a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E1,E2; @@ ( !E1 & !E2 \| - !E1 & E2 + !(E1 & E2) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:16 -08:00
Julia Lawall	07fb6f26ba	drivers/char/isicom.c: correct use of ! and & In commit `e6bafba5b4` ("wmi: (!x & y) strikes again"), a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E1,E2; @@ ( !E1 & !E2 \| - !E1 & E2 + !(E1 & E2) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:15 -08:00
Hugh Dickins	fb59e9f1e9	memcg: fix oops on NULL lru list While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list. I couldn't see what racing tasks on other cpus were doing, but surmise that another must have been in mem_cgroup_charge_common on the same page, between its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc which I'd almost changed to a kmalloc). Normally such a race cannot happen, the ref_cnt prevents it, the final uncharge cannot race with the initial charge. But force_empty buggers the ref_cnt, that's what it's all about; and thereafter forced pages are vulnerable to races such as this (just think of a shared page also mapped into an mm of another mem_cgroup than that just emptied). And remain vulnerable until they're freed indefinitely later. This patch just fixes the oops by moving the unlock_page_cgroups down below adding to and removing from the list (only possible given the previous patch); and while we're at it, we might as well make it an invariant that page->page_cgroup is always set while pc is on lru. But this behaviour of force_empty seems highly unsatisfactory to me: why have a ref_cnt if we always have to cope with it being violated (as in the earlier page migration patch). We may prefer force_empty to move pages to an orphan mem_cgroup (could be the root, but better not), from which other cgroups could recover them; we might need to reverse the locking again; but no time now for such concerns. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:15 -08:00
Hirokazu Takahashi	9b3c0a07e0	memcg: simplify force_empty and move_lists As for force_empty, though this may not be the main topic here, mem_cgroup_force_empty_list() can be implemented simpler. It is possible to make the function just call mem_cgroup_uncharge_page() instead of releasing page_cgroups by itself. The tip is to call get_page() before invoking mem_cgroup_uncharge_page(), so the page won't be released during this function. Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges, the page might have been reassigned to an lru of a different mem_cgroup, and now be emptied from that; but Hugh claims that's okay, the end state is the same as when it hasn't gone to another list. And once force_empty stops taking lock_page_cgroup within mz->lru_lock, mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while holding page_cgroup lock (but still has to use try_lock_page_cgroup). Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: David Rientjes <rientjes@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:15 -08:00
Hugh Dickins	2680eed723	memcg: fix mem_cgroup_move_lists locking Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went into page freeing, I've hit it from time to time in testing on some machines, sometimes only after many days. Recently found a machine which could usually produce it within a few hours, which got me there at last. The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the arrangement of structures was such that you got page_cgroups from the lru list neatly put on to SLUB's freelist. Kamezawa-san identified the same hole independently. The main problem was that it was missing the lock_page_cgroup it needs to safely page_get_page_cgroup; but it's tricky to go beyond that too, and I couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected. See the code for comments on the constraints. This patch immediately gets replaced by a simpler one from Hirokazu-san; but is it just foolish pride that tells me to put this one on record, in case we need to come back to it later? Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: David Rientjes <rientjes@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:15 -08:00
Hugh Dickins	6d48ff8bcf	memcg: css_put after remove_list mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from it, and before removing page_cgroup from one of its lru lists: isn't there a danger that struct mem_cgroup memory could be freed and reused before completing that, so corrupting something? Never seen it, and for all I know there may be other constraints which make it impossible; but let's be defensive and reverse the ordering there. mem_cgroup_force_empty_list is safe because there's an extra css_get around all its works; but even so, change its ordering the same way round, to help get in the habit of doing it like this. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: David Rientjes <rientjes@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-04 16:35:15 -08:00

1 2 3 4 5 ...

86933 Commits