linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Linus Torvalds	67b9d76f9e	Merge tag 'dm-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM multipath IO hang regression from 3.15 due to logic bug in multipath_busy. This impacted cable-pull testing and also the ability to boot with IPR SCSI on a POWER8 box. - Fix possible deadlock with deferred device removal by using a new dedicated workqueue rather than using the system workqueue. - Fix NULL pointer crash due to race condition in dm-io's wake up code for sync_io by using a completion. - Update dm-crypt and dm-zero author name following legal name change; this is important to Jana so I didn't see any reason to hold it back. * tag 'dm-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: fix IO hang due to logic bug in multipath_busy dm io: fix a race condition in the wake up code for sync_io dm crypt, dm zero: update author name following legal name change dm: allocate a special workqueue for deferred device removal	2014-07-11 09:33:36 -07:00
Jun'ichi Nomura	7a7a3b45fe	dm mpath: fix IO hang due to logic bug in multipath_busy Commit `e80991773` ("dm mpath: push back requests instead of queueing") modified multipath_busy() to return true if !pg_ready(). pg_ready() checks the current state of the multipath device and may return false even if a new IO is needed to change the state. Bart Van Assche reported that he had multipath IO lockup when he was performing cable pull tests. Analysis showed that the multipath device had a single path group with both paths active, but that the path group itself was not active. During the multipath device state transitions 'queue_io' got set but nothing could clear it. Clearing 'queue_io' only happens in __choose_pgpath(), but it won't be called if multipath_busy() returns true due to pg_ready() returning false when 'queue_io' is set. As such the !pg_ready() check in multipath_busy() is wrong because new IO will not be sent to multipath target and the multipath state change won't happen. That results in multipath IO lockup. The intent of multipath_busy() is to avoid unnecessary cycles of dequeue + request_fn + requeue if it is known that the multipath device will requeue. Such "busy" situations would be: - path group is being activated - there is no path and the multipath is setup to requeue if no path Fix multipath_busy() to return "busy" early only for these specific situations. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.15	2014-07-10 16:44:15 -04:00
Joe Thornber	10f1d5d111	dm io: fix a race condition in the wake up code for sync_io There's a race condition between the atomic_dec_and_test(&io->count) in dec_count() and the waking of the sync_io() thread. If the thread is spuriously woken immediately after the decrement it may exit, making the on stack io struct invalid, yet the dec_count could still be using it. Fix this race by using a completion in sync_io() and dec_count(). Reported-by: Minfei Huang <huangminfei@ucloud.cn> Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org	2014-07-10 16:44:14 -04:00
Jana Saout	bf14299f1c	dm crypt, dm zero: update author name following legal name change Signed-off-by: Jana Saout <jana@saout.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-07-10 16:44:14 -04:00
Mikulas Patocka	acfe0ad74d	dm: allocate a special workqueue for deferred device removal The commit `2c140a246d` ("dm: allow remove to be deferred") introduced a deferred removal feature for the device mapper. When this feature is used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl) and the user tries to remove a device that is currently in use, the device will be removed automatically in the future when the last user closes it. Device mapper used the system workqueue to perform deferred removals. However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items scheduled for the system workqueue from their destructor. If the destructor itself is called from the system workqueue during deferred removal, it introduces a possible deadlock - the workqueue tries to flush itself. Fix this possible deadlock by introducing a new workqueue for deferred removals. We allocate just one workqueue for all dm targets. The ability of dm targets to process IOs isn't dependent on deferred removal of unused targets, so a deadlock due to shared workqueue isn't possible. Also, cleanup local_init() to eliminate potential for returning success on failure. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.13+	2014-07-10 16:44:13 -04:00
NeilBrown	133d4527ea	md: flush writes before starting a recovery. When we write to a degraded array which has a bitmap, we make sure the relevant bit in the bitmap remains set when the write completes (so a 're-add' can quickly rebuilt a temporarily-missing device). If, immediately after such a write starts, we incorporate a spare, commence recovery, and skip over the region where the write is happening (because the 'needs recovery' flag isn't set yet), then that write will not get to the new device. Once the recovery finishes the new device will be trusted, but will have incorrect data, leading to possible corruption. We cannot set the 'needs recovery' flag when we start the write as we do not know easily if the write will be "degraded" or not. That depends on details of the particular raid level and particular write request. This patch fixes a corruption issue of long standing and so it suitable for any -stable kernel. It applied correctly to 3.0 at least and will minor editing to earlier kernels. Reported-by: Bill <billstuff2001@sbcglobal.net> Tested-by: Bill <billstuff2001@sbcglobal.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-03 10:44:45 +10:00
NeilBrown	9bd3592032	md: make sure GET_ARRAY_INFO ioctl reports correct "clean" status If an array has a bitmap, the when we set the "has bitmap" flag we incorrectly clear the "is clean" flag. "is clean" isn't really important when a bitmap is present, but it is best to get it right anyway. Reported-by: George Duffield <forumscollective@gmail.com> Link: http://lkml.kernel.org/CAG__1a4MRV6gJL38XLAurtoSiD3rLBTmWpcS5HYvPpSfPR88UQ@mail.gmail.com Fixes: `36fa30636f` (v2.6.14) Signed-off-by: NeilBrown <neilb@suse.de>	2014-07-03 10:44:31 +10:00
Linus Torvalds	0e04c641b1	Merge tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: "This pull request is later than I'd have liked because I was waiting for some performance data to help finally justify sending the long-standing dm-crypt cpu scalability improvements upstream. Unfortunately we came up short, so those dm-crypt changes will continue to wait, but it seems we're not far off. . Add dm_accept_partial_bio interface to DM core to allow DM targets to only process a portion of a bio, the remainder being sent in the next bio. This enables the old dm snapshot-origin target to only split write bios on chunk boundaries, read bios are now sent to the origin device unchanged. . Add DM core support for disabling WRITE SAME if the underlying SCSI layer disables it due to command failure. . Reduce lock contention in DM's bio-prison. . A few small cleanups and fixes to dm-thin and dm-era" * tag 'dm-3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: update discard_granularity to reflect the thin-pool blocksize dm bio prison: implement per bucket locking in the dm_bio_prison hash table dm: remove symbol export for dm_set_device_limits dm: disable WRITE SAME if it fails dm era: check for a non-NULL metadata object before closing it dm thin: return ENOSPC instead of EIO when error_if_no_space enabled dm thin: cleanup noflush_work to use a proper completion dm snapshot: do not split read bios sent to snapshot-origin target dm snapshot: allocate a per-target structure for snapshot-origin target dm: introduce dm_accept_partial_bio dm: change sector_count member in clone_info from sector_t to unsigned	2014-06-12 13:33:29 -07:00
Lukas Czerner	09869de57e	dm thin: update discard_granularity to reflect the thin-pool blocksize DM thinp already checks whether the discard_granularity of the data device is a factor of the thin-pool block size. But when using the dm-thin-pool's discard passdown support, DM thinp was not selecting the max of the underlying data device's discard_granularity and the thin-pool's block size. Update set_discard_limits() to set discard_granularity to the max of these values. This enables blkdev_issue_discard() to properly align the discards that are sent to the DM thin device on a full block boundary. As such each discard will now cover an entire DM thin-pool block and the block will be reclaimed. Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2014-06-11 16:56:12 -04:00
Heinz Mauelshagen	adcc44472b	dm bio prison: implement per bucket locking in the dm_bio_prison hash table Split the single per bio-prison lock by using per bucket locking. Per bucket locking benefits both dm-thin and dm-cache targets by reducing bio-prison lock contention. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-11 16:48:54 -04:00
Linus Torvalds	8d0304e69d	Merge tag 'md/3.16' of git://neil.brown.name/md Pull md updates from Neil Brown: "Assorted md fixes for 3.16 Mostly performance improvements with a few corner-case bug fixes" * tag 'md/3.16' of git://neil.brown.name/md: raid5: speedup sync_request processing md/raid5: deadlock between retry_aligned_read with barrier io raid5: add an option to avoid copy data from bio to stripe cache md/bitmap: remove confusing code from filemap_get_page. raid5: avoid release list until last reference of the stripe md: md_clear_badblocks should return an error code on failure. md/raid56: Don't perform reads to support writes until stripe is ready. md: refuse to change shape of array if it is active but read-only	2014-06-11 08:33:41 -07:00
Eivind Sarto	053f5b6525	raid5: speedup sync_request processing The raid5 sync_request() processing calls handle_stripe() within the context of the resync-thread. The resync-thread issues the first set of read requests and this adds execution latency and slows down the scheduling of the next sync_request(). The current rebuild/resync speed of raid5 is not much faster than what rotational HDDs can sustain. Testing the following patch on a 6-drive array, I can increase the rebuild speed from 100 MB/s to 175 MB/s. The sync_request() now just sets STRIPE_HANDLE and releases the stripe. This creates some more parallelism between the resync-thread and raid5 kernel daemon. Signed-off-by: Eivind Sarto <esarto@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>	2014-06-10 11:02:01 +10:00
Linus Torvalds	3f17ea6dea	Merge branch 'next' (accumulated 3.16 merge window patches) into master Now that 3.15 is released, this merges the 'next' branch into 'master', bringing us to the normal situation where my 'master' branch is the merge window. * accumulated work in next: (6809 commits) ufs: sb mutex merge + mutex_destroy powerpc: update comments for generic idle conversion cris: update comments for generic idle conversion idle: remove cpu_idle() forward declarations nbd: zero from and len fields in NBD_CMD_DISCONNECT. mm: convert some level-less printks to pr_* MAINTAINERS: adi-buildroot-devel is moderated MAINTAINERS: add linux-api for review of API/ABI changes mm/kmemleak-test.c: use pr_fmt for logging fs/dlm/debug_fs.c: replace seq_printf by seq_puts fs/dlm/lockspace.c: convert simple_str to kstr fs/dlm/config.c: convert simple_str to kstr mm: mark remap_file_pages() syscall as deprecated mm: memcontrol: remove unnecessary memcg argument from soft limit functions mm: memcontrol: clean up memcg zoneinfo lookup mm/memblock.c: call kmemleak directly from memblock_(alloc\|free) mm/mempool.c: update the kmemleak stack trace for mempool allocations lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations mm: introduce kmemleak_update_trace() mm/kmemleak.c: use %u to print ->checksum ...	2014-06-08 11:31:16 -07:00
hui jiao	2844dc32ea	md/raid5: deadlock between retry_aligned_read with barrier io A chunk aligned read increases counter active_aligned_reads and decreases it after sub-device handle it successfully. But when a read error occurs, the read redispatched by raid5d, and the active_aligned_reads will not be decreased until we can grab a stripe head in retry_aligned_read. Now suppose, a barrier io comes, set conf->quiesce to 2, and wait until both active_stripes and active_aligned_reads are zero. The retried chunk aligned read gets stuck at get_active_stripe waiting until conf->quiesce becomes 0. Retry_aligned_read and barrier io are waiting each other now. One possible solution is that we ignore conf->quiesce, let the retried aligned read finish. I reproduced this deadlock and test this patch on centos6.0 Signed-off-by: NeilBrown <neilb@suse.de>	2014-06-05 17:18:19 +10:00
Mike Snitzer	11f0431be2	dm: remove symbol export for dm_set_device_limits There is no need for code other than DM core to use dm_set_device_limits so remove its EXPORT_SYMBOL_GPL. Also, cleanup a couple whitespace nits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-04 09:46:34 -04:00
Mike Snitzer	7eee4ae2db	dm: disable WRITE SAME if it fails Add DM core support for disabling WRITE SAME on first failure to both request-based and bio-based targets. The need to disable WRITE SAME stems from SCSI enabling it by default but then disabling it when it fails. When SCSI does this it returns "permanent target failure, do not retry" using -EREMOTEIO. Update DM core to only disable WRITE SAME on failure if the returned error is -EREMOTEIO. Commit `f84cb8a4` ("dm mpath: disable WRITE SAME if it fails") implemented multipath specific disabling of WRITE SAME if it fails. However, as that commit detailed, the multipath-only solution doesn't go far enough if bio-based DM targets are stacked ontop of the request-based dm-multipath target (as is commonly done using dm-linear to support partitions on multipath devices, via kpartx). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Alex Chen <alex.chen@huawei.com>	2014-06-04 09:45:52 -04:00
Linus Torvalds	776edb5931	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next Pull core locking updates from Ingo Molnar: "The main changes in this cycle were: - reduced/streamlined smp_mb__() interface that allows more usecases and makes the existing ones less buggy, especially in rarer architectures - add rwsem implementation comments - bump up lockdep limits" 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) rwsem: Add comments to explain the meaning of the rwsem's count field lockdep: Increase static allocations arch: Mass conversion of smp_mb__() arch,doc: Convert smp_mb__() arch,xtensa: Convert smp_mb__() arch,x86: Convert smp_mb__() arch,tile: Convert smp_mb__() arch,sparc: Convert smp_mb__() arch,sh: Convert smp_mb__() arch,score: Convert smp_mb__() arch,s390: Convert smp_mb__() arch,powerpc: Convert smp_mb__() arch,parisc: Convert smp_mb__() arch,openrisc: Convert smp_mb__() arch,mn10300: Convert smp_mb__() arch,mips: Convert smp_mb__() arch,metag: Convert smp_mb__() arch,m68k: Convert smp_mb__() arch,m32r: Convert smp_mb__() arch,ia64: Convert smp_mb__() ...	2014-06-03 12:57:53 -07:00
Joe Thornber	989f26f5ad	dm era: check for a non-NULL metadata object before closing it era_ctr() may call era_destroy() before era->md is initialized so era_destory() must only close the metadata object if it is not NULL. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Naohiro Aota <naota@elisp.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.15+	2014-06-03 13:44:08 -04:00
Mike Snitzer	af91805a49	dm thin: return ENOSPC instead of EIO when error_if_no_space enabled Update the DM thin provisioning target's allocation failure error to be consistent with commit `a9d6ceb8` ("[SCSI] return ENOSPC on thin provisioning failure"). The DM thin target now returns -ENOSPC rather than -EIO when block allocation fails due to the pool being out of data space (and the 'error_if_no_space' thin-pool feature is enabled). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-By: Joe Thornber <ejt@redhat.com>	2014-06-03 13:44:08 -04:00
Joe Thornber	e7a3e871d8	dm thin: cleanup noflush_work to use a proper completion Factor out a pool_work interface that noflush_work makes use of to wait for and complete work items (in terms of a proper completion struct). Allows discontinuing the use of a custom completion in terms of atomic_t and wait_event. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	298eaa89b0	dm snapshot: do not split read bios sent to snapshot-origin target Change the snapshot-origin target so that only write bios are split on chunk boundary. Read bios are passed unchanged to the underlying device, so they don't have to be split. Later, we could change the target so that it accepts a larger write bio if it spans an area that is completely covered by snapshot exceptions. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	599cdf3bfb	dm snapshot: allocate a per-target structure for snapshot-origin target Allocate a per-target dm_origin structure. This is a prerequisite for the next commit ("dm snapshot: do not split read bios sent to snapshot-origin target") which adds a new member to this structure. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:07 -04:00
Mikulas Patocka	1dd40c3ecd	dm: introduce dm_accept_partial_bio The function dm_accept_partial_bio allows the target to specify how many sectors of the current bio it will process. If the target only wants to accept part of the bio, it calls dm_accept_partial_bio and the DM core sends the rest of the data in next bio. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:06 -04:00
Mikulas Patocka	e0d6609a5f	dm: change sector_count member in clone_info from sector_t to unsigned It is impossible to create bios with 2^23 or more sectors (the size is stored as a 32-bit byte count in the bio). So we convert some sector_t values to unsigned integers. This is needed for the next commit ("dm: introduce dm_accept_partial_bio") that replaces integer value arguments with pointers, so the size of the integer must match. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-06-03 13:44:06 -04:00
Linus Torvalds	ca755175f2	Merge tag 'md/3.15-fixes' of git://neil.brown.name/md Pull two md bugfixes from Neil Brown: "Two md bugfixes for possible corruption when restarting reshape If a raid5/6 reshape is restarted (After stopping and re-assembling the array) and the array is marked read-only (or read-auto), then the reshape will appear to complete immediately, without actually moving anything around. This can result in corruption. There are two patches which do much the same thing in different places. They are separate because one is an older bug and so can be applied to more -stable kernels" * tag 'md/3.15-fixes' of git://neil.brown.name/md: md: always set MD_RECOVERY_INTR when interrupting a reshape thread. md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".	2014-06-02 17:04:37 -07:00

1 2 3 4 5 ...

3281 Commits