linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Dan Ehrenberg	644bda6f34	dm table: fall back to getting device using name_to_dev_t() If a device is used as the root filesystem, it can't be built off of devices which are within the root filesystem (just like command line arguments to root=). For this reason, Linux has a pseudo-filesystem for root= and MD initialization (based on the function name_to_dev_t) which handles different ways of specifying devices including PARTUUID and major:minor. Switch to using name_to_dev_t() in dm_get_device(). Rather than having DM assume that all things which are not major:minor are paths in an already-mounted filesystem, change dm_get_device() to first attempt to look up the device in the filesystem, and if not found it will fall back to using name_to_dev_t(). In terms of backwards compatibility, there are some cases where behavior will be different: - If you have a file in the current working directory named 1:2 and you initialze DM there, then it will try to use that file rather than the disk with that major:minor pair as a backing device. - Similarly for other bdev types which name_to_dev_t() knows how to interpret, the previous behavior was to repeatedly check for the existence of the file (e.g., while waiting for rootfs to come up) but the new behavior is to use the name_to_dev_t() interpretation. For example, if you have a file named /dev/ubiblock0_0 which is a symlink to /dev/sda3, but it is not yet present when DM starts to initialize, then the name_to_dev_t() interpretation will take precedence. These incompatibilities would only show up in really strange setups with bad practices so we shouldn't have to worry about them. Signed-off-by: Dan Ehrenberg <dehrenberg@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:19 -04:00
Mike Snitzer	17e149b8f7	dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr Request-based DM's blk-mq support defaults to off; but a user can easily change the default using the dm_mod.use_blk_mq module/boot option. Also, you can check what mode a given request-based DM device is using with: cat /sys/block/dm-X/dm/use_blk_mq This change enabled further cleanup and reduced work (e.g. the md->io_pool and md->rq_pool isn't created if using blk-mq). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:17 -04:00
Mike Snitzer	022333427a	dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq dm_mq_queue_rq() is in atomic context so care must be taken to not sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations and dm-mpath's call to blk_get_request(). In the future the bioset allocations will hopefully go away (by removing support for partial completions of bios in a cloned request). Also prepare for supporting DM blk-mq ontop of old-style request_fn device(s) if a new dm-mod 'use_blk_mq' parameter is set. The kthread will still be used to queue work if blk-mq is used ontop of old-style request_fn device(s). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:17 -04:00
Mike Snitzer	bfebd1cdb4	dm: add full blk-mq support to request-based DM Commit `e5863d9ad` ("dm: allocate requests in target when stacking on blk-mq devices") served as the first step toward fully utilizing blk-mq in request-based DM -- it enabled stacking an old-style (request_fn) request_queue ontop of the underlying blk-mq device(s). That first step didn't improve performance of DM multipath ontop of fast blk-mq devices (e.g. NVMe) because the top-level old-style request_queue was severely limited by the queue_lock. The second step offered here enables stacking a blk-mq request_queue ontop of the underlying blk-mq device(s). This unlocks significant performance gains on fast blk-mq devices, Keith Busch tested on his NVMe testbed and offered this really positive news: "Just providing a performance update. All my fio tests are getting roughly equal performance whether accessed through the raw block device or the multipath device mapper (~470k IOPS). I could only push ~20% of the raw iops through dm before this conversion, so this latest tree is looking really solid from a performance standpoint." Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Keith Busch <keith.busch@intel.com>	2015-04-15 12:10:16 -04:00
Mike Snitzer	0ce65797a7	dm: impose configurable deadline for dm_request_fn's merge heuristic Otherwise, for sequential workloads, the dm_request_fn can allow excessive request merging at the expense of increased service time. Add a per-device sysfs attribute to allow the user to control how long a request, that is a reasonable merge candidate, can be queued on the request queue. The resolution of this request dispatch deadline is in microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline: echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline The dm_request_fn's merge heuristic and associated extra accounting is disabled by default (rq_based_seq_io_merge_deadline is 0). This sysfs attribute is not applicable to bio-based DM devices so it will only ever report 0 for them. By allowing a request to remain on the queue it will block others requests on the queue. But introducing a short dequeue delay has proven very effective at enabling certain sequential IO workloads on really fast, yet IOPS constrained, devices to build up slightly larger IOs -- yielding 90+% throughput improvements. Having precise control over the time taken to wait for larger requests to build affords control beyond that of waiting for certain IO sizes to accumulate (which would require a deadline anyway). This knob will only ever make sense with sequential IO workloads and the particular value used is storage configuration specific. Given the expected niche use-case for when this knob is useful it has been deemed acceptable to expose this relatively crude method for crafting optimal IO on specific storage -- especially given the solution is simple yet effective. In the context of DM multipath, it is advisable to tune this sysfs attribute to a value that offers the best performance for the common case (e.g. if 4 paths are expected active, tune for that; if paths fail then performance may be slightly reduced). Alternatives were explored to have request-based DM autotune this value (e.g. if/when paths fail) but they were quickly deemed too fragile and complex to warrant further design and development time. If this problem proves more common as faster storage emerges we'll have to look at elevating a generic solution into the block core. Tested-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:15 -04:00
Mike Snitzer	b898320d68	dm sysfs: introduce ability to add writable attributes Add DM_ATTR_RW() macro and establish .store method in dm_sysfs_ops. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:15 -04:00
Mike Snitzer	de3ec86dff	dm: don't start current request if it would've merged with the previous Request-based DM's dm_request_fn() is so fast to pull requests off the queue that steps need to be taken to promote merging by avoiding request processing if it makes sense. If the current request would've merged with previous request let the current request stay on the queue longer. Suggested-by: Jens Axboe <axboe@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:14 -04:00
Mike Snitzer	d548b34b06	dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms Commit `7eaceaccab` ("block: remove per-queue plugging") didn't justify DM's use of a 100ms delay; such an extended delay is a liability when there is reason to re-kick the queue. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:13 -04:00
Mike Snitzer	9d1deb83d4	dm: don't schedule delayed run of the queue if nothing to do In request-based DM's dm_request_fn(), if blk_peek_request() returns NULL just return. Avoids unnecessary blk_delay_queue(). Reported-by: Jens Axboe <axboe@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:13 -04:00
Mike Snitzer	9a0e609e3f	dm: only run the queue on completion if congested or no requests pending On really fast storage it can be beneficial to delay running the request_queue to allow the elevator more opportunity to merge requests. Otherwise, it has been observed that requests are being sent to q->request_fn much quicker than is ideal on IOPS-bound backends. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:10:12 -04:00
Mike Snitzer	ff36ab3458	dm: remove request-based logic from make_request_fn wrapper The old dm_request() method used for q->make_request_fn had a branch for request-based DM support but it isn't needed given that dm_init_request_based_queue() sets it to the standard blk_queue_bio() anyway. Cleanup dm_init_md_queue() to be DM device-type agnostic and have dm_setup_md_queue() properly finish queue setup based on DM device-type (bio-based vs request-based). A followup block patch can be made to remove the export for blk_queue_bio() now that DM no longer calls it directly. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-04-15 12:08:48 -04:00
NeilBrown	47d68979cc	md/raid0: fix bug with chunksize not a power of 2. Since commit `20d0189b10` in v3.14-rc1 RAID0 has performed incorrect calculations when the chunksize is not a power of 2. This happens because "sector_div()" modifies its first argument, but this wasn't taken into account in the patch. So restore that first arg before re-using the variable. Reported-by: Joe Landman <joe.landman@gmail.com> Reported-by: Dave Chinner <david@fromorbit.com> Fixes: `20d0189b10` Cc: stable@vger.kernel.org (3.14 and later). Signed-off-by: NeilBrown <neilb@suse.de>	2015-04-10 15:36:31 +10:00
Gu Zheng	74672d069b	md: fix md io stats accounting broken Simon reported the md io stats accounting issue: " I'm seeing "iostat -x -k 1" print this after a RAID1 rebuild on 4.0-rc5. It's not abnormal other than it's 3-disk, with one being SSD (sdc) and the other two being write-mostly: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 345.00 0.00 0.00 0.00 0.00 100.00 md2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 58779.00 0.00 0.00 0.00 0.00 100.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12.00 0.00 0.00 0.00 0.00 100.00 " The cause is commit "18c0b223cf9901727ef3b02da6711ac930b4e5d4" uses the generic_start_io_acct to account the disk stats rather than the open code, but it also introduced the increase to .in_flight[rw] which is needless to md. So we re-use the open code here to fix it. Reported-by: Simon Kirby <sim@hostway.ca> Cc: <stable@vger.kernel.org> 3.19 Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-04-08 12:53:00 +10:00
Mike Snitzer	d56b9b28a4	dm: remove request-based DM queue's lld_busy_fn hook DM multipath is the only caller of blk_lld_busy() -- which calls a queue's lld_busy_fn hook. Request-based DM doesn't support stacking multipath devices so there is no reason to register the lld_busy_fn hook on a multipath device's queue using blk_queue_lld_busy(). As such, remove functions dm_lld_busy and dm_table_any_busy_target. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:49 -04:00
Mike Snitzer	52b09914af	dm: remove unnecessary wrapper around blk_lld_busy There is no need for DM to export a wrapper around the already exported blk_lld_busy(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:49 -04:00
Mike Snitzer	09c2d53101	dm: rename __dm_get_reserved_ios() helper to __dm_get_module_param() __dm_get_module_param() could be useful for future DM module parameters besides those related to "reserved_ios". Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:49 -04:00
Joe Thornber	e65ff8703f	dm cache policy mq: try not to writeback data that changed in the last second Writeback takes out a lock on the cache block, so will increase the latency for any concurrent io. This patch works by placing 2 sentinel objects on each level of the multiqueues. Every WRITEBACK_PERIOD the oldest sentinel gets moved to the newest end of the queue level. When looking for writeback work: if less than 25% of the cache is clean: we select the oldest object with the lowest hit count otherwise: we select the oldest object that is not past a writeback sentinel. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:48 -04:00
Joe Thornber	fdecee3224	dm cache policy mq: remove unused generation member of struct entry Remove to stop wasting memory. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:48 -04:00
Joe Thornber	3e45c91e5c	dm cache policy mq: track entries hit this 'tick' via sentinel objects A sentinel object is placed on each level of the multiqueues. When an object is hit it is requeued behind the sentinel. When the tick is incremented we iterate through all objects behind the sentinel and update the hit_count, then reposition the sentinel at the very back. This saves memory by avoiding tracking the tick explicitly for every struct entry object in the multiqueues. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:48 -04:00
Joe Thornber	c74ffc5c63	dm cache policy mq: remove queue_shift_down() queue_shift_down() didn't adjust the hit_counts to the new levels, so it just had the effect of scrambling levels. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:48 -04:00
Joe Thornber	75da39bf25	dm cache policy mq: keep track of the number of entries in a multiqueue Small optimisation, now queue_empty() doesn't need to walk all levels of the multiqueue. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-31 12:03:48 -04:00
Mike Snitzer	ac1f9ef211	dm log userspace: split flush_entry_pool to be per dirty-log Use a single slab cache to allocate a mempool for each dirty-log. This _should_ eliminate DM's need for io_schedule_timeout() in mempool_alloc(); so io_schedule() should be sufficient now. Also, rename struct flush_entry to dm_dirty_log_flush_entry to allow KMEM_CACHE() to create a meaningful global name for the slab cache. Also, eliminate some holes in struct log_c by rearranging members. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com>	2015-03-31 12:03:47 -04:00
Linus Torvalds	be8a9bc633	Merge tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mike Snitzer: "Fix DM core device cleanup regression -- due to a latent race that was exposed by the bdi changes that were introduced during the 4.0 merge" * tag 'dm-4.0-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: fix add_disk() NULL pointer due to race with free_dev()	2015-03-26 14:53:47 -07:00
Goldwyn Rodrigues	124eb761ed	md: Fix bitmap offset calculations The calculations of bitmap offset is incorrect with respect to bits to bytes conversion. Also, remove an irrelevant duplicate message. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NeilBrown <neilb@suse.de>	2015-03-25 13:07:55 +11:00
Mike Snitzer	63a4f065ec	dm: fix add_disk() NULL pointer due to race with free_dev() Commit `c4db59d31e` ("fs: don't reassign dirty inodes to default_backing_dev_info") exposed DM to a latent race in free_dev() vs add_disk() in relation to management of the device's minor number. Fix this by refactoring free_dev() to match cleanup order of the alloc_dev() error path. Move cleanup of the gendisk, queue, and bdev to _before_ the cleanup of the idr managed minor number. Also, purely due to cleanup that fell out during the free_dev() audit: - adjust dm_blk_close() to access the gendisk's private_data under the _minor_lock spinlock. - move __dm_destroy()'s dm_get_live_table() call out from under the _minor_lock spinlock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449 Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2015-03-23 18:14:00 -04:00

... 2 3 4 5 6 ...

3676 Commits