linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Ming Lei	25d8be77e1	block: move bio_alloc_pages() to bcache bcache is the only user of bio_alloc_pages(), so move this function into bcache, and avoid it being misused in the future. Also rename it to bch_bio_allo_pages() since it is bcache only. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Ming Lei	c2421edf5f	bcache: comment on direct access to bvec table All direct access to bvec table are safe even after multipage bvec is supported. Cc: linux-bcache@vger.kernel.org Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Ming Lei	8f50e35815	dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE For BIO based DM, some targets aren't ready for dealing with bigger incoming bio than 1Mbyte, such as crypt target. Cc: Mike Snitzer <snitzer@redhat.com> Cc:dm-devel@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Ming Lei	263663cd3c	block: convert to bio_first_bvec_all & bio_first_page_all This patch converts to bio_first_bvec_all() & bio_first_page_all() for retrieving the 1st bvec/page, and prepares for supporting multipage bvec. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-01-06 09:18:00 -07:00
Linus Torvalds	ee1b43ece1	Merge tag 'for-4.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - fix a particularly nasty DM core bug in a 4.15 refcount_t conversion. - fix various targets to dm_register_target after module __init resources created; otherwise racing lvm2 commands could result in a NULL pointer during initialization of associated DM kernel module. - fix regression in bio-based DM multipath queue_if_no_path handling. - fix DM bufio's shrinker to reclaim more than one buffer per scan. * tag 'for-4.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix shrinker scans when (nr_to_scan < retain_target) dm mpath: fix bio-based multipath queue_if_no_path handling dm: fix various targets to dm_register_target after module __init resources created dm table: fix regression from improper dm_dev_internal.count refcount_t conversion	2017-12-15 12:53:37 -08:00
Suren Baghdasaryan	fbc7c07ec2	dm bufio: fix shrinker scans when (nr_to_scan < retain_target) When system is under memory pressure it is observed that dm bufio shrinker often reclaims only one buffer per scan. This change fixes the following two issues in dm bufio shrinker that cause this behavior: 1. ((nr_to_scan - freed) <= retain_target) condition is used to terminate slab scan process. This assumes that nr_to_scan is equal to the LRU size, which might not be correct because do_shrink_slab() in vmscan.c calculates nr_to_scan using multiple inputs. As a result when nr_to_scan is less than retain_target (64) the scan will terminate after the first iteration, effectively reclaiming one buffer per scan and making scans very inefficient. This hurts vmscan performance especially because mutex is acquired/released every time dm_bufio_shrink_scan() is called. New implementation uses ((LRU size - freed) <= retain_target) condition for scan termination. LRU size can be safely determined inside __scan() because this function is called after dm_bufio_lock(). 2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to determine number of freeable objects in the slab. However dm_bufio always retains retain_target buffers in its LRU and will terminate a scan when this mark is reached. Therefore returning the entire LRU size from dm_bufio_shrink_count() is misleading because that does not represent the number of freeable objects that slab will reclaim during a scan. Returning (LRU size - retain_target) better represents the number of freeable objects in the slab. This way do_shrink_slab() returns 0 when (LRU size < retain_target) and vmscan will not try to scan this shrinker avoiding scans that will not reclaim any memory. Test: tested using Android device running <AOSP>/system/extras/alloc-stress that generates memory pressure and causes intensive shrinker scans Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-08 10:54:25 -05:00
Mike Snitzer	c1fd0abee0	dm mpath: fix bio-based multipath queue_if_no_path handling Commit `ca5beb76` ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's "test_02_sdev_delete". Restoring the logic that existed prior to commit `ca5beb76` fixes this bio-based DM-multipath regression. Also verified all mptest tests pass with request-based DM-multipath. This commit effectively reverts commit `ca5beb76` -- but it does so without reintroducing the need to take the m->lock spinlock in must_push_back_{rq,bio}. Fixes: `ca5beb76` ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-08 10:49:40 -05:00
monty_pavel@sina.com	7e6358d244	dm: fix various targets to dm_register_target after module __init resources created A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>" processes race to load the dm-thin-pool module: PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange" #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9 0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992 0000002 [ffff883cd743d730] oops_end at ffffffff81515c90 0000003 [ffff883cd743d760] no_context at ffffffff81049f1b 0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5 0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce 0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f 0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae 0000008 [ffff883cd743d980] page_fault at ffffffff81514f95 [exception RIP: kmem_cache_alloc+108] RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046 RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0 RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000 RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000 R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0 R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5 0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083 0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4 0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool] 0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod] 0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod] 0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod] The race results in a NULL pointer because: Process A (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target not registered; c. modprobe dm_thin_pool and wait until end. Process B (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target registered; c. table_load->dm_table_add_target->pool_ctr; d. _new_mapping_cache is NULL and panic. Note: 1. process A and process B are two concurrent processes. 2. pool_target can be detected by process B but _new_mapping_cache initialization has not ended. To fix dm-thin-pool, and other targets (cache, multipath, and snapshot) with the same problem, simply dm_register_target() after all resources created during module init (as labelled with __init) are finished. Cc: stable@vger.kernel.org Signed-off-by: monty <monty_pavel@sina.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-04 10:23:10 -05:00
Mike Snitzer	afc567a497	dm table: fix regression from improper dm_dev_internal.count refcount_t conversion Multiple refcounts are needed if the device was already added. The micro-optimization of setting the refcount to 1 on first added (rather than fall thru to a common refcount_inc) lost sight of the fact that the refcount_inc is also needed for the case when the device already exists and the mode need not be upgraded. Fixes: `2a0b4682e0` ("dm: convert dm_dev_internal.count from atomic_t to refcount_t") Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-12-04 10:23:10 -05:00
Shaohua Li	18022a1bd3	md/raid1/10: add missed blk plug flush_pending_writes isn't always called with block plug, so add it, and plug works in nested way. Signed-off-by: Shaohua Li <shli@fb.com>	2017-12-01 12:19:48 -08:00
Nate Dailey	d2e2ec8222	md: limit mdstat resync progress to max_sectors There is a small window near the end of md_do_sync where mddev->curr_resync can be equal to MaxSector. If status_resync is called during this window, the resulting /proc/mdstat output contains a HUGE number of = signs due to the very large curr_resync: Personalities : [raid1] md123 : active raid1 sdd3[2] sdb3[0] 204736 blocks super 1.0 [2/1] [U_] [===================================================================== ... (82 MB more) ... ================>] recovery =429496729.3% (9223372036854775807/204736) finish=0.2min speed=12796K/sec bitmap: 0/1 pages [0KB], 65536KB chunk Modify status_resync to ensure the resync variable doesn't exceed the array's max_sectors. Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-12-01 12:19:47 -08:00
Song Liu	ff35f58e8f	md/r5cache: move mddev_lock() out of r5c_journal_mode_set() r5c_journal_mode_set() is called by r5c_journal_mode_store() and raid_ctr() in dm-raid. We don't need mddev_lock() when calling from raid_ctr(). This patch fixes this by moves the mddev_lock() to r5c_journal_mode_store(). Cc: stable@vger.kernel.org (v4.13+) Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-12-01 11:27:32 -08:00
bingjingc	aff69d89bd	md/raid5: correct degraded calculation in raid5_error When disk failure occurs on new disks for reshape, mddev->degraded is not calculated correctly. Faulty bit of the failure device is not set before raid5_calc_degraded(conf). mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/loop[012] mdadm /dev/md0 -a /dev/loop3 mdadm /dev/md0 --grow -n4 mdadm /dev/md0 -f /dev/loop3 # simulating disk failure cat /sys/block/md0/md/degraded # it outputs 0, but it should be 1. However, mdadm -D /dev/md0 will show that it is degraded. It's a bug. It can be fixed by moving the resources raid5_calc_degraded() depends on before it. Reported-by: Roy Chung <roychung@synology.com> Reviewed-by: Alex Wu <alexwu@synology.com> Signed-off-by: BingJing Chang <bingjingc@synology.com> Signed-off-by: Shaohua Li <shli@fb.com>	2017-12-01 11:27:32 -08:00
Michael Lyle	6c4ca1e36c	bcache: check return value of register_shrinker register_shrinker is now __must_check, so check it to kill a warning. Caller of bch_btree_cache_alloc in super.c appropriately checks return value so this is fully plumbed through. This V2 fixes checkpatch warnings and improves the commit description, as I was too hasty getting the previous version out. Signed-off-by: Michael Lyle <mlyle@lyle.org> Reviewed-by: Vojtech Pavlik <vojtech@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-24 16:23:01 -07:00
Rui Hua	e393aa2446	bcache: recover data from backing when data is clean When we send a read request and hit the clean data in cache device, there is a situation called cache read race in bcache(see the commit in the tail of cache_look_up(), the following explaination just copy from there): The bucket we're reading from might be reused while our bio is in flight, and we could then end up reading the wrong data. We guard against this by checking (in bch_cache_read_endio()) if the pointer is stale again; if so, we treat it as an error (s->iop.error = -EINTR) and reread from the backing device (but we don't pass that error up anywhere) It should be noted that cache read race happened under normal circumstances, not the circumstance when SSD failed, it was counted and shown in /sys/fs/bcache/XXX/internal/cache_read_races. Without this patch, when we use writeback mode, we will never reread from the backing device when cache read race happened, until the whole cache device is clean, because the condition (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR) will be passed up, at last, user will receive -EINTR when it's bio end, this is not suitable, and wield to up-application. In this patch, we use s->read_dirty_data to judge whether the read request hit dirty data in cache device, it is safe to reread data from the backing device when the read request hit clean data. This can not only handle cache read race, but also recover data when failed read request from cache device. [edited by mlyle to fix up whitespace, commit log title, comment spelling] Fixes: `d59b237959` ("bcache: only permit to recovery read error when cache device is clean") Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: Hua Rui <huarui.dev@gmail.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-24 16:22:59 -07:00
Huacai Chen	cf33c1ee52	bcache: Fix building error on MIPS This patch try to fix the building error on MIPS. The reason is MIPS has already defined the PTR macro, which conflicts with the PTR macro in include/uapi/linux/bcache.h. [fixed by mlyle: corrected a line-length issue] Cc: stable@vger.kernel.org Signed-off-by: Huacai Chen <chenhc@lemote.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-24 16:22:58 -07:00
Tang Junhui	bb22cafd75	bcache: add a comment in journal bucket reading Journal bucket is a circular buffer, the bucket can be like YYYNNNYY, which means the first valid journal in the 7th bucket, and the latest valid journal in third bucket, in this case, if we do not try we the zero index first, We may get a valid journal in the 7th bucket, then we call find_next_bit(bitmap,ca->sb.njournal_buckets, l + 1) to get the first invalid bucket after the 7th bucket, because all these buckets is valid, so no bit 1 in bitmap, thus find_next_bit() function would return with ca->sb.njournal_buckets (8). So, after that, bcache only read journal in 7th and 8the bucket, the first to the third buckets are lost. So, it is important to let developer know that, we need to try the zero index at first in the hash-search, and avoid any breaks in future's code modification. [ML: Fixed whitespace & formatting & file permissions] Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Signed-off-by: Michael Lyle <mlyle@lyle.org> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-11-24 16:22:55 -07:00
Linus Torvalds	06ede5f608	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull more block layer updates from Jens Axboe: "A followup pull request, with some parts that either needed a bit more testing before going in, merge sync, or just later arriving fixes. This contains: - Timer related updates from Kees. These were purposefully delayed since I didn't want to pull in a later v4.14-rc tag to my block tree. - ide-cd prep sense buffer fix from Bart. Also delayed, as not to clash with the late fix we put into 4.14-rc. - Small BFQ updates series from Luca and Paolo. - Single nvmet fix from James, fixing a non-functional case there. - Bio fast clone fix from Michael, which made bcache return the wrong data for some cases. - Legacy IO path regression hang fix from Ming" * 'for-linus' of git://git.kernel.dk/linux-block: bio: ensure __bio_clone_fast copies bi_partno nvmet_fc: fix better length checking block: wake up all tasks blocked in get_request() block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP block, bfq: update blkio stats outside the scheduler lock block, bfq: add missing invocations of bfqg_stats_update_io_add/remove doc, block, bfq: update max IOPS sustainable with BFQ ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer md: Convert timers to use timer_setup() block: swim3: Convert timers to use timer_setup() block/aoe: Convert timers to use timer_setup() amifloppy: Convert timers to use timer_setup() block/floppy: Convert callback to pass timer_list	2017-11-17 10:56:56 -08:00
Linus Torvalds	adeba81ac2	Merge tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull more device mapper updates from Mike Snitzer: "Given your expected travel I figured I'd get these fixes to you sooner rather than later. - a DM multipath stable@ fix to silence an annoying error message that isn't _really_ an error - a DM core @stable fix for discard support that was enabled for an entire DM device despite only having partial support for discards due to a mix of discard capabilities across the underlying devices. - a couple other DM core discard fixes. - a DM bufio @stable fix that resolves a 32-bit overflow" * tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix integer overflow when limiting maximum cache size dm: clear all discard attributes in queue_limits when discards are disabled dm: do not set 'discards_supported' in targets that do not need it dm: discard support requires all targets in a table support discards dm mpath: remove annoying message of 'blk_get_request() returned -11'	2017-11-17 09:40:12 -08:00
Eric Biggers	74d4108d9e	dm bufio: fix integer overflow when limiting maximum cache size The default max_cache_size_bytes for dm-bufio is meant to be the lesser of 25% of the size of the vmalloc area and 2% of the size of lowmem. However, on 32-bit systems the intermediate result in the expression (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100 overflows, causing the wrong result to be computed. For example, on a 32-bit system where the vmalloc area is 520093696 bytes, the result is 1174405 rather than the expected 130023424, which makes the maximum cache size much too small (far less than 2% of lowmem). This causes severe performance problems for dm-verity users on affected systems. Fix this by using mult_frac() to correctly multiply by a percentage. Do this for all places in dm-bufio that multiply by a percentage. Also replace (VMALLOC_END - VMALLOC_START) with VMALLOC_TOTAL, which contrary to the comment is now defined in include/linux/vmalloc.h. Depends-on: `9993bc635` ("sched/x86: Fix overflow in cyc2ns_offset") Fixes: `95d402f057` ("dm: add bufio") Cc: <stable@vger.kernel.org> # v3.2+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-11-16 16:49:57 -05:00
Mike Snitzer	5d47c89f29	dm: clear all discard attributes in queue_limits when discards are disabled Otherwise, it can happen that the QUEUE_FLAG_DISCARD isn't set but the various discard attributes (which get exposed via sysfs) may be set. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-11-16 16:33:55 -05:00
Mike Snitzer	7dea378b23	dm: do not set 'discards_supported' in targets that do not need it The DM target's 'discards_supported' flag is intended to act as an override. Meaning, even if the underlying storage doesn't support discards the DM target will. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-11-16 16:33:54 -05:00
Mike Snitzer	8a74d29d54	dm: discard support requires all targets in a table support discards A DM device with a mix of discard capabilities (due to some underlying devices not having discard support) _should_ just return -EOPNOTSUPP for the region of the device that doesn't support discards (even if only by way of the underlying driver formally not supporting discards). BUT, that does ask the underlying driver to handle something that it never advertised support for. In doing so we're exposing users to the potential for a underlying disk driver hanging if/when a discard is issued a the device that is incapable and never claimed to support discards. Fix this by requiring that each DM target in a DM table provide discard support as a prereq for a DM device to advertise support for discards. This may cause some configurations that were happily supporting discards (even in the face of a mix of discard support) to stop supporting discards -- but the risk of users hitting driver hangs, and forced reboots, outweighs supporting those fringe mixed discard configurations. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-11-16 16:33:53 -05:00
Ming Lei	9dc112e2da	dm mpath: remove annoying message of 'blk_get_request() returned -11' It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit `7083abbbf`. Fixes: `7083abbbf` ("dm mpath: avoid that path removal can trigger an infinite loop") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2017-11-16 16:29:43 -05:00
Linus Torvalds	1be2172e96	Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "Summary of modules changes for the 4.15 merge window: - treewide module_param_call() cleanup, fix up set/get function prototype mismatches, from Kees Cook - minor code cleanups" * tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Do not paper over type mismatches in module_param_call() treewide: Fix function prototypes for module_param_call() module: Prepare to convert all module_param_call() prototypes kernel/module: Delete an error message for a failed memory allocation in add_module_usage()	2017-11-15 13:46:33 -08:00

1 2 3 4 5 ...

5126 Commits