linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Tejun Heo	a9520cd6f2	blkcg: make blkcg_policy methods take a pointer to blkcg_policy_data The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd (blkg_policy_data) while the older ones use blkg (blkcg_gq). As using blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd can always be mapped to blkg and given that these are policy-specific methods, it makes sense to converge on pd. This patch makes all methods deal with pd instead of blkg. Most conversions are trivial. In blk-cgroup.c, a couple method invocation sites now test whether pd exists instead of policy state for consistency. This shouldn't cause any behavioral differences. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:17 -07:00
Tejun Heo	b2ce2643cc	blk-throttle: clean up blkg_policy_data alloc/init/exit/free methods With the recent addition of alloc and free methods, things became messier. This patch reorganizes them according to the followings. * ->pd_alloc_fn() Responsible for allocation and static initializations - the ones which can be done independent of where the pd might be attached. * ->pd_init_fn() Initializations which require the knowledge of where the pd is attached. * ->pd_free_fn() The counter part of pd_alloc_fn(). Static de-init and freeing. This leaves ->pd_exit_fn() without any users. Removed. While at it, collapse an one liner function throtl_pd_exit(), which has only one user, into its user. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:17 -07:00
Tejun Heo	4fb72036fb	blk-throttle: remove asynchrnous percpu stats allocation mechanism Because percpu allocator couldn't do non-blocking allocations, blk-throttle was forced to implement an ad-hoc asynchronous allocation mechanism for its percpu stats for cases where blkg's (blkcg_gq's) are allocated from an IO path without sleepable context. Now that percpu allocator can handle gfp_mask and blkg_policy_data alloc / free are handled by policy methods, the ad-hoc asynchronous allocation mechanism can be replaced with direct allocation from tg_stats_alloc_fn(). Rit it out. This ensures that an active throtl_grp always has valid non-NULL ->stats_cpu. Remove checks on it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	001bea73e7	blkcg: replace blkcg_policy->pd_size with ->pd_alloc/free_fn() methods A blkg (blkcg_gq) represents the relationship between a cgroup and request_queue. Each active policy has a pd (blkg_policy_data) on each blkg. The pd's were allocated by blkcg core and each policy could request to allocate extra space at the end by setting blkcg_policy->pd_size larger than the size of pd. This is a bit unusual but was done this way mostly to simplify error handling and all the existing use cases could be handled this way; however, this is becoming too restrictive now that percpu memory can be allocated without blocking. This introduces two new mandatory blkcg_policy methods - pd_alloc_fn() and pd_free_fn() - which are used to allocate and release pd for a given policy. As pd allocation is now done from policy side, it can simply allocate a larger area which embeds pd at the beginning. This change makes ->pd_size pointless. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	3e41871046	blkcg: make blkcg_activate_policy() allow NULL ->pd_init_fn blkg_create() allows NULL ->pd_init_fn() but blkcg_activate_policy() doesn't. As both in-kernel policies implement ->pd_init_fn, it currently doesn't break anything. Update blkcg_activate_policy() so that its behavior is consistent with blkg_create(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	4c55f4f9ad	blkcg: restructure blkg_policy_data allocation in blkcg_activate_policy() When a policy gets activated, it needs to allocate and install its policy data on all existing blkg's (blkcg_gq's). Because blkg iteration is protected by a spinlock, it currently counts the total number of blkg's in the system, allocates the matching number of policy data on a list and installs them during a single iteration. This can be simplified by using speculative GFP_NOWAIT allocations while iterating and falling back to a preallocated policy data on failure. If the preallocated one has already been consumed, it releases the lock, preallocate with GFP_KERNEL and then restarts the iteration. This can be a bit more expensive than before but policy activation is a very cold path and shouldn't matter. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	bc915e61cd	blkcg: remove unnecessary blkcg_root handling from css_alloc/free paths blkcg_css_alloc() bypasses policy data allocation and blkcg_css_free() bypasses policy data and blkcg freeing for blkcg_root. There's no reason to to treat policy data any differently for blkcg_root. If the root css gets allocated after policies are registered, policy registration path will add policy data; otherwise, the alloc path will. The free path isn't never invoked for root csses. This patch removes the unnecessary special handling of blkcg_root from css_alloc/free paths. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	994b783274	blkcg: use blkg_free() in blkcg_init_queue() failure path When blkcg_init_queue() fails midway after creating a new blkg, it performs kfree() directly; however, this doesn't free the policy data areas. Make it use blkg_free() instead. In turn, blkg_free() is updated to handle root request_list special case. While this fixes a possible memory leak, it's on an unlikely failure path of an already cold path and the size leaked per occurrence is miniscule too. I don't think it needs to be tagged for -stable. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	401efbf835	blkcg: remove unnecessary request_list->blkg NULL test in blk_put_rl() Since `ec13b1d6f0` ("blkcg: always create the blkcg_gq for the root blkcg"), a request_list always has its blkg associated. Drop unnecessary rl->blkg NULL test from blk_put_rl(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	60a837077e	cfq-iosched: charge async IOs to the appropriate blkcg's instead of the root Up until now, all async IOs were queued to async queues which are shared across the whole request_queue, which means that blkcg resource control is completely void on async IOs including all writeback IOs. It was done this way because writeback didn't support writeback and there was no way of telling which writeback IO belonged to which cgroup; however, writeback recently became cgroup aware and writeback bio's are sent down properly tagged with the blkcg's to charge them against. This patch makes async cfq_queues per-cfq_cgroup instead of per-cfq_data so that each async IO is charged to the blkcg that it was tagged for instead of unconditionally attributing it to root. * cfq_data->async_cfqq and ->async_idle_cfqq are moved to cfq_group and alloc / destroy paths are updated accordingly. * cfq_link_cfqq_cfqg() no longer overrides @cfqg to root for async queues. * check_blkcg_changed() now also invalidates async queues as they no longer stay the same across cgroups. After this patch, cfq's proportional IO control through blkio.weight works correctly when cgroup writeback is in use. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	d4aad7ff04	cfq-iosched: fold cfq_find_alloc_queue() into cfq_get_queue() cfq_find_alloc_queue() checks whether a queue actually needs to be allocated, which is unnecessary as its sole caller, cfq_get_queue(), only calls it if so. Also, the oom queue fallback logic is scattered between cfq_get_queue() and cfq_find_alloc_queue(). There really isn't much going on in the latter and things can be made simpler by folding it into cfq_get_queue(). This patch collapses cfq_find_alloc_queue() into cfq_get_queue(). The change is fairly straight-forward with one exception - async_cfqq is now initialized to NULL and the "!is_sync" test in the last if conditional is replaced with "async_cfqq" test. This is because gcc (5.1.1) gets confused for some reason and warns that async_cfqq may be used uninitialized otherwise. Oh well, the code isn't necessarily worse this way. This patch doesn't cause any functional difference. v2: Updated to reflect GFP_ATOMIC -> GPF_NOWAIT. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	322731ed0d	cfq-iosched: move cfq_group determination from cfq_find_alloc_queue() to cfq_get_queue() This is necessary for making async cfq_cgroups per-cfq_group instead of per-cfq_data. While this change makes cfq_get_queue() perform RCU locking and look up cfq_group even when it reuses async queue, the extra overhead is extremely unlikely to be noticeable given that this is already sitting behind cic->cfqq[] cache and the overall cost of cfq operation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	2da8de0bb7	cfq-iosched: remove @gfp_mask from cfq_find_alloc_queue() Even when allocations fail, cfq_find_alloc_queue() always returns a valid cfq_queue by falling back to the oom cfq_queue. As such, there isn't much point in taking @gfp_mask and trying "harder" if __GFP_WAIT is set. GFP_NOWAIT allocations don't fail often and even when they do the degraded behavior is acceptable and temporary. After all, the only reason get_request(), which ultimately determines the gfp_mask, cares about __GFP_WAIT is to guarantee request allocation, assuming IO forward progress, for callers which are willing to wait. There's no reason for cfq_find_alloc_queue() to behave differently on __GFP_WAIT when it already has a fallback mechanism. Remove @gfp_mask from cfq_find_alloc_queue() and propagate the changes to its callers. This simplifies the function quite a bit and will help making async queues per-cfq_group. v2: Updated to reflect GFP_ATOMIC -> GPF_NOWAIT. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	d93a11f1cd	blkcg, cfq-iosched: use GFP_NOWAIT instead of GFP_ATOMIC for non-critical allocations blkcg performs several allocations to track IOs per cgroup and enforce resource control. Most of these allocations are performed lazily on demand in the IO path and thus can't involve reclaim path. Currently, these allocations use GFP_ATOMIC; however, blkcg can gracefully deal with occassional failures of these allocations by punting IOs to the root cgroup and there's no reason to reach into the emergency reserve. This patch replaces GFP_ATOMIC with GFP_NOWAIT for the following allocations. * bdi_writeback_congested and blkcg_gq allocations in blkg_create(). * radix tree node allocations for blkcg->blkg_tree. * cfq_queue allocation on ioprio changes. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-and-Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Suggested-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	563180a44b	cfq-iosched: minor cleanups * Some were accessing cic->cfqq[] directly. Always use cic_to_cfqq() and cic_set_cfqq(). * check_ioprio_changed() doesn't need to verify cfq_get_queue()'s return for NULL. It's always non-NULL. Simplify accordingly. This patch doesn't cause any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	bce6133b09	cfq-iosched: fix oom cfq_queue ref leak in cfq_set_request() If the cfq_queue cached in cfq_io_cq is the oom one, cfq_set_request() replaces it by invoking cfq_get_queue() again without putting the oom queue leaking the reference it was holding. While oom queues are not released through reference counting, they're still reference counted and this can theoretically lead to the reference count overflowing and incorrectly invoke the usual release path on it. Fix it by making cfq_set_request() put the ref it was holding. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:16 -07:00
Tejun Heo	95e5d6f626	cfq-iosched: fix async oom queue handling Async cfqq's (cfq_queue's) are shared across cfq_data. When cfq_get_queue() obtains a new queue from cfq_find_alloc_queue(), it stashes the pointer in cfq_data and reuses it from then on; however, the function doesn't consider that cfq_find_alloc_queue() may return the oom_cfqq under memory pressure and installs the returned queue unconditionally. If the oom_cfqq is installed as an async cfqq, cfq_set_request() will continue calling cfq_get_queue() hoping to replace it with a proper queue; however, cfq_get_queue() will keep returning the cached queue for the slot - the oom_cfqq. Fix it by skipping caching if the queue is the oom one. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	4ebc1c61d6	cfq-iosched: simplify control flow in cfq_get_queue() cfq_get_queue()'s control flow looks like the following. async_cfqq = NULL; cfqq = NULL; if (!is_sync) { ... async_cfqq = ...; cfqq = async_cfqq; } if (!cfqq) cfqq = ...; if (!is_sync && !(async_cfqq)) ...; The only thing the local variable init, the second if, and the async_cfqq test in the third if achieves is to skip cfqq creation and installation if async_cfqq was already non-NULL. This is needlessly complicated with different tests examining the same condition. Simplify it to the following. if (!is_sync) { ... async_cfqq = ...; cfqq = async_cfqq; if (cfqq) goto out; } cfqq = ...; if (!is_sync) ...; out: Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	5634cc2aa9	writeback: update writeback tracepoints to report cgroup The following tracepoints are updated to report the cgroup used during cgroup writeback. * writeback_write_inode[_start] * writeback_queue * writeback_exec * writeback_start * writeback_written * writeback_wait * writeback_nowork * writeback_wake_background * wbc_writepage * writeback_queue_io * bdi_dirty_ratelimit * balance_dirty_pages * writeback_sb_inodes_requeue * writeback_single_inode[_start] Note that writeback_bdi_register is separated out from writeback_class as reporting cgroup doesn't make sense to it. Tracepoints which take bdi are updated to take bdi_writeback instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	9acee9c551	kernfs: implement kernfs_path_len() Add a function to determine the path length of a kernfs node. This for now will be used by writeback tracepoint updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	60292bcc1b	writeback: explain why @inode is allowed to be NULL for inode_congested() Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	8a1270cda7	writeback: remove wb_writeback_work->single_wait/done wb_writeback_work->single_wait/done are used for the wait mechanism for synchronous wb_work (wb_writeback_work) items which are issued when bdi_split_work_to_wbs() fails to allocate memory for asynchronous wb_work items; however, there's no reason to use a separate wait mechanism for this. bdi_split_work_to_wbs() can simply use on-stack fallback wb_work item and separate wb_completion to wait for it. This patch removes wb_work->single_wait/done and the related code and make bdi_split_work_to_wbs() use on-stack fallback wb_work and wb_completion instead. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Tejun Heo	1ed8d48c57	writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an earlier implementation, wb's were keyed by blkcg ID. bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and allows iterations to start from an arbitrary ID which is used to interrupt and resume iterations. Unfortunately, while changing wb to be keyed by memcg ID instead of blkcg, bdi_for_each_wb() was missed and is still assuming that wb's are keyed by blkcg ID. This doesn't affect iterations which don't get interrupted but bdi_split_work_to_wbs() makes use of iteration resuming on allocation failures and thus may incorrectly skip or repeat wb's. Fix it by changing bdi_for_each_wb() to take memcg IDs instead of blkcg IDs and updating bdi_split_work_to_wbs() accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-08-18 15:49:15 -07:00
Jens Axboe	11743ee047	Merge branch 'for-4.3-unified-base' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup into for-4.3/blkcg	2015-08-18 15:49:11 -07:00
Tejun Heo	3e1d2eed39	cgroup: introduce cgroup_subsys->legacy_name This allows cgroup subsystems to use a different name on the unified hierarchy. cgroup_subsys->name is used on the unified hierarchy, ->legacy_name elsewhere. If ->legacy_name is not explicitly set, it's automatically set to ->name and the userland visible behavior remains unchanged. v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount options are used only on legacy hierarchies. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org	2015-08-18 13:58:16 -07:00

1 2 3 4 5 ...

534132 Commits