linux-apfs

mirror of https://github.com/linux-apfs/linux-apfs.git synced 2026-05-01 15:00:59 -07:00

Author	SHA1	Message	Date
Li Zefan	084457f284	cgroup: fix umount vs cgroup_cfts_commit() race cgroup_cfts_commit() uses dget() to keep cgroup alive after cgroup_mutex is dropped, but dget() won't prevent cgroupfs from being umounted. When the race happens, vfs will see some dentries with non-zero refcnt while umount is in process. Keep running this: mount -t cgroup -o blkio xxx /cgroup umount /cgroup And this: modprobe cfq-iosched rmmod cfs-iosched After a while, the BUG() in shrink_dcache_for_umount_subtree() may be triggered: BUG: Dentry xxx{i=0,n=blkio.yyy} still in use (1) [umount of cgroup cgroup] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2013-06-18 09:04:30 -07:00
Tejun Heo	6db8e85c5c	cgroup: disallow rename(2) if sane_behavior cgroup's rename(2) isn't a proper migration implementation - it can't move the cgroup to a different parent in the hierarchy. All it can do is swapping the name string for that cgroup. This isn't useful and can mislead users to think that cgroup supports proper cgroup-level migration. Disallow rename(2) if sane_behavior. v2: Fail with -EPERM instead of -EINVAL so that it matches the vfs return value when ->rename is not implemented as suggested by Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-18 08:14:23 -07:00
Tejun Heo	f63674fd0d	cgroup: update sane_behavior documentation `f12dc02014` ("cgroup: mark "tasks" cgroup file as insane") and `cc5943a781` ("cgroup: mark "notify_on_release" and "release_agent" cgroup files insane") forgot to update the changed behavior documentation in cgroup.h. Update it. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 20:02:26 -07:00
Tejun Heo	d3daf28da1	cgroup: use percpu refcnt for cgroup_subsys_states A css (cgroup_subsys_state) is how each cgroup is represented to a controller. As such, it can be used in hot paths across the various subsystems different controllers are associated with. One of the common operations is reference counting, which up until now has been implemented using a global atomic counter and can have significant adverse impact on scalability. For example, css refcnt can be gotten and put multiple times by blkcg for each IO request. For highops configurations which try to do as much per-cpu as possible, the global frequent refcnting can be very expensive. In general, given the various and hugely diverse paths css's end up being used from, we need to make it cheap and highly scalable. In its usage, css refcnting isn't very different from module refcnting. This patch converts css refcnting to use the recently added percpu_ref. css_get/tryget/put() directly maps to the matching percpu_ref operations and the deactivation logic is no longer necessary as percpu_ref already has refcnt killing. The only complication is that as the refcnt is per-cpu, percpu_ref_kill() in itself doesn't ensure that further tryget operations will fail, which we need to guarantee before invoking ->css_offline()'s. This is resolved collecting kill confirmation using percpu_ref_kill_and_confirm() and initiating the offline phase of destruction after all css refcnt's are confirmed to be seen as killed on all CPUs. The previous patches already splitted destruction into two phases, so percpu_ref_kill_and_confirm() can be hooked up easily. This patch removes css_refcnt() which is used for rcu dereference sanity check in css_id(). While we can add a percpu refcnt API to ask the same question, css_id() itself is scheduled to be removed fairly soon, so let's not bother with it. Just drop the sanity check and use rcu_dereference_raw() instead. v2: - init_cgroup_css() was calling percpu_ref_init() without checking the return value. This causes two problems - the obvious lack of error handling and percpu_ref_init() being called from cgroup_init_subsys() before the allocators are up, which triggers warnings but doesn't cause actual problems as the refcnt isn't used for roots anyway. Fix both by moving percpu_ref_init() to cgroup_create(). - The base references were put too early by percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the refs one extra time. This wasn't noticeable because css's go through another RCU grace period before being freed. Update cgroup_destroy_locked() to grab an extra reference before killing the refcnts. This problem was noticed by Kent. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Alasdair G. Kergon" <agk@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Glauber Costa <glommer@gmail.com>	2013-06-13 19:43:12 -07:00
Tejun Heo	2b0e53a7c8	Merge branch 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.11 This is to receive percpu_refcount which will replace atomic_t reference count in cgroup_subsys_state. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 19:42:22 -07:00
Tejun Heo	ea15f8ccdb	cgroup: split cgroup destruction into two steps Split cgroup_destroy_locked() into two steps and put the latter half into cgroup_offline_fn() which is executed from a work item. The latter half is responsible for offlining the css's, removing the cgroup from internal lists, and propagating release notification to the parent. The separation is to allow using percpu refcnt for css. Note that this allows for other cgroup operations to happen between the first and second halves of destruction, including creating a new cgroup with the same name. As the target cgroup is marked DEAD in the first half and cgroup internals don't care about the names of cgroups, this should be fine. A comment explaining this will be added by the next patch which implements the actual percpu refcnting. As RCU freeing is guaranteed to happen after the second step of destruction, we can use the same work item for both. This patch renames cgroup->free_work to ->destroy_work and uses it for both purposes. INIT_WORK() is now performed right before queueing the work item. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 19:27:42 -07:00
Tejun Heo	455050d23e	cgroup: reorder the operations in cgroup_destroy_locked() This patch reorders the operations in cgroup_destroy_locked() such that the userland visible parts happen before css offlining and removal from the ->sibling list. This will be used to make css use percpu refcnt. While at it, split out CGRP_DEAD related comment from the refcnt deactivation one and correct / clarify how different guarantees are met. While this patch changes the specific order of operations, it shouldn't cause any noticeable behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 19:27:41 -07:00
Tejun Heo	dbece3a0f1	percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() Implement percpu_tryget() which stops giving out references once the percpu_ref is visible as killed. Because the refcnt is per-cpu, different CPUs will start to see a refcnt as killed at different points in time and tryget() may continue to succeed on subset of cpus for a while after percpu_ref_kill() returns. For use cases where it's necessary to know when all CPUs start to see the refcnt as dead, percpu_ref_kill_and_confirm() is added. The new function takes an extra argument @confirm_kill which is invoked when the refcnt is guaranteed to be viewed as killed on all CPUs. While this isn't the prettiest interface, it doesn't force synchronous wait and is much safer than requiring the caller to do its own call_rcu(). v2: Patch description rephrased to emphasize that tryget() may continue to succeed on some CPUs after kill() returns as suggested by Kent. v3: Function comment in percpu_ref_kill_and_confirm() updated warning people to not depend on the implied RCU grace period from the confirm callback as it's an implementation detail. Signed-off-by: Tejun Heo <tj@kernel.org> Slightly-Grumpily-Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-13 19:23:53 -07:00
Tejun Heo	bc497bd33b	percpu-refcount: implement percpu_ref_cancel_init() Normally, percpu_ref_init() initializes and percpu_ref_kill() initiates destruction which completes asynchronously. The asynchronous destruction can be problematic in init failure path where the caller wants to destroy half-constructed object - distinguishing half-constructed objects from the usual release method can be painful for complex objects. This patch implements percpu_ref_cancel_init() which synchronously destroys the percpu_ref without invoking release. To avoid unintentional misuses, the function requires the ref to have finished percpu_ref_init() but never used and triggers WARN otherwise. v2: Explain the weird name and usage restriction in the function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-13 11:08:27 -07:00
Tejun Heo	acac7883ee	percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu() Two small changes. * Unlike most init functions, percpu_ref_init() allocates memory and may fail. Let's mark it with __must_check in case the caller forgets. * percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to dereference @ref->pcpu_count, which can be misleading. The pointer is guaranteed to be valid and visible and can't change underneath the function. Drop ACCESS_ONCE(). Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-13 11:08:26 -07:00
Tejun Heo	6f3d828f0f	cgroup: remove cgroup->count and use cgroup->count tracks the number of css_sets associated with the cgroup and used only to verify that no css_set is associated when the cgroup is being destroyed. It's superflous as the destruction path can simply check whether cgroup->cset_links is empty instead. Drop cgroup->count and check ->cset_links directly from cgroup_destroy_locked(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	ddd69148bd	cgroup: drop unnecessary RCU dancing from __put_css_set() __put_css_set() does RCU read access on @cgrp across dropping @cgrp->count so that it can continue accessing @cgrp even if the count reached zero and destruction of the cgroup commenced. Given that both sides - __css_put() and cgroup_destroy_locked() - are cold paths, this is unnecessary. Just making cgroup_destroy_locked() grab css_set_lock while checking @cgrp->count is enough. Remove the RCU read locking from __put_css_set() and make cgroup_destroy_locked() read-lock css_set_lock when checking @cgrp->count. This will also allow removing @cgrp->count. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	54766d4a1d	cgroup: rename CGRP_REMOVED to CGRP_DEAD We will add another flag indicating that the cgroup is in the process of being killed. REMOVING / REMOVED is more difficult to distinguish and cgroup_is_removing()/cgroup_is_removed() are a bit awkward. Also, later percpu_ref usage will involve "kill"ing the refcnt. s/CGRP_REMOVED/CGRP_DEAD/ s/cgroup_is_removed()/cgroup_is_dead() This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:18 -07:00
Tejun Heo	5de0107e63	cgroup: clean up css_[try]get() and css_put() * __css_get() isn't used by anyone. Fold it into css_get(). * Add proper function comments to all css reference functions. This patch is purely cosmetic. v2: Typo fix as per Li. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	f4f4be2bd2	cgroup: use kzalloc() instead of kmalloc() There's no point in using kmalloc() instead of the clearing variant for trivial stuff. We can live dangerously elsewhere. Use kzalloc() instead and drop 0 inits. While at it, do trivial code reorganization in cgroup_file_open(). This patch doesn't introduce any functional changes. v2: I was caught in the very distant past where list_del() didn't poison and the initial version converted list_del()s to list_del_init()s too. Li and Kent took me out of the stasis chamber. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	69d0206c79	cgroup: bring some sanity to naming around cg_cgroup_link cgroups and css_sets are mapped M:N and this M:N mapping is represented by struct cg_cgroup_link which forms linked lists on both sides. The naming around this mapping is already confusing and struct cg_cgroup_link exacerbates the situation quite a bit. >From cgroup side, it starts off ->css_sets and runs through ->cgrp_link_list. From css_set side, it starts off ->cg_links and runs through ->cg_link_list. This is rather reversed as cgrp_link_list is used to iterate css_sets and cg_link_list cgroups. Also, this is the only place which is still using the confusing "cg" for css_sets. This patch cleans it up a bit. * s/cgroup->css_sets/cgroup->cset_links/ s/css_set->cg_links/css_set->cgrp_links/ s/cgroup_iter->cg_link/cgroup_iter->cset_link/ * s/cg_cgroup_link/cgrp_cset_link/ * s/cgrp_cset_link->cg/cgrp_cset_link->cset/ s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/ s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/ * s/init_css_set_link/init_cgrp_cset_link/ s/free_cg_links/free_cgrp_cset_links/ s/allocate_cg_links/allocate_cgrp_cset_links/ * s/cgl[12]/link[12]/ in compare_css_sets() * s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar adustments. * Comment and whiteline adjustments. After the changes, we have list_for_each_entry(link, &cont->cset_links, cset_link) { struct css_set cset = link->cset; instead of list_for_each_entry(link, &cont->css_sets, cgrp_link_list) { struct css_set cset = link->cg; This patch is purely cosmetic. v2: Fix broken sentences in the patch description. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	5abb885573	cgroup: consistently use @cset for struct css_set variables cgroup.c uses @cg for most struct css_set variables, which in itself could be a bit confusing, but made much worse by the fact that there are places which use @cg for struct cgroup variables. compare_css_sets() epitomizes this confusion - @[old_]cg are struct css_set while @cg[12] are struct cgroup. It's not like the whole deal with cgroup, css_set and cg_cgroup_link isn't already confusing enough. Let's give it some sanity by uniformly using @cset for all struct css_set variables. * s/cg/cset/ for all css_set variables. * s/oldcg/old_cset/ s/oldcgrp/old_cgrp/. The same for the ones prefixed with "new". * s/cg/cgrp/ for cgroup variables in compare_css_sets(). * s/css/cset/ for the cgroup variable in task_cgroup_from_root(). * Whiteline adjustments. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	3fc3db9a3a	cgroup: remove now unused css_depth() Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-13 10:55:17 -07:00
Tejun Heo	ac899061a9	percpu-refcount: cosmetic updates * s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t postfix for types and the type is gonna be used for a different type of callback too. * Add @ARG to function comments. * Drop unnecessary and unaligned indentation from percpu_ref_init() function comment. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-12 20:43:06 -07:00
Tejun Heo	6a24474da8	percpu-refcount: consistently use plain (non-sched) RCU percpu_ref_get/put() are using preempt_disable/enable() while percpu_ref_kill() is using plain call_rcu() instead of call_rcu_sched(). This is buggy as grace periods of the two may not match. Fix it by using plain RCU in percpu_ref_get/put(). (I suggested using sched RCU in the first place but there's no actual benefit in doing so unless we're gonna introduce different variants of get/put to be called while preemption is alredy disabled, which we definitely shouldn't.) Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kent Overstreet <koverstreet@google.com>	2013-06-12 20:43:06 -07:00
Tejun Heo	d5c56ced77	cgroup: clean up the cftype array for the base cgroup files * Rename it from files[] (really?) to cgroup_base_files[]. * Drop CGROUP_FILE_GENERIC_PREFIX which was defined as "cgroup." and used inconsistently. Just use "cgroup." directly. * Collect insane files at the end. Note that only the insane ones are missing "cgroup." prefix. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-05 12:00:33 -07:00
Tejun Heo	cc5943a781	cgroup: mark "notify_on_release" and "release_agent" cgroup files insane The empty cgroup notification mechanism currently implemented in cgroup is tragically outdated. Forking and execing userland process stopped being a viable notification mechanism more than a decade ago. We're gonna have a saner mechanism. Let's make it clear that this abomination is going away. Mark "notify_on_release" and "release_agent" with CFTYPE_INSANE. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-06-05 12:00:33 -07:00
Tejun Heo	f12dc02014	cgroup: mark "tasks" cgroup file as insane Some resources controlled by cgroup aren't per-task and cgroup core allowing threads of a single thread_group to be in different cgroups forced memcg do explicitly find the group leader and use it. This is gonna be nasty when transitioning to unified hierarchy and in general we don't want and won't support granularity finer than processes. Mark "tasks" with CFTYPE_INSANE. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: cgroups@vger.kernel.org Cc: Vivek Goyal <vgoyal@redhat.com>	2013-06-05 12:00:33 -07:00
Kent Overstreet	c1ae6e9b4d	percpu-refcount: Don't use silly cmpxchg() The cmpxchg() was just to ensure the debug check didn't race, which was a bit excessive. The caller is supposed to do the appropriate synchronization, which means percpu_ref_kill() can just do a simple store. Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-03 16:04:04 -07:00
Kent Overstreet	215e262f2a	percpu: implement generic percpu refcounting This implements a refcount with similar semantics to atomic_get()/atomic_dec_and_test() - but percpu. It also implements two stage shutdown, as we need it to tear down the percpu counts. Before dropping the initial refcount, you must call percpu_ref_kill(); this puts the refcount in "shutting down mode" and switches back to a single atomic refcount with the appropriate barriers (synchronize_rcu()). It's also legal to call percpu_ref_kill() multiple times - it only returns true once, so callers don't have to reimplement shutdown synchronization. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style tweak] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-06-03 15:36:41 -07:00

1 2 3 4 5 ...

376618 Commits