linux

mirror of https://github.com/armbian/linux.git synced 2026-01-06 10:13:00 -08:00

Author	SHA1	Message	Date
Tejun Heo	eeecbd1971	cgroup: implement cgroup_get_e_css() Implement cgroup_get_e_css() which finds and gets the effective css for the specified cgroup and subsystem combination. This function always returns a valid pinned css. This will be used by cgroup writeback support. While at it, add comment to cgroup_e_css() to explain why that function is different from cgroup_get_e_css() and has to test cgrp->child_subsys_mask instead of cgroup_css(cgrp, ss). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:52 -05:00
Tejun Heo	56c807ba4e	cgroup: add cgroup_subsys->css_e_css_changed() Add a new cgroup_subsys operatoin ->css_e_css_changed(). This is invoked if any of the effective csses seen from the css's cgroup may have changed. This will be used to implement cgroup writeback support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	7d172cc89b	cgroup: add cgroup_subsys->css_released() Add a new cgroup subsys callback css_released(). This is called when the reference count of the css (cgroup_subsys_state) reaches zero before RCU scheduling free. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	db6e305345	cgroup: fix the async css offline wait logic in cgroup_subtree_control_write() When a subsystem is offlined, its entry on @cgrp->subsys[] is cleared asynchronously. If cgroup_subtree_control_write() is requested to enable the subsystem again before the entry is cleared, it has to wait for the previous offlining to finish and clear the @cgrp->subsys[] entry before trying to enable the subsystem again. This is currently done while verifying the input enable / disable parameters. This used to be correct but `f63070d350` ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") breaks it. The commit is one of the commits implementing subsystem dependency. Through subsystem dependency, some subsystems may be enabled and disabled implicitly in addition to the explicitly requested ones. The actual subsystems to be enabled and disabled are determined during @css_enable/disable calculation. The current offline wait logic skips the ones which are already implicitly enabled and then waits for subsystems in @enable; however, this misses the subsystems which may be implicitly enabled through dependency from @enable. If such implicitly subsystem hasn't yet finished offlining yet, the function ends up trying to create a css when its @cgrp->subsys[] slot is already occupied triggering BUG_ON() in init_and_link_css(). Fix it by moving the wait logic after @css_enable is calculated and waiting for all the subsystems in @css_enable. This fixes the above bug as the mask contains all subsystems which are to be enabled including the ones enabled through dependencies. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `f63070d350` ("cgroup: make interface files visible iff enabled on cgroup->subtree_control") Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:51 -05:00
Tejun Heo	755bf5ee86	cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write() Make cgroup_subtree_control_write() first calculate new subtree_control (new_sc), child_subsys_mask (new_ss) and css_enable/disable masks before applying them to the cgroup. Also, store the original subtree_control (old_sc) and child_subsys_mask (old_ss) and use them to restore the orignal state after failure. This patch shouldn't cause any behavior changes. This prepares for a fix for a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00
Tejun Heo	0f060deb5c	cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask() cgroup_refresh_child_subsys_mask() calculates and updates the effective @cgrp->child_subsys_maks according to the current @cgrp->subtree_control. Separate out the calculation part into cgroup_calc_child_subsys_mask(). This will be used to fix a bug in the async css offline wait logic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Zefan Li <lizefan@huawei.com>	2014-11-18 02:49:50 -05:00
Linus Torvalds	c798360cd1	Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: "A lot of activities on percpu front. Notable changes are... - percpu allocator now can take @gfp. If @gfp doesn't contain GFP_KERNEL, it tries to allocate from what's already available to the allocator and a work item tries to keep the reserve around certain level so that these atomic allocations usually succeed. This will replace the ad-hoc percpu memory pool used by blk-throttle and also be used by the planned blkcg support for writeback IOs. Please note that I noticed a bug in how @gfp is interpreted while preparing this pull request and applied the fix `6ae833c7fe` ("percpu: fix how @gfp is interpreted by the percpu allocator") just now. - percpu_ref now uses longs for percpu and global counters instead of ints. It leads to more sparse packing of the percpu counters on 64bit machines but the overhead should be negligible and this allows using percpu_ref for refcnting pages and in-memory objects directly. - The switching between percpu and single counter modes of a percpu_ref is made independent of putting the base ref and a percpu_ref can now optionally be initialized in single or killed mode. This allows avoiding percpu shutdown latency for cases where the refcounted objects may be synchronously created and destroyed in rapid succession with only a fraction of them reaching fully operational status (SCSI probing does this when combined with blk-mq support). It's also planned to be used to implement forced single mode to detect underflow more timely for debugging. There's a separate branch percpu/for-3.18-consistent-ops which cleans up the duplicate percpu accessors. That branch causes a number of conflicts with s390 and other trees. I'll send a separate pull request w/ resolutions once other branches are merged" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits) percpu: fix how @gfp is interpreted by the percpu allocator blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky percpu_ref: add PERCPU_REF_INIT_* flags percpu_ref: decouple switching to percpu mode and reinit percpu_ref: decouple switching to atomic mode and killing percpu_ref: add PCPU_REF_DEAD percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch percpu_ref: replace pcpu_ prefix with percpu_ percpu_ref: minor code and comment updates percpu_ref: relocate percpu_ref_reinit() Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe" Revert "percpu: free percpu allocation info for uniprocessor system" percpu-refcount: make percpu_ref based on longs instead of ints percpu-refcount: improve WARN messages percpu: fix locking regression in the failure path of pcpu_alloc() percpu-refcount: add @gfp to percpu_ref_init() proportions: add @gfp to init functions percpu_counter: add @gfp to percpu_counter_init() percpu_counter: make percpu_counters_lock irq-safe ...	2014-10-10 07:26:02 -04:00
Linus Torvalds	b211e9d7c8	Merge branch 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing too interesting. Just a handful of cleanup patches" * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "cgroup: remove redundant variable in cgroup_mount()" cgroup: remove redundant variable in cgroup_mount() cgroup: fix missing unlock in cgroup_release_agent() cgroup: remove CGRP_RELEASABLE flag perf/cgroup: Remove perf_put_cgroup() cgroup: remove redundant check in cgroup_ino() cpuset: simplify proc_cpuset_show() cgroup: simplify proc_cgroup_show() cgroup: use a per-cgroup work for release agent cgroup: remove bogus comments cgroup: remove redundant code in cgroup_rmdir() cgroup: remove some useless forward declarations cgroup: fix a typo in comment.	2014-10-10 07:24:40 -04:00
Zefan Li	e756c7b698	Revert "cgroup: remove redundant variable in cgroup_mount()" This reverts commit `0c7bf3e8ca`. If there are child cgroups in the cgroupfs and then we umount it, the superblock will be destroyed but the cgroup_root will be kept around. When we mount it again, cgroup_mount() will find this cgroup_root and allocate a new sb for it. So with this commit we will be trapped in a dead loop in the case described above, because kernfs_pin_sb() keeps returning NULL. Currently I don't see how we can avoid using both pinned_sb and new_sb, so just revert it. Cc: Al Viro <viro@ZenIV.linux.org.uk> Reported-by: Andrey Wagin <avagin@gmail.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-26 00:16:23 -04:00
Tejun Heo	2aad2a86f6	percpu_ref: add PERCPU_REF_INIT_* flags With the recent addition of percpu_ref_reinit(), percpu_ref now can be used as a persistent switch which can be turned on and off repeatedly where turning off maps to killing the ref and waiting for it to drain; however, there currently isn't a way to initialize a percpu_ref in its off (killed and drained) state, which can be inconvenient for certain persistent switch use cases. Similarly, percpu_ref_switch_to_atomic/percpu() allow dynamic selection of operation mode; however, currently a newly initialized percpu_ref is always in percpu mode making it impossible to avoid the latency overhead of switching to atomic mode. This patch adds @flags to percpu_ref_init() and implements the following flags. * PERCPU_REF_INIT_ATOMIC : start ref in atomic mode * PERCPU_REF_INIT_DEAD : start ref killed and drained These flags should be able to serve the above two use cases. v2: target_core_tpg.c conversion was missing. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org>	2014-09-24 13:31:50 -04:00
Tejun Heo	d06efebf0c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive `0a30288da1` ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>	2014-09-24 13:00:21 -04:00
Zefan Li	0c7bf3e8ca	cgroup: remove redundant variable in cgroup_mount() Both pinned_sb and new_sb indicate if a new superblock is needed, so we can just remove new_sb. Note now we must check if kernfs_tryget_sb() returns NULL, because when it returns NULL, kernfs_mount() may still re-use an existing superblock, which is just allocated by another concurent mount. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-20 13:09:35 -04:00
Zefan Li	3e2cd91ab9	cgroup: fix missing unlock in cgroup_release_agent() The patch `971ff49355`: "cgroup: use a per-cgroup work for release agent" from Sep 18, 2014, leads to the following static checker warning: kernel/cgroup.c:5310 cgroup_release_agent() warn: 'mutex:&cgroup_mutex' is sometimes locked here and sometimes unlocked. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-20 12:23:35 -04:00
Zefan Li	a25eb52e81	cgroup: remove CGRP_RELEASABLE flag We call put_css_set() after setting CGRP_RELEASABLE flag in cgroup_task_migrate(), but in other places we call it without setting the flag. I don't see the necessity of this flag. Moreover once the flag is set, it will never be cleared, unless writing to the notify_on_release control file, so it can be quite confusing if we look at the output of debug.releasable. # mount -t cgroup -o debug xxx /cgroup # mkdir /cgroup/child # cat /cgroup/child/debug.releasable 0 <-- shows 0 though the cgroup is empty # echo $$ > /cgroup/child/tasks # cat /cgroup/child/debug.releasable 0 # echo $$ > /cgroup/tasks && echo $$ > /cgroup/child/tasks # cat /proc/child/debug.releasable 1 <-- shows 1 though the cgroup is not empty This patch removes the flag, and now debug.releasable shows if the cgroup is empty or not. Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-19 09:29:32 -04:00
Zefan Li	006f4ac497	cgroup: simplify proc_cgroup_show() Use the ONE macro instead of REG, and we can simplify proc_cgroup_show(). Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 13:27:23 -04:00
Zefan Li	971ff49355	cgroup: use a per-cgroup work for release agent Instead of using a global work to schedule release agent on removable cgroups, we change to use a per-cgroup work to do this, which makes the code much simpler. v2: use a dedicated work instead of reusing css->destroy_work. (Tejun) Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 13:14:22 -04:00
Zefan Li	eb4aec84d6	cgroup: fix unbalanced locking cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex. It is wrong that we release the mutex in the failure path in pidlist_array_load(), because cgroup_pidlist_stop() will be called no matter if cgroup_pidlist_start() returns errno or not. Fixes: `4bac00d16a` Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>	2014-09-18 12:32:52 -04:00
Li Zefan	0c8fc2c121	cgroup: remove bogus comments We never grab cgroup mutex in fork and exit paths no matter whether notify_on_release is set or not. Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 06:34:16 +09:00
Li Zefan	244bb9a633	cgroup: remove redundant code in cgroup_rmdir() We no longer clear kn->priv in cgroup_rmdir(), so we don't need to get an extra refcnt. Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 06:34:15 +09:00
Li Zefan	6213daab25	cgroup: remove some useless forward declarations Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 06:34:15 +09:00
Tejun Heo	9253b279f4	Merge branch 'for-3.17-fixes' of ra.kernel.org:/pub/scm/linux/kernel/git/tj/cgroup into for-3.18 Pull to receive `a4189487da` ("cgroup: delay the clearing of cgrp->kn->priv") for the scheduled clean up patches. Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-18 06:29:05 +09:00
Tejun Heo	a34375ef9e	percpu-refcount: add @gfp to percpu_ref_init() Percpu allocator now supports allocation mask. Add @gfp to percpu_ref_init() so that !GFP_KERNEL allocation masks can be used with percpu_refs too. This patch doesn't make any functional difference. v2: blk-mq conversion was missing. Updated. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <koverstreet@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Jens Axboe <axboe@kernel.dk>	2014-09-08 09:51:30 +09:00
Li Zefan	aa32362f01	cgroup: check cgroup liveliness before unbreaking kernfs When cgroup_kn_lock_live() is called through some kernfs operation and another thread is calling cgroup_rmdir(), we'll trigger the warning in cgroup_get(). ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1228 at kernel/cgroup.c:1034 cgroup_get+0x89/0xa0() ... Call Trace: [<c16ee73d>] dump_stack+0x41/0x52 [<c10468ef>] warn_slowpath_common+0x7f/0xa0 [<c104692d>] warn_slowpath_null+0x1d/0x20 [<c10bb999>] cgroup_get+0x89/0xa0 [<c10bbe58>] cgroup_kn_lock_live+0x28/0x70 [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230 [<c10be5b2>] cgroup_tasks_write+0x12/0x20 [<c10bb7b0>] cgroup_file_write+0x40/0x130 [<c11aee71>] kernfs_fop_write+0xd1/0x160 [<c1148e58>] vfs_write+0x98/0x1e0 [<c114934d>] SyS_write+0x4d/0xa0 [<c16f656b>] sysenter_do_call+0x12/0x12 ---[ end trace 6f2e0c38c2108a74 ]--- Fix this by calling css_tryget() instead of cgroup_get(). v2: - move cgroup_tryget() right below cgroup_get() definition. (Tejun) Cc: <stable@vger.kernel.org> # 3.15+ Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-05 01:36:19 +09:00
Li Zefan	a4189487da	cgroup: delay the clearing of cgrp->kn->priv Run these two scripts concurrently: for ((; ;)) { mkdir /cgroup/sub rmdir /cgroup/sub } for ((; ;)) { echo $$ > /cgroup/sub/cgroup.procs echo $$ > /cgroup/cgroup.procs } A kernel bug will be triggered: BUG: unable to handle kernel NULL pointer dereference at 00000038 IP: [<c10bbd69>] cgroup_put+0x9/0x80 ... Call Trace: [<c10bbe19>] cgroup_kn_unlock+0x39/0x50 [<c10bbe91>] cgroup_kn_lock_live+0x61/0x70 [<c10be3c1>] __cgroup_procs_write.isra.26+0x51/0x230 [<c10be5b2>] cgroup_tasks_write+0x12/0x20 [<c10bb7b0>] cgroup_file_write+0x40/0x130 [<c11aee71>] kernfs_fop_write+0xd1/0x160 [<c1148e58>] vfs_write+0x98/0x1e0 [<c114934d>] SyS_write+0x4d/0xa0 [<c16f656b>] sysenter_do_call+0x12/0x12 We clear cgrp->kn->priv in the end of cgroup_rmdir(), but another concurrent thread can access kn->priv after the clearing. We should move the clearing to css_release_work_fn(). At that time no one is holding reference to the cgroup and no one can gain a new reference to access it. v2: - move RCU_INIT_POINTER() into the else block. (Tejun) - remove the cgroup_parent() check. (Tejun) - update the comment in css_tryget_online_from_dir(). Cc: <stable@vger.kernel.org> # 3.15+ Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-09-05 01:36:18 +09:00
Dongsheng Yang	251f8c0364	cgroup: fix a typo in comment. There is no function named cgroup_enable_task_cg_links(). Instead, the correct function name in this comment should be cgroup_enabled_task_cg_lists(). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-25 10:49:29 -04:00

1 2 3 4 5 ...

738 Commits