Commit Graph

231 Commits

Author SHA1 Message Date
Linus Torvalds
db0c2bf69a Merge branch 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
* 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
  cgroup: fix to allow mounting a hierarchy by name
  cgroup: move assignement out of condition in cgroup_attach_proc()
  cgroup: Remove task_lock() from cgroup_post_fork()
  cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
  cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
  cgroup: only need to check oldcgrp==newgrp once
  cgroup: remove redundant get/put of task struct
  cgroup: remove redundant get/put of old css_set from migrate
  cgroup: Remove unnecessary task_lock before fetching css_set on migration
  cgroup: Drop task_lock(parent) on cgroup_fork()
  cgroups: remove redundant get/put of css_set from css_set_check_fetched()
  resource cgroups: remove bogus cast
  cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
  cgroup, cpuset: don't use ss->pre_attach()
  cgroup: don't use subsys->can_attach_task() or ->attach_task()
  cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
  cgroup: improve old cgroup handling in cgroup_attach_proc()
  cgroup: always lock threadgroup during migration
  threadgroup: extend threadgroup_lock() to cover exit and exec
  threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
  ...

Fix up conflict in kernel/cgroup.c due to commit e0197aae59: "cgroups:
fix a css_set not found bug in cgroup_attach_proc" that already
mentioned that the bug is fixed (differently) in Tejun's cgroup
patchset. This one, in other words.
2012-01-09 12:59:24 -08:00
Al Viro
34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Li Zefan
0d19ea8665 cgroup: fix to allow mounting a hierarchy by name
If we mount a hierarchy with a specified name, the name is unique,
and we can use it to mount the hierarchy without specifying its
set of subsystem names. This feature is documented is
Documentation/cgroups/cgroups.txt section 2.3

Here's an example:

	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
	# mount -t cgroup -o name=myhier xxx /cgroup2

But it was broken by commit 32a8cf235e
(cgroup: make the mount options parsing more accurate)

This fixes the regression.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
2012-01-05 09:59:46 -08:00
Dan Carpenter
305f3c8b20 cgroup: move assignement out of condition in cgroup_attach_proc()
Gcc complains about this: "kernel/cgroup.c:2179:4: warning: suggest
parentheses around assignment used as truth value [-Wparentheses]"

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-01-04 07:37:48 -08:00
Al Viro
a5e7ed3287 cgroup: propagate mode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:03 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Frederic Weisbecker
7e3aa30ac8 cgroup: Remove task_lock() from cgroup_post_fork()
cgroup_post_fork() is protected between threadgroup_change_begin()
and threadgroup_change_end() against concurrent changes of the
child's css_set in cgroup_task_migrate(). Also the child can't
exit and call cgroup_exit() at this stage, this means it's css_set
can't be changed with init_css_set concurrently.

For these reasons, we don't need to hold task_lock() on the child
because it's css_set can only remain stable in this place.

Let's remove the lock there.

v2: Update comment to explain that we are safe against
cgroup_exit()

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Mandeep Singh Baines <msb@chromium.org>
2011-12-27 09:05:20 -08:00
Kirill A. Shutemov
c6ca57500c cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-12-27 08:51:47 -08:00
Kirill A. Shutemov
1c6c3fad81 cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-12-27 08:51:46 -08:00
Mandeep Singh Baines
892a2b90ba cgroup: only need to check oldcgrp==newgrp once
In cgroup_attach_proc it is now sufficient to only check that
oldcgrp==newcgrp once. Now that we are using threadgroup_lock()
during the migrations, oldcgrp will not change.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-22 07:32:57 -08:00
Mandeep Singh Baines
b07ef77411 cgroup: remove redundant get/put of task struct
threadgroup_lock() guarantees that the target threadgroup will
remain stable - no new task will be added, no new PF_EXITING
will be set and exec won't happen.

Changes in V2:
* https://lkml.org/lkml/2011/12/20/369 (Tejun Heo)
  * Undo incorrect removal of get/put from attach_task_by_pid()
* Author
  * Remove a comment which is made stale by this change

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-22 07:32:57 -08:00
Mandeep Singh Baines
026085ef5a cgroup: remove redundant get/put of old css_set from migrate
We can now assume that the css_set reference held by the task
will not go away for an exiting task. PF_EXITING state can be
trusted throughout migration by checking it after locking
threadgroup.

Changes in V4:
* https://lkml.org/lkml/2011/12/20/368 (Tejun Heo)
  * Fix typo in commit message
  * Undid the rename of css_set_check_fetched
* https://lkml.org/lkml/2011/12/20/427 (Li Zefan)
  * Fix comment in cgroup_task_migrate()
Changes in V3:
* https://lkml.org/lkml/2011/12/20/255 (Frederic Weisbecker)
  * Fixed to put error in retval
Changes in V2:
* https://lkml.org/lkml/2011/12/19/289 (Tejun Heo)
  * Updated commit message

-tj: removed stale patch description about dropped function rename.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-22 07:32:57 -08:00
Frederic Weisbecker
c84cdf75cc cgroup: Remove unnecessary task_lock before fetching css_set on migration
When we fetch the css_set of the tasks on cgroup migration, we don't need
anymore to synchronize against cgroup_exit() that could swap the old one
with init_css_set. Now that we are using threadgroup_lock() during
the migrations, we don't need to worry about it anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-21 11:15:01 -08:00
Frederic Weisbecker
7e381b0eb1 cgroup: Drop task_lock(parent) on cgroup_fork()
We don't need to hold the parent task_lock() on the
parent in cgroup_fork() because we are already synchronized
against the two places that may change the parent css_set
concurrently:

- cgroup_exit(), but the parent obviously can't exit concurrently
- cgroup migration: we are synchronized against threadgroup_lock()

So we can safely remove the task_lock() there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Mandeep Singh Baines <msb@chromium.org>
2011-12-21 11:15:00 -08:00
Mandeep Singh Baines
29e21368b9 cgroups: remove redundant get/put of css_set from css_set_check_fetched()
We already have a reference to all elements in newcg_list.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: Paul Menage <paul@paulmenage.org>
2011-12-19 09:14:30 -08:00
Mandeep Singh Baines
e0197aae59 cgroups: fix a css_set not found bug in cgroup_attach_proc
There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.

This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.

$ cat zombie.c
\#include <unistd.h>

int main()
{
  if (fork())
      pause();
  return 0;
}
$

We are hitting this bug pretty regularly on ChromeOS.

This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:

https://lkml.org/lkml/2011/11/1/356

I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Downstream-Bug-Report: http://crosbug.com/23953
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: stable@kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Olof Johansson <olofj@chromium.org>
2011-12-19 09:09:09 -08:00
Tejun Heo
494c167cf7 cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()
These three methods are no longer used.  Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
2011-12-12 18:12:22 -08:00
Tejun Heo
2f7ee5691e cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()
Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods.  This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.

Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset.  This unifies passed parameters so
that all methods have access to all information.  Conversions in this
patchset are identical and don't introduce any behavior change.

-v2: documentation updated as per Paul Menage's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>
2011-12-12 18:12:21 -08:00
Tejun Heo
134d33737f cgroup: improve old cgroup handling in cgroup_attach_proc()
cgroup_attach_proc() behaves differently from cgroup_attach_task() in
the following aspects.

* All hooks are invoked even if no task is actually being moved.

* ->can_attach_task() is called for all tasks in the group whether the
  new cgrp is different from the current cgrp or not; however,
  ->attach_task() is skipped if new equals new.  This makes the calls
  asymmetric.

This patch improves old cgroup handling in cgroup_attach_proc() by
looking up the current cgroup at the head, recording it in the flex
array along with the task itself, and using it to remove the above two
differences.  This will also ease further changes.

-v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
     suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
2011-12-12 18:12:21 -08:00
Tejun Heo
cd3d095275 cgroup: always lock threadgroup during migration
Update cgroup to take advantage of the fack that threadgroup_lock()
guarantees stable threadgroup.

* Lock threadgroup even if the target is a single task.  This
  guarantees that when the target tasks stay stable during migration
  regardless of the target type.

* Remove PF_EXITING early exit optimization from attach_task_by_pid()
  and check it in cgroup_task_migrate() instead.  The optimization was
  for rather cold path to begin with and PF_EXITING state can be
  trusted throughout migration by checking it after locking
  threadgroup.

* Don't add PF_EXITING tasks to target task array in
  cgroup_attach_proc().  This ensures that task migration is performed
  only for live tasks.

* Remove -ESRCH failure path from cgroup_task_migrate().  With the
  above changes, it's guaranteed to be called only for live tasks.

After the changes, only live tasks are migrated and they're guaranteed
to stay alive until migration is complete.  This removes problems
caused by exec and exit racing against cgroup migration including
symmetry among cgroup attach methods and different cgroup methods
racing each other.

v2: Oleg pointed out that one more PF_EXITING check can be removed
    from cgroup_attach_proc().  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-12 18:12:21 -08:00
Tejun Heo
257058ae2b threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem
Make the following renames to prepare for extension of threadgroup
locking.

* s/signal->threadgroup_fork_lock/signal->group_rwsem/
* s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
* s/threadgroup_fork_read_unlock()/threadgroup_change_end()/
* s/threadgroup_fork_write_lock()/threadgroup_lock()/
* s/threadgroup_fork_write_unlock()/threadgroup_unlock()/

This patch doesn't cause any behavior change.

-v2: Rename threadgroup_change_done() to threadgroup_change_end() per
     KAMEZAWA's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
2011-12-12 18:12:21 -08:00
Tejun Heo
e25e2cbb4c cgroup: add cgroup_root_mutex
cgroup wants to make threadgroup stable while modifying cgroup
hierarchies which will introduce locking dependency on
cred_guard_mutex from cgroup_mutex.  This unfortunately completes
circular dependency.

 A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
 B. namespace_sem -> cgroup_mutex

B is from cgroup_show_options() and this patch breaks it by
introducing another mutex cgroup_root_mutex which nests inside
cgroup_mutex and protects cgroupfs_root.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2011-12-12 18:12:21 -08:00
Andrew Bresticker
c1e2ee2dc4 memcg: replace ss->id_lock with a rwlock
While back-porting Johannes Weiner's patch "mm: memcg-aware global
reclaim" for an internal effort, we noticed a significant performance
regression during page-reclaim heavy workloads due to high contention of
the ss->id_lock.  This lock protects idr map, and serializes calls to
idr_get_next() in css_get_next() (which is used during the memcg hierarchy
walk).

Since idr_get_next() is just doing a look up, we need only serialize it
with respect to idr_remove()/idr_get_new().  By making the ss->id_lock a
rwlock, contention is greatly reduced and performance improves.

Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
each core (one file + container per core) in parallel on a NUMA machine.
Result is the time for the test to complete in 1 of the containers.
Both kernels included Johannes' memcg-aware global reclaim patches.

Before rwlock patch: 1710.778s
After rwlock patch: 152.227s

Signed-off-by: Andrew Bresticker <abrestic@google.com>
Cc: Paul Menage <menage@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:07:03 -07:00
Ben Blum
77ceab8ea5 cgroups: don't attach task to subsystem if migration failed
If a task has exited to the point it has called cgroup_exit() already,
then we can't migrate it to another cgroup anymore.

This can happen when we are attaching a task to a new cgroup between the
call to ->can_attach_task() on subsystems and the migration that is
eventually tried in cgroup_task_migrate().

In this case cgroup_task_migrate() returns -ESRCH and we don't want to
attach the task to the subsystems because the attachment to the new cgroup
itself failed.

Fix this by only calling ->attach_task() on the subsystems if the cgroup
migration succeeded.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:59 -07:00
Ben Blum
33ef6b6984 cgroups: more safe tasklist locking in cgroup_attach_proc
Fix unstable tasklist locking in cgroup_attach_proc.

According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
not sufficient to guarantee the tasklist is stable w.r.t.  de_thread and
exit.  Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
proper exclusion.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:59 -07:00