Commit Graph

15744 Commits

Author SHA1 Message Date
Li Zefan a73456f37b cpuset: re-structure update_cpumask() a bit
Check if cpus_allowed is to be changed before calling validate_change().

This won't change any behavior, but later it will allow us to do this:

 # mkdir /cpuset/child
 # echo $$ > /cpuset/child/tasks	/* empty cpuset */
 # echo > /cpuset/child/cpuset.cpus	/* do nothing, won't fail */

Without this patch, the last operation will fail.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-05 13:55:14 -07:00
Li Zefan 249cc86db7 cpuset: remove cpuset_test_cpumask()
The test is done in set_cpus_allowed_ptr(), so it's redundant.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-05 13:55:14 -07:00
Li Zefan 67bd2c5985 cpuset: remove unnecessary variable in cpuset_attach()
We can just use oldcs->mems_allowed.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-05 13:55:13 -07:00
Li Zefan 40df2deb50 cpuset: cleanup guarantee_online_{cpus|mems}()
- We never pass a NULL @cs to these functions.
- The top cpuset always has some online cpus/mems.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-05 13:55:13 -07:00
Li Zefan 06d6b3cbdf cpuset: remove redundant check in cpuset_cpus_allowed_fallback()
task_cs() will never return NULL.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-06-05 13:55:13 -07:00
Tejun Heo d5c56ced77 cgroup: clean up the cftype array for the base cgroup files
* Rename it from files[] (really?) to cgroup_base_files[].

* Drop CGROUP_FILE_GENERIC_PREFIX which was defined as "cgroup." and
  used inconsistently.  Just use "cgroup." directly.

* Collect insane files at the end.  Note that only the insane ones are
  missing "cgroup." prefix.

This patch doesn't introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-05 12:00:33 -07:00
Tejun Heo cc5943a781 cgroup: mark "notify_on_release" and "release_agent" cgroup files insane
The empty cgroup notification mechanism currently implemented in
cgroup is tragically outdated.  Forking and execing userland process
stopped being a viable notification mechanism more than a decade ago.
We're gonna have a saner mechanism.  Let's make it clear that this
abomination is going away.

Mark "notify_on_release" and "release_agent" with CFTYPE_INSANE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-06-05 12:00:33 -07:00
Tejun Heo f12dc02014 cgroup: mark "tasks" cgroup file as insane
Some resources controlled by cgroup aren't per-task and cgroup core
allowing threads of a single thread_group to be in different cgroups
forced memcg do explicitly find the group leader and use it.  This is
gonna be nasty when transitioning to unified hierarchy and in general
we don't want and won't support granularity finer than processes.

Mark "tasks" with CFTYPE_INSANE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: cgroups@vger.kernel.org
Cc: Vivek Goyal <vgoyal@redhat.com>
2013-06-05 12:00:33 -07:00
Tejun Heo 75501a6d59 cgroup: update iterators to use cgroup_next_sibling()
This patch converts cgroup_for_each_child(),
cgroup_next_descendant_pre/post() and thus
cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
instead of manually dereferencing ->sibling.next.

The only reason the iterators couldn't allow dropping RCU read lock
while iteration is in progress was because they couldn't determine the
next sibling safely once RCU read lock is dropped.  Using
cgroup_next_sibling() removes that problem and enables all iterators
to allow dropping RCU read lock in the middle.  Comments are updated
accordingly.

This makes the iterators easier to use and will simplify controllers.

Note that @cgroup argument is renamed to @cgrp in
cgroup_for_each_child() because it conflicts with "struct cgroup" used
in the new macro body.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
2013-05-24 10:55:38 +09:00
Tejun Heo 53fa526174 cgroup: add cgroup->serial_nr and implement cgroup_next_sibling()
Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.

It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.

This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.

This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.

Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.

v2: Typo fix as per Serge.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
2013-05-24 10:55:38 +09:00
Tejun Heo bdc7119f1b cgroup: make cgroup_is_removed() static
cgroup_is_removed() no longer has external users and it shouldn't grow
any - controllers should deal with cgroup_subsys_state on/offline
state instead of cgroup removal state.  Make it static.

While at it, make it return bool.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-24 10:55:38 +09:00
Tejun Heo 3f33e64f4a Merge branch 'for-3.10-fixes' into for-3.11
Merging to receive 7805d000db ("cgroup: fix a subtle bug in descendant
pre-order walk") so that further iterator updates can build upon it.

Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-24 10:53:09 +09:00
Tejun Heo 7805d000db cgroup: fix a subtle bug in descendant pre-order walk
When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org
2013-05-24 10:50:24 +09:00
Tejun Heo 857a2beb09 cgroup: implement task_cgroup_path_from_hierarchy()
kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

Implement task_cgroup_path_from_hierarchy().

v2: Dropped unnecessary NULL check on the return value of
    task_cgroup_from_root() as suggested by Li Zefan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <greg@kroah.com>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <daniel@zonque.org>
2013-05-14 11:42:07 -07:00
Tejun Heo 1a57423166 cgroup: make hierarchy_id use cyclic idr
We want to be able to lookup a hierarchy from its id and cyclic
allocation is a whole lot simpler with idr.  Convert to idr and use
idr_alloc_cyclc().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14 11:42:07 -07:00
Tejun Heo 54e7b4eb15 cgroup: drop hierarchy_id_lock
Now that hierarchy_id alloc / free are protected by the cgroup
mutexes, there's no need for this separate lock.  Drop it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14 11:42:06 -07:00
Tejun Heo fa3ca07e96 cgroup: refactor hierarchy_id handling
We're planning to converting hierarchy_ida to an idr and use it to
look up hierarchy from its id.  As we want the mapping to happen
atomically with cgroupfs_root registration, this patch refactors
hierarchy_id init / exit so that ida operations happen inside
cgroup_[root_]mutex.

* s/init_root_id()/cgroup_init_root_id()/ and make it return 0 or
  -errno like a normal function.

* Move hierarchy_id initialization from cgroup_root_from_opts() into
  cgroup_mount() block where the root is confirmed to be used and
  being registered while holding both mutexes.

* Split cgroup_drop_id() into cgroup_exit_root_id() and
  cgroup_free_root(), so that ID release can happen before dropping
  the mutexes in cgroup_kill_sb().  The latter expects hierarchy_id to
  be exited before being invoked.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
2013-05-14 11:42:05 -07:00
Li Zefan d6cbf35dac cgroup: initialize xattr before calling d_instantiate()
cgroup_create_file() calls d_instantiate(), which may decide to look
at the xattrs on the file. Smack always does this and SELinux can be
configured to do so.

But cgroup_add_file() didn't initialize xattrs before calling
cgroup_create_file(), which finally leads to dereferencing NULL
dentry->d_fsdata.

This bug has been there since cgroup xattr was introduced.

Cc: <stable@vger.kernel.org> # 3.8.x
Reported-by: Ivan Bulatovic <combuster@archlinux.us>
Reported-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-05-14 08:24:15 -07:00
Linus Torvalds 26b840ae5d Merge tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing/kprobes update from Steven Rostedt:
 "The majority of these changes are from Masami Hiramatsu bringing
  kprobes up to par with the latest changes to ftrace (multi buffering
  and the new function probes).

  He also discovered and fixed some bugs in doing so.  When pulling in
  his patches, I also found a few minor bugs as well and fixed them.

  This also includes a compile fix for some archs that select the ring
  buffer but not tracing.

  I based this off of the last patch you took from me that fixed the
  merge conflict error, as that was the commit that had all the changes
  I needed for this set of changes."

* tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Support soft-mode disabling
  tracing/kprobes: Support ftrace_event_file base multibuffer
  tracing/kprobes: Pass trace_probe directly from dispatcher
  tracing/kprobes: Increment probe hit-count even if it is used by perf
  tracing/kprobes: Use bool for retprobe checker
  ftrace: Fix function probe when more than one probe is added
  ftrace: Fix the output of enabled_functions debug file
  ftrace: Fix locking in register_ftrace_function_probe()
  tracing: Add helper function trace_create_new_event() to remove duplicate code
  tracing: Modify soft-mode only if there's no other referrer
  tracing: Indicate enabled soft-mode in enable file
  tracing/kprobes: Fix to increment return event probe hit-count
  ftrace: Cleanup regex_lock and ftrace_lock around hash updating
  ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
  ftrace: Have ftrace_regex_write() return either read or error
  tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
  tracing: Don't succeed if event_enable_func did not register anything
  ring-buffer: Select IRQ_WORK
2013-05-11 17:04:59 -07:00
Linus Torvalds c4cc75c332 Merge git://git.infradead.org/users/eparis/audit
Pull audit changes from Eric Paris:
 "Al used to send pull requests every couple of years but he told me to
  just start pushing them to you directly.

  Our touching outside of core audit code is pretty straight forward.  A
  couple of interface changes which hit net/.  A simple argument bug
  calling audit functions in namei.c and the removal of some assembly
  branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: fix message spacing printing auid
  Revert "audit: move kaudit thread start from auditd registration to kaudit init"
  audit: vfs: fix audit_inode call in O_CREAT case of do_last
  audit: Make testing for a valid loginuid explicit.
  audit: fix event coverage of AUDIT_ANOM_LINK
  audit: use spin_lock in audit_receive_msg to process tty logging
  audit: do not needlessly take a lock in tty_audit_exit
  audit: do not needlessly take a spinlock in copy_signal
  audit: add an option to control logging of passwords with pam_tty_audit
  audit: use spin_lock_irqsave/restore in audit tty code
  helper for some session id stuff
  audit: use a consistent audit helper to log lsm information
  audit: push loginuid and sessionid processing down
  audit: stop pushing loginid, uid, sessionid as arguments
  audit: remove the old depricated kernel interface
  audit: make validity checking generic
  audit: allow checking the type of audit message in the user filter
  audit: fix build break when AUDIT_DEBUG == 2
  audit: remove duplicate export of audit_enabled
  Audit: do not print error when LSMs disabled
  ...
2013-05-11 14:29:11 -07:00
Linus Torvalds 3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00
Linus Torvalds f741df1f3a Merge tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6
Pull misc fixes from David Woodhouse:
 "This is some miscellaneous cleanups that don't really belong anywhere
  else (or were ignored), that have been sitting in linux-next for some
  time.  Two of them are fixes resulting from my audit of krealloc()
  usage that don't seem to have elicited any response when I posted
  them, and the other three are patches from Artem removing dead code."

* tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6:
  pcmcia: remove RPX board stuff
  m68k: remove rpxlite stuff
  pcmcia: remove Motorola MBX860 support
  params: Fix potential memory leak in add_sysfs_param()
  dell-laptop: Fix krealloc() misuse in parse_da_table()
2013-05-10 09:09:47 -07:00
Masami Hiramatsu b8820084f2 tracing/kprobes: Support soft-mode disabling
Support soft-mode disabling on kprobe-based dynamic events.
Soft-disabling is just ignoring recording if the soft disabled
flag is set.

Link: http://lkml.kernel.org/r/20130509054454.30398.7237.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:22:16 -04:00
Masami Hiramatsu 41a7dd420c tracing/kprobes: Support ftrace_event_file base multibuffer
Support multi-buffer on kprobe-based dynamic events by
using ftrace_event_file.

Link: http://lkml.kernel.org/r/20130509054449.30398.88343.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:21:47 -04:00
Masami Hiramatsu 2b106aabe6 tracing/kprobes: Pass trace_probe directly from dispatcher
Pass the pointer of struct trace_probe directly from probe
dispatcher to handlers. This removes redundant container_of
macro uses. Same thing has already done in trace_uprobe.

Link: http://lkml.kernel.org/r/20130509054441.30398.69112.stgit@mhiramat-M0-7522

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-09 20:19:48 -04:00