Commit Graph

15891 Commits

Author SHA1 Message Date
Lorenzo Colitti
4fd02636be Make suspend abort reason logging depend on CONFIG_PM_SLEEP
This unbreaks the build on architectures such as um that do not
support CONFIG_PM_SLEEP.

Change-Id: Ia846ed0a7fca1d762ececad20748d23610e8544f
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
2014-12-02 21:58:47 +00:00
Rom Lemarchand
57114e95e8 cgroup: refactor allow_attach function into common code
move cpu_cgroup_allow_attach to a common subsys_cgroup_allow_attach.
This allows any process with CAP_SYS_NICE to move tasks across cgroups if
they use this function as their allow_attach handler.

Bug: 18260435
Change-Id: I6bb4933d07e889d0dc39e33b4e71320c34a2c90f
Signed-off-by: Rom Lemarchand <romlem@android.com>
2014-11-07 13:47:35 -08:00
Dmitry Shmidt
f141171d7d power: Add check_wakeup_reason() to verify wakeup source irq
Wakeup reason is set before driver resume handlers are called.
It is cleared before driver suspend handlers are called, on
PM_SUSPEND_PREPARE.

Change-Id: I04218c9b0c115a7877e8029c73e6679ff82e0aa4
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2014-11-04 10:47:40 -08:00
Ruchi Kandoi
7af7a7d021 power: Adds functionality to log the last suspend abort reason.
Extends the last_resume_reason to log suspend abort reason. The abort
reasons will have "Abort:" appended at the start to distinguish itself
from the resume reason.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Change-Id: I3207f1844e3d87c706dfc298fb10e1c648814c5f
2014-10-29 10:36:27 -07:00
Ruchi Kandoi
5ffb57932e power: Avoids bogus error messages for the suspend aborts.
Avoids printing bogus error message "tasks refusing to freeze", in cases
where pending wakeup source caused the suspend abort.

Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Change-Id: I913ad290f501b31cd536d039834c8d24c6f16928
2014-10-16 09:16:24 -07:00
Guenter Roeck
9ac860041d seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock
Current upstream kernel hangs with mips and powerpc targets in
uniprocessor mode if SECCOMP is configured.

Bisect points to commit dbd952127d ("seccomp: introduce writer locking").
Turns out that code such as
	BUG_ON(!spin_is_locked(&list_lock));
can not be used in uniprocessor mode because spin_is_locked() always
returns false in this configuration, and that assert_spin_locked()
exists for that very purpose and must be used instead.

Fixes: dbd952127d ("seccomp: introduce writer locking")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
2014-10-08 14:35:33 -07:00
Kees Cook
f14a5db239 seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.

Changes across threads to the filter pointer includes a barrier.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-10-07 16:42:34 -07:00
Kees Cook
c852ef7782 seccomp: allow mode setting across threads
This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c
2014-10-07 16:42:34 -07:00
Kees Cook
61b6b882a0 seccomp: introduce writer locking
Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/fork.c
2014-10-07 16:42:33 -07:00
Kees Cook
b6a12bf4dd seccomp: split filter prep from check and apply
In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

For extreme defensiveness, I've added a BUG_ON check for the calculated
size of the buffer allocation in case BPF_MAXINSN ever changes, which
shouldn't ever happen. The compiler should actually optimize out this
check since the test above it makes it impossible.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c
2014-10-07 16:42:33 -07:00
Kees Cook
9d0ff694bc sched: move no_new_privs into new atomic flags
Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/sys.c
2014-10-07 16:42:32 -07:00
Kees Cook
e985fd474d seccomp: add "seccomp" syscall
This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	arch/x86/syscalls/syscall_32.tbl
	arch/x86/syscalls/syscall_64.tbl
	include/uapi/asm-generic/unistd.h
	kernel/seccomp.c

And fixup of unistd32.h to truly enable sys_secomp.

Change-Id: I95bea02382c52007d22e5e9dc563c7d055c2c83f
2014-10-07 16:42:32 -07:00
Kees Cook
8908dde5a7 seccomp: split mode setting routines
Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-10-07 16:42:31 -07:00
Kees Cook
b8a9cff6db seccomp: extract check/assign mode helpers
To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-10-07 16:42:31 -07:00
Kees Cook
2a30a4386e seccomp: create internal mode-setting function
In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-10-07 16:42:30 -07:00
Oleg Nesterov
987a0f1102 introduce for_each_thread() to replace the buggy while_each_thread()
while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

   while_each_thread(g, t) can loop forever if g exits, next_thread()
   can't reach the unhashed thread in this case. Note that this can
   happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
   it wrongly.

   It was never safe to just take rcu_read_lock() and loop unless
   you verify that pid_alive(g) == T, even the first next_thread()
   can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head.  The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node.  But
we can't do this right now because this will lead to subtle behavioural
changes.  For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died.  Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.

So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.

Bug 200004307

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0c740d0afc)
Signed-off-by: Sri Krishna chowdary <schowdary@nvidia.com>
Change-Id: Id689cb1383ceba2561b66188d88258619b68f5c6
Reviewed-on: http://git-master/r/419041
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
2014-10-07 16:42:30 -07:00
Eric Paris
9499cd23f9 syscall_get_arch: remove useless function arguments
Every caller of syscall_get_arch() uses current for the task and no
implementors of the function need args.  So just get rid of both of
those things.  Admittedly, since these are inline functions we aren't
wasting stack space, but it just makes the prototypes better.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux390@de.ibm.com
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org

Conflicts:
	arch/mips/include/asm/syscall.h
	arch/mips/kernel/ptrace.c
2014-10-07 15:36:24 -07:00
JP Abgrall
4149e0de6d seccomp: revert previous patches in prep for updated ones
This reverts the seccomp related patches committed around 2014-08-27.
This allows for a cleaner cherry-pick of newly landed upstream patches.

 f56b1aa arm: fixup NR_syscalls to accommodate the new seccomp syscall
 81ff7fa seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
 d924727 seccomp: allow mode setting across threads
 743266a seccomp: introduce writer locking
 3497a88 seccomp: split filter prep from check and apply
 2c6d7de MIPS: add seccomp syscall
 83f1ccba ARM: add seccomp syscall
 a75a29b seccomp: add "seccomp" syscall
 1a63bce seccomp: split mode setting routines
 c208e4e seccomp: extract check/assign mode helpers
 6862b01 seccomp: create internal mode-setting function
 1ba2ccb MAINTAINERS: create seccomp entry
 c2da3eb seccomp: fix memory leak on filter attach
 945a225 ARM: 7888/1: seccomp: not compatible with ARM OABI

Change-Id: I3f129263d68a7b3c206d79f84f7f9908d13064f6
Signed-off-by: JP Abgrall <jpa@google.com>
2014-09-17 16:56:33 -07:00
Kees Cook
81ff7fa232 seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.

Changes across threads to the filter pointer includes a barrier.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-08-28 01:54:06 +00:00
Kees Cook
d924727911 seccomp: allow mode setting across threads
This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c

Change-Id: I091ffa55d8f4e83ff02558a55e2b4dc76ac26905
2014-08-28 01:53:49 +00:00
Kees Cook
743266ae25 seccomp: introduce writer locking
Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/fork.c

Change-Id: Ie01ece43b610867013f7d0e0a2a7be0b9077630f
2014-08-28 01:53:30 +00:00
Kees Cook
3497a88f55 seccomp: split filter prep from check and apply
In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

For extreme defensiveness, I've added a BUG_ON check for the calculated
size of the buffer allocation in case BPF_MAXINSN ever changes, which
shouldn't ever happen. The compiler should actually optimize out this
check since the test above it makes it impossible.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	kernel/seccomp.c

Change-Id: I8d89f80a5b4f2826d90474dcea441c41f0af6594
2014-08-28 01:53:09 +00:00
Kees Cook
a75a29b16e seccomp: add "seccomp" syscall
This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>

Conflicts:
	arch/x86/syscalls/syscall_32.tbl
	arch/x86/syscalls/syscall_64.tbl
	include/uapi/asm-generic/unistd.h
	kernel/seccomp.c

Change-Id: Id7a365079829fd9164315dec75d6ee415c29b176
2014-08-28 01:51:54 +00:00
Kees Cook
1a63bcec4f seccomp: split mode setting routines
Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-08-28 01:51:22 +00:00
Kees Cook
c208e4e9f1 seccomp: extract check/assign mode helpers
To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
2014-08-28 01:50:55 +00:00