Commit Graph

180 Commits

Author SHA1 Message Date
Harvey Harrison
7ad5b3a505 kernel: remove fastcall in kernel/*
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Pavel Emelyanov
6c5f3e7b43 Pidns: make full use of xxx_vnr() calls
Some time ago the xxx_vnr() calls (e.g.  pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace.  After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.

Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:29 -08:00
Eric W. Biederman
161550d74c pid: sys_wait... fixes
This modifies do_wait and eligible child to take a pair of enum pid_type
and struct pid *pid to precisely specify what set of processes are eligible
to be waited for, instead of the raw pid_t value from sys_wait4.

This fixes a bug in sys_waitid where you could not wait for children in
just process group 1.

This fixes a pid namespace crossing case in eligible_child.  Allowing us to
wait for a processes in our current process group even if our current
process group == 0.

This allows the no child with this pid case to be optimized.  This allows
us to optimize the pid membership test in eligible child to be optimized.

This even closes a theoretical pid wraparound race where in a threaded
parent if two threads are waiting for the same child and one thread picks
up the child and the pid numbers wrap around and generate another child
with that same pid before the other thread is scheduled (teribly insanely
unlikely) we could end up waiting on the second child with the same pid#
and not discover that the specific child we were waiting for has exited.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
5dee1707df move the related code from exit_notify() to exit_signals()
The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress.  In that case
nothing special is needed, just set PF_EXITING and return.

Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().

So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock.  But this in fact fixes another
___pure___ theoretical minor race.  __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
d12619b5ff fix group stop with exit race
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly.  Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.

However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already.  In that case the group stop never
finishes properly.

Note: this is a minimal fix, we can add some optimizations later.  Say we
can return quickly if thread_group_empty().  Also, we can move some signal
related code from exit_notify() to exit_signals().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
297bd42b15 move daemonized kernel threads into the swapper's session
Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().

Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
8520d7c7f8 teach set_special_pids() to use struct pid
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
c543f1ee08 wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT
The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
The exit_state was EXIT_ZOMBIE when the function was called, and another
thread can change it to EXIT_DEAD right after the check.

The second condition is not possible, detached non-traced threads were already
filtered out by eligible_child(), we didn't drop tasklist since then.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
3a515e4a62 wait_task_continued/zombie: don't use task_pid_nr_ns() lockless
Surprise, the other two wait_task_*() functions also abuse the
task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
in wait_task_continued().  wait_task_zombie() doesn't have this problem,
but it is still better to cache pid_t rather than call task_pid_nr_ns()
three times on the saved pid_namespace.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
f2cc3eb133 do_wait: fix security checks
Imho, the current usage of security_task_wait() is not logical.

Suppose we have the single child p, and security_task_wait(p) return
-EANY.  In that case waitpid(-1) returns this error.  Why? Isn't it
better to return ECHLD? We don't really have reapable children.

Now suppose that child was stolen by gdb.  In that case we find this
child on ->ptrace_children and set flag = 1, but we don't check that the
child was denied.  So, do_wait(..., WNOHANG) returns 0, this doesn't
match the behaviour above.  Without WNOHANG do_wait() blocks only to
return the error later, when the child will be untraced.  Inho, really
strange.

I think eligible_child() should return the error only if the child's pid
was requested explicitly, otherwise we should silently ignore the tasks
which were nacked by security_task_wait().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
96fabbf55a do_wait: cleanup delay_group_leader() usage
eligible_child() == 2 means delay_group_leader().  With the previous patch
this only matters for EXIT_ZOMBIE task, we can move that special check to
the only place it is really needed.

Also, with this patch we don't skip security_task_wait() for the group
leaders in a non-empty thread group.  I don't really understand the exact
semantics of security_task_wait(), but imho this change is a bugfix.

Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
1bad95c3be wait_task_stopped(): remove unneeded delay_group_leader check
wait_task_stopped() doesn't need the "delay_group_leader" parameter.  If
the child is not traced it must be a group leader.  With or without
subthreads ->group_stop_count == 0 when the whole task is stopped.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Mika Penttila <mika.penttila@kolumbus.fi>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
9cbab81005 do_wait: factor out "retval != 0" checks
Every branch if the main "if" statement does the same code at the end.  Move
it down.  Also, fix the indentation.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
ee7c82da83 wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace
wait_task_stopped() has multiple races with SIGCONT/SIGKILL.  tasklist_lock
does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info
reported (including exit_code) may be wrong.

In fact, the code under write_lock_irq(tasklist_lock) is not safe.  The child
may be PTRACE_DETACH'ed at this time by another subthread, in that case it is
possible we are no longer its ->parent.

Change wait_task_stopped() to take ->siglock before inspecting the task.  This
guarantees that the child can't resume and (for example) clear its
->exit_code, so we don't need to use xchg(&p->exit_code) and re-check.  The
only exception is ptrace_stop() which changes ->state and ->exit_code without
->siglock held during abort.  But this can only happen if both the tracer and
the tracee are dying (coredump is in progress), we don't care.

With this patch wait_task_stopped() doesn't move the child to the end of
the ->parent list on success.  This optimization could be restored, but
in that case we have to take write_lock(tasklist) and do some nasty
checks.

Also change the do_wait() since we don't return EAGAIN any longer.

[akpm@linux-foundation.org: fix up after Willy renamed everything]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
34a1738f7d kill my_ptrace_child()
Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED"
inline and simplify the corresponding logic in do_wait: we can't find the
child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace()
either sets TASK_STOPPED or wakes up the tracee.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
6b39c7bfbd kill PT_ATTACHED
Since the patch

	"Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
	commit f5b40e363a

we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.

This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
which were needed to catch the "ptrace attach is in progress" case.  We can
also remove the flag itself since nobody else uses it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
0a76fe8e50 do_wait: remove one "else if" branch
Minor cleanup. We can remove one "else if" branch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:04 -08:00
Oleg Nesterov
ed5d2cac11 exec: rework the group exit and fix the race with kill
As Roland pointed out, we have the very old problem with exec.  de_thread()
sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
clears signal->flags.  All signals (even fatal ones) sent in this window
(which is not too small) will be lost.

With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
the new helper, should be used to detect exit_group() or exec() in progress.
It can have more users, but this patch does only strictly necessary changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Pierre Peiffer
06c93e8757 Remove one useless extern declaration
The file exit.c contains one useless extern declaration of sem_exit().
Moreover, it refers to nothing.

This trivial patch removes it.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:22:12 +02:00
Matthew Wilcox
338077e54e exit: Use task_is_*
Also restructure the loop in do_wait()

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:35:17 -05:00
Scott James Remnant
e6ceb32aa2 wait_task_stopped(): pass correct exit_code to wait_noreap_copyout()
In wait_task_stopped() exit_code already contains the right value for the
si_status member of siginfo, and this is simply set in the non WNOWAIT
case.

If you call waitid() with a stopped or traced process, you'll get the signal
in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the
same time, you'll get the signal << 8 | 0x7f

Pass it unchanged to wait_noreap_copyout(); we would only need to shift it
and add 0x7f if we were returning it in the user status field and that
isn't used for any function that permits WNOWAIT.

Signed-off-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:55 -08:00
Oleg Nesterov
c895078355 wait_task_stopped(): don't use task_pid_nr_ns() lockless
wait_task_stopped(WNOWAIT) does task_pid_nr_ns() without tasklist/rcu lock,
we can read an already freed memory.  Use the cached pid_t value.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Looks-good-to: Roland McGrath <roland@redhat.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-29 09:24:52 -08:00
Roland McGrath
a3474224e6 wait_task_stopped: Check p->exit_state instead of TASK_TRACED
The original meaning of the old test (p->state > TASK_STOPPED) was
"not dead", since it was before TASK_TRACED existed and before the
state/exit_state split.  It was a wrong correction in commit
14bf01bb05 to make this test for
TASK_TRACED instead.  It should have been changed when TASK_TRACED
was introducted and again when exit_state was introduced.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Kees Cook <kees@ubuntu.com>
Acked-by: Scott James Remnant <scott@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-15 08:36:27 -08:00
Alexey Dobriyan
a39bc51691 Uninline fork.c/exit.c
Save ~650 bytes here.

add/remove: 4/0 grow/shrink: 0/7 up/down: 430/-1088 (-658)
function                                     old     new   delta
__copy_fs_struct                               -     202    +202
__put_fs_struct                                -     112    +112
__exit_fs                                      -      58     +58
__exit_files                                   -      58     +58
exit_files                                    58       2     -56
put_fs_struct                                112       5    -107
exit_fs                                      161       2    -159
sys_unshare                                  774     590    -184
copy_process                                4031    3840    -191
do_exit                                     1791    1597    -194
copy_fs_struct                               202       5    -197

No difference in lmbench lat_proc tests on 2-way Opteron 246.
Smaaaal degradation on UP P4 (within errors).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:56 -07:00
Pavel Emelyanov
ba25f9dcc4 Use helpers to obtain task pid in printks
The task_struct->pid member is going to be deprecated, so start
using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
the kernel.

The first thing to start with is the pid, printed to dmesg - in
this case we may safely use task_pid_nr(). Besides, printks produce
more (much more) than a half of all the explicit pid usage.

[akpm@linux-foundation.org: git-drm went and changed lots of stuff]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:43 -07:00