Commit Graph

186 Commits

Author SHA1 Message Date
Roland McGrath
54a0151041 asmlinkage_protect replaces prevent_tail_call
The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs.  The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.

Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.

More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function.  This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else.  It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10 17:28:26 -07:00
Roland McGrath
6efcae4601 Fix waitid si_code regression
In commit ee7c82da83 ("wait_task_stopped:
simplify and fix races with SIGCONT/SIGKILL/untrace"), the magic (short)
cast when storing si_code was lost in wait_task_stopped.  This leaks the
in-kernel CLD_* values that do not match what userland expects.

Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-08 11:54:00 -08:00
Oleg Nesterov
821c7de719 exit_notify: fix kill_orphaned_pgrp() usage with mt exit
1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we
   should do this only when the whole process exits.

2. exit_notify() uses "current" as "ignored_task", obviously wrong.
   Use ->group_leader instead.

Test case:

	void hup(int sig)
	{
		printf("HUP received\n");
	}

	void *tfunc(void *arg)
	{
		sleep(2);
		printf("sub-thread exited\n");
		return NULL;
	}

	int main(int argc, char *argv[])
	{
		if (!fork()) {
			signal(SIGHUP, hup);
			kill(getpid(), SIGSTOP);
			exit(0);
		}

		pthread_t thr;
		pthread_create(&thr, NULL, tfunc, NULL);

		sleep(1);
		printf("main thread exited\n");
		syscall(__NR_exit, 0);

		return 0;
	}

output:

	main thread exited
	HUP received
	Hangup

With this patch the output is:

	main thread exited
	sub-thread exited
	HUP received

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-03 14:53:16 -08:00
Oleg Nesterov
05e83df624 will_become_orphaned_pgrp: partially fix insufficient ->exit_state check
p->exit_state != 0 doesn't mean this process is dead, it may have
sub-threads.  Change the code to use "p->exit_state && thread_group_empty(p)"
instead.

Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process
if the main thread has exited.

However, the new check is not perfect either.  There is a window when
exit_notify() drops tasklist and before release_task().  Suppose that
the last (non-leader) thread exits.  This means that entire group exits,
but thread_group_empty() is not true yet.

As Eric pointed out, is_global_init() is wrong as well, but I did not
dare to do other changes.

Just for the record, has_stopped_jobs() is absolutely wrong too.  But we
can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues.

Even with this patch ^Z doesn't play well with the dead main thread.
The task is stopped correctly but do_wait(WSTOPPED) won't see it.  This
is another unrelated issue, will be (hopefully) fixed separately.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-03 14:53:16 -08:00
Oleg Nesterov
f49ee505b1 introduce kill_orphaned_pgrp() helper
Factor out the common code in reparent_thread() and exit_notify().

No functional changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-03 14:53:16 -08:00
Jan Blunck
6ac08c39a1 Use struct path in fs_struct
* Use struct path in fs_struct.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Harvey Harrison
7ad5b3a505 kernel: remove fastcall in kernel/*
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:31 -08:00
Pavel Emelyanov
6c5f3e7b43 Pidns: make full use of xxx_vnr() calls
Some time ago the xxx_vnr() calls (e.g.  pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace.  After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.

Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:29 -08:00
Eric W. Biederman
161550d74c pid: sys_wait... fixes
This modifies do_wait and eligible child to take a pair of enum pid_type
and struct pid *pid to precisely specify what set of processes are eligible
to be waited for, instead of the raw pid_t value from sys_wait4.

This fixes a bug in sys_waitid where you could not wait for children in
just process group 1.

This fixes a pid namespace crossing case in eligible_child.  Allowing us to
wait for a processes in our current process group even if our current
process group == 0.

This allows the no child with this pid case to be optimized.  This allows
us to optimize the pid membership test in eligible child to be optimized.

This even closes a theoretical pid wraparound race where in a threaded
parent if two threads are waiting for the same child and one thread picks
up the child and the pid numbers wrap around and generate another child
with that same pid before the other thread is scheduled (teribly insanely
unlikely) we could end up waiting on the second child with the same pid#
and not discover that the specific child we were waiting for has exited.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
5dee1707df move the related code from exit_notify() to exit_signals()
The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress.  In that case
nothing special is needed, just set PF_EXITING and return.

Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().

So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock.  But this in fact fixes another
___pure___ theoretical minor race.  __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
d12619b5ff fix group stop with exit race
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly.  Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.

However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already.  In that case the group stop never
finishes properly.

Note: this is a minimal fix, we can add some optimizations later.  Say we
can return quickly if thread_group_empty().  Also, we can move some signal
related code from exit_notify() to exit_signals().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
297bd42b15 move daemonized kernel threads into the swapper's session
Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().

Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
8520d7c7f8 teach set_special_pids() to use struct pid
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
c543f1ee08 wait_task_zombie: remove ->exit_state/exit_signal checks for WNOWAIT
The first "p->exit_state != EXIT_ZOMBIE" check doesn't make too much sense.
The exit_state was EXIT_ZOMBIE when the function was called, and another
thread can change it to EXIT_DEAD right after the check.

The second condition is not possible, detached non-traced threads were already
filtered out by eligible_child(), we didn't drop tasklist since then.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:27 -08:00
Oleg Nesterov
3a515e4a62 wait_task_continued/zombie: don't use task_pid_nr_ns() lockless
Surprise, the other two wait_task_*() functions also abuse the
task_pid_nr_ns() function, and may cause read-after-free or report nr == 0
in wait_task_continued().  wait_task_zombie() doesn't have this problem,
but it is still better to cache pid_t rather than call task_pid_nr_ns()
three times on the saved pid_namespace.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
f2cc3eb133 do_wait: fix security checks
Imho, the current usage of security_task_wait() is not logical.

Suppose we have the single child p, and security_task_wait(p) return
-EANY.  In that case waitpid(-1) returns this error.  Why? Isn't it
better to return ECHLD? We don't really have reapable children.

Now suppose that child was stolen by gdb.  In that case we find this
child on ->ptrace_children and set flag = 1, but we don't check that the
child was denied.  So, do_wait(..., WNOHANG) returns 0, this doesn't
match the behaviour above.  Without WNOHANG do_wait() blocks only to
return the error later, when the child will be untraced.  Inho, really
strange.

I think eligible_child() should return the error only if the child's pid
was requested explicitly, otherwise we should silently ignore the tasks
which were nacked by security_task_wait().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
96fabbf55a do_wait: cleanup delay_group_leader() usage
eligible_child() == 2 means delay_group_leader().  With the previous patch
this only matters for EXIT_ZOMBIE task, we can move that special check to
the only place it is really needed.

Also, with this patch we don't skip security_task_wait() for the group
leaders in a non-empty thread group.  I don't really understand the exact
semantics of security_task_wait(), but imho this change is a bugfix.

Also rearrange the code a bit to kill an ugly "check_continued" backdoor.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
1bad95c3be wait_task_stopped(): remove unneeded delay_group_leader check
wait_task_stopped() doesn't need the "delay_group_leader" parameter.  If
the child is not traced it must be a group leader.  With or without
subthreads ->group_stop_count == 0 when the whole task is stopped.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Mika Penttila <mika.penttila@kolumbus.fi>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
9cbab81005 do_wait: factor out "retval != 0" checks
Every branch if the main "if" statement does the same code at the end.  Move
it down.  Also, fix the indentation.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
ee7c82da83 wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace
wait_task_stopped() has multiple races with SIGCONT/SIGKILL.  tasklist_lock
does not pin the child in TASK_TRACED/TASK_STOPPED stated, almost all info
reported (including exit_code) may be wrong.

In fact, the code under write_lock_irq(tasklist_lock) is not safe.  The child
may be PTRACE_DETACH'ed at this time by another subthread, in that case it is
possible we are no longer its ->parent.

Change wait_task_stopped() to take ->siglock before inspecting the task.  This
guarantees that the child can't resume and (for example) clear its
->exit_code, so we don't need to use xchg(&p->exit_code) and re-check.  The
only exception is ptrace_stop() which changes ->state and ->exit_code without
->siglock held during abort.  But this can only happen if both the tracer and
the tracee are dying (coredump is in progress), we don't care.

With this patch wait_task_stopped() doesn't move the child to the end of
the ->parent list on success.  This optimization could be restored, but
in that case we have to take write_lock(tasklist) and do some nasty
checks.

Also change the do_wait() since we don't return EAGAIN any longer.

[akpm@linux-foundation.org: fix up after Willy renamed everything]
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
34a1738f7d kill my_ptrace_child()
Now that my_ptrace_child() is trivial we can use the "p->ptrace & PT_PTRACED"
inline and simplify the corresponding logic in do_wait: we can't find the
child in TASK_TRACED state without PT_PTRACED flag set, ptrace_untrace()
either sets TASK_STOPPED or wakes up the tracee.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
6b39c7bfbd kill PT_ATTACHED
Since the patch

	"Fix ptrace_attach()/ptrace_traceme()/de_thread() race"
	commit f5b40e363a

we set PT_ATTACHED and change child->parent "atomically" wrt task_list lock.

This means we can remove the checks like "PT_ATTACHED && ->parent != ptracer"
which were needed to catch the "ptrace attach is in progress" case.  We can
also remove the flag itself since nobody else uses it.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:26 -08:00
Oleg Nesterov
0a76fe8e50 do_wait: remove one "else if" branch
Minor cleanup. We can remove one "else if" branch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06 10:41:04 -08:00
Oleg Nesterov
ed5d2cac11 exec: rework the group exit and fix the race with kill
As Roland pointed out, we have the very old problem with exec.  de_thread()
sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
clears signal->flags.  All signals (even fatal ones) sent in this window
(which is not too small) will be lost.

With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
the new helper, should be used to detect exit_group() or exec() in progress.
It can have more users, but this patch does only strictly necessary changes.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robin Holt <holt@sgi.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:07 -08:00
Pierre Peiffer
06c93e8757 Remove one useless extern declaration
The file exit.c contains one useless extern declaration of sem_exit().
Moreover, it refers to nothing.

This trivial patch removes it.

Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 16:22:12 +02:00