Remains the question whether it is intended that many, perhaps even large,
tables are compiled in without ever having a chance to get used, i.e.
whether there shouldn't #ifdef CONFIG_xxx get added.
[akpm@linux-foundation.org: fix cut-n-paste error]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the first step (of two) in removing the kill_pgrp_info.
All the users of this function are in kernel/signal.c, but all they need is to
call __kill_pgrp_info() with the tasklist_lock read-locked.
Fortunately, one of its users is the kill_something_info(), which already
needs this lock in one of its branches, so clean these branches up and call
the __kill_pgrp_info() directly.
Based on Oleg's view of how this function should look.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace. After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.
Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signal_struct->tsk points to the ->group_leader and thus we have the nasty
code in de_thread() which has to change it and restart ->real_timer if the
leader is changed.
Use "struct pid *leader_pid" instead. This also allows us to kill now
unneeded send_group_sig_info().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kill_pid_info()->pid_task() could be the old leader of the execing process.
In that case it is possible that the leader will be released before we take
siglock. This means that kill_pid_info() (and thus sys_kill()) can return a
false -ESRCH.
Change the code to retry when lock_task_sighand() fails. The endless loop is
not possible, __exit_signal() both clears ->sighand and does detach_pid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pid_vnr returns the user space pid with respect to the pid namespace the
struct pid was allocated in. What we want before we return a pid to user
space is the user space pid with respect to the pid namespace of current.
pid_vnr is a very nice optimization but because it isn't quite what we want
it is easy to use pid_vnr at times when we aren't certain the struct pid
was allocated in our pid namespace.
Currently this describes at least tiocgpgrp and tiocgsid in ttyio.c the
parent process reported in the core dumps and the parent process in
get_signal_to_deliver.
So unless the performance impact is huge having an interface that does what
we want instead of always what we want should be much more reliable and
much less error prone.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This modifies do_wait and eligible child to take a pair of enum pid_type
and struct pid *pid to precisely specify what set of processes are eligible
to be waited for, instead of the raw pid_t value from sys_wait4.
This fixes a bug in sys_waitid where you could not wait for children in
just process group 1.
This fixes a pid namespace crossing case in eligible_child. Allowing us to
wait for a processes in our current process group even if our current
process group == 0.
This allows the no child with this pid case to be optimized. This allows
us to optimize the pid membership test in eligible child to be optimized.
This even closes a theoretical pid wraparound race where in a threaded
parent if two threads are waiting for the same child and one thread picks
up the child and the pid numbers wrap around and generate another child
with that same pid before the other thread is scheduled (teribly insanely
unlikely) we could end up waiting on the second child with the same pid#
and not discover that the specific child we were waiting for has exited.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The previous bugfix was not optimal, we shouldn't care about group stop
when we are the only thread or the group stop is in progress. In that case
nothing special is needed, just set PF_EXITING and return.
Also, take the related "TIF_SIGPENDING re-targeting" code from exit_notify().
So, from the performance POV the only difference is that we don't trust
!signal_pending() until we take ->siglock. But this in fact fixes another
___pure___ theoretical minor race. __group_complete_signal() finds the
task without PF_EXITING and chooses it as the target for signal_wake_up().
But nothing prevents this task from exiting in between without noticing the
pending signal and thus unpredictably delaying the actual delivery.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_signal_stop() counts all sub-thread and sets ->group_stop_count
accordingly. Every thread should decrement ->group_stop_count and stop,
the last one should notify the parent.
However a sub-thread can exit before it notices the signal_pending(), or it
may be somewhere in do_exit() already. In that case the group stop never
finishes properly.
Note: this is a minimal fix, we can add some optimizations later. Say we
can return quickly if thread_group_empty(). Also, we can move some signal
related code from exit_notify() to exit_signals().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Eric pointed out, there is no problem with init starting with sid == pgid
== 0, and this was historical linux behavior changed in 2.6.18.
Remove kernel_init()->__set_special_pids(), this is unneeded and complicates
the rules for sys_setsid().
This change and the previous change in daemonize() mean that /sbin/init does
not need the special "session != 1" hack in sys_setsid() any longer. We can't
remove this check yet, we should cleanup copy_process(CLONE_NEWPID) first, so
update the comment only.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daemonized kernel threads run in the init's session. This doesn't match the
behaviour of kthread_create()'ed threads, and this is one of the 2 reasons
why we need a special hack in sys_setsid().
Now that set_special_pids() was changed to use struct pid, not pid_t, we can
use init_struct_pid and set 0,0 special pids.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change set_special_pids() to work with struct pid, not pid_t from global name
space. This again speedups and imho cleanups the code, also a preparation for
the next patch.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_setsid() still deals with pid_t's from the global namespace. This means
that the "session > 1" check can't help for sub-namespace init, setsid() can't
succeed because copy_process(CLONE_NEWPID) populates PIDTYPE_PGID/SID links.
Remove the usage of task_struct->pid and convert the code to use "struct pid".
This also simplifies and speedups the code, saves one find_pid().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_setpgid() does unneeded conversions from pid_t to "struct pid" and vice
versa. Use "struct pid" more consistently. Saves one find_vpid() and
eliminates the explicit usage of ->nsproxy->pid_ns. Imho, cleanups the
code.
Also use the same_thread_group() helper.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>