Commit Graph

58 Commits

Author SHA1 Message Date
Oleg Nesterov
30f1e3dd8c [PATCH] run_posix_cpu_timers: remove a bogus BUG_ON()
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
arm_timer() tries to protect against this race, but the check
is racy.

After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.

At this moment exiting task has no pending cpu timers, they were
cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
so we can just return from irq.

John Stultz recently confirmed this bug, see

	http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-17 10:52:13 -07:00
Al Viro
fa84cb935d [PATCH] move call of audit_free() into do_exit()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-05-01 06:06:13 -04:00
Eric W. Biederman
5e85d4abe3 [PATCH] task: Make task list manipulations RCU safe
While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-19 09:13:49 -07:00
Jens Axboe
b92ce55893 [PATCH] splice: add direct fd <-> fd splicing support
It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.

Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.

Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-11 13:52:07 +02:00
Eric W. Biederman
8c7904a00b [PATCH] task: RCU protect task->usage
A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count.  I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case.  Unfortunately this means you must special case the
rcu access case.

When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e.  tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.

What that means is that the proc code that looks up a task and later
wants to sleep can now do:

rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
	get_task_struct(task);
}
rcu_read_unlock();

The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.

Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called.  Tasks can only
be looked up via rcu before release_task is called.  All rcu protected
members of task_struct are freed by release_task.

Therefore we can move call_rcu from put_task_struct into release_task.  And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.

The end result:

- get_task_struct is safe in an rcu context where we have just looked
  up the task.

- put_task_struct() simplifies into its old pre rcu self.

This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-31 12:18:59 -08:00
Oleg Nesterov
a7e5328a06 [PATCH] cleanup __exit_signal->cleanup_sighand path
Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal().  This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:44 -08:00
Oleg Nesterov
47e65328a7 [PATCH] pids: kill PIDTYPE_TGID
This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
preparation for the further tref/pids rework.

This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.

We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.

Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:44 -08:00
Oleg Nesterov
aacc90944d [PATCH] do_group_exit: don't take tasklist_lock
do_group_exit() takes tasklist_lock for zap_other_threads(), this is unneeded
now.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
5876700cd3 [PATCH] do __unhash_process() under ->siglock
This patch moves __unhash_process() call from realease_task() to
__exit_signal(), so __detach_pid() is called with ->siglock held.

This means we don't need tasklist_lock to iterate over thread group anymore:

	copy_process() was already changed to do attach_pid()
	under ->siglock.

	Eric's "pidhash-kill-switch_exec_pids.patch" from -mm
	changed de_thread() so it doesn't touch PIDTYPE_TGID.

NOTE: de_thread() still needs some attention.  It still changes task->pid
lockless.  Taking ->sighand.siglock here allows to do more tasklist_lock
removals.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
35f5cad8c4 [PATCH] revert "Optimize sys_times for a single thread process"
This patch reverts 'CONFIG_SMP && thread_group_empty()' optimization in
sys_times().  The reason is that the next patch breaks memory ordering which
is needed for that optimization.

tasklist_lock in sys_times() will be eliminated completely by further patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
6a14c5c9da [PATCH] move __exit_signal() to kernel/exit.c
__exit_signal() is private to release_task() now.  I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:43 -08:00
Oleg Nesterov
1f09f9749c [PATCH] release_task: replace open-coded ptrace_unlink()
Use ptrace_unlink() instead of open-coding.  No changes in kernel/exit.o

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
6ac781b11a [PATCH] reparent_thread: use remove_parent/add_parent
Use remove_parent/add_parent instead of open coding.

No changes in kernel/exit.o

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
73b9ebfe12 [PATCH] pidhash: don't count idle threads
fork_idle() does unhash_process() just after copy_process().  Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.

copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].

We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
called with pid == 0 arg, so it will never be reused.  So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.

However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
c97d98931a [PATCH] kill SET_LINKS/REMOVE_LINKS
Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().

This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list.  Callers are
updated to do these actions by hand.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
9b678ece42 [PATCH] don't use REMOVE_LINKS/SET_LINKS for reparenting
There are places where kernel uses REMOVE_LINKS/SET_LINKS while changing
process's ->parent.  Use add_parent/remove_parent instead, they don't abuse
of global process list.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
8fafabd86f [PATCH] remove add_parent()'s parent argument
add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently.  This patch removes this argument.

No changes in affected .o files.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:41 -08:00
Oleg Nesterov
d799f03597 [PATCH] choose_new_parent: remove unused arg, sanitize exit_state check
'child_reaper' arg is not used in choose_new_parent().

"->exit_state >= EXIT_ZOMBIE" check is a leftover, was
valid when EXIT_ZOMBIE lived in ->state var.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Eric W. Biederman
fef23e7fbb [PATCH] exec: allow init to exec from any thread.
After looking at the problem of init calling exec some more I figured out
an easy way to make the code work.

The actual symptom without out this patch is that all threads will die
except pid == 1, and the thread calling exec.  The thread calling exec will
wait forever for pid == 1 to die.

Since pid == 1 does not install a handler for SIGKILL it will never die.

This modifies the tests for init from current->pid == 1 to the equivalent
current == child_reaper.  And then it causes exec in the ugly case to
modify child_reaper.

The only weird symptom is that you wind up with an init process that
doesn't have the oldest start time on the box.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28 18:36:40 -08:00
Ingo Molnar
34f192c652 [PATCH] lightweight robust futexes: compat
32-bit syscall compatibility support.  (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
0771dfefc9 [PATCH] lightweight robust futexes: core
Add the core infrastructure for robust futexes: structure definitions, the new
syscalls and the do_exit() based cleanup mechanism.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 08:44:49 -08:00
Ingo Molnar
70522e121a [PATCH] sem2mutex: tty
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:11 -08:00
Al Viro
afc847b7dd [PATCH] don't do exit_io_context() until we know we won't be doing any IO
testcase:

mount /dev/sdb10 /mnt
touch /mnt/tmp/b
umount /mnt
mount /dev/sdb10 /mnt
rm /mnt/tmp/b </mnt/tmp/b
umount /mnt

and watch blkdev_ioc line in /proc/slabinfo.  Vanilla kernel leaks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18 18:33:46 -05:00
Björn Steinbrink
5914811acf [PATCH] kjournald keeps reference to namespace
In daemonize() a new thread gets cleaned up and 'merged' with init_task.
The current fs_struct is handled there, but not the current namespace.

This adds the namespace part.

[ Eric Biederman pointed out the namespace wrappers, and also notes that
  we can't ever count on using our parents namespace because we already
  have called exit_fs(), which is the only way to the namespace from a
  process. ]

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-20 20:27:38 -08:00
Arjan van de Ven
858119e159 [PATCH] Unlinline a bunch of other functions
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14 18:27:06 -08:00