One anomaly remains from when Andrea rationalized the responsibilities of
mmap_sem and page_table_lock: in dup_mmap we add vmas to the child holding its
page_table_lock, but not the mmap_sem which normally guards the vma list and
rbtree. Which could be an issue for unuse_mm: though since it just walks down
the list (today with page_table_lock, tomorrow not), it's probably okay. Will
need a memory barrier? Oh, keep it simple, Nick and I agreed, no harm in
taking child's mmap_sem here.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the parent's oldmm throughout dup_mmap, instead of perversely going back
to current->mm. (Can you hear the sigh of relief from those mpnts? Usually I
squash them, but not today.)
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I was lazy when we added anon_rss, and chose to change as few places as
possible. So currently each anonymous page has to be counted twice, in rss
and in anon_rss. Which won't be so good if those are atomic counts in some
configurations.
Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics. And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but
that's not so good if an mm_counter_t is a special type. And how is rss
initialized? By set_mm_counter, all over the place. Come on, we just need to
initialize them both at once by set_mm_counter in mm_init (which follows the
memcpy when forking).
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The original vm_stat_account has fallen into disuse, with only one user, and
only one user of vm_stat_unaccount. It's easier to keep track if we convert
them all to __vm_stat_account, then free it from its __shackles.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
frv, sh64, ia64 and sparc64 do not have do_settimeofday() exported (the
last two are using variant in kernel/time.c). Exports added to match
the rest of architectures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
exit_signal() (called from copy_process's error path) should decrement
->signal->live, otherwise forking process will miss 'group_dead' in
do_exit().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This just makes sure that a thread's expiry times can't get reset after
it clears them in do_exit.
This is what allowed us to re-introduce the stricter BUG_ON() check in
a362f463a6.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 3de463c7d9.
Roland has another patch that allows us to leave the BUG_ON() in place
by just making sure that the condition it tests for really is always
true.
That goes in next.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a silly off-by-one error in the code that updates the expiration
of posix CPU timers, causing them to not be properly updated when they
hit exactly on their expiration time (which should be the normal case).
This causes them to then fire immediately again, and only _then_ get
properly updated.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_SMP=n:
*** Warning: "cpu_online_map" [drivers/firmware/dcdbas.ko] undefined!
due to set_cpus_allowed().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This might be harmless, but looks like a race from code inspection (I
was unable to trigger it). I must admit, I don't understand why we
can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)'
without doing bump_cpu_timer(), but this is what original code does.
posix_cpu_timer_set:
read_lock(&tasklist_lock);
spin_lock(&p->sighand->siglock);
list_del_init(&timer->it.cpu.entry);
spin_unlock(&p->sighand->siglock);
We are probaly deleting the timer from run_posix_cpu_timers's 'firing'
local list_head while run_posix_cpu_timers() does list_for_each_safe.
Various bad things can happen, for example we can just delete this timer
so that list_for_each() will not notice it and run_posix_cpu_timers()
will not reset '->firing' flag. In that case,
....
if (timer->it.cpu.firing) {
read_unlock(&tasklist_lock);
timer->it.cpu.firing = -1;
return TIMER_RETRY;
}
sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again,
it returns TIMER_RETRY ...
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.
At this moment exiting task has no pending cpu timers, they were cleaned
up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
return from irq.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks.
That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set()
lock_timer(timer);
if (timer->task == NULL)
return;
read_lock(tasklist);
put_task_struct(timer->task)
is racy. With this patch timer->task modified and accounted only under
timer->it_lock. Sadly, this means that dead task_struct won't be freed
until timer deleted or armed.
2. run_posix_cpu_timers() collects expired timers into local list under
tasklist + ->sighand again. That means that posix_cpu_timer_del()
should check timer->it.cpu.firing under these locks too.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Bursty timers aren't good for anybody, very much including latency for
other programs when we trigger lots of timers in interrupt context. So
set a random limit, after which we'll handle the rest on the next timer
tick.
Noted by Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When I originally moved exit_itimers into __exit_signal, that was the only
place where we could reliably know it was the last thread in the group
dying, without races. Since then we've gotten the signal_struct.live
counter, and do_exit can reliably do group-wide cleanup work.
This patch moves the call to do_exit, where it's made without locks. This
avoids the deadlock issues that the old __exit_signal code's comment talks
about, and the one that Oleg found recently with process CPU timers.
[ This replaces e03d13e985, which is why
it was just reverted. ]
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The PF_NOFREEZE process flag should not be inherited when a thread is
forked. This patch (as585) removes the flag from the child.
This problem is starting to show up more and more as drivers turn to the
kthread API instead of using kernel_thread(). As a result, their kernel
threads are now children of the kthread worker instead of modprobe, and
they inherit the PF_NOFREEZE flag. This can cause problems during system
suspend; the kernel threads are not getting frozen as they ought to be.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov reported an SMP deadlock. If there is a running timer
tracking a different process's CPU time clock when the process owning
the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via
exit_itimers.
That code was using tasklist_lock to check for a race with __exit_signal
being called on the timer-target task and clearing its ->signal.
However, there is actually no such race. __exit_signal will have called
posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does
that. Those will clear those k_itimer's association with the dying
task, so posix_cpu_timer_del will return early and never reach the code
in question.
In addition, posix_cpu_timer_del called from exit_itimers during execve
or directly from timer_delete in the process owning the timer can race
with an exiting timer-target task to cause a double put on timer-target
task struct. Make sure we always access cpu_timers lists with sighand
lock held.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This makes call_rcu() keep track of how many events there are on the RCU
list, and cause a reschedule event when the list gets too long.
This helps keep RCU event lists down.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sure we release the task struct properly when releasing pending
timers.
release_task() does write_lock_irq(&tasklist_lock), so it can't race
with run_posix_cpu_timers() on any cpu.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>