Commit Graph

11449 Commits

Author SHA1 Message Date
Oleg Nesterov
bb7efee2ca signal: cleanup sys_rt_sigprocmask()
sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
We can just read current->blocked lockless unconditionally before
anything else and then copy-to-user it if needed.  At worst we
copy 4 words on mips.

We could copy-to-user the old mask first and simplify the code even
more, but the patch tries to keep the current behaviour: we change
current->block even if copy_to_user(oset) fails.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:38 +02:00
Oleg Nesterov
e6fa16ab9c signal: sigprocmask() should do retarget_shared_pending()
In short, almost every changing of current->blocked is wrong, or at least
can lead to the unexpected results.

For example. Two threads T1 and T2, T1 sleeps in sigtimedwait/pause/etc.
kill(tgid, SIG) can pick T2 for TIF_SIGPENDING. If T2 calls sigprocmask()
and blocks SIG before it notices the pending signal, nobody else can handle
this pending shared signal.

I am not sure this is bug, but at least this looks strange imho. T1 should
not sleep forever, there is a signal which should wake it up.

This patch moves the code which actually changes ->blocked into the new
helper, set_current_blocked() and changes this code to call
retarget_shared_pending() as exit_signals() does. We should only care about
the signals we just blocked, we use "newset & ~current->blocked" as a mask.

We do not check !sigisemptyset(newblocked), retarget_shared_pending() is
cheap unless mask & shared_pending.

Note: for this particular case we could simply change sigprocmask() to
return -EINTR if signal_pending(), but then we should change other callers
and, more importantly, if we need this fix then set_current_blocked() will
have more callers and some of them can't restart. See the next patch as a
random example.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:37 +02:00
Oleg Nesterov
73ef4aeb61 signal: sigprocmask: narrow the scope of ->siglock
No functional changes, preparation to simplify the review of the next change.

1. We can read current->block lockless, nobody else can ever change this mask.

2. Calculate the resulting sigset_t outside of ->siglock into the temporary
   variable, then take ->siglock and change ->blocked.

Also, kill the stale comment about BKL.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:36 +02:00
Oleg Nesterov
fec9993db0 signal: retarget_shared_pending: optimize while_each_thread() loop
retarget_shared_pending() blindly does recalc_sigpending_and_wake() for
every sub-thread, this is suboptimal. We can check t->blocked and stop
looping once every bit in shared_pending has the new target.

Note: we do not take task_is_stopped_or_traced(t) into account, we are
not trying to speed up the signal delivery or to avoid the unnecessary
(but harmless) signal_wake_up(0) in this unlikely case.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Oleg Nesterov
f646e227b8 signal: retarget_shared_pending: consider shared/unblocked signals only
exit_signals() checks signal_pending() before retarget_shared_pending() but
this is suboptimal. We can avoid the while_each_thread() loop in case when
there are no shared signals visible to us.

Add the "shared_pending.signal & ~blocked" check. We don't use tsk->blocked
directly but pass ~blocked as an argument, this is needed for the next patch.

Note: we can optimize this more. while_each_thread(t) can check t->blocked
into account and stop after every pending signal has the new target, see the
next patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Oleg Nesterov
0edceb7bcd signal: introduce retarget_shared_pending()
No functional changes. Move the notify-other-threads code from exit_signals()
to the new helper, retarget_shared_pending().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
2011-04-28 13:01:35 +02:00
Oleg Nesterov
e46bc9b6fd Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into ptrace 2011-04-07 20:44:11 +02:00
Randy Dunlap
41c57892a2 kernel/signal.c: add kernel-doc notation to syscalls
Add kernel-doc to syscalls in signal.c.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 17:51:46 -07:00
Randy Dunlap
5aba085ede kernel/signal.c: fix typos and coding style
General coding style and comment fixes; no code changes:

 - Use multi-line-comment coding style.
 - Put some function signatures completely on one line.
 - Hyphenate some words.
 - Spell Posix as POSIX.
 - Correct typos & spellos in some comments.
 - Drop trailing whitespace.
 - End sentences with periods.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 17:51:46 -07:00
Linus Torvalds
148086bb64 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Fix rebalance interval calculation
  sched, doc: Beef up load balancing description
  sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
2011-04-04 08:36:58 -07:00
Linus Torvalds
4da7e90e65 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix task_struct reference leak
  perf: Fix task context scheduling
  perf: mmap 512 kiB by default
  perf: Rebase max unprivileged mlock threshold on top of page size
  perf tools: Fix NO_NEWT=1 python build error
  perf symbols: Properly align symbol_conf.priv_size
  perf tools: Emit clearer message for sys_perf_event_open ENOENT return
  perf tools: Fixup exit path when not able to open events
  perf symbols: Fix vsyscall symbol lookup
  oprofile, x86: Allow setting EDGE/INV/CMASK for counter events
2011-04-04 08:36:40 -07:00
Richard Cochran
4352d9d44b ntp: fix non privileged system time shifting
The ADJ_SETOFFSET bit added in commit 094aa188 ("ntp: Add ADJ_SETOFFSET
mode bit") also introduced a way for any user to change the system time.
Sneaky or buggy calls to adjtimex() could set

    ADJ_OFFSET_SS_READ | ADJ_SETOFFSET

which would result in a successful call to timekeeping_inject_offset().
This patch fixes the issue by adding the capability check.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-04 08:31:23 -07:00
Oleg Nesterov
321fb56197 ptrace: ptrace_check_attach() should not do s/STOPPED/TRACED/
After "ptrace: Clean transitions between TASK_STOPPED and TRACED"
d79fdd6d96, ptrace_check_attach()
should never see a TASK_STOPPED tracee and s/STOPPED/TRACED/ is
no longer legal. Add the warning.

Note: ptrace_check_attach() can be greatly simplified, in particular
it doesn't need tasklist. But I'd prefer another patch for that.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-04-04 02:11:05 +02:00
Oleg Nesterov
ee77f07592 signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED
This patch moves SIGNAL_STOP_DEQUEUED from signal_struct->flags to
task_struct->group_stop, and thus makes it per-thread.

Like SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can be false-positive
after return from get_signal_to_deliver(), this is fine. The only
purpose of this bit is: we can drop ->siglock after __dequeue_signal()
returns the sig_kernel_stop() signal and before we call
do_signal_stop(), in this case we must not miss SIGCONT if it comes in
between.

But, unlike SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can not be
false-positive in do_signal_stop() if multiple threads dequeue the
sig_kernel_stop() signal at the same time.

Consider two threads T1 and T2, SIGTTIN has a hanlder.

	- T1 dequeues SIGTSTP and sets SIGNAL_STOP_DEQUEUED, then
	  it drops ->siglock

	- SIGCONT comes and clears SIGNAL_STOP_DEQUEUED, SIGTSTP
	  should be cancelled.

	- T2 dequeues SIGTTIN and sets SIGNAL_STOP_DEQUEUED again.
	  Since we have a handler we should not stop, T2 returns
	  to usermode to run the handler.

	- T1 continues, calls do_signal_stop() and wrongly starts
	  the group stop because SIGNAL_STOP_DEQUEUED was restored
	  in between.

With or without this change:

	- we need to do something with ptrace_signal() which can
	  return SIGSTOP, but this needs another discussion

	- SIGSTOP can be lost if it races with the mt exec, will
	  be fixed later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-04-04 02:11:05 +02:00
Oleg Nesterov
780006eac2 signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending()
PF_EXITING or TASK_STOPPED has already called task_participate_group_stop()
and cleared its ->group_stop. No need to do task_clear_group_stop_pending()
when we start the new group stop.

Add a small comment to explain the !task_is_stopped() check. Note that this
check is not exactly right and it can lead to unnecessary stop later if the
thread is TASK_PTRACED. What we need is task_participated_in_group_stop(),
this will be solved later.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-04-04 02:11:04 +02:00
Oleg Nesterov
1deac632fc signal: prepare_signal(SIGCONT) shouldn't play with TIF_SIGPENDING
prepare_signal(SIGCONT) should never set TIF_SIGPENDING or wake up
the TASK_INTERRUPTIBLE threads. We are going to call complete_signal()
which should pick the right thread correctly. All we need is to wake
up the TASK_STOPPED threads.

If the task was stopped, it can't return to usermode without taking
->siglock. Otherwise we don't care, and the spurious TIF_SIGPENDING
can't be useful.

The comment says:

	* If there is a handler for SIGCONT, we must make
	* sure that no thread returns to user mode before
	* we post the signal

It is not clear what this means. Probably, "when there's only a single
thread" and this continues to be true. Otherwise, even if this SIGCONT
is not private, with or without this change only one thread can dequeue
SIGCONT, other threads can happily return to user mode before before
that thread handles this signal.

Note also that wake_up_state(t, __TASK_STOPPED) can't race with the task
which changes its state, TASK_STOPPED state is protected by ->siglock as
well.

In short: when it comes to signal delivery, SIGCONT is the normal signal
and does not need any special support.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-04-04 02:11:04 +02:00
Anton Blanchard
c0bb9e45f3 kdump: Allow shrinking of kdump region to be overridden
On ppc64 the crashkernel region almost always overlaps an area of firmware.
This works fine except when using the sysfs interface to reduce the kdump
region. If we free the firmware area we are guaranteed to crash.

Rename free_reserved_phys_range to crash_free_reserved_phys_range and make
it a weak function so we can override it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-01 16:14:30 +11:00
Peter Zijlstra
fd1edb3aa2 perf: Fix task_struct reference leak
sys_perf_event_open() had an imbalance in the number of task refs it
took causing memory leakage

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org # .37+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:02:56 +02:00
Frederic Weisbecker
20443384fe perf: Rebase max unprivileged mlock threshold on top of page size
Ensure we allow 512 kiB + 1 page for user control without
assuming a 4096 bytes page size.

Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
LKML-Reference: <1301535209-9679-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:02:54 +02:00
Sisir Koppaka
3436ae1298 sched: Fix rebalance interval calculation
The interval for checking scheduling domains if they are due to be
balanced currently depends on boot state NR_CPUS, which may not
accurately reflect the number of online CPUs at the time of check.

Thus replace NR_CPUS with num_online_cpus().

 (ed: Should only affect those who set NR_CPUS really high, such as 4096
      or so :-)

Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <AANLkTikqHWid2Q93F5U5Qw5snJH8C5PXoa7J6=6hYO94@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:00:37 +02:00
Dario Faggioli
a51e919818 sched: Leave sched_setscheduler() earlier if possible, do not disturb SCHED_FIFO tasks
sched_setscheduler() (in sched.c) is called in order of changing the
scheduling policy and/or the real-time priority of a task. Thus,
if we find out that neither of those are actually being modified, it
is possible to return earlier and save the overhead of a full
deactivate+activate cycle of the task in question.

Beside that, if we have more than one SCHED_FIFO task with the same
priority on the same rq (which means they share the same priority queue)
having one of them changing its position in the priority queue because of
a sched_setscheduler (as it happens by means of the deactivate+activate)
that does not actually change the priority violates POSIX which states,
for SCHED_FIFO:

  "If a thread whose policy or priority has been modified by
   pthread_setschedprio() is a running thread or is runnable, the effect on
   its position in the thread list depends on the direction of the
   modification, as follows: a. <...> b. If the priority is unchanged, the
   thread does not change position in the thread list. c. <...>"

     http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_08.html

 (ed: And the POSIX specification here does, briefly and somewhat unexpectedly,
      match what common sense tells us as well. )

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1300971618.3960.82.camel@Palantir>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-31 13:00:34 +02:00
Thomas Gleixner
78c8982564 genirq: Remove the now obsolete config options and select statements
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-30 14:13:23 +02:00
Thomas Gleixner
353c8ed44f genirq: Fix misnamed label in handle_edge_eoi_irq
Reported-by: michael@ellerman.id.au
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
2011-03-29 22:24:05 +02:00
Thomas Gleixner
851d7cf647 genirq: Remove move_*irq leftovers
All users converted to new interface.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:50:32 +02:00
Thomas Gleixner
0c6f8a8b91 genirq: Remove compat code
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-29 14:48:19 +02:00