tracehook_notify_jctl() aids in determining whether and what to report
to the parent when a task is stopped or continued. The function also
adds an extra requirement that siglock may be released across it,
which is currently unused and quite difficult to satisfy in
well-defined manner.
As job control and the notifications are about to receive major
overhaul, remove the tracehook and open code it. If ever necessary,
let's factor it out after the overhaul.
* Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
do_signal_stop() is used only by get_signal_to_deliver() and after a
successful signal stop, it always calls try_to_freeze(), so the
try_to_freeze() loop around schedule() in do_signal_stop() is
superflous and confusing. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
This wake_up_state() has a turbulent history. This is a remnant from
ancient ptrace implementation and patently wrong. Commit 95a3540d
(ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
it but the change was reverted later by commit edaba2c5 (ptrace:
revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
citing compatibility breakage and general brokeness of the whole group
stop / ptrace interaction. Then, recently, it got converted from
wake_up_process() to wake_up_state() to make it less dangerous.
Digging through the mailing archives, the compatibility breakage
doesn't seem to be critical in the sense that the behavior isn't well
defined or reliable to begin with and it seems to have been agreed to
remove the wakeup with proper cleanup of the whole thing.
Now that the group stop and its interaction with ptrace are being
cleaned up, it's high time to finally kill this silliness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
After a task receives SIGCONT, its parent is notified via SIGCHLD with
its siginfo describing what the notified event is. If SIGCONT is
received while the child process is stopped, the code should be
CLD_CONTINUED. If SIGCONT is recieved while the child process is in
the process of being stopped, it should be CLD_STOPPED. Which code to
use is determined in prepare_signal() and recorded in signal->flags
using SIGNAL_CLD_CONTINUED|STOP flags.
get_signal_deliver() should test these flags and then notify
accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
and CLD_STOPPED if the state was fetched already.
Fix it by testing SIGNAL_CLD_CONTINUED. While at it, uncompress the
?: test into if/else clause for better readability.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
We've been burned by regressions/bugs which we later realized could have
been triaged quicker if only we'd paid closer attention to dmesg. To make
it easier to audit dmesg, we'd like to make DEFAULT_MESSAGE_LEVEL
Kconfig-settable. That way we can set it to KERN_NOTICE and audit any
messages <= KERN_WARNING.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In an effort to reduce kernel address leaks that might be used to help
target kernel privilege escalation exploits, this patch uses %pK when
displaying addresses in /proc/kallsyms, /proc/modules, and
/sys/module/*/sections/*.
Note that this changes %x to %p, so some legitimately 0 values in
/proc/kallsyms would have changed from 00000000 to "(null)". To avoid
this, "(null)" is not used when using the "K" format. Anything that was
already successfully parsing "(null)" in addition to full hex digits
should have no problem with this change. (Thanks to Joe Perches for the
suggestion.) Due to the %x to %p, "void *" casts are needed since these
addresses are already "unsigned long" everywhere internally, due to their
starting life as ELF section offsets.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: Eugene Teo <eugene@redhat.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For a platform with many consoles like:
"console=tty1 console=ttyMFD2 console=ttyS0 earlyprintk=mrst"
Each time when the non "selected_console" (tty1 and ttyMFD2 here) get
registered, the existing kernel message will be printed out on registered
consoles again, the "mrst" early console will get some same message for 3
times, and "tty1" will get some for twice.
As suggested by Andrew Morton, every time a new console is registered, it
will be set as the "exclusive" console which will dump the already
existing kernel messages.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Greg KH <gregkh@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some architectures, the boot process involves de-registering the boot
console (early boot), initialize drivers and then re-register the console.
This mechanism introduces a window in which no printk can happen on the
console and messages are buffered and then printed once the new console is
available.
If a kernel crashes during this window, all it's left on the boot console
is "console [foo] enabled, bootconsole disabled" making debug of the crash
rather 'interesting'.
By adding "keep_bootcon" option, do not unregister the boot console, that
will allow to printk everything that is happening up to the crash.
The option is clearly meant only for debugging purposes as it introduces
lots of duplicated info printed on console, but will make bug report from
users easier as it doesn't require a kernel build just to figure out where
we crash.
Signed-off-by: Fabio M. Di Nitto <fabbione@fabbione.net>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch addresses a couple of problems. One was the case when the
hardlockup failed to start, it also failed to start the softlockup. There
were valid cases when the hardlockup shouldn't start and that shouldn't
block the softlockup (no lapic, bios controls perf counters).
The second problem was when the hardlockup failed to start on boxes (from
a no lapic or bios controlled perf counter case), it reported failure to
the cpu notifier chain. This blocked the notifier from continuing to
start other more critical pieces of cpu bring-up (in our case based on a
2.6.32 fork, it was the mce). As a result, during soft cpu online/offline
testing, the system would panic when a cpu was offlined because the cpu
notifier would succeed in processing a watchdog disable cpu event and
would panic in the mce case as a result of un-initialized variables from a
never executed cpu up event.
I realized the hardlockup/softlockup cases are really just debugging aids
and should never impede the progress of a cpu up/down event. Therefore I
modified the code to always return NOTIFY_OK and instead rely on printks
to inform the user of problems.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a cpu is considered stuck, instead of limping along and just printing
a warning, it is sometimes preferred to just panic, let kdump capture the
vmcore and reboot. This gets the machine back into a stable state quickly
while saving the info that got it into a stuck state to begin with.
Add a Kconfig option to allow users to set the hardlockup to panic
by default. Also add in a 'nmi_watchdog=nopanic' to override this.
[akpm@linux-foundation.org: fix strncmp length]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup: kill the dead code which does nothing but complicates the code
and confuses the reader.
sys_unshare(CLONE_THREAD/SIGHAND/VM) is not really implemented, and I
doubt very much it will ever work. At least, nobody even tried since the
original 99d1419d96 ("unshare system call -v5: system call
handler function") was applied more than 4 years ago.
And the code is not consistent. unshare_thread() always fails
unconditionally, while unshare_sighand() and unshare_vm() pretend to work
if there is nothing to unshare.
Remove unshare_thread(), unshare_sighand(), unshare_vm() helpers and
related variables and add a simple CLONE_THREAD | CLONE_SIGHAND| CLONE_VM
check into check_unshare_flags().
Also, move the "CLONE_NEWNS needs CLONE_FS" check from
check_unshare_flags() to sys_unshare(). This looks more consistent and
matches the similar do_sysvsem check in sys_unshare().
Note: with or without this patch "atomic_read(mm->mm_users) > 1" can give
a false positive due to get_task_mm().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Janak Desai <janak@us.ibm.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and
fixes other coding style errors. Not _all_ of them are gone, though.
[akpm@linux-foundation.org: revert the bits I disagree with]
Signed-off-by: Michael Rodriguez <dkingston02@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All kthreads being created from a single helper task, they all use memory
from a single node for their kernel stack and task struct.
This patch suite creates kthread_create_on_cpu(), adding a 'cpu' parameter
to parameters already used by kthread_create().
This parameter serves in allocating memory for the new kthread on its
memory node if available.
Users of this new function are : ksoftirqd, kworker, migration, pktgend...
This patch:
Add a node parameter to alloc_task_struct(), and change its name to
alloc_task_struct_node()
This change is needed to allow NUMA aware kthread_create_on_cpu()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
list_del() leaves poison in the prev and next pointers. The next
list_empty() will compare those poisons, and say the list isn't empty.
Any list operations that assume the node is on a list because of such a
check will be fooled into dereferencing poison. One needs to INIT the
node after the del, and fortunately there's already a wrapper for that -
list_del_init().
Some of the dels are followed by deallocations, so can be ignored, and one
can be merged with an add to make a move. Apart from that, I erred on the
side of caution in making nodes list_empty()-queriable.
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Userland should be able to trust the pid and uid of the sender of a
signal if the si_code is SI_TKILL.
Unfortunately, the kernel has historically allowed sigqueueinfo() to
send any si_code at all (as long as it was negative - to distinguish it
from kernel-generated signals like SIGILL etc), so it could spoof a
SI_TKILL with incorrect siginfo values.
Happily, it looks like glibc has always set si_code to the appropriate
SI_QUEUE, so there are probably no actual user code that ever uses
anything but the appropriate SI_QUEUE flag.
So just tighten the check for si_code (we used to allow any negative
value), and add a (one-time) warning in case there are binaries out
there that might depend on using other si_code values.
Signed-off-by: Julien Tinnes <jln@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
video: change to new flag variable
scsi: change to new flag variable
rtc: change to new flag variable
rapidio: change to new flag variable
pps: change to new flag variable
net: change to new flag variable
misc: change to new flag variable
message: change to new flag variable
memstick: change to new flag variable
isdn: change to new flag variable
ieee802154: change to new flag variable
ide: change to new flag variable
hwmon: change to new flag variable
dma: change to new flag variable
char: change to new flag variable
fs: change to new flag variable
xtensa: change to new flag variable
um: change to new flag variables
s390: change to new flag variable
mips: change to new flag variable
...
Fix up trivial conflict in drivers/hwmon/Makefile
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Fix incorrect unlock in __setup_irq()
cris: Use generic show_interrupts()
genirq: show_interrupts: Check desc->name before printing it blindly
cris: Use accessor functions to set IRQ_PER_CPU flag
cris: Fix irq conversion fallout
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched, kernel-doc: Fix runqueue_is_locked() description
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
trace, filters: Initialize the match variable in process_ops() properly
trace, documentation: Fix branch profiling location in debugfs
oprofile, s390: Cleanups
oprofile, s390: Remove hwsampler_files.c and merge it into init.c
perf: Fix tear-down of inherited group events
perf: Reorder & optimize perf_event_context to remove alignment padding on 64 bit builds
perf: Handle stopped state with tracepoints
perf: Fix the software events state check
perf, powerpc: Handle events that raise an exception without overflowing
perf, x86: Use INTEL_*_CONSTRAINT() for all PEBS event constraints
perf, x86: Clean up SandyBridge PEBS events
perf lock: Fix sorting by wait_min
perf tools: Version incorrect with some versions of grep
perf evlist: New command to list the names of events present in a perf.data file
perf script: Add support for H/W and S/W events
perf script: Add support for dumping symbols
perf script: Support custom field selection for output
perf script: Move printing of 'common' data from print_event and rename
perf tracing: Remove print_graph_cpu and print_graph_proc from trace-event-parse
perf script: Change process_event prototype
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
Update cpuset info & webiste for cgroups
dcdbas: force SMI to happen when expected
arch/arm/Kconfig: remove one to many l's in the word.
asm-generic/user.h: Fix spelling in comment
drm: fix printk typo 'sracth'
Remove one to many n's in a word
Documentation/filesystems/romfs.txt: fixing link to genromfs
drivers:scsi Change printk typo initate -> initiate
serial, pch uart: Remove duplicate inclusion of linux/pci.h header
fs/eventpoll.c: fix spelling
mm: Fix out-of-date comments which refers non-existent functions
drm: Fix printk typo 'failled'
coh901318.c: Change initate to initiate.
mbox-db5500.c Change initate to initiate.
edac: correct i82975x error-info reported
edac: correct i82975x mci initialisation
edac: correct commented info
fs: update comments to point correct document
target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
...
Trivial conflict in fs/eventpoll.c (spelling vs addition)
Make sure the 'match' variable always has a value.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>