Commit 1fb9341ac3 ("module: put modules in list much earlier") moved
some of the module initialization code around, and in the process
changed the exit paths too. But for the duplicate export symbol error
case the change made the ddebug_cleanup path jump to after the module
mutex unlock, even though it happens with the mutex held.
Rusty has some patches to split this function up into some helper
functions, hopefully the mess of complex goto targets will go away
eventually.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull module fixes and a virtio block fix from Rusty Russell:
"Various minor fixes, but a slightly more complex one to fix the
per-cpu overload problem introduced recently by kvm id changes."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: put modules in list much earlier.
module: add new state MODULE_STATE_UNFORMED.
module: prevent warning when finit_module a 0 sized file
virtio-blk: Don't free ida when disk is in use
Pull misc syscall fixes from Al Viro:
- compat syscall fixes (discussed back in December)
- a couple of "make life easier for sigaltstack stuff by reducing
inter-tree dependencies"
- fix up compiler/asmlinkage calling convention disagreement of
sys_clone()
- misc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
sys_clone() needs asmlinkage_protect
make sure that /linuxrc has std{in,out,err}
x32: fix sigtimedwait
x32: fix waitid()
switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
Ensure that kernel_init_freeable() is not inlined into non __init code
The ia64 function "thread_matches()" has no users since commit
e868a55c2a ("[IA64] remove find_thread_for_addr()"). Remove it.
This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async. This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.
async A modprobe
1. finds a device
2. registers the block device
3. request_module(default iosched)
4. modprobe in userland
5. load and init module
6. async_synchronize_full()
Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().
Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult. For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.
This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full(). This is
hacky and incomplete. It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull tracing regression fixes from Steven Rostedt:
"The clean up patch commit 0fb9656d95 "tracing: Make tracing_enabled
be equal to tracing_on" caused two regressions.
1) The irqs off latency tracer no longer starts if tracing_on is off
when the tracer is set, and then tracing_on is enabled. The
tracing_on file needs the hook that tracing_enabled had to enable
tracers if they request it (call the tracer's start() method).
2) That commit had a separate change that really should have been a
separate patch, but it must have been added accidently with the -a
option of git commit. But as the change is still related to the
commit it wasn't noticed in review. That change, changed the way
blocking is done by the trace_pipe file with respect to the
tracing_on settings. I've been told that this change breaks
current userspace, and this specific change is being reverted."
* tag 'trace-3.8-rc3-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regression of trace_pipe
tracing: Fix regression with irqsoff tracer and tracing_on file
Commit 0fb9656d "tracing: Make tracing_enabled be equal to tracing_on"
changes the behaviour of trace_pipe, ie. it makes trace_pipe return if
we've read something and tracing is enabled, and this means that we have
to 'cat trace_pipe' again and again while running tests.
IMO the right way is if tracing is enabled, we always block and wait for
ring buffer, or we may lose what we want since ring buffer's size is limited.
Link: http://lkml.kernel.org/r/1358132051-5410-1-git-send-email-bo.li.liu@oracle.com
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Prarit's excellent bug report:
> In recent Fedora releases (F17 & F18) some users have reported seeing
> messages similar to
>
> [ 15.478160] kvm: Could not allocate 304 bytes percpu data
> [ 15.478174] PERCPU: allocation failed, size=304 align=32, alloc from
> reserved chunk failed
>
> during system boot. In some cases, users have also reported seeing this
> message along with a failed load of other modules.
>
> What is happening is systemd is loading an instance of the kvm module for
> each cpu found (see commit e9bda3b). When the module load occurs the kernel
> currently allocates the modules percpu data area prior to checking to see
> if the module is already loaded or is in the process of being loaded. If
> the module is already loaded, or finishes load, the module loading code
> releases the current instance's module's percpu data.
Now we have a new state MODULE_STATE_UNFORMED, we can insert the
module into the list (and thus guarantee its uniqueness) before we
allocate the per-cpu region.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Prarit Bhargava <prarit@redhat.com>
You should never look at such a module, so it's excised from all paths
which traverse the modules list.
We add the state at the end, to avoid gratuitous ABI break (ksplice).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
audit_log_start() performs the same jiffies comparison in two places.
If sufficient time has elapsed between the two comparisons, the second
one produces a negative sleep duration:
schedule_timeout: wrong timeout value fffffffffffffff0
Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
Call Trace:
schedule_timeout+0x305/0x340
audit_log_start+0x311/0x470
audit_log_exit+0x4b/0xfb0
__audit_syscall_exit+0x25f/0x2c0
sysret_audit+0x17/0x21
Fix it by performing the comparison a single time.
Reported-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process. While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.
In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection. This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Acked-by: Will Drewry <wad@chromium.org>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 02404baf1b "tracing: Remove deprecated tracing_enabled file"
removed the tracing_enabled file as it never worked properly and
the tracing_on file should be used instead. But the tracing_on file
didn't call into the tracers start/stop routines like the
tracing_enabled file did. This caused trace-cmd to break when it
enabled the irqsoff tracer.
If you just did "echo irqsoff > current_tracer" then it would work
properly. But the tool trace-cmd disables tracing first by writing
"0" into the tracing_on file. Then it writes "irqsoff" into
current_tracer and then writes "1" into tracing_on. Unfortunately,
the above commit changed the irqsoff tracer to check the tracing_on
status instead of the tracing_enabled status. If it's disabled then
it does not start the tracer internals.
The problem is that writing "1" into tracing_on does not call the
tracers "start" routine like writing "1" into tracing_enabled did.
This makes the irqsoff tracer not start when using the trace-cmd
tool, and is a regression for userspace.
Simple fix is to have the tracing_on file call the tracers start()
method when being enabled (and the stop() method when disabled).
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull tracing regression fix from Steven Rostedt:
"A change that came in this merge window broke the writing to the
trace_options file. It causes garbage to be read during the compare
of option names, and breaks setting options via the trace_options
file, although options can still be set via the options/<option>
files."
* tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regression of trace_options file setting
The latest change to allow trace options to be set on the command
line also broke the trace_options file.
The zeroing of the last byte of the option name that is echoed into
the trace_option file was removed with the consolidation of some
of the code. The compare between the option and what was written to
the trace_options file fails because the string holding the data
written doesn't terminate with a null character.
A zero needs to be added to the end of the string copied from
user space.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Merge emailed fixes from Andrew Morton:
"Bunch of fixes:
- delayed IPC updates. I held back on this because of some possible
outstanding bug reports, but they appear to have been addressed in
later versions
- A bunch of MAINTAINERS updates
- Yet Another RTC driver. I'd held this back while a couple of
little issues were being worked out.
I'm expecting an intrusive-but-simple patchset from Joe Perches which
splits up printk.c into kernel/printk/*. That will be a pig to
maintain for two months so if it passes testing I'd like to get it
upstream after a week or so."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
printk: fix incorrect length from print_time() when seconds > 99999
drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time
drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield
rtc: add RTC driver for TPS6586x
MAINTAINERS: fix drivers/staging/sm7xx/
MAINTAINERS: remove include/linux/of_pwm.h
MAINTAINERS: remove arch/*/lib/perf_event*.c
MAINTAINERS: remove drivers/mmc/host/imxmmc.*
MAINTAINERS: fix Documentation/mei/
MAINTAINERS: remove arch/x86/platform/mrst/pmu.*
MAINTAINERS: remove firmware/isci/
MAINTAINERS: fix drivers/ieee802154/
MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h
MAINTAINERS: remove drivers/video/epson1355fb.c
MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb*
MAINTAINERS: adjust for UAPI
MAINTAINERS: fix drivers/media/platform/atmel-isi.c
MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h
MAINTAINERS: fix drivers/rtc/rtc-vt8500.c
MAINTAINERS: remove arch/arm/plat-s5p/
...
Cleanup. And I think we need more cleanups, in particular
__set_current_blocked() and sigprocmask() should die. Nobody should
ever block SIGKILL or SIGSTOP.
- Change set_current_blocked() to use __set_current_blocked()
- Change sys_sigprocmask() to use set_current_blocked(), this way it
should not worry about SIGKILL/SIGSTOP.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value. However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).
The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix. This could be micro-optimized but it seems
better to have simpler, more readable code here.
The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer. If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:
# dmesg
klogctl: Bad address
When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:
syslog(0xa, 0, 0) = 1048576
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address)
As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e ("printk: flush continuation lines
immediately to console") in 3.5-rc5.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we try to finit_module on a file sized 0 bytes vmalloc will
scream and spit out a warning.
Since modules have to be bigger than 0 bytes anyways we can just
check that beforehand and avoid the warning.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It needs 64bit timespec. As it is, we end up truncating the timeout
to whole seconds; usually it doesn't matter, but for having all
sub-second timeouts truncated to one jiffy is visibly wrong.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It needs 64bit rusage and 32bit siginfo. glibc never calls it with
non-NULL rusage pointer, or we would've seen breakage already...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>