Commit Graph

14670 Commits

Author SHA1 Message Date
Linus Torvalds
ee61abb322 module: fix missing module_mutex unlock
Commit 1fb9341ac3 ("module: put modules in list much earlier") moved
some of the module initialization code around, and in the process
changed the exit paths too.  But for the duplicate export symbol error
case the change made the ddebug_cleanup path jump to after the module
mutex unlock, even though it happens with the mutex held.

Rusty has some patches to split this function up into some helper
functions, hopefully the mess of complex goto targets will go away
eventually.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-20 20:22:58 -08:00
Linus Torvalds
226364766f Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes and a virtio block fix from Rusty Russell:
 "Various minor fixes, but a slightly more complex one to fix the
  per-cpu overload problem introduced recently by kvm id changes."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  module: put modules in list much earlier.
  module: add new state MODULE_STATE_UNFORMED.
  module: prevent warning when finit_module a 0 sized file
  virtio-blk: Don't free ida when disk is in use
2013-01-20 16:44:28 -08:00
Linus Torvalds
3a142ed962 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull misc syscall fixes from Al Viro:

 - compat syscall fixes (discussed back in December)

 - a couple of "make life easier for sigaltstack stuff by reducing
   inter-tree dependencies"

 - fix up compiler/asmlinkage calling convention disagreement of
   sys_clone()

 - misc

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  sys_clone() needs asmlinkage_protect
  make sure that /linuxrc has std{in,out,err}
  x32: fix sigtimedwait
  x32: fix waitid()
  switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
  switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
  CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
  Ensure that kernel_init_freeable() is not inlined into non __init code
2013-01-20 13:58:48 -08:00
Oleg Nesterov
edea0d03ee ia64: kill thread_matches(), unexport ptrace_check_attach()
The ia64 function "thread_matches()" has no users since commit
e868a55c2a ("[IA64] remove find_thread_for_addr()").  Remove it.

This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-20 12:26:05 -08:00
Al Viro
b1e0318b8c sys_clone() needs asmlinkage_protect
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-19 22:13:34 -05:00
Tejun Heo
774a1221e8 module, async: async_synchronize_full() on module init iff async is used
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async.  This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.

 async A				modprobe

 1. finds a device
 2. registers the block device
 3. request_module(default iosched)
					4. modprobe in userland
					5. load and init module
					6. async_synchronize_full()

Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().

Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult.  For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.

This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full().  This is
hacky and incomplete.  It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.

For more details, please refer to the following thread.

  http://thread.gmane.org/gmane.linux.kernel/1420814

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-16 09:05:33 -08:00
Linus Torvalds
406089d015 Merge tag 'trace-3.8-rc3-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing regression fixes from Steven Rostedt:
 "The clean up patch commit 0fb9656d95 "tracing: Make tracing_enabled
  be equal to tracing_on" caused two regressions.

   1) The irqs off latency tracer no longer starts if tracing_on is off
      when the tracer is set, and then tracing_on is enabled.  The
      tracing_on file needs the hook that tracing_enabled had to enable
      tracers if they request it (call the tracer's start() method).

   2) That commit had a separate change that really should have been a
      separate patch, but it must have been added accidently with the -a
      option of git commit.  But as the change is still related to the
      commit it wasn't noticed in review.  That change, changed the way
      blocking is done by the trace_pipe file with respect to the
      tracing_on settings.  I've been told that this change breaks
      current userspace, and this specific change is being reverted."

* tag 'trace-3.8-rc3-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix regression of trace_pipe
  tracing: Fix regression with irqsoff tracer and tracing_on file
2013-01-14 20:22:16 -08:00
Liu Bo
250bfd3d8e tracing: Fix regression of trace_pipe
Commit 0fb9656d "tracing: Make tracing_enabled be equal to tracing_on"
changes the behaviour of trace_pipe, ie. it makes trace_pipe return if
we've read something and tracing is enabled, and this means that we have
to 'cat trace_pipe' again and again while running tests.

IMO the right way is if tracing is enabled, we always block and wait for
ring buffer, or we may lose what we want since ring buffer's size is limited.

Link: http://lkml.kernel.org/r/1358132051-5410-1-git-send-email-bo.li.liu@oracle.com

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-14 13:13:32 -05:00
Rusty Russell
1fb9341ac3 module: put modules in list much earlier.
Prarit's excellent bug report:
> In recent Fedora releases (F17 & F18) some users have reported seeing
> messages similar to
>
> [   15.478160] kvm: Could not allocate 304 bytes percpu data
> [   15.478174] PERCPU: allocation failed, size=304 align=32, alloc from
> reserved chunk failed
>
> during system boot.  In some cases, users have also reported seeing this
> message along with a failed load of other modules.
>
> What is happening is systemd is loading an instance of the kvm module for
> each cpu found (see commit e9bda3b).  When the module load occurs the kernel
> currently allocates the modules percpu data area prior to checking to see
> if the module is already loaded or is in the process of being loaded.  If
> the module is already loaded, or finishes load, the module loading code
> releases the current instance's module's percpu data.

Now we have a new state MODULE_STATE_UNFORMED, we can insert the
module into the list (and thus guarantee its uniqueness) before we
allocate the per-cpu region.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Prarit Bhargava <prarit@redhat.com>
2013-01-12 13:27:46 +10:30
Rusty Russell
0d21b0e347 module: add new state MODULE_STATE_UNFORMED.
You should never look at such a module, so it's excised from all paths
which traverse the modules list.

We add the state at the end, to avoid gratuitous ABI break (ksplice).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-12 13:27:05 +10:30
Andrew Morton
829199197a kernel/audit.c: avoid negative sleep durations
audit_log_start() performs the same jiffies comparison in two places.
If sufficient time has elapsed between the two comparisons, the second
one produces a negative sleep duration:

  schedule_timeout: wrong timeout value fffffffffffffff0
  Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
  Call Trace:
    schedule_timeout+0x305/0x340
    audit_log_start+0x311/0x470
    audit_log_exit+0x4b/0xfb0
    __audit_syscall_exit+0x25f/0x2c0
    sysret_audit+0x17/0x21

Fix it by performing the comparison a single time.

Reported-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:56 -08:00
Kees Cook
0644ec0cc8 audit: catch possible NULL audit buffers
It's possible for audit_log_start() to return NULL.  Handle it in the
various callers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Cc: Will Drewry <wad@google.com>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
Kees Cook
7b9205bd77 audit: create explicit AUDIT_SECCOMP event type
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process.  While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.

In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection.  This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Julien Tinnes <jln@google.com>
Acked-by: Will Drewry <wad@chromium.org>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
Jiri Kosina
1b963c81b1 lockdep, rwsem: provide down_write_nest_lock()
down_write_nest_lock() provides a means to annotate locking scenario
where an outer lock is guaranteed to serialize the order nested locks
are being acquired.

This is analogoue to already existing mutex_lock_nest_lock() and
spin_lock_nest_lock().

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mel Gorman <mel@csn.ul.ie>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11 14:54:55 -08:00
Steven Rostedt
2df8f8a6a8 tracing: Fix regression with irqsoff tracer and tracing_on file
Commit 02404baf1b "tracing: Remove deprecated tracing_enabled file"
removed the tracing_enabled file as it never worked properly and
the tracing_on file should be used instead. But the tracing_on file
didn't call into the tracers start/stop routines like the
tracing_enabled file did. This caused trace-cmd to break when it
enabled the irqsoff tracer.

If you just did "echo irqsoff > current_tracer" then it would work
properly. But the tool trace-cmd disables tracing first by writing
"0" into the tracing_on file. Then it writes "irqsoff" into
current_tracer and then writes "1" into tracing_on. Unfortunately,
the above commit changed the irqsoff tracer to check the tracing_on
status instead of the tracing_enabled status. If it's disabled then
it does not start the tracer internals.

The problem is that writing "1" into tracing_on does not call the
tracers "start" routine like writing "1" into tracing_enabled did.
This makes the irqsoff tracer not start when using the trace-cmd
tool, and is a regression for userspace.

Simple fix is to have the tracing_on file call the tracers start()
method when being enabled (and the stop() method when disabled).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-11 16:14:10 -05:00
Randy Dunlap
bfbbd96c51 audit: fix auditfilter.c kernel-doc warnings
Fix new kernel-doc warning in auditfilter.c:

  Warning(kernel/auditfilter.c:1157): Excess function parameter 'uid' description in 'audit_receive_filter'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: linux-audit@redhat.com (subscribers-only)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10 14:35:23 -08:00
Linus Torvalds
4ffd4ebf9d Merge tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing regression fix from Steven Rostedt:
 "A change that came in this merge window broke the writing to the
  trace_options file.  It causes garbage to be read during the compare
  of option names, and breaks setting options via the trace_options
  file, although options can still be set via the options/<option>
  files."

* tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix regression of trace_options file setting
2013-01-10 09:03:16 -08:00
Steven Rostedt
a8dd2176a8 tracing: Fix regression of trace_options file setting
The latest change to allow trace options to be set on the command
line also broke the trace_options file.

The zeroing of the last byte of the option name that is echoed into
the trace_option file was removed with the consolidation of some
of the code. The compare between the option and what was written to
the trace_options file fails because the string holding the data
written doesn't terminate with a null character.

A zero needs to be added to the end of the string copied from
user space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-09 20:54:17 -05:00
Linus Torvalds
d0631c6e09 Merge branch 'akpm' (fixes from Andrew)
Merge emailed fixes from Andrew Morton:
 "Bunch of fixes:

   - delayed IPC updates.  I held back on this because of some possible
     outstanding bug reports, but they appear to have been addressed in
     later versions

   - A bunch of MAINTAINERS updates

   - Yet Another RTC driver.  I'd held this back while a couple of
     little issues were being worked out.

  I'm expecting an intrusive-but-simple patchset from Joe Perches which
  splits up printk.c into kernel/printk/*.  That will be a pig to
  maintain for two months so if it passes testing I'd like to get it
  upstream after a week or so."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
  printk: fix incorrect length from print_time() when seconds > 99999
  drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time
  drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield
  rtc: add RTC driver for TPS6586x
  MAINTAINERS: fix drivers/staging/sm7xx/
  MAINTAINERS: remove include/linux/of_pwm.h
  MAINTAINERS: remove arch/*/lib/perf_event*.c
  MAINTAINERS: remove drivers/mmc/host/imxmmc.*
  MAINTAINERS: fix Documentation/mei/
  MAINTAINERS: remove arch/x86/platform/mrst/pmu.*
  MAINTAINERS: remove firmware/isci/
  MAINTAINERS: fix drivers/ieee802154/
  MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h
  MAINTAINERS: remove drivers/video/epson1355fb.c
  MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb*
  MAINTAINERS: adjust for UAPI
  MAINTAINERS: fix drivers/media/platform/atmel-isi.c
  MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h
  MAINTAINERS: fix drivers/rtc/rtc-vt8500.c
  MAINTAINERS: remove arch/arm/plat-s5p/
  ...
2013-01-07 07:42:38 -08:00
Oleg Nesterov
0c4a842349 signals: set_current_blocked() can use __set_current_blocked()
Cleanup.  And I think we need more cleanups, in particular
__set_current_blocked() and sigprocmask() should die.  Nobody should
ever block SIGKILL or SIGSTOP.

 - Change set_current_blocked() to use __set_current_blocked()

 - Change sys_sigprocmask() to use set_current_blocked(), this way it
   should not worry about SIGKILL/SIGSTOP.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 19:34:54 -08:00
Oleg Nesterov
5ba53ff648 signals: sys_ssetmask() uses uninitialized newmask
Commit 77097ae503 ("most of set_current_blocked() callers want
SIGKILL/SIGSTOP removed from set") removed the initialization of newmask
by accident, causing ltp to complain like this:

  ssetmask01    1  TFAIL  :  sgetmask() failed: TEST_ERRNO=???(0): Success

Restore the proper initialization.

Reported-and-tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org	# v3.5+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05 19:34:54 -08:00
Roland Dreier
35dac27ced printk: fix incorrect length from print_time() when seconds > 99999
print_prefix() passes a NULL buf to print_time() to get the length of
the time prefix; when printk times are enabled, the current code just
returns the constant 15, which matches the format "[%5lu.%06lu] " used
to print the time value.  However, this is obviously incorrect when the
whole seconds part of the time gets beyond 5 digits (100000 seconds is a
bit more than a day of uptime).

The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual
length of the time prefix.  This could be micro-optimized but it seems
better to have simpler, more readable code here.

The bug leads to the syslog system call miscomputing which messages fit
into the userspace buffer.  If there are enough messages to fill
log_buf_len and some have a timestamp >= 100000, dmesg may fail with:

    # dmesg
    klogctl: Bad address

When this happens, strace shows that the failure is indeed EFAULT due to
the kernel mistakenly accessing past the end of dmesg's buffer, since
dmesg asks the kernel how big a buffer it needs, allocates a bit more,
and then gets an error when it asks the kernel to fill it:

    syslog(0xa, 0, 0)                       = 1048576
    mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000
    syslog(0x3, 0x7fa4d25d2010, 0x100008)   = -1 EFAULT (Bad address)

As far as I can see, the bug has been there as long as print_time(),
which comes from commit 084681d14e ("printk: flush continuation lines
immediately to console") in 3.5-rc5.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-04 16:11:48 -08:00
Sasha Levin
52441fa8f2 module: prevent warning when finit_module a 0 sized file
If we try to finit_module on a file sized 0 bytes vmalloc will
scream and spit out a warning.

Since modules have to be bigger than 0 bytes anyways we can just
check that beforehand and avoid the warning.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-03 11:10:32 +10:30
Al Viro
b2ddedcd21 x32: fix sigtimedwait
It needs 64bit timespec.  As it is, we end up truncating the timeout
to whole seconds; usually it doesn't matter, but for having all
sub-second timeouts truncated to one jiffy is visibly wrong.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-26 01:15:03 -05:00
Al Viro
a566c28882 x32: fix waitid()
It needs 64bit rusage and 32bit siginfo.  glibc never calls it with
non-NULL rusage pointer, or we would've seen breakage already...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-26 01:15:03 -05:00