Now that we do not call kernel_thread(CLONE_VFORK) from the worker
thread we can not deadlock if do_execve() in turn triggers another
call_usermodehelper(), we can remove the kmod_thread_locker code.
Note: we should probably kill khelper_wq and simply use one of the
global workqueues, say, system_unbound_wq, this special wq for umh buys
nothing nowadays.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After "kernel/kmod: fix use-after-free of the sub_infostructure"
CLONE_VFORK in __call_usermodehelper() buys nothing, we rely on on
umh_complete() in ____call_usermodehelper() anyway.
Remove it. This also eliminates the unnecessary sleep/wakeup in the
likely case, and this allows the next change.
While at it, kill the "int wait" locals in ____call_usermodehelper() and
__call_usermodehelper(), they can safely use sub_info->wait.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found this in the message log on a s390 system:
BUG kmalloc-192 (Not tainted): Poison overwritten
Disabling lock debugging due to kernel taint
INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b
INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648
__slab_alloc.isra.47.constprop.56+0x5f6/0x658
kmem_cache_alloc_trace+0x106/0x408
call_usermodehelper_setup+0x70/0x128
call_usermodehelper+0x62/0x90
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648
__slab_free+0x94/0x560
kfree+0x364/0x3e0
call_usermodehelper_exec+0x110/0x1b8
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
There is a use-after-free bug on the subprocess_info structure allocated
by the user mode helper. In case do_execve() returns with an error
____call_usermodehelper() stores the error code to sub_info->retval, but
sub_info can already have been freed.
Regarding UMH_NO_WAIT, the sub_info structure can be freed by
__call_usermodehelper() before the worker thread returns from
do_execve(), allowing memory corruption when do_execve() failed after
exec_mmap() is called.
Regarding UMH_WAIT_EXEC, the call to umh_complete() allows
call_usermodehelper_exec() to continue which then frees sub_info.
To fix this race the code needs to make sure that the call to
call_usermodehelper_freeinfo() is always done after the last store to
sub_info->retval.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done. This is what the normal
users want, and it simplifies and streamlines their error handling.
The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.
To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.
As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec(). That would be a separate cleanup.
Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If /proc/sys/kernel/core_pattern contains only "|", a NULL pointer
dereference happens upon core dump because argv_split("") returns
argv[0] == NULL.
This bug was once fixed by commit 264b83c07a ("usermodehelper: check
subprocess_info->path != NULL") but was by error reintroduced by commit
7f57cfa4e2 ("usermodehelper: kill the sub_info->path[0] check").
This bug seems to exist since 2.6.19 (the version which core dump to
pipe was added). Depending on kernel version and config, some side
effect might happen immediately after this oops (e.g. kernel panic with
2.6.32-358.18.1.el6).
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
call_usermodehelper_exec() does nothing but returns success if path[0] ==
0. The only user which needs this strange feature is request_module(), it
can check modprobe_path[0] itself like other users do if they want to
detect the "disabled by admin" case.
Kill it. Not only it looks strange, it can confuse other callers. And
this allows us to revert 264b83c0 ("usermodehelper: check
subprocess_info->path != NULL"), do_execve(NULL) is safe.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
argv_split(empty_or_all_spaces) happily succeeds, it simply returns
argc == 0 and argv[0] == NULL. Change call_usermodehelper_exec() to
check sub_info->path != NULL to avoid the crash.
This is the minimal fix, todo:
- perhaps we should change argv_split() to return NULL or change the
callers.
- kill or justify ->path[0] check
- narrow the scope of helper_lock()
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-By: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
call_usermodehelper_setup() + call_usermodehelper_exec() need to be
called instead of call_usermodehelper_fns() when the cleanup function
needs to be called even when an ENOMEM error occurs. In this case using
call_usermodehelper_fns() the user can't distinguish if the cleanup
function was called or not.
[akpm@linux-foundation.org: export call_usermodehelper_setup() to modules]
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Synchronous requet_module() from an async worker can lead to deadlock
because module init path may invoke async_synchronize_full(). The
async worker waits for request_module() to complete and the module
loading waits for the async task to finish. This bug happened in the
block layer because of default elevator auto-loading.
Block layer has been updated not to do default elevator auto-loading
and it has been decided to disallow synchronous request_module() from
async workers.
Trigger WARN_ON_ONCE() on synchronous request_module() from async
workers.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
All architectures have
CONFIG_GENERIC_KERNEL_THREAD
CONFIG_GENERIC_KERNEL_EXECVE
__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* allow kernel_execve() leave the actual return to userland to
caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers
updated accordingly.
* architecture that does select GENERIC_KERNEL_EXECVE in its
Kconfig should have its ret_from_kernel_thread() do this:
call schedule_tail
call the callback left for it by copy_thread(); if it ever
returns, that's because it has just done successful kernel_execve()
jump to return from syscall
IOW, its only difference from ret_from_fork() is that it does call the
callback.
* such an architecture should also get rid of ret_from_kernel_execve()
and __ARCH_WANT_KERNEL_EXECVE
This is the last part of infrastructure patches in that area - from
that point on work on different architectures can live independently.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Most of them never returned anyway - only two functions had to be
changed. That allows to simplify their callers a whole lot.
Note that this does *not* apply to kthread_run() callbacks - all of
those had been called from the same kernel_thread() callback, which
did do_exit() already. This is strictly about very few low-level
kernel_thread() callbacks (there are only 6 of those, mostly as part
of kthread.h and kmod.h exported mechanisms, plus kernel_init()
itself).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The system deadlocks (at least since 2.6.10) when
call_usermodehelper(UMH_WAIT_EXEC) request triggers
call_usermodehelper(UMH_WAIT_PROC) request.
This is because "khelper thread is waiting for the worker thread at
wait_for_completion() in do_fork() since the worker thread was created
with CLONE_VFORK flag" and "the worker thread cannot call complete()
because do_execve() is blocked at UMH_WAIT_PROC request" and "the khelper
thread cannot start processing UMH_WAIT_PROC request because the khelper
thread is waiting for the worker thread at wait_for_completion() in
do_fork()".
The easiest example to observe this deadlock is to use a corrupted
/sbin/hotplug binary (like shown below).
# : > /tmp/dummy
# chmod 755 /tmp/dummy
# echo /tmp/dummy > /proc/sys/kernel/hotplug
# modprobe whatever
call_usermodehelper("/tmp/dummy", UMH_WAIT_EXEC) is called from
kobject_uevent_env() in lib/kobject_uevent.c upon loading/unloading a
module. do_execve("/tmp/dummy") triggers a call to
request_module("binfmt-0000") from search_binary_handler() which in turn
calls call_usermodehelper(UMH_WAIT_PROC).
In order to avoid deadlock, as a for-now and easy-to-backport solution, do
not try to call wait_for_completion() in call_usermodehelper_exec() if the
worker thread was created by khelper thread with CLONE_VFORK flag. Future
and fundamental solution might be replacing singleton khelper thread with
some workqueue so that recursive calls up to max_active dependency loop
can be handled without deadlock.
[akpm@linux-foundation.org: add comment to kmod_thread_locker]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
call_usermodehelper_setup
call_usermodehelper_setfns
call_usermodehelper_exec
And make all of them static to kmod.c
Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns(). So we loose the call to
_fns but gain 3 calls to the helpers. (Not that it matters)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race condition between the freezer and request_firmware()
such that if request_firmware() is run on one CPU and
freeze_processes() is run on another CPU and usermodehelper_disable()
called by it succeeds to grab umhelper_sem for writing before
usermodehelper_read_trylock() called from request_firmware()
acquires it for reading, the request_firmware() will fail and
trigger a WARN_ON() complaining that it was called at a wrong time.
However, in fact, it wasn't called at a wrong time and
freeze_processes() simply happened to be executed simultaneously.
To avoid this race, at least in some cases, modify
usermodehelper_read_trylock() so that it doesn't fail if the
freezing of tasks has just started and hasn't been completed yet.
Instead, during the freezing of tasks, it will try to freeze the
task that has called it so that it can wait until user space is
thawed without triggering the scary warning.
For this purpose, change usermodehelper_disabled so that it can
take three different values, UMH_ENABLED (0), UMH_FREEZING and
UMH_DISABLED. The first one means that usermode helpers are
enabled, the last one means "hard disable" (i.e. the system is not
ready for usermode helpers to be used) and the second one
is reserved for the freezer. Namely, when freeze_processes() is
started, it sets usermodehelper_disabled to UMH_FREEZING which
tells usermodehelper_read_trylock() that it shouldn't fail just
yet and should call try_to_freeze() if woken up and cannot
return immediately. This way all freezable tasks that happen
to call request_firmware() right before freeze_processes() is
started and lose the race for umhelper_sem with it will be
frozen and will sleep until thaw_processes() unsets
usermodehelper_disabled. [For the non-freezable callers of
request_firmware() the race for umhelper_sem against
freeze_processes() is unfortunately unavoidable.]
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
If firmware is requested asynchronously, by calling
request_firmware_nowait(), there is no reason to fail the request
(and warn the user) when the system is (presumably temporarily)
unready to handle it (because user space is not available yet or
frozen). For this reason, introduce an alternative routine for
read-locking umhelper_sem, usermodehelper_read_lock_wait(), that
will wait for usermodehelper_disabled to be unset (possibly with
a timeout) and make request_firmware_work_func() use it instead of
usermodehelper_read_trylock().
Accordingly, modify request_firmware() so that it uses
usermodehelper_read_trylock() to acquire umhelper_sem and remove
the code related to that lock from _request_firmware().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Instead of two functions, read_lock_usermodehelper() and
usermodehelper_is_disabled(), used in combination, introduce
usermodehelper_read_trylock() that will only return with umhelper_sem
held if usermodehelper_disabled is unset (and will return -EAGAIN
otherwise) and make _request_firmware() use it.
Rename read_unlock_usermodehelper() to
usermodehelper_read_unlock() to follow the naming convention of the
new function.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org