Pull livepatching fix from Jiri Kosina:
- fix for potential race with module loading, from Petr Mladek.
The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
MODULE_DEVICE_TABLE() macro used to create aliases to device tables.
Normally alias should have the same type as aliased symbol.
Device tables are arrays, so they have 'struct type##_device_id[x]'
types. Alias created by MODULE_DEVICE_TABLE() will have non-array type -
'struct type##_device_id'.
This inconsistency confuses compiler, it could make a wrong assumption
about variable's size which leads KASan to produce a false positive report
about out of bounds access.
For every global variable compiler calls __asan_register_globals() passing
information about global variable (address, size, size with redzone, name
...) __asan_register_globals() poison symbols redzone to detect possible
out of bounds accesses.
When symbol has an alias __asan_register_globals() will be called as for
symbol so for alias. Compiler determines size of variable by size of
variable's type. Alias and symbol have the same address, so if alias have
the wrong size part of memory that actually belongs to the symbol could be
poisoned as redzone of alias symbol.
By fixing type of alias symbol we will fix size of it, so
__asan_register_globals() will not poison valid memory.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace module_ref per-cpu complex reference counter with
an atomic_t simple refcnt. This is for code simplification.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The within_module*() functions return only true or false. Let's use bool as
the return type.
Note that it should not change kABI because these are inline functions.
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It is just a small optimization that allows to replace few
occurrences of within_module_init() || within_module_core()
with a single call.
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
MODULE_DEVICE_TABLE() calles MODULE_GENERIC_TABLE(); make it do the
work directly. This also removes a wart introduced in the last patch,
where the alias is defined to be an unknown struct type "struct
type##__##name##_device_id" instead of "struct type##_device_id" (it's
an extern so GCC doesn't care, but it's wrong).
The other user of MODULE_GENERIC_TABLE (ISAPNP_CARD_TABLE) is unused,
so delete it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 78551277e4: "Input: i8042 - add PNP modaliases" had a bug, where the
second call to MODULE_DEVICE_TABLE() overrode the first resulting in not all
the modaliases being exposed.
This fixes the problem by including the name of the device_id table in the
__mod_*_device_table alias, allowing us to export several device_id tables
per module.
Suggested-by: Kay Sievers <kay@vrfy.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There's nothing in the module.h header that requires tracepoint.h to be
included, and there may be cases that tracepoint.h may need to include
module.h, which will cause recursive header issues.
But module.h requires seeing HAVE_JUMP_LABEL which is set in jump_label.h
which it just coincidentally gets from tracepoint.h.
Link: http://lkml.kernel.org/r/20140307084712.5c68641a@gandalf.local.home
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The option to wait for a module reference count to reach zero was in
the initial module implementation, but it was never supported in
modprobe (you had to use rmmod --wait). After discussion with Lucas,
It has been deprecated (with a 10 second sleep) in kmod for the last
year.
This finally removes it: the flag will evoke a printk warning and a
normal (non-blocking) remove attempt.
Cc: Lucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Additional and optional dependencies not found while building the kernel and
modules, can now be declared explicitly.
Signed-off-by: Andreas Robinson <andr345@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We have CONFIG_SYMBOL_PREFIX, which three archs define to the string
"_". But Al Viro broke this in "consolidate cond_syscall and
SYSCALL_ALIAS declarations" (in linux-next), and he's not the first to
do so.
Using CONFIG_SYMBOL_PREFIX is awkward, since we usually just want to
prefix it so something. So various places define helpers which are
defined to nothing if CONFIG_SYMBOL_PREFIX isn't set:
1) include/asm-generic/unistd.h defines __SYMBOL_PREFIX.
2) include/asm-generic/vmlinux.lds.h defines VMLINUX_SYMBOL(sym)
3) include/linux/export.h defines MODULE_SYMBOL_PREFIX.
4) include/linux/kernel.h defines SYMBOL_PREFIX (which differs from #7)
5) kernel/modsign_certificate.S defines ASM_SYMBOL(sym)
6) scripts/modpost.c defines MODULE_SYMBOL_PREFIX
7) scripts/Makefile.lib defines SYMBOL_PREFIX on the commandline if
CONFIG_SYMBOL_PREFIX is set, so that we have a non-string version
for pasting.
(arch/h8300/include/asm/linkage.h defines SYMBOL_NAME(), too).
Let's solve this properly:
1) No more generic prefix, just CONFIG_HAVE_UNDERSCORE_SYMBOL_PREFIX.
2) Make linux/export.h usable from asm.
3) Define VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR().
4) Make everyone use them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com> (metag)
These helper functions just check a set intersection with a range, and
don't actually modify struct module.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
You should never look at such a module, so it's excised from all paths
which traverse the modules list.
We add the state at the end, to avoid gratuitous ABI break (ksplice).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do a very simple search for a particular string appended to the module
(which is cache-hot and about to be SHA'd anyway). There's both a config
option and a boot parameter which control whether we accept or fail with
unsigned modules and modules that are signed with an unknown key.
If module signing is enabled, the kernel will be tainted if a module is
loaded that is unsigned or has a signature for which we don't have the
key.
(Useful feedback and tweaks by David Howells <dhowells@redhat.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
module_ref contains two "unsigned int" fields.
Thats now too small, since some machines can open more than 2^32 files.
Check commit 518de9b39e (fs: allow for more than 2^31 files) for
reference.
We can add an aligned(2 * sizeof(unsigned long)) attribute to force
alloc_percpu() allocating module_ref areas in single cache lines.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Tejun Heo <tj@kernel.org>
CC: Robin Holt <holt@sgi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There are files which use module_param and MODULE_PARM_DESC
back to back. They only include moduleparam.h which makes sense,
but the implicit presence of module.h everywhere hid the fact
that MODULE_PARM_DESC wasn't in moduleparam.h at all. Relocate
the macro to moduleparam.h so that the moduleparam infrastructure
can be used independently of module.h
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
A lot of files pull in module.h when all they are really
looking for is the basic EXPORT_SYMBOL functionality. The
recent data from Ingo[1] shows that this is one of several
instances that has a significant impact on compile times,
and it should be targeted for factoring out (as done here).
Note that several commonly used header files in include/*
directly include <linux/module.h> themselves (some 34 of them!)
The most commonly used ones of these will have to be made
independent of module.h before the full benefit of this change
can be realized.
We also transition THIS_MODULE from module.h to export.h,
since there are lots of files with subsystem structs that
in turn will have a struct module *owner and only be doing:
.owner = THIS_MODULE;
and absolutely nothing else modular. So, we also want to have
the THIS_MODULE definition present in the lightweight header.
[1] https://lkml.org/lkml/2011/5/23/76
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Copy the information needed from struct module into a local module list
held within tracepoint.c from within the module coming/going notifier.
This vastly simplifies locking of tracepoint registration /
unregistration, because we don't have to take the module mutex to
register and unregister tracepoints anymore. Steven Rostedt ran into
dependency problems related to modules mutex vs kprobes mutex vs ftrace
mutex vs tracepoint mutex that seems to be hard to fix without removing
this dependency between tracepoint and module mutex. (note: it should be
investigated whether kprobes could benefit of being dissociated from the
modules mutex too.)
This also fixes module handling of tracepoint list iterators, because it
was expecting the list to be sorted by pointer address. Given we have
control on our own list now, it's OK to sort this list which has
tracepoints as its only purpose. The reason why this sorting is required
is to handle the fact that seq files (and any read() operation from
user-space) cannot hold the tracepoint mutex across multiple calls, so
list entries may vanish between calls. With sorting, the tracepoint
iterator becomes usable even if the list don't contain the exact item
pointed to by the iterator anymore.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Jason Baron <jbaron@redhat.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20110810191839.GC8525@Krystal
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>