Several drivers use asynchronous work to do device discovery, and we
synchronize with them in the compiled-in case before we actually try to
mount root filesystems etc.
However, when compiled as modules, that synchronization is missing - the
module loading completes, but the driver hasn't actually finished
probing for devices, and that means that any user mode that expects to
use the devices after the 'insmod' is now potentially broken.
We already saw one case of a similar issue in the ACPI battery code,
where the kernel itself expected the module to be all done, and unmapped
the init memory - but the async device discovery was still running.
That got hacked around by just removing the "__init" (see commit
5d38258ec0 "ACPI battery: fix async boot
oops"), but the real fix is to just make the module loading wait for all
async work to be completed.
It will slow down module loading, but since common devices should be
built in anyway, and since the bug is really annoying and hard to handle
from user space (and caused several S3 resume regressions), the simple
fix to wait is the right one.
This fixes at least
http://bugzilla.kernel.org/show_bug.cgi?id=13063
but probably a few other bugzilla entries too (12936, for example), and
is confirmed to fix Rafael's storage driver breakage after resume bug
report (no bugzilla entry).
We should also be able to now revert that ACPI battery fix.
Reported-and-tested-by: Rafael J. Wysocki <rjw@suse.com>
Tested-by: Heinz Diehl <htd@fancy-poultry.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 9cb610d8e3.
This was an impressively stupid patch. Firstly, we reset the SHF_ALLOC
flag lower down in the same function, so the patch was useless. Even
better, find_sec() ignores sections with SHF_ALLOC not set, so
it breaks CONFIG_MODVERSIONS=y with CONFIG_MODULE_FORCE_LOAD=n, which
refuses to load the module since it can't find the __versions section.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits)
tracing, net: fix net tree and tracing tree merge interaction
tracing, powerpc: fix powerpc tree and tracing tree interaction
ring-buffer: do not remove reader page from list on ring buffer free
function-graph: allow unregistering twice
trace: make argument 'mem' of trace_seq_putmem() const
tracing: add missing 'extern' keywords to trace_output.h
tracing: provide trace_seq_reserve()
blktrace: print out BLK_TN_MESSAGE properly
blktrace: extract duplidate code
blktrace: fix memory leak when freeing struct blk_io_trace
blktrace: fix blk_probes_ref chaos
blktrace: make classic output more classic
blktrace: fix off-by-one bug
blktrace: fix the original blktrace
blktrace: fix a race when creating blk_tree_root in debugfs
blktrace: fix timestamp in binary output
tracing, Text Edit Lock: cleanup
tracing: filter fix for TRACE_EVENT_FORMAT events
ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()
x86: kretprobe-booster interrupt emulation code fix
...
Fix up trivial conflicts in
arch/parisc/include/asm/ftrace.h
include/linux/memory.h
kernel/extable.c
kernel/module.c
Impact: minor cleanup.
I'm not going to neaten anyone else's code, but I'm happy to clean up
my own.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Kay Sievers <kay.sievers@vrfy.org> discovered that boot times are slowed
by about half a second because all the stop_machine_create() calls,
and he only probes about 40 modules (I have 125 loaded on this laptop).
We only do stop_machine_create() so we can unlink the module if
something goes wrong, but it's overkill (and buggy anyway: if
stop_machine_create() fails we still call stop_machine_destroy()).
Since we are only protecting against kallsyms (esp. oops) walking the
list, synchronize_sched() is sufficient (synchronize_rcu() is probably
sufficient, but we're not in a hurry).
Kay says of this patch:
... no module takes more than 40 millisecs to link now, most of
them are between 3 and 8 millisecs.
That looks very different to the numbers without this patch
and the otherwise same setup, where we get heavy noise in the
traces and many delays of up to 200 millisecs until linking,
most of them taking 30+ millisecs.
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With CONFIG_MODVERSIONS, we version 'struct module' using a dummy
export, but other things matter too:
1) 'struct modversion_info' determines the layout of the __versions section,
2) 'struct kernel_param' determines the layout of the __params section,
3) 'struct kernel_symbol' determines __ksymtab*.
4) 'struct marker' determines __markers.
5) 'struct tracepoint' determines __tracepoints.
So we rename 'struct_module' to 'module_layout' and include these in
the signature. Now it's general we can add others later on without
confusion.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: reduce kernel memory usage
This patch just takes off the SHF_ALLOC flag on __versions so we don't
keep them around after module load.
This saves about 7% of module memory if CONFIG_MODVERSIONS=y.
Cc: Shawn Bohrer <shawn.bohrer@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Message cleanup
Two of three callers of try_to_force_load() are not because of a
missing version, so change the messages:
Old:
<modname>: no version for "magic" found: kernel tainted.
New:
<modname>: bad vermagic: kernel tainted.
Old:
<modname>: no version for "nocrc" found: kernel tainted.
New:
<modname>: no versions for exported symbols: kernel tainted.
Old:
<modname>: no version for "<symname>" found: kernel tainted.
New:
<modname>: <symname>: kernel tainted.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Expose some module.c symbols
Ksplice uses several functions from module.c in order to resolve
symbols and implement dependency handling. Calling these functions
requires holding module_mutex, so it is exported.
(This is just the module part of a bigger add-exports patch from Tim).
Cc: Anders Kaseorg <andersk@mit.edu>
Cc: Jeff Arnold <jbarnold@mit.edu>
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: New API
kallsyms_lookup_name only returns the first match that it finds. Ksplice
needs information about all symbols with a given name in order to correctly
resolve local symbols.
kallsyms_on_each_symbol provides a generic mechanism for iterating over the
kallsyms table.
Cc: Jeff Arnold <jbarnold@mit.edu>
Cc: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Replace and remove risky (non-EXPORTed) API
module_text_address() returns a pointer to the module, which given locking
improvements in module.c, is useless except to test for NULL:
1) If the module can't go away, use __module_text_address.
2) Otherwise, just use is_module_text_address().
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: New API, cleanup
ksplice wants to know the bounds of a module, not just the module text.
It makes sense to have __module_address. We then implement
is_module_address and __module_text_address in terms of this (and
change is_module_text_address() to bool while we're at it).
Also, add proper kerneldoc for them all.
Cc: Anders Kaseorg <andersk@mit.edu>
Cc: Jeff Arnold <jbarnold@mit.edu>
Cc: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: Cleanup, internal API change
Ksplice needs access to the kernel_symbol structure in order to support
modifications to the exported symbol table.
Cc: Anders Kaseorg <andersk@mit.edu>
Cc: Jeff Arnold <jbarnold@mit.edu>
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (bugfix and style)
Impact: cleanup
Label 'free_init' is only used when defined(CONFIG_MODULE_UNLOAD) &&
defined(CONFIG_SMP), so move it inside to shut up gcc.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: fix crash on reading from /sys/module/.../ieee80211_default_rc_algo
The module_param type "charp" simply sets a char * pointer in the
module to the parameter in the commandline string: this is why we keep
the (mangled) module command line around. But when set via sysfs (as
about 11 charp parameters can be) this memory is freed on the way
out of the write(). Future reads hit random mem.
So we kstrdup instead: we have to check we're not in early commandline
parsing, and we have to note when we've used it so we can reliably
kfree the parameter when it's next overwritten, and also on module
unload.
(Thanks to Randy Dunlap for CONFIG_SYSFS=n fixes)
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Diagnosed-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.
The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.
for example, enabled all debug messages in module 'nf_conntrack':
echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
to disable them:
echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
A further explanation can be found in the documentation patch.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: fix ref-after-free crash on failed module load
Fix refptr bug: Change refptr allocation and release order not to access a module
data structure pointed by 'mod' after freeing mod->module_core.
This bug will cause kernel panic(e.g. failed to find undefined symbols).
This bug was reported on systemtap bugzilla.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Impact: add reserved allocation functionality and use it for module
percpu variables
This patch implements reserved allocation from the first chunk. When
setting up the first chunk, arch can ask to set aside certain number
of bytes right after the core static area which is available only
through a separate reserved allocator. This will be used primarily
for module static percpu variables on architectures with limited
relocation range to ensure that the module perpcu symbols are inside
the relocatable range.
If reserved area is requested, the first chunk becomes reserved and
isn't available for regular allocation. If the first chunk also
includes piggy-back dynamic allocation area, a separate chunk mapping
the same region is created to serve dynamic allocation. The first one
is called static first chunk and the second dynamic first chunk.
Although they share the page map, their different area map
initializations guarantee they serve disjoint areas according to their
purposes.
If arch doesn't setup reserved area, reserved allocation is handled
like any other allocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: new scalable dynamic percpu allocator which allows dynamic
percpu areas to be accessed the same way as static ones
Implement scalable dynamic percpu allocator which can be used for both
static and dynamic percpu areas. This will allow static and dynamic
areas to share faster direct access methods. This feature is optional
and enabled only when CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is defined by
arch. Please read comment on top of mm/percpu.c for details.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>