Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
1fb9341ac3 made our locking in
load_module more complicated: we grab the mutex once to insert the
module in the list, then again to upgrade it once it's formed.
Since the locking is self-contained, it's neater to do this in
separate functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 1fb9341ac3 ("module: put modules in list much earlier") moved
some of the module initialization code around, and in the process
changed the exit paths too. But for the duplicate export symbol error
case the change made the ddebug_cleanup path jump to after the module
mutex unlock, even though it happens with the mutex held.
Rusty has some patches to split this function up into some helper
functions, hopefully the mess of complex goto targets will go away
eventually.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull module fixes and a virtio block fix from Rusty Russell:
"Various minor fixes, but a slightly more complex one to fix the
per-cpu overload problem introduced recently by kvm id changes."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: put modules in list much earlier.
module: add new state MODULE_STATE_UNFORMED.
module: prevent warning when finit_module a 0 sized file
virtio-blk: Don't free ida when disk is in use
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async. This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.
async A modprobe
1. finds a device
2. registers the block device
3. request_module(default iosched)
4. modprobe in userland
5. load and init module
6. async_synchronize_full()
Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().
Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult. For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.
This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full(). This is
hacky and incomplete. It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prarit's excellent bug report:
> In recent Fedora releases (F17 & F18) some users have reported seeing
> messages similar to
>
> [ 15.478160] kvm: Could not allocate 304 bytes percpu data
> [ 15.478174] PERCPU: allocation failed, size=304 align=32, alloc from
> reserved chunk failed
>
> during system boot. In some cases, users have also reported seeing this
> message along with a failed load of other modules.
>
> What is happening is systemd is loading an instance of the kvm module for
> each cpu found (see commit e9bda3b). When the module load occurs the kernel
> currently allocates the modules percpu data area prior to checking to see
> if the module is already loaded or is in the process of being loaded. If
> the module is already loaded, or finishes load, the module loading code
> releases the current instance's module's percpu data.
Now we have a new state MODULE_STATE_UNFORMED, we can insert the
module into the list (and thus guarantee its uniqueness) before we
allocate the per-cpu region.
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Prarit Bhargava <prarit@redhat.com>
You should never look at such a module, so it's excised from all paths
which traverse the modules list.
We add the state at the end, to avoid gratuitous ABI break (ksplice).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If we try to finit_module on a file sized 0 bytes vmalloc will
scream and spit out a warning.
Since modules have to be bigger than 0 bytes anyways we can just
check that beforehand and avoid the warning.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull module update from Rusty Russell:
"Nothing all that exciting; a new module-from-fd syscall for those who
want to verify the source of the module (ChromeOS) and/or use standard
IMA on it or other security hooks."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Fix kbuild output when using default extra_certificates
MODSIGN: Avoid using .incbin in C source
modules: don't hand 0 to vmalloc.
module: Remove a extra null character at the top of module->strtab.
ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants
ASN.1: Define indefinite length marker constant
moduleparam: use __UNIQUE_ID()
__UNIQUE_ID()
MODSIGN: Add modules_sign make target
powerpc: add finit_module syscall.
ima: support new kernel module syscall
add finit_module syscall to asm-generic
ARM: add finit_module syscall to ARM
security: introduce kernel_module_from_file hook
module: add flags arg to sys_finit_module()
module: add syscall to load module from fd
In commit 9c0ece069b ("Get rid of Documentation/feature-removal.txt"),
Linus removed feature-removal-schedule.txt from Documentation, but there
is still some reference to this file. So remove them.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit d0a21265df David Rientjes unified various archs'
module_alloc implementation (including x86) and removed the graduitous
shortcut for size == 0.
Then, in commit de7d2b567d, Joe Perches added a warning for
zero-length vmallocs, which can happen without kallsyms on modules
with no init sections (eg. zlib_deflate).
Fix this once and for all; the module code has to handle zero length
anyway, so get it right at the caller and remove the now-gratuitous
checks within the arch-specific module_alloc implementations.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=42608
Reported-by: Conrad Kostecki <ConiKost@gmx.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
There is a extra null character('\0') at the top of module->strtab for
each module. Commit 59ef28b introduced this bug and this patch fixes it.
Live dump log of the current linus git kernel(HEAD is 2844a4870):
============================================================================
crash> mod | grep loop
ffffffffa01db0a0 loop 16689 (not loaded) [CONFIG_KALLSYMS]
crash> module.core_symtab ffffffffa01db0a0
core_symtab = 0xffffffffa01db320crash> rd 0xffffffffa01db320 12
ffffffffa01db320: 0000005500000001 0000000000000000 ....U...........
ffffffffa01db330: 0000000000000000 0002007400000002 ............t...
ffffffffa01db340: ffffffffa01d8000 0000000000000038 ........8.......
ffffffffa01db350: 001a00640000000e ffffffffa01daeb0 ....d...........
ffffffffa01db360: 00000000000000a0 0002007400000019 ............t...
ffffffffa01db370: ffffffffa01d8068 000000000000001b h...............
crash> module.core_strtab ffffffffa01db0a0
core_strtab = 0xffffffffa01dbb30 ""
crash> rd 0xffffffffa01dbb30 4
ffffffffa01dbb30: 615f70616d6b0000 66780063696d6f74 ..kmap_atomic.xf
ffffffffa01dbb40: 73636e75665f7265 72665f646e696600 er_funcs.find_fr
============================================================================
We expect Just first one byte of '\0', but actually first two bytes
are '\0'. Here is The relationship between symtab and strtab.
symtab_idx strtab_idx symbol
-----------------------------------------------
0 0x1 "\0" # startab_idx should be 0
1 0x2 "kmap_atomic"
2 0xe "xfer_funcs"
3 0x19 "find_fr..."
By applying this patch, it becomes as follows.
symtab_idx strtab_idx symbol
-----------------------------------------------
0 0x0 "\0" # extra byte is removed
1 0x1 "kmap_atomic"
2 0xd "xfer_funcs"
3 0x18 "find_fr..."
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Cc: Masaki Kimura <masaki.kimura.kz@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now that kernel module origins can be reasoned about, provide a hook to
the LSMs to make policy decisions about the module file. This will let
Chrome OS enforce that loadable kernel modules can only come from its
read-only hash-verified root filesystem. Other LSMs can, for example,
read extended attributes for signatures, etc.
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Thanks to Michael Kerrisk for keeping us honest. These flags are actually
useful for eliminating the only case where kmod has to mangle a module's
internals: for overriding module versioning.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Acked-by: Kees Cook <keescook@chromium.org>
As part of the effort to create a stronger boundary between root and
kernel, Chrome OS wants to be able to enforce that kernel modules are
being loaded only from our read-only crypto-hash verified (dm_verity)
root filesystem. Since the init_module syscall hands the kernel a module
as a memory blob, no reasoning about the origin of the blob can be made.
Earlier proposals for appending signatures to kernel modules would not be
useful in Chrome OS, since it would involve adding an additional set of
keys to our kernel and builds for no good reason: we already trust the
contents of our root filesystem. We don't need to verify those kernel
modules a second time. Having to do signature checking on module loading
would slow us down and be redundant. All we need to know is where a
module is coming from so we can say yes/no to loading it.
If a file descriptor is used as the source of a kernel module, many more
things can be reasoned about. In Chrome OS's case, we could enforce that
the module lives on the filesystem we expect it to live on. In the case
of IMA (or other LSMs), it would be possible, for example, to examine
extended attributes that may contain signatures over the contents of
the module.
This introduces a new syscall (on x86), similar to init_module, that has
only two arguments. The first argument is used as a file descriptor to
the module and the second argument is a pointer to the NULL terminated
string of module arguments.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (merge fixes)
Masaki found and patched a kallsyms issue: the last symbol in a
module's symtab wasn't transferred. This is because we manually copy
the zero'th entry (which is always empty) then copy the rest in a loop
starting at 1, though from src[0]. His fix was minimal, I prefer to
rewrite the loops in more standard form.
There are two loops: one to get the size, and one to copy. Make these
identical: always count entry 0 and any defined symbol in an allocated
non-init section.
This bug exists since the following commit was introduced.
module: reduce symbol table for loaded modules (v2)
commit: 4a4962263f
LKML: http://lkml.org/lkml/2012/10/24/27
Reported-by: Masaki Kimura <masaki.kimura.kz@hitachi.com>
Cc: stable@kernel.org
Emit the magic string that indicates a module has a signature after the
signature data instead of before it. This allows module_sig_check() to
be made simpler and faster by the elimination of the search for the
magic string. Instead we just need to do a single memcmp().
This works because at the end of the signature data there is the
fixed-length signature information block. This block then falls
immediately prior to the magic number.
From the contents of the information block, it is trivial to calculate
the size of the signature data and thus the size of the actual module
data.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we're in FIPS mode, we should panic if we fail to verify the signature on a
module or we're asked to load an unsigned module in signature enforcing mode.
Possibly FIPS mode should automatically enable enforcing mode.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We do a very simple search for a particular string appended to the module
(which is cache-hot and about to be SHA'd anyway). There's both a config
option and a boot parameter which control whether we accept or fail with
unsigned modules and modules that are signed with an unknown key.
If module signing is enabled, the kernel will be tainted if a module is
loaded that is unsigned or has a signature for which we don't have the
key.
(Useful feedback and tweaks by David Howells <dhowells@redhat.com>)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The original module-init-tools module loader used a fnctl lock on the
.ko file to avoid attempts to simultaneously load a module.
Unfortunately, you can't get an exclusive fcntl lock on a read-only
fd, making this not work for read-only mounted filesystems.
module-init-tools has a hacky sleep-and-loop for this now.
It's not that hard to wait in the kernel, and only return -EEXIST once
the first module has finished loading (or continue loading the module
if the first one failed to initialize for some reason). It's also
consistent with what we do for dependent modules which are still loading.
Suggested-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We use resolve_symbol_wait(), which blocks if the module containing
the symbol is still loading. However:
1) The module_wq we use is only woken after calling the modules' init
function, but there are other failure paths after the module is
placed in the linked list where we need to do the same thing.
2) wake_up() only wakes one waiter, and our waitqueue is shared by all
modules, so we need to wake them all.
3) wake_up_all() doesn't imply a memory barrier: I feel happier calling
it after we've grabbed and dropped the module_mutex, not just after
the state assignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
into asm-generic/module.h for all arches bar MIPS.
Also, use the generic definition mod_arch_specific where possible.
To this end, I've defined three new config bools:
(*) HAVE_MOD_ARCH_SPECIFIC
Arches define this if they don't want to use the empty generic
mod_arch_specific struct.
(*) MODULES_USE_ELF_RELA
Arches define this if their modules can contain RELA records. This causes
the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
defined by the arch rather than have the core emit an error message.
(*) MODULES_USE_ELF_REL
Arches define this if their modules can contain REL records. This causes
the Elf_Rel mapping to be emitted and allows apply_relocate() to be
defined by the arch rather than have the core emit an error message.
Note that it is possible to allow both REL and RELA records: m68k and mips are
two arches that do this.
With this, some arch asm/module.h files can be deleted entirely and replaced
with a generic-y marker in the arch Kbuild file.
Additionally, I have removed the bits from m32r and score that handle the
unsupported type of relocation record as that's now handled centrally.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>