Pull more set_fs removal from Al Viro:
"Christoph's 'use kernel_read and friends rather than open-coding
set_fs()' series"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: unexport vfs_readv and vfs_writev
fs: unexport vfs_read and vfs_write
fs: unexport __vfs_read/__vfs_write
lustre: switch to kernel_write
gadget/f_mass_storage: stop messing with the address limit
mconsole: switch to kernel_read
btrfs: switch write_buf to kernel_write
net/9p: switch p9_fd_read to kernel_write
mm/nommu: switch do_mmap_private to kernel_read
serial2002: switch serial2002_tty_write to kernel_{read/write}
fs: make the buf argument to __kernel_write a void pointer
fs: fix kernel_write prototype
fs: fix kernel_read prototype
fs: move kernel_read to fs/read_write.c
fs: move kernel_write to fs/read_write.c
autofs4: switch autofs4_write to __kernel_write
ashmem: switch to ->read_iter
Use proper ssize_t and size_t types for the return value and count
argument, move the offset last and make it an in/out argument like
all other read/write helpers, and make the buf argument a void pointer
to get rid of lots of casts in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The bprm_secureexec hook can be moved earlier. Right now, it is called
during create_elf_tables(), via load_binary(), via search_binary_handler(),
via exec_binprm(). Nearly all (see exception below) state used by
bprm_secureexec is created during the bprm_set_creds hook, called from
prepare_binprm().
For all LSMs (except commoncaps described next), only the first execution
of bprm_set_creds takes any effect (they all check bprm->called_set_creds
which prepare_binprm() sets after the first call to the bprm_set_creds
hook). However, all these LSMs also only do anything with bprm_secureexec
when they detected a secure state during their first run of bprm_set_creds.
Therefore, it is functionally identical to move the detection into
bprm_set_creds, since the results from secureexec here only need to be
based on the first call to the LSM's bprm_set_creds hook.
The single exception is that the commoncaps secureexec hook also examines
euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
via prepare_binprm(), which can be called multiple times (e.g.
binfmt_script, binfmt_misc), and may clear the euid/egid for the final
load (i.e. the script interpreter). However, while commoncaps specifically
ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
prepare_binprm() may get called, it needs to base the secureexec decision
on the final call to bprm_set_creds. As a result, it will need special
handling.
To begin this refactoring, this adds the secureexec flag to the bprm
struct, and calls the secureexec hook during setup_new_exec(). This is
safe since all the cred work is finished (and past the point of no return).
This explicit call will be removed in later patches once the hook has been
removed.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: John Johansen <john.johansen@canonical.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
We are going to split <linux/sched/task_stack.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task_stack.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/coredump.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/coredump.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This copying of arguments and environment is common to both NOMMU
binary formats we support. Let's make the elf_fdpic version available
to the flat format as well.
While at it, improve the code a bit not to copy below the actual
data area.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
The offset in the core file used to be tracked with ->written field of
the coredump_params structure. The field was retired in favour of
file->f_pos.
However, ->f_pos is not maintained for pipes which leads to breakage.
Restore explicit tracking of the offset in coredump_params. Introduce
->pos field for this purpose since ->written was already reused.
Fixes: a008393951 ("get rid of coredump_params->written").
Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull misc vfs cleanups from Al Viro:
"Assorted cleanups and fixes all over the place"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coredump: only charge written data against RLIMIT_CORE
coredump: get rid of coredump_params->written
ecryptfs_lookup(): try either only encrypted or plaintext name
ecryptfs: avoid multiple aliases for directories
bpf: reject invalid names right in ->lookup()
__d_alloc(): treat NULL name as QSTR("/", 1)
mtd: switch ubi_open_volume_path() to vfs_stat()
mtd: switch open_mtd_by_chdev() to use of vfs_stat()
cprm->written is redundant with cprm->file->f_pos, so use that instead.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libnvdimm updates from Dan Williams:
"Outside of the new ACPI-NFIT hot-add support this pull request is more
notable for what it does not contain, than what it does. There were a
handful of development topics this cycle, dax get_user_pages, dax
fsync, and raw block dax, that need more more iteration and will wait
for 4.5.
The patches to make devm and the pmem driver NUMA aware have been in
-next for several weeks. The hot-add support has not, but is
contained to the NFIT driver and is passing unit tests. The coredump
support is straightforward and was looked over by Jeff. All of it has
received a 0day build success notification across 107 configs.
Summary:
- Add support for the ACPI 6.0 NFIT hot add mechanism to process
updates of the NFIT at runtime.
- Teach the coredump implementation how to filter out DAX mappings.
- Introduce NUMA hints for allocations made by the pmem driver, and
as a side effect all devm allocations now hint their NUMA node by
default"
* tag 'libnvdimm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
coredump: add DAX filtering for FDPIC ELF coredumps
coredump: add DAX filtering for ELF coredumps
acpi: nfit: Add support for hot-add
nfit: in acpi_nfit_init, break on a 0-length table
pmem, memremap: convert to numa aware allocations
devm_memremap_pages: use numa_mem_id
devm: make allocations numa aware by default
devm_memremap: convert to return ERR_PTR
devm_memunmap: use devres_release()
pmem: kill memremap_pmem()
x86, mm: quiet arch_add_memory()
The ELF binary loader in binfmt_elf.c requires an MMU, making it
impossible to use regular ELF binaries on NOMMU archs. However, the FDPIC
ELF loader in binfmt_elf_fdpic.c is fully capable as a loader for plain
ELF, which requires constant displacements between LOAD segments, since it
already supports FDPIC ELF files flagged as needing constant displacement.
This patch adjusts the FDPIC ELF loader to accept non-FDPIC ELF files on
NOMMU archs. They are treated identically to FDPIC ELF files with the
constant-displacement flag bit set, except for personality, which must
match the ABI of the program being loaded; the PER_LINUX_FDPIC personality
controls how the kernel interprets function pointers passed to sigaction.
Files that do not set a stack size requirement explicitly are given a
default stack size (matching the amount of committed stack the normal ELF
loader for MMU archs would give them) rather than being rejected; this is
necessary because plain ELF files generally do not declare stack
requirements in theit program headers.
Only ET_DYN (PIE) format ELF files are supported, since loading at a fixed
virtual address is not possible on NOMMU.
This patch was developed and tested on J2 (SH2-compatible) but should
be usable immediately on all archs where binfmt_elf_fdpic is
available. Moreover, by providing dummy definitions of the
elf_check_fdpic() and elf_check_const_displacement() macros for archs
which lack an FDPIC ABI, it should be possible to enable building of
binfmt_elf_fdpic on all other NOMMU archs and thereby give them ELF
binary support, but I have not yet tested this.
The motivation for using binfmt_elf_fdpic.c rather than adapting
binfmt_elf.c to NOMMU is that the former already has all the necessary
code to work properly on NOMMU and has already received widespread
real-world use and testing. I hope this is not controversial.
I'm not really happy with having to unset the FDPIC_FUNCPTRS
personality bit when loading non-FDPIC ELF. This bit should really
reset automatically on execve, since otherwise, executing non-ELF
binaries (e.g. bFLT) from an FDPIC process will leave the personality
in the wrong state and severely break signal handling. But that's a
separate, existing bug and I don't know the right place to fix it.
Signed-off-by: Rich Felker <dalias@libc.org>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Oleg Endo <oleg.endo@t-online.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add explicit filtering for DAX mappings to FDPIC ELF coredump. This is
useful because DAX mappings have the potential to be very large.
This patch has only been compile tested.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On NOMMU archs, the FDPIC ELF loader sets up the usable brk range to
overlap with all but the last PAGE_SIZE bytes of the stack. This leads
to catastrophic memory reuse/corruption if brk is used. Fix by setting
the brk area to zero size to disable its use.
Signed-off-by: Rich Felker <dalias@libc.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
... rather than doing that in the guts of ->load_binary().
[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>