* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ignore the address parameter given to NOMMU mmap() as it is a hint, rather
than giving an error if it's non-zero. MAP_FIXED still gets an error.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix MAP_PRIVATE mmap() of files and devices where the data in the backing store
might be mapped directly. Use the BDI_CAP_MAP_DIRECT capability flag to govern
whether or not we should be trying to map a file directly. This can be used to
determine whether or not a region has been filled in at the point where we call
do_mmap_shared() or do_mmap_private().
The BDI_CAP_MAP_DIRECT capability flag is cleared by validate_mmap_request() if
there's any reason we can't use it. It's also cleared in do_mmap_pgoff() if
f_op->get_unmapped_area() fails.
Without this fix, attempting to run a program from a RomFS image on a
non-mappable MTD partition results in a BUG as the kernel attempts XIP, and
this can be caught in gdb:
Program received signal SIGABRT, Aborted.
0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
(gdb) bt
#0 0xc005dce8 in add_nommu_region (region=<value optimized out>) at mm/nommu.c:547
#1 0xc005f168 in do_mmap_pgoff (file=0xc31a6620, addr=<value optimized out>, len=3808, prot=3, flags=6146, pgoff=0) at mm/nommu.c:1373
#2 0xc00a96b8 in elf_fdpic_map_file (params=0xc33fbbec, file=0xc31a6620, mm=0xc31bef60, what=0xc0213144 "executable") at mm.h:1145
#3 0xc00aa8b4 in load_elf_fdpic_binary (bprm=0xc316cb00, regs=<value optimized out>) at fs/binfmt_elf_fdpic.c:343
#4 0xc006b588 in search_binary_handler (bprm=0x6, regs=0xc33fbce0) at fs/exec.c:1234
#5 0xc006c648 in do_execve (filename=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460, regs=0xc33fbce0) at fs/exec.c:1356
#6 0xc0008cf0 in sys_execve (name=<value optimized out>, argv=0xc3ad14cc, envp=0xc3ad1460) at arch/frv/kernel/process.c:263
#7 0xc00075dc in __syscall_call () at arch/frv/kernel/entry.S:897
Note that this fix does the following commit differently:
commit a190887b58
Author: David Howells <dhowells@redhat.com>
Date: Sat Sep 5 11:17:07 2009 -0700
nommu: fix error handling in do_mmap_pgoff()
Reported-by: Graff Yang <graff.yang@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce new truncate helpers truncate_pagecache and inode_newsize_ok.
vmtruncate is also consolidated from mm/memory.c and mm/nommu.c and
into mm/truncate.c.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
My 58fa879e1e "mm: FOLL flags for GUP flags"
broke CONFIG_NOMMU build by forgetting to update nommu.c foll_flags type:
mm/nommu.c:171: error: conflicting types for `__get_user_pages'
mm/internal.h:254: error: previous declaration of `__get_user_pages' was here
make[1]: *** [mm/nommu.o] Error 1
My 03f6462a3a "mm: move highest_memmap_pfn"
broke CONFIG_NOMMU build by forgetting to add a nommu.c highest_memmap_pfn:
mm/built-in.o: In function `memmap_init_zone':
(.meminit.text+0x326): undefined reference to `highest_memmap_pfn'
mm/built-in.o: In function `memmap_init_zone':
(.meminit.text+0x32d): undefined reference to `highest_memmap_pfn'
Fix both breakages, and give myself 30 lashes (ouch!)
Reported-by: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some architectures (like the Blackfin arch) implement some of the
"simpler" features that one would expect out of a MMU such as memory
protection.
In our case, we actually get read/write/exec protection down to the page
boundary so processes can't stomp on each other let alone the kernel.
There is a performance decrease (which depends greatly on the workload)
however as the hardware/software interaction was not optimized at design
time.
Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__get_user_pages() has been taking its own GUP flags, then processing
them into FOLL flags for follow_page(). Though oddly named, the FOLL
flags are more widely used, so pass them to __get_user_pages() now.
Sorry, VM flags, VM_FAULT flags and FAULT_FLAGs are still distinct.
(The patch to __get_user_pages() looks peculiar, with both gup_flags
and foll_flags: the gup_flags remain constant; but as before there's
an exceptional case, out of scope of the patch, in which foll_flags
per page have FOLL_WRITE masked off.)
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the error handling in do_mmap_pgoff(). If do_mmap_shared_file() or
do_mmap_private() fail, we jump to the error_put_region label at which
point we cann __put_nommu_region() on the region - but we haven't yet
added the region to the tree, and so __put_nommu_region() may BUG
because the region tree is empty or it may corrupt the region tree.
To get around this, we can afford to add the region to the region tree
before calling do_mmap_shared_file() or do_mmap_private() as we keep
nommu_region_sem write-locked, so no-one can race with us by seeing a
transient region.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently SELinux enforcement of controls on the ability to map low memory
is determined by the mmap_min_addr tunable. This patch causes SELinux to
ignore the tunable and instead use a seperate Kconfig option specific to how
much space the LSM should protect.
The tunable will now only control the need for CAP_SYS_RAWIO and SELinux
permissions will always protect the amount of low memory designated by
CONFIG_LSM_MMAP_MIN_ADDR.
This allows users who need to disable the mmap_min_addr controls (usual reason
being they run WINE as a non-root user) to do so and still have SELinux
controls preventing confined domains (like a web server) from being able to
map some area of low memory.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: LCDC dcache flush for deferred io
sh: Fix compiler error and include the definition of IS_ERR_VALUE
sh: re-add LCDC fbdev support to the Migo-R defconfig
sh: fix se7724 ceu names
sh: ms7724se: Enable sh_eth in defconfig.
arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
sh: ms7724se: Add sh_eth support
nommu: provide follow_pfn().
sh: Kill off unused DEBUG_BOOTMEM symbol.
perf_counter tools: add cpu_relax()/rmb() definitions for sh.
sh64: Hook up page fault events for software perf counters.
sh: Hook up page fault events for software perf counters.
sh: make set_perf_counter_pending() static inline.
clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.
With the introduction of follow_pfn() as an exported symbol, modules have
begun making use of it. Unfortunately this was not reflected on nommu at
the time, so the in-tree users have subsequently all blown up with link
errors there.
This provides a simple follow_pfn() that just returns addr >> PAGE_SHIFT,
which will do the right thing on nommu. There is no need to do range
checking within the vma, as the find_vma() case will already take care of
this.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Currently the 4th parameter of get_user_pages() is called len, but its
in pages, not bytes. Rename the thing to nr_pages to avoid future
confusion.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the "security: use mmap_min_addr indepedently of security models"
change, mmap_min_addr is used in common areas, which susbsequently blows
up the nommu build. This stubs in the definition in the nommu case as
well.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
--
mm/nommu.c | 3 +++
1 file changed, 3 insertions(+)
Signed-off-by: James Morris <jmorris@namei.org>
Don't check vm_region::vm_start is page aligned in add_nommu_region() because
the region may reflect some non-page-aligned mapped file, such as could be
obtained from RomFS XIP.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NOMMU mmap() has an option controlled by a sysctl variable that determines
whether the allocations made by do_mmap_private() should have the excess
space trimmed off and returned to the allocator. Make the initial setting
of this variable a Kconfig configuration option.
The reason there can be excess space is that the allocator only allocates
in power-of-2 size chunks, but mmap()'s can be made in sizes that aren't a
power of 2.
There are two alternatives:
(1) Keep the excess as dead space. The dead space then remains unused for the
lifetime of the mapping. Mappings of shared objects such as libc, ld.so
or busybox's text segment may retain their dead space forever.
(2) Return the excess to the allocator. This means that the dead space is
limited to less than a page per mapping, but it means that for a transient
process, there's more chance of fragmentation as the excess space may be
reused fairly quickly.
During the boot process, a lot of transient processes are created, and
this can cause a lot of fragmentation as the pagecache and various slabs
grow greatly during this time.
By turning off the trimming of excess space during boot and disabling
batching of frees, Coldfire can manage to boot.
A better way of doing things might be to have /sbin/init turn this option
off. By that point libc, ld.so and init - which are all long-duration
processes - have all been loaded and trimmed.
Reported-by: Lanttor Guo <lanttor.guo@freescale.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Lanttor Guo <lanttor.guo@freescale.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Committed_AS field can underflow in certain situations:
> # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c
> 1 Committed_AS: 18446744073709323392 kB
> 11 Committed_AS: 18446744073709455488 kB
> 6 Committed_AS: 35136 kB
> 5 Committed_AS: 18446744073709454400 kB
> 7 Committed_AS: 35904 kB
> 3 Committed_AS: 18446744073709453248 kB
> 2 Committed_AS: 34752 kB
> 9 Committed_AS: 18446744073709453248 kB
> 8 Committed_AS: 34752 kB
> 3 Committed_AS: 18446744073709320960 kB
> 7 Committed_AS: 18446744073709454080 kB
> 3 Committed_AS: 18446744073709320960 kB
> 5 Committed_AS: 18446744073709454080 kB
> 6 Committed_AS: 18446744073709320960 kB
Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.
But NR_CPUS proportional isn't good calculation. In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).
The current kernel has generic percpu-counter stuff. using it is right
way. it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org> [All kernel versions]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a number of issues with the per-MM VMA patch:
(1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
a NOMMU system with more than 2G pages. Makes no difference on a 32-bit
system.
(2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
lest it overflow.
(3) Move the allocation of the vm_area_struct slab back for fork.c.
(4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
(5) Use BUG_ON() rather than if () BUG().
(6) Make the default validate_nommu_regions() a static inline rather than a
#define.
(7) Make free_page_series()'s objection to pages with a refcount != 1 more
informative.
(8) Adjust the __put_nommu_region() banner comment to indicate that the
semaphore must be held for writing.
(9) Limit the number of warnings about munmaps of non-mmapped regions.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently we do not support these interfaces, so make them BUG() wrappers
as per the rest of the vmap interface on nommu. Fixes up the modular xfs
build.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Convert all system calls to return a long. This should be a NOP since all
converted types should have the same size anyway.
With the exception of sys_exit_group which returned void. But that doesn't
matter since the system call doesn't return.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Now that we no longer use compound pages for all large allocations,
kobjsize() actively breaks things like binfmt_flat by always handing
back PAGE_SIZE for mmap'ed regions. Fix this up by looking up the
VMA region for non-compounds.
Ideally binfmt_flat wants to get rid of kobjsize() completely, but
this is an incremental step.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mike Frysinger <vapier.adi@gmail.com>