A few remaining architectures directly kill the page faulting task in an
out of memory situation. This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.
Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill. Convert
the remaining architectures over to this hook.
To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...
Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:
arch/arm/mm/init.c: In function 'mem_init':
arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
^
In file included from include/linux/mman.h:4:0,
from arch/arm/mm/init.c:15:
include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
In file included from arch/mips/include/asm/page.h:49:0,
from include/linux/mmzone.h:20,
from include/linux/gfp.h:4,
from include/linux/mm.h:8,
from mm/page_alloc.c:18:
arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
mm/page_alloc.c: In function 'free_area_init_nodes':
mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
Also address some minor code review comments.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/arc uses of the __cpuinit macros from
all C files. Currently arc does not have any __CPUINIT used in
assembly files.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
LOAD_FAULT_PTE macro is expected to set r2 with faulting vaddr.
However in case of CONFIG_ARC_DBG_TLB_MISS_COUNT, it was getting
clobbered with statistics collection code.
Fix latter by using a different register.
Note that only I-TLB Miss handler was potentially affected.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* No need to check for READ access in I-TLB Miss handler
* Redundant PAGE_PRESENT update in PTE
Post TLB entry installation, in updating PTE for software accessed/dity
bits, no need to update PAGE_PRESENT since it will already be set.
Infact the entry won't have installed if !PAGE_PRESENT.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
With ECR now part of pt_regs
* No need to propagate from lowest asm handlers as arg
* No need to save it in tsk->thread.cause_code
* Avoid bit chopping to access the bit-fields
More code consolidation, cleanup
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This can be ascertained within do_page_fault() since it gets the full
ECR (Exception Cause Register).
Further, for both the callers of do_page_fault(): Prot-V / D-TLB-Miss,
the cause sub-fields in ECR are same for same type of access, making the
code much more simpler.
D-TLB-Miss [LD] 0x00_21_01_00
Prot-V [LD] 0x00_23_01_00
^^
D-TLB-Miss [ST] 0x00_21_02_00
Prot-V [ST] 0x00_23_02_00
^^
D-TLB-Miss [EX] 0x00_21_03_00
Prot-V [EX] 0x00_23_03_00
^^
This helps code consolidation, which is even better when moving code from
assembler to "C".
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Non-congruent SRC page in copy_user_page() is dcache clean in the end -
so record that fact, to avoid a subsequent extraneous flush.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* Number of (i|d)cache ways can be retrieved from BCRs and hence no need
to cross check with with built-in constants
* Use of IS_ENABLED() to check for a Kconfig option
* is_not_cache_aligned() not used anymore
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* Move the various sub-system defines/types into relevant files/functions
(reduces compilation time)
* move CPU specific stuff out of asm/tlb.h into asm/mmu.h
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.
So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()
The intention was to have the equivalent of following:
if (a || (1 ? b : 0))
but we ended up with following:
if (a || 1 ? b : 0)
And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of <a>
Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
be missed if U/K mappings were congruent.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This manifested as grep failing psuedo-randomly:
-------------->8---------------------
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$ ip address show lo | grep inet
[ARCLinux]$
[ARCLinux]$ ip address show lo | grep inet
inet 127.0.0.1/8 scope host lo
-------------->8---------------------
ARC700 MMU provides fully orthogonal permission bits per page:
Ur, Uw, Ux, Kr, Kw, Kx
The user mode page permission templates used to have all Kernel mode
access bits enabled.
This caused a tricky race condition observed with uClibc buffered file
read and UNIX pipes.
1. Read access to an anon mapped page in libc .bss: write-protected
zero_page mapped: TLB Entry installed with Ur + K[rwx]
2. grep calls libc:getc() -> buffered read layer calls read(2) with the
internal read buffer in same .bss page.
The read() call is on STDIN which has been redirected to a pipe.
read(2) => sys_read() => pipe_read() => copy_to_user()
3. Since page has Kernel-write permission (despite being user-mode
write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss
Exception (page-fault for ARC). core-MM is unaware that kernel
erroneously wrote to the reserved read-only zero-page (BUG #1)
4. Control returns to userspace which now does a write to same .bss page
Since Linux MM is not aware that page has been modified by kernel, it
simply reassigns a new writable zero-init page to mapping, loosing the
prior write by kernel - effectively zero'ing out the libc read buffer
under the hood - hence grep doesn't see right data (BUG #2)
The fix is to make all kernel-mode access permissions mirror the
user-mode ones. Note that the kernel still has full access to pages,
when accessed directly (w/o MMU) - this fix ensures that kernel-mode
access in copy_to_from() path uses the same faulting access model as for
pure user accesses to keep MM fully aware of page state.
The issue is peudo-random because it only shows up if the TLB entry
installed in #1 is present at the time of #3. If it is evicted out, due
to TLB pressure or some-such, then copy_to_user() does take a TLB Miss
Exception, with a routine write-to-anon COW processing installing a
fresh page for kernel writes and also usable as it is in userspace.
Further the issue was dormant for so long as it depends on where the
libc internal read buffer (in .bss) is mapped at runtime.
If it happens to reside in file-backed data mapping of libc (in the
page-aligned slack space trailing the file backed data), loader zero
padding the slack space, does the early cow page replacement, setting
things up at the very beginning itself.
With gcc 4.8 based builds, the libc buffer got pushed out to a real
anon mapping which triggers the issue.
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Flush and INVALIDATE the dcache page.
This helper is only used for writeback of CODE pages to memory. So
there's no value in keeping the dcache lines around. Infact it is risky
as a writeback on natural eviction under pressure can cause un-needed
writeback with weird issues on aliasing dcache configurations.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Pull second set of arc arch updates from Vineet Gupta:
"Aliasing VIPT dcache support for ARC
I'm satisified with testing, specially with fuse which has
historically given grief to VIPT arches (ARM/PARISC...)"
* tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [TB10x] Remove GENERIC_GPIO
ARC: [mm] Aliasing VIPT dcache support 4/4
ARC: [mm] Aliasing VIPT dcache support 3/4
ARC: [mm] Aliasing VIPT dcache support 2/4
ARC: [mm] Aliasing VIPT dcache support 1/4
ARC: [mm] refactor the core (i|d)cache line ops loops
ARC: [mm] serious bug in vaddr based icache flush
Pull ARC port updates from Vineet Gupta:
"Support for two new platforms based on ARC700:
- Abilis TB10x SoC [Chritisian/Pierrick]
- Simulator only System-C Model [Mischa]
ARC specific MM improvements:
- Avoid full TLB flush (ASID increment) on munmap (even single page)
- VIPT Cache Flushing improvements
+ Delayed dcache flush for non-aliasing dcache (big performance boost)
+ icache flush aliasing agnostic (no need to kill all possible aliases)
Others:
- Avoid needless rebuild of DTB files for every kernel build
- Remove builtin cmdline as that is already provided by DeviceTree/bootargs
- Fixing unaligned access emulation corner case
- checkpatch fixes [Sachin]
- Various fixlets [Noam]
- Minor build failures/cleanups"
* tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (35 commits)
ARC: [mm] Lazy D-cache flush (non aliasing VIPT)
ARC: [mm] micro-optimize page size icache invalidate
ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers
ARC: [mm] consolidate icache/dcache sync code
ARC: [mm] optimise icache flush for kernel mappings
ARC: [mm] optimise icache flush for user mappings
ARC: [mm] optimize needless full mm TLB flush on munmap
ARC: Add support for nSIM OSCI System C model
ARC: [TB10x] Adapt device tree to new compatible string
ARC: [TB10x] Add support for TB10x platform
ARC: [TB10x] Device tree of TB100 and TB101 Development Kits
ARC: Prepare interrupt code for external controllers
ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy
ARC: [cmdline] Don't overwrite u-boot provided bootargs
ARC: [cmdline] Remove CONFIG_CMDLINE
ARC: [plat-arcfpga] defconfig update
ARC: unaligned access emulation broken if callee-reg dest of LD/ST
ARC: unaligned access emulation error handling consolidation
ARC: Debug/crash-printing Improvements
ARC: fix typo with clock speed
...