Pull powerpc fixes from Michael Ellerman:
- Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
- Fix dedotify for binutils >= 2.26 from Andreas Schwab
- Don't trace hcalls on offline CPUs from Denis Kirjanov
- eeh: Fix stale cached primary bus from Gavin Shan
- eeh: Fix stale PE primary bus from Gavin Shan
- mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
- ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy
* tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/ioda: Set "read" permission when "write" is set
powerpc/mm: Fix Multi hit ERAT cause by recent THP update
powerpc/powernv: Fix stale PE primary bus
powerpc/eeh: Fix stale cached primary bus
powerpc/pseries: Don't trace hcalls on offline CPUs
powerpc: Fix dedotify for binutils >= 2.26
powerpc/book3s_32: Fix build error with checkpoint restart
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.
Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.
Consider following race:
CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()
As result the THP will not be freed. The leak is detected by check_mm():
BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
The above required us to not mark pmd none during a pmd split.
The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)
Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.
Fixes: eef1b3ba05 ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The datatype __kernel_time_t is u32 on 32bit platform, so its subject to
overflows in the timeval/timespec to cputime conversion.
Currently the following functions are affected:
1. setitimer()
2. timer_create/timer_settime()
3. sys_clock_nanosleep
This can happen on MIPS32 and ARM32 with "Full dynticks CPU time accounting"
enabled, which is required for CONFIG_NO_HZ_FULL.
Enforce u64 conversion to prevent the overflow.
Fixes: 31c1fc8187 ("ARM: Kconfig: allow full nohz CPU accounting")
Signed-off-by: zengtao <prime.zeng@huawei.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: <fweisbec@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1454384314-154784-1-git-send-email-prime.zeng@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge third patch-bomb from Andrew Morton:
"I'm pretty much done for -rc1 now:
- the rest of MM, basically
- lib/ updates
- checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit
- cpu_mask simplifications
- kexec, rapidio, MAINTAINERS etc, etc.
- more dma-mapping cleanups/simplifications from hch"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
MAINTAINERS: add/fix git URLs for various subsystems
mm: memcontrol: add "sock" to cgroup2 memory.stat
mm: memcontrol: basic memory statistics in cgroup2 memory controller
mm: memcontrol: do not uncharge old page in page cache replacement
Documentation: cgroup: add memory.swap.{current,max} description
mm: free swap cache aggressively if memcg swap is full
mm: vmscan: do not scan anon pages if memcg swap limit is hit
swap.h: move memcg related stuff to the end of the file
mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
mm: vmscan: pass memcg to get_scan_count()
mm: memcontrol: charge swap to cgroup2
mm: memcontrol: clean up alloc, online, offline, free functions
mm: memcontrol: flatten struct cg_proto
mm: memcontrol: rein in the CONFIG space madness
net: drop tcp_memcontrol.c
mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
mm: memcontrol: allow to disable kmem accounting for cgroup2
mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
mm: memcontrol: separate kmem code from legacy tcp accounting code
...
Pull asm-generic updates from Arnd Bergmann:
"The asm-generic tree this time contains one series from Nicolas Pitre
that makes the optimized do_div() implementation from the ARM
architecture available to all architectures.
This also adds stricter type checking for callers of do_div, which has
uncovered a number of bugs in existing code, and fixes up the ones we
have found"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
ARM: asm/div64.h: adjust to generic codde
__div64_32(): make it overridable at compile time
__div64_const32(): abstract out the actual 128-bit cross product code
do_div(): generic optimization for constant divisor on 32-bit machines
div64.h: optimize do_div() for power-of-two constant divisors
mtd/sm_ftl.c: fix wrong do_div() usage
drm/mgag200/mgag200_mode.c: fix wrong do_div() usage
hid-sensor-hub.c: fix wrong do_div() usage
ti/fapll: fix wrong do_div() usage
ti/clkt_dpll: fix wrong do_div() usage
tegra/clk-divider: fix wrong do_div() usage
imx/clk-pllv2: fix wrong do_div() usage
imx/clk-pllv1: fix wrong do_div() usage
nouveau/nvkm/subdev/clk/gk20a.c: fix wrong do_div() usage
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.
[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull virtio barrier rework+fixes from Michael Tsirkin:
"This adds a new kind of barrier, and reworks virtio and xen to use it.
Plus some fixes here and there"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
checkpatch: add virt barriers
checkpatch: check for __smp outside barrier.h
checkpatch.pl: add missing memory barriers
virtio: make find_vqs() checkpatch.pl-friendly
virtio_balloon: fix race between migration and ballooning
virtio_balloon: fix race by fill and leak
s390: more efficient smp barriers
s390: use generic memory barriers
xen/events: use virt_xxx barriers
xen/io: use virt_xxx barriers
xenbus: use virt_xxx barriers
virtio_ring: use virt_store_mb
sh: move xchg_cmpxchg to a header by itself
sh: support 1 and 2 byte xchg
virtio_ring: update weak barriers to use virt_xxx
Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
asm-generic: implement virt_xxx memory barriers
x86: define __smp_xxx
xtensa: define __smp_xxx
tile: define __smp_xxx
...
Add a helper to check if an object (given an address and a size) is part
of a section (given beginning and end addresses). For convenience, also
provide a helper that performs this check for __init data using the
__init_begin and __init_end limits.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__phys_to_pfn and __pfn_to_phys are symmetric, PHYS_PFN and PFN_PHYS are
semmetric:
- y = (phys_addr_t)x << PAGE_SHIFT
- y >> PAGE_SHIFT = (phys_add_t)x
- (unsigned long)(y >> PAGE_SHIFT) = x
[akpm@linux-foundation.org: use macro arg name `x']
[arnd@arndb.de: include linux/pfn.h for PHYS_PFN definition]
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support. This is an artifact of
interfacing with an SMP host while running an UP kernel. Using mandatory
barriers for this use-case would be possible but is often suboptimal.
In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.
To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.
These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On !SMP, most architectures define their
barriers as compiler barriers.
On SMP, most need an actual barrier.
Make it possible to remove the code duplication for
!SMP by defining low-level __smp_xxx barriers
which do not depend on the value of SMP, then
use them from asm-generic conditionally.
Besides reducing code duplication, these low level APIs will also be
useful for virtualization, where a barrier is sometimes needed even if
!SMP since we might be talking to another kernel on the same SMP system.
Both virtio and Xen drivers will benefit.
The smp_xxx variants should use __smp_XXX ones or barrier() depending on
SMP, identically for all architectures.
We keep ifndef guards around them for now - once/if all
architectures are converted to use the generic
code, we'll be able to remove these.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Allow architectures to override smp_store_release
and smp_load_acquire by guarding the defines
in asm-generic/barrier.h with ifndef directives.
This is in preparation to reusing asm-generic/barrier.h
on architectures which have their own definition
of these macros.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- make the debugfs 'kernel_page_tables' file read-only, as it only
has read ops. (Borislav Petkov)
- micro-optimize clflush_cache_range() (Chris Wilson)
- swiotlb enhancements, which fixes certain KVM emulated devices
(Igor Mammedov)
- fix an LDT related debug message (Jan Beulich)
- modularize CONFIG_X86_PTDUMP (Kees Cook)
- tone down an overly alarming warning (Laura Abbott)
- Mark variable __initdata (Rasmus Villemoes)
- PAT additions (Toshi Kani)"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Micro-optimise clflush_cache_range()
x86/mm/pat: Change free_memtype() to support shrinking case
x86/mm/pat: Add untrack_pfn_moved for mremap
x86/mm: Drop WARN from multi-BAR check
x86/LDT: Print the real LDT base address
x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN
x86/mm: Introduce max_possible_pfn
x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only
x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata
x86/mm: Turn CONFIG_X86_PTDUMP into a module
Pull locking updates from Ingo Molnar:
"So we have a laundry list of locking subsystem changes:
- continuing barrier API and code improvements
- futex enhancements
- atomics API improvements
- pvqspinlock enhancements: in particular lock stealing and adaptive
spinning
- qspinlock micro-enhancements"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
futex: Cleanup the goto confusion in requeue_pi()
futex: Remove pointless put_pi_state calls in requeue()
futex: Document pi_state refcounting in requeue code
futex: Rename free_pi_state() to put_pi_state()
futex: Drop refcount if requeue_pi() acquired the rtmutex
locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
lcoking/barriers, arch: Use smp barriers in smp_store_release()
locking/cmpxchg, arch: Remove tas() definitions
locking/pvqspinlock: Queue node adaptive spinning
locking/pvqspinlock: Allow limited lock stealing
locking/pvqspinlock: Collect slowpath lock statistics
sched/core, locking: Document Program-Order guarantees
locking, sched: Introduce smp_cond_acquire() and use it
locking/pvqspinlock, x86: Optimize the PV unlock code path
locking/qspinlock: Avoid redundant read of next pointer
locking/qspinlock: Prefetch the next node cacheline
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
atomics: Add test for atomic operations with _relaxed variants
mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following
WARN_ON_ONCE() message in untrack_pfn().
WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0()
Call Trace:
[<ffffffff817729ea>] dump_stack+0x45/0x57
[<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0
[<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860
[<ffffffff811d3725>] unmap_vmas+0x55/0xb0
[<ffffffff811d916c>] unmap_region+0xac/0x120
[<ffffffff811db86a>] do_munmap+0x28a/0x460
[<ffffffff811dec33>] move_vma+0x1b3/0x2e0
[<ffffffff811df113>] SyS_mremap+0x3b3/0x510
[<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71
MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is
called with the old vma after its pfnmap page table has been removed,
which causes follow_phys() to fail. The new vma has a new pfnmap to
the same pfn & cache type with VM_PAT set. Therefore, we only need to
clear VM_PAT from the old vma in this case.
Add untrack_pfn_moved(), which clears VM_PAT from a given old vma.
move_vma() is changed to call this function with the old vma when
VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn()
is a no-op since VM_PAT is cleared.
Reported-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some architectures may want to override the default implementation
at compile time to do things inline. For example, ARM uses a
non-standard calling convention for better efficiency in this case.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
The default C implementation for the 128-bit cross product is abstracted
into the __arch_xprod_64() macro that can be overridden to let
architectures provide their own assembly optimized implementation.
There are many advantages to an assembly version for this operation.
Carry bit handling becomes trivial, and 32-bit shifts may be achieved
simply by inverting register pairs on some architectures. This has the
potential to be quite faster and use much fewer instructions.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
64-by-32-bit divisions are prominent in the kernel, even on 32-bit
machines. Luckily, many of them use a constant divisor that allows
for a much faster multiplication by the divisor's reciprocal.
The compiler already performs this optimization when compiling a 32-by-32
division with a constant divisor. Unfortunately, on 32-bit machines, gcc
does not optimize 64-by-32 divisions in that case, except for constant
divisors that happen to be a power of 2.
Let's avoid the slow path whenever the divisor is constant by manually
computing the reciprocal ourselves and performing the multiplication
inline. In most cases, this improves performance of 64-by-32 divisions
by about two orders of magnitude compared to the __div64_32() fallback,
especially on architectures lacking a native div instruction.
The algorithm used here comes from the existing ARM code.
The __div64_const32_is_OK macro can be predefined by architectures to
disable this optimization in some cases. For example, some ancient gcc
version on ARM would crash with an ICE when fed this code.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Alexey Brodkin <abrodkin@synopsys.com>