Merge misc fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
cris/arch-v32: cryptocop: print a hex number after a 0x prefix
ipack: print a hex number after a 0x prefix
block: DAC960: print a hex number after a 0x prefix
fs: exofs: print a hex number after a 0x prefix
lib/genalloc.c: start search from start of chunk
mm: memcontrol: do not recurse in direct reclaim
CREDITS: update credit information for Martin Kepplinger
proc: fix NULL dereference when reading /proc/<pid>/auxv
mm: kmemleak: ensure that the task stack is not freed during scanning
lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
latent_entropy: raise CONFIG_FRAME_WARN by default
kconfig.h: remove config_enabled() macro
ipc: account for kmem usage on mqueue and msg
mm/slab: improve performance of gathering slabinfo stats
mm: page_alloc: use KERN_CONT where appropriate
mm/list_lru.c: avoid error-path NULL pointer deref
h8300: fix syscall restarting
kcov: properly check if we are in an interrupt
mm/slab: fix kmemcg cache creation delayed issue
On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:
[...]
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
memcg_charge_kmem+0x40/0x80
new_slab+0x2d9/0x5a0
__slab_alloc+0x2fd/0x44f
kmem_cache_alloc+0x193/0x1e0
alloc_extent_state+0x21/0xc0
__clear_extent_bit+0x2b5/0x400
try_release_extent_mapping+0x1a3/0x220
__btrfs_releasepage+0x31/0x70
btrfs_releasepage+0x2c/0x30
try_to_release_page+0x32/0x50
shrink_page_list+0x6da/0x7a0
shrink_inactive_list+0x1e5/0x510
shrink_lruvec+0x605/0x7f0
shrink_zone+0xee/0x320
do_try_to_free_pages+0x174/0x440
try_to_free_mem_cgroup_pages+0xa7/0x130
try_charge+0x17b/0x830
mem_cgroup_try_charge+0x65/0x1c0
handle_mm_fault+0x117f/0x1510
__do_page_fault+0x177/0x420
do_page_fault+0xc/0x10
page_fault+0x22/0x30
On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.
But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.
Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim. Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.
Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds. During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).
This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache. This counter is
updated when a slab is created or destroyed. This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement. Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.
We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.
Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:
After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().
And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexander Polakov <apolyakov@beget.ru>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No, KASAN may not be able to co-exist with HOTPLUG_MEMORY at runtime,
but for build testing there is no reason not to allow them together.
This hopefully means better build coverage and fewer embarrasing silly
problems like the one fixed by commit 9db4f36e82 ("mm: remove unused
variable in memory hotplug") in the future.
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When I removed the per-zone bitlock hashed waitqueues in commit
9dcb8b685f ("mm: remove per-zone hashtable of bitlock waitqueues"), I
removed all the magic hotplug memory initialization of said waitqueues
too.
But when I actually _tested_ the resulting build, I stupidly assumed
that "allmodconfig" would enable memory hotplug. And it doesn't,
because it enables KASAN instead, which then disables hotplug memory
support.
As a result, my build test of the per-zone waitqueues was totally
broken, and I didn't notice that the compiler warns about the now unused
iterator variable 'i'.
I guess I should be happy that that seems to be the worst breakage from
my clearly horribly failed test coverage.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:
wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)
where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().
The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).
It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.
As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.
Peter Zijlstra already has a patch for that, but let's see if anybody
even notices. In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.
Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch unexports the low-level __get_user_pages() function.
Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.
We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:
get_user_page_nowait():
get_user_pages(start, 1, flags, page, NULL)
__get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, start, 1,
flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)
check_user_page_hwpoison():
get_user_pages(addr, 1, flags, NULL, NULL)
__get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
false, flags | FOLL_TOUCH)
__get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
NULL, NULL)
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vmap stack fixes from Ingo Molnar:
"This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
accesses that used to be just somewhat questionable are now totally
buggy.
These changes try to do it without breaking the ABI: the fields are
left there, they are just reporting zero, or reporting narrower
information (the maps file change)"
* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
fs/proc: Stop trying to report thread stacks
fs/proc: Stop reporting eip and esp in /proc/PID/stat
mm/numa: Remove duplicated include from mprotect.c
Merge the gup_flags cleanups from Lorenzo Stoakes:
"This patch series adjusts functions in the get_user_pages* family such
that desired FOLL_* flags are passed as an argument rather than
implied by flags.
The purpose of this change is to make the use of FOLL_FORCE explicit
so it is easier to grep for and clearer to callers that this flag is
being used. The use of FOLL_FORCE is an issue as it overrides missing
VM_READ/VM_WRITE flags for the VMA whose pages we are reading
from/writing to, which can result in surprising behaviour.
The patch series came out of the discussion around commit 38e0885465
("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
which addressed a BUG_ON() being triggered when a page was faulted in
with PROT_NONE set but having been overridden by FOLL_FORCE.
do_numa_page() was run on the assumption the page _must_ be one marked
for NUMA node migration as an actual PROT_NONE page would have been
dealt with prior to this code path, however FOLL_FORCE introduced a
situation where this assumption did not hold.
See
https://marc.info/?l=linux-mm&m=147585445805166
for the patch proposal"
Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.
[ This branch was rebased recently to add a few more acked-by's and
reviewed-by's ]
* gup_flag-cleanups:
mm: replace access_process_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace get_user_pages_remote() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: remove write/force parameters from __get_user_pages_unlocked()
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' argument from __access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the redundant 'write' and 'force' parameters from
__get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the redundant 'write' and 'force' parameters from
__get_user_pages_locked() to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>