Commit Graph

1225 Commits

Author SHA1 Message Date
Vlastimil Babka 16096c25bf mm, page_alloc: fix fast-path race with cpuset update or removal
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

One possible cause is that in the fast path we find the preferred
zoneref according to current mems_allowed, so that it points to the
middle of the zonelist, skipping e.g.  zones of node 1 completely.  If
the mems_allowed is updated to contain only node 1, we never reach it in
the zonelist, and trigger OOM before checking the cpuset_mems_cookie.

This patch fixes the particular case by redoing the preferred zoneref
search if we switch back to the original nodemask.  The condition is
also slightly changed so that when the last non-root cpuset is removed,
we don't miss it.

Note that this is not a full fix, and more patches will follow.

Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz
Fixes: 682a3385e7 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Vlastimil Babka ea57485af8 mm, page_alloc: fix check for NULL preferred_zone
Patch series "fix premature OOM regression in 4.7+ due to cpuset races".

This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1].  The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice.  That's why the patches try to
be not too intrusive.

Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem.  I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.

Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way.  I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies.  Until now I expected the parameter to
alloc_pages_nodemask() to be stable.  I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset.  I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form.  So we
have both crazy complexity and overhead, AFAICS.

[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz

This patch (of 4):

Since commit c33d6c06f6 ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification.  We
check the zoneref pointer which is never NULL and we should check the zone
pointer.  Also document this in first_zones_zonelist() comment per Michal
Hocko.

Fixes: c33d6c06f6 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Lucas Stach 424f6c4818 mm: alloc_contig: re-allow CMA to compact FS pages
Commit 73e64c51af ("mm, compaction: allow compaction for GFP_NOFS
requests") changed compation to skip FS pages if not explicitly allowed
to touch them, but missed to update the CMA compact_control.

This leads to a very high isolation failure rate, crippling performance
of CMA even on a lightly loaded system.  Re-allow CMA to compact FS
pages by setting the correct GFP flags, restoring CMA behavior and
performance to the kernel 4.9 level.

Fixes: 73e64c51af (mm, compaction: allow compaction for GFP_NOFS requests)
Link: http://lkml.kernel.org/r/20170113115155.24335-1-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Alexander Duyck 2976db8018 mm: rename __page_frag functions to __page_frag_cache, drop order from drain
This patch does two things.

First it goes through and renames the __page_frag prefixed functions to
__page_frag_cache so that we can be clear that we are draining or
refilling the cache, not the frags themselves.

Second we drop the order parameter from __page_frag_cache_drain since we
don't actually need to pass it since all fragments are either order 0 or
must be a compound page.

Link: http://lkml.kernel.org/r/20170104023954.13451.5678.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Alexander Duyck 8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Ard Biesheuvel f073bdc517 mm: don't dereference struct page fields of invalid pages
The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone.  However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with.  This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.

So reorder the VM_BUG_ON() with the pfn_valid_within() check.

Link: http://lkml.kernel.org/r/1481706707-6211-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Robert Richter <rrichter@cavium.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Michal Hocko 41b6167e8f mm: get rid of __GFP_OTHER_NODE
The flag was introduced by commit 78afd5612d ("mm: add
__GFP_OTHER_NODE flag") to allow proper accounting of remote node
allocations done by kernel daemons on behalf of a process - e.g.
khugepaged.

After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.

[mhocko@suse.com: fix build]
Link: http://lkml.kernel.org/r/20170106122225.GK5556@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Michal Hocko 2df26639e7 mm: fix remote numa hits statistics
Jia He has noticed that commit b9f00e147f ("mm, page_alloc: reduce
branches in zone_statistics") has an unintentional side effect that
remote node allocation requests are accounted as NUMA_MISS rathat than
NUMA_HIT and NUMA_OTHER if such a request doesn't use __GFP_OTHER_NODE.

There are many of these potentially because the flag is used very rarely
while we have many users of __alloc_pages_node.

Fix this by simply ignoring __GFP_OTHER_NODE (it can be removed in a
follow up patch) and treat all allocations that were satisfied from the
preferred zone's node as NUMA_HITS because this is the same node we
requested the allocation from in most cases.  If this is not the local
node then we just account it as NUMA_OTHER rather than NUMA_LOCAL.

One downsize would be that an allocation request for a node which is
outside of the mempolicy nodemask would be reported as a hit which is a
bit weird but that was the case before b9f00e147f already.

Fixes: b9f00e147f ("mm, page_alloc: reduce branches in zone_statistics")
Link: http://lkml.kernel.org/r/20170102153057.9451-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Jia He <hejianet@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz> # with cbmc[1] superpowers
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:54 -08:00
Alexander Duyck 44fdffd705 mm: add support for releasing multiple instances of a page
Add a function that allows us to batch free a page that has multiple
references outstanding.  Specifically this function can be used to drop
a page being used in the page frag alloc cache.  With this drivers can
make use of functionality similar to the page frag alloc cache without
having to do any workarounds for the fact that there is no function that
frees multiple references.

Link: http://lkml.kernel.org/r/20161110113606.76501.70752.stgit@ahduyck-blue-test.jf.intel.com
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Keguang Zhang <keguang.zhang@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00
Linus Torvalds e34bac726d Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - various misc bits

 - most of MM (quite a lot of MM material is awaiting the merge of
   linux-next dependencies)

 - kasan

 - printk updates

 - procfs updates

 - MAINTAINERS

 - /lib updates

 - checkpatch updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (123 commits)
  init: reduce rootwait polling interval time to 5ms
  binfmt_elf: use vmalloc() for allocation of vma_filesz
  checkpatch: don't emit unified-diff error for rename-only patches
  checkpatch: don't check c99 types like uint8_t under tools
  checkpatch: avoid multiple line dereferences
  checkpatch: don't check .pl files, improve absolute path commit log test
  scripts/checkpatch.pl: fix spelling
  checkpatch: don't try to get maintained status when --no-tree is given
  lib/ida: document locking requirements a bit better
  lib/rbtree.c: fix typo in comment of ____rb_erase_color
  lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
  MAINTAINERS: add drm and drm/i915 irc channels
  MAINTAINERS: add "C:" for URI for chat where developers hang out
  MAINTAINERS: add drm and drm/i915 bug filing info
  MAINTAINERS: add "B:" for URI where to file bugs
  get_maintainer: look for arbitrary letter prefixes in sections
  printk: add Kconfig option to set default console loglevel
  printk/sound: handle more message headers
  printk/btrfs: handle more message headers
  printk/kdb: handle more message headers
  ...
2016-12-12 20:50:02 -08:00
Linus Torvalds e71c3978d6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the final round of converting the notifier mess to the state
  machine. The removal of the notifiers and the related infrastructure
  will happen around rc1, as there are conversions outstanding in other
  trees.

  The whole exercise removed about 2000 lines of code in total and in
  course of the conversion several dozen bugs got fixed. The new
  mechanism allows to test almost every hotplug step standalone, so
  usage sites can exercise all transitions extensively.

  There is more room for improvement, like integrating all the
  pointlessly different architecture mechanisms of synchronizing,
  setting cpus online etc into the core code"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  tracing/rb: Init the CPU mask on allocation
  soc/fsl/qbman: Convert to hotplug state machine
  soc/fsl/qbman: Convert to hotplug state machine
  zram: Convert to hotplug state machine
  KVM/PPC/Book3S HV: Convert to hotplug state machine
  arm64/cpuinfo: Convert to hotplug state machine
  arm64/cpuinfo: Make hotplug notifier symmetric
  mm/compaction: Convert to hotplug state machine
  iommu/vt-d: Convert to hotplug state machine
  mm/zswap: Convert pool to hotplug state machine
  mm/zswap: Convert dst-mem to hotplug state machine
  mm/zsmalloc: Convert to hotplug state machine
  mm/vmstat: Convert to hotplug state machine
  mm/vmstat: Avoid on each online CPU loops
  mm/vmstat: Drop get_online_cpus() from init_cpu_node_state/vmstat_cpu_dead()
  tracing/rb: Convert to hotplug state machine
  oprofile/nmi timer: Convert to hotplug state machine
  net/iucv: Use explicit clean up labels in iucv_init()
  x86/pci/amd-bus: Convert to hotplug state machine
  x86/oprofile/nmi: Convert to hotplug state machine
  ...
2016-12-12 19:25:04 -08:00
Mel Gorman a6de734bc0 mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted
Vlastimil Babka pointed out that commit 479f854a20 ("mm, page_alloc:
defer debugging checks of pages allocated from the PCP") will allow the
per-cpu list counter to be out of sync with the per-cpu list contents if
a struct page is corrupted.

The consequence is an infinite loop if the per-cpu lists get fully
drained by free_pcppages_bulk because all the lists are empty but the
count is positive.  The infinite loop occurs here

                do {
                        batch_free++;
                        if (++migratetype == MIGRATE_PCPTYPES)
                                migratetype = 0;
                        list = &pcp->lists[migratetype];
                } while (list_empty(list));

What the user sees is a bad page warning followed by a soft lockup with
interrupts disabled in free_pcppages_bulk().

This patch keeps the accounting in sync.

Fixes: 479f854a20 ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>	[4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:08 -08:00
Minchan Kim 29fac03bef mm: make unreserve highatomic functions reliable
Currently, unreserve_highatomic_pageblock bails out if it found
highatomic pageblock regardless of really moving free pages from the one
so that it could mitigate unreserve logic's goal which saves OOM of a
process.

This patch makes unreserve functions bail out only if it moves some
pages out of !highatomic free list to avoid such false positive.

Another potential problem is that by race between page freeing and
reserve highatomic function, pages could be in highatomic free list even
though the pageblock is !high atomic migratetype.  In that case,
unreserve_highatomic_pageblock can be void if count of highatomic
reserve is less than pageblock_nr_pages.  We could solve it simply via
draining all of reserved pages before the OOM.  It would have a
safeguard role to exhuast reserved pages before converging to OOM.

Link: http://lkml.kernel.org/r/1476259429-18279-5-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:07 -08:00
Minchan Kim 04c8716f7b mm: try to exhaust highatomic reserve before the OOM
I got OOM report from production team with v4.4 kernel.  It had enough
free memory but failed to allocate GFP_KERNEL order-0 page and finally
encountered OOM kill.  It occured during QA process which launches
several apps, switching and so on.  It happned rarely.  IOW, In normal
situation, it was not a problem but if we are unluck so that several
apps uses peak memory at the same time, it can happen.  If we manage to
pass the phase, the system can go working well.

I could reproduce it with my test(memory spike easily.  Look at below.

The reason is free pages(19M) of DMA32 zone are reserved for
HIGHORDERATOMIC and doesn't unreserved before the OOM.

  balloon invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
  balloon cpuset=/ mems_allowed=0
  CPU: 1 PID: 8473 Comm: balloon Tainted: G        W  OE   4.8.0-rc7-00219-g3f74c9559583-dirty #3161
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    dump_header+0x5c/0x1ce
    oom_kill_process+0x22e/0x400
    out_of_memory+0x1ac/0x210
    __alloc_pages_nodemask+0x101e/0x1040
    handle_mm_fault+0xa0a/0xbf0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:383949 inactive_anon:106724 isolated_anon:0
   active_file:15 inactive_file:44 isolated_file:0
   unevictable:0 dirty:0 writeback:24 unstable:0
   slab_reclaimable:2483 slab_unreclaimable:3326
   mapped:0 shmem:0 pagetables:1906 bounce:0
   free:6898 free_pcp:291 free_cma:0
  Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1418 all_unreclaimable? no
  DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (H) = 8192kB
  DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
  51131 total pagecache pages
  50795 pages in swap cache
  Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
  Free swap  = 8kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12658 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

Another example exceeded the limit by the race is

  in:imklog: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
  CPU: 0 PID: 476 Comm: in:imklog Tainted: G            E   4.8.0-rc7-00217-g266ef83c51e5-dirty #3135
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    warn_alloc_failed+0xdb/0x130
    __alloc_pages_nodemask+0x4d6/0xdb0
    new_slab+0x339/0x490
    ___slab_alloc.constprop.74+0x367/0x480
    __slab_alloc.constprop.73+0x20/0x40
    __kmalloc+0x1a4/0x1e0
    alloc_indirect.isra.14+0x1d/0x50
    virtqueue_add_sgs+0x1c4/0x470
    __virtblk_add_req+0xae/0x1f0
    virtio_queue_rq+0x12d/0x290
    __blk_mq_run_hw_queue+0x239/0x370
    blk_mq_run_hw_queue+0x8f/0xb0
    blk_mq_insert_requests+0x18c/0x1a0
    blk_mq_flush_plug_list+0x125/0x140
    blk_flush_plug_list+0xc7/0x220
    blk_finish_plug+0x2c/0x40
    __do_page_cache_readahead+0x196/0x230
    filemap_fault+0x448/0x4f0
    ext4_filemap_fault+0x36/0x50
    __do_fault+0x75/0x140
    handle_mm_fault+0x84d/0xbe0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:363826 inactive_anon:121283 isolated_anon:32
   active_file:65 inactive_file:152 isolated_file:0
   unevictable:0 dirty:0 writeback:46 unstable:0
   slab_reclaimable:2778 slab_unreclaimable:3070
   mapped:112 shmem:0 pagetables:1822 bounce:0
   free:9469 free_pcp:231 free_cma:0
  Node 0 active_anon:1455304kB inactive_anon:485132kB active_file:260kB inactive_file:608kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:448kB dirty:0kB writeback:184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:13641 all_unreclaimable? no
  DMA free:7748kB min:44kB low:56kB high:68kB active_anon:7944kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:30128kB min:5628kB low:7624kB high:9620kB active_anon:1447360kB inactive_anon:485028kB active_file:260kB inactive_file:608kB unevictable:0kB writepending:184kB present:2080640kB managed:2030132kB mlocked:0kB slab_reclaimable:11112kB slab_unreclaimable:12172kB kernel_stack:2400kB pagetables:7284kB bounce:0kB free_pcp:924kB local_pcp:72kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 7*4kB (UE) 3*8kB (UH) 1*16kB (M) 0*32kB 2*64kB (U) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 1*4096kB (H) = 7748kB
  DMA32: 10*4kB (H) 3*8kB (H) 47*16kB (H) 38*32kB (H) 5*64kB (H) 1*128kB (H) 2*256kB (H) 3*512kB (H) 3*1024kB (H) 3*2048kB (H) 4*4096kB (H) = 30128kB
  2775 total pagecache pages
  2536 pages in swap cache
  Swap cache stats: add 206786828, delete 206784292, find 7323106/106686077
  Free swap  = 108744kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12648 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

It's weird to show that zone has enough free memory above min watermark
but OOMed with 4K GFP_KERNEL allocation due to reserved highatomic
pages.  As last resort, try to unreserve highatomic pages again and if
it has moved pages to non-highatmoc free list, retry reclaim once more.

Link: http://lkml.kernel.org/r/1476259429-18279-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:07 -08:00
Minchan Kim 4855e4a7f2 mm: prevent double decrease of nr_reserved_highatomic
There is race between page freeing and unreserved highatomic.

 CPU 0				    CPU 1

    free_hot_cold_page
      mt = get_pfnblock_migratetype
      set_pcppage_migratetype(page, mt)
    				    unreserve_highatomic_pageblock
    				    spin_lock_irqsave(&zone->lock)
    				    move_freepages_block
    				    set_pageblock_migratetype(page)
    				    spin_unlock_irqrestore(&zone->lock)
      free_pcppages_bulk
        __free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed.  By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock severak
times and then it will make mismatch between nr_reserved_highatomic and
the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not and
decrease the count only if the pageblock is highatomic.

Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:07 -08:00
Minchan Kim 88ed365ea2 mm: don't steal highatomic pageblock
Patch series "use up highorder free pages before OOM", v3.

I got OOM report from production team with v4.4 kernel.  It had enough
free memory but failed to allocate GFP_KERNEL order-0 page and finally
encountered OOM kill.  It occured during QA process which launches
several apps, switching and so on.  It happned rarely.  IOW, In normal
situation, it was not a problem but if we are unluck so that several
apps uses peak memory at the same time, it can happen.  If we manage to
pass the phase, the system can go working well.

I could reproduce it with my test(memory spike easily.  Look at below.

The reason is free pages(19M) of DMA32 zone are reserved for
HIGHORDERATOMIC and doesn't unreserved before the OOM.

  balloon invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
  balloon cpuset=/ mems_allowed=0
  CPU: 1 PID: 8473 Comm: balloon Tainted: G        W  OE   4.8.0-rc7-00219-g3f74c9559583-dirty #3161
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    dump_header+0x5c/0x1ce
    oom_kill_process+0x22e/0x400
    out_of_memory+0x1ac/0x210
    __alloc_pages_nodemask+0x101e/0x1040
    handle_mm_fault+0xa0a/0xbf0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:383949 inactive_anon:106724 isolated_anon:0
   active_file:15 inactive_file:44 isolated_file:0
   unevictable:0 dirty:0 writeback:24 unstable:0
   slab_reclaimable:2483 slab_unreclaimable:3326
   mapped:0 shmem:0 pagetables:1906 bounce:0
   free:6898 free_pcp:291 free_cma:0
  Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1418 all_unreclaimable? no
  DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (H) = 8192kB
  DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
  51131 total pagecache pages
  50795 pages in swap cache
  Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
  Free swap  = 8kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12658 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

Another example exceeded the limit by the race is

  in:imklog: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
  CPU: 0 PID: 476 Comm: in:imklog Tainted: G            E   4.8.0-rc7-00217-g266ef83c51e5-dirty #3135
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    warn_alloc_failed+0xdb/0x130
    __alloc_pages_nodemask+0x4d6/0xdb0
    new_slab+0x339/0x490
    ___slab_alloc.constprop.74+0x367/0x480
    __slab_alloc.constprop.73+0x20/0x40
    __kmalloc+0x1a4/0x1e0
    alloc_indirect.isra.14+0x1d/0x50
    virtqueue_add_sgs+0x1c4/0x470
    __virtblk_add_req+0xae/0x1f0
    virtio_queue_rq+0x12d/0x290
    __blk_mq_run_hw_queue+0x239/0x370
    blk_mq_run_hw_queue+0x8f/0xb0
    blk_mq_insert_requests+0x18c/0x1a0
    blk_mq_flush_plug_list+0x125/0x140
    blk_flush_plug_list+0xc7/0x220
    blk_finish_plug+0x2c/0x40
    __do_page_cache_readahead+0x196/0x230
    filemap_fault+0x448/0x4f0
    ext4_filemap_fault+0x36/0x50
    __do_fault+0x75/0x140
    handle_mm_fault+0x84d/0xbe0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:363826 inactive_anon:121283 isolated_anon:32
   active_file:65 inactive_file:152 isolated_file:0
   unevictable:0 dirty:0 writeback:46 unstable:0
   slab_reclaimable:2778 slab_unreclaimable:3070
   mapped:112 shmem:0 pagetables:1822 bounce:0
   free:9469 free_pcp:231 free_cma:0
  Node 0 active_anon:1455304kB inactive_anon:485132kB active_file:260kB inactive_file:608kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:448kB dirty:0kB writeback:184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:13641 all_unreclaimable? no
  DMA free:7748kB min:44kB low:56kB high:68kB active_anon:7944kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:30128kB min:5628kB low:7624kB high:9620kB active_anon:1447360kB inactive_anon:485028kB active_file:260kB inactive_file:608kB unevictable:0kB writepending:184kB present:2080640kB managed:2030132kB mlocked:0kB slab_reclaimable:11112kB slab_unreclaimable:12172kB kernel_stack:2400kB pagetables:7284kB bounce:0kB free_pcp:924kB local_pcp:72kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 7*4kB (UE) 3*8kB (UH) 1*16kB (M) 0*32kB 2*64kB (U) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 1*4096kB (H) = 7748kB
  DMA32: 10*4kB (H) 3*8kB (H) 47*16kB (H) 38*32kB (H) 5*64kB (H) 1*128kB (H) 2*256kB (H) 3*512kB (H) 3*1024kB (H) 3*2048kB (H) 4*4096kB (H) = 30128kB
  2775 total pagecache pages
  2536 pages in swap cache
  Swap cache stats: add 206786828, delete 206784292, find 7323106/106686077
  Free swap  = 108744kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12648 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

During the investigation, I found some problems with highatomic so this
patch aims to solve the problems and the final goal is to unreserve
every highatomic free pages before the OOM kill.

This patch (of 4):

In page freeing path, migratetype is racy so that a highorderatomic page
could free into non-highorderatomic free list.  If that page is
allocated, VM can change the pageblock from higorderatomic to something.
In that case, highatomic pageblock accounting is broken so it doesn't
work(e.g., VM cannot reserve highorderatomic pageblocks any more
although it doesn't reach 1% limit).

So, this patch prohibits the changing from highatomic to other type.
It's no problem because MIGRATE_HIGHATOMIC is not listed in fallback
array so stealing will only happen due to unexpected races which is
really rare.  Also, such prohibiting keeps highatomic pageblock more
longer so it would be better for highorderatomic page allocation.

Link: http://lkml.kernel.org/r/1476259429-18279-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12 18:55:07 -08:00
Ingo Molnar 89a01c51cb Merge branch 'x86/cpufeature' into x86/asm, to pick up dependency
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-17 08:30:54 +01:00
Tetsuo Handa 9e80c719a8 mm: remove extra newline from allocation stall warning
Commit 63f53dea0c ("mm: warn about allocations which stall for too
long") by error embedded "\n" in the format string, resulting in strange
output.

  [  722.876655] kworker/0:1: page alloction stalls for 160001ms, order:0
  [  722.876656] , mode:0x2400000(GFP_NOIO)
  [  722.876657] CPU: 0 PID: 6966 Comm: kworker/0:1 Not tainted 4.8.0+ #69

Link: http://lkml.kernel.org/r/1476026219-7974-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:12:37 -08:00
Sebastian Andrzej Siewior 005fd4bbef mm/page_alloc: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161103145021.28528-7-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 23:45:27 +01:00
Linus Torvalds 577f12c07e Merge tag 'gcc-plugins-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugin fixes from Kees Cook:
 - make sure required exports from gcc plugins are visible to gcc
 - switch latent_entropy to unsigned long to avoid stack frame bloat

* tag 'gcc-plugins-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Fix wrong gcc code generation with 64 bit variables
  gcc-plugins: Export symbols needed by gcc
2016-11-01 17:48:46 -06:00
Ingo Molnar 05b93c19d5 Merge branch 'linus' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-01 07:41:06 +01:00
Kees Cook 58bea4144d latent_entropy: Fix wrong gcc code generation with 64 bit variables
The stack frame size could grow too large when the plugin used long long
on 32-bit architectures when the given function had too many basic blocks.

The gcc warning was:

drivers/pci/hotplug/ibmphp_ebda.c: In function 'ibmphp_access_ebda':
drivers/pci/hotplug/ibmphp_ebda.c:409:1: warning: the frame size of 1108 bytes is larger than 1024 bytes [-Wframe-larger-than=]

This switches latent_entropy from u64 to unsigned long.

Thanks to PaX Team and Emese Revfy for the patch.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-31 11:30:41 -07:00
Joe Perches 1f84a18fc0 mm: page_alloc: use KERN_CONT where appropriate
Recent changes to printk require KERN_CONT uses to continue logging
messages.  So add KERN_CONT where necessary.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4bcc595ccd ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Linus Torvalds 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:

     wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)

where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().

The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).

It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.

As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.

Peter Zijlstra already has a patch for that, but let's see if anybody
even notices.  In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 09:27:57 -07:00
Josh Poimboeuf adb1fe9ae2 mm/page_alloc: Remove kernel address exposure in free_reserved_area()
Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg.  The only leaks I see on my local
system are:

  Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
  Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
  Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
  Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
  Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)

Linus says:

  "I suspect we should just remove [the addresses in the 'Freeing'
   messages]. I'm sure they are useful in theory, but I suspect they
   were more useful back when the whole "free init memory" was
   originally done.

   These days, if we have a use-after-free, I suspect the init-mem
   situation is the easiest situation by far. Compared to all the dynamic
   allocations which are much more likely to show it anyway. So having
   debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

  Freeing SMP alternatives memory: 32K
  Freeing initrd memory: 10588K
  Freeing unused kernel memory: 3592K
  Freeing unused kernel memory: 1352K
  Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-25 18:40:37 +02:00