mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
2eb060975df2aaca635801dcfb99eaa45ede98fe
97 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2af120bc04 |
mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block
We received several reports of bad page state when freeing CMA pages
previously allocated with alloc_contig_range:
BUG: Bad page state in process Binder_A pfn:63202
page:d21130b0 count:0 mapcount:1 mapping: (null) index:0x7dfbf
page flags: 0x40080068(uptodate|lru|active|swapbacked)
Based on the page state, it looks like the page was still in use. The
page flags do not make sense for the use case though. Further debugging
showed that despite alloc_contig_range returning success, at least one
page in the range still remained in the buddy allocator.
There is an issue with isolate_freepages_block. In strict mode (which
CMA uses), if any pages in the range cannot be isolated,
isolate_freepages_block should return failure 0. The current check
keeps track of the total number of isolated pages and compares against
the size of the range:
if (strict && nr_strict_required > total_isolated)
total_isolated = 0;
After taking the zone lock, if one of the pages in the range is not in
the buddy allocator, we continue through the loop and do not increment
total_isolated. If in the last iteration of the loop we isolate more
than one page (e.g. last page needed is a higher order page), the check
for total_isolated may pass and we fail to detect that a page was
skipped. The fix is to bail out if the loop immediately if we are in
strict mode. There's no benfit to continuing anyway since we need all
pages to be isolated. Additionally, drop the error checking based on
nr_strict_required and just check the pfn ranges. This matches with
what isolate_freepages_range does.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
6c14466cc0 |
mm: improve documentation of page_order
Developers occasionally try and optimise PFN scanners by using page_order but miss that in general it requires zone->lock. This has happened twice for compaction.c and rejected both times. This patch clarifies the documentation of page_order and adds a note to compaction.c why page_order is not used. [akpm@linux-foundation.org: tweaks] [lauraa@codeaurora.org: Corrected a page_zone(page)->lock reference] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
309381feae |
mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE
Most of the VM_BUG_ON assertions are performed on a page. Usually, when one of these assertions fails we'll get a BUG_ON with a call stack and the registers. I've recently noticed based on the requests to add a small piece of code that dumps the page to various VM_BUG_ON sites that the page dump is quite useful to people debugging issues in mm. This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what VM_BUG_ON() does, also dumps the page before executing the actual BUG_ON. [akpm@linux-foundation.org: fix up includes] Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
55b7c4c99f |
mm: compaction: reset scanner positions immediately when they meet
Compaction used to start its migrate and free page scaners at the zone's lowest and highest pfn, respectively. Later, caching was introduced to remember the scanners' progress across compaction attempts so that pageblocks are not re-scanned uselessly. Additionally, pageblocks where isolation failed are marked to be quickly skipped when encountered again in future compactions. Currently, both the reset of cached pfn's and clearing of the pageblock skip information for a zone is done in __reset_isolation_suitable(). This function gets called when: - compaction is restarting after being deferred - compact_blockskip_flush flag is set in compact_finished() when the scanners meet (and not again cleared when direct compaction succeeds in allocation) and kswapd acts upon this flag before going to sleep This behavior is suboptimal for several reasons: - when direct sync compaction is called after async compaction fails (in the allocation slowpath), it will effectively do nothing, unless kswapd happens to process the compact_blockskip_flush flag meanwhile. This is racy and goes against the purpose of sync compaction to more thoroughly retry the compaction of a zone where async compaction has failed. The restart-after-deferring path cannot help here as deferring happens only after the sync compaction fails. It is also done only for the preferred zone, while the compaction might be done for a fallback zone. - the mechanism of marking pageblock to be skipped has little value since the cached pfn's are reset only together with the pageblock skip flags. This effectively limits pageblock skip usage to parallel compactions. This patch changes compact_finished() so that cached pfn's are reset immediately when the scanners meet. Clearing pageblock skip flags is unchanged, as well as the other situations where cached pfn's are reset. This allows the sync-after-async compaction to retry pageblocks not marked as skipped, such as blocks !MIGRATE_MOVABLE blocks that async compactions now skips without marking them. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
50b5b094e6 |
mm: compaction: do not mark unmovable pageblocks as skipped in async compaction
Compaction temporarily marks pageblocks where it fails to isolate pages as to-be-skipped in further compactions, in order to improve efficiency. One of the reasons to fail isolating pages is that isolation is not attempted in pageblocks that are not of MIGRATE_MOVABLE (or CMA) type. The problem is that blocks skipped due to not being MIGRATE_MOVABLE in async compaction become skipped due to the temporary mark also in future sync compaction. Moreover, this may follow quite soon during __alloc_page_slowpath, without much time for kswapd to clear the pageblock skip marks. This goes against the idea that sync compaction should try to scan these blocks more thoroughly than the async compaction. The fix is to ensure in async compaction that these !MIGRATE_MOVABLE blocks are not marked to be skipped. Note this should not affect performance or locking impact of further async compactions, as skipping a block due to being !MIGRATE_MOVABLE is done soon after skipping a block marked to be skipped, both without locking. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7ed695e069 |
mm: compaction: detect when scanners meet in isolate_freepages
Compaction of a zone is finished when the migrate scanner (which begins at the zone's lowest pfn) meets the free page scanner (which begins at the zone's highest pfn). This is detected in compact_zone() and in the case of direct compaction, the compact_blockskip_flush flag is set so that kswapd later resets the cached scanner pfn's, and a new compaction may again start at the zone's borders. The meeting of the scanners can happen during either scanner's activity. However, it may currently fail to be detected when it occurs in the free page scanner, due to two problems. First, isolate_freepages() keeps free_pfn at the highest block where it isolated pages from, for the purposes of not missing the pages that are returned back to allocator when migration fails. Second, failing to isolate enough free pages due to scanners meeting results in -ENOMEM being returned by migrate_pages(), which makes compact_zone() bail out immediately without calling compact_finished() that would detect scanners meeting. This failure to detect scanners meeting might result in repeated attempts at compaction of a zone that keep starting from the cached pfn's close to the meeting point, and quickly failing through the -ENOMEM path, without the cached pfns being reset, over and over. This has been observed (through additional tracepoints) in the third phase of the mmtests stress-highalloc benchmark, where the allocator runs on an otherwise idle system. The problem was observed in the DMA32 zone, which was used as a fallback to the preferred Normal zone, but on the 4GB system it was actually the largest zone. The problem is even amplified for such fallback zone - the deferred compaction logic, which could (after being fixed by a previous patch) reset the cached scanner pfn's, is only applied to the preferred zone and not for the fallbacks. The problem in the third phase of the benchmark was further amplified by commit |
||
|
|
d3132e4b83 |
mm: compaction: reset cached scanner pfn's before reading them
Compaction caches pfn's for its migrate and free scanners to avoid scanning the whole zone each time. In compact_zone(), the cached values are read to set up initial values for the scanners. There are several situations when these cached pfn's are reset to the first and last pfn of the zone, respectively. One of these situations is when a compaction has been deferred for a zone and is now being restarted during a direct compaction, which is also done in compact_zone(). However, compact_zone() currently reads the cached pfn's *before* resetting them. This means the reset doesn't affect the compaction that performs it, and with good chance also subsequent compactions, as update_pageblock_skip() is likely to be called and update the cached pfn's to those being processed. Another chance for a successful reset is when a direct compaction detects that migration and free scanners meet (which has its own problems addressed by another patch) and sets update_pageblock_skip flag which kswapd uses to do the reset because it goes to sleep. This is clearly a bug that results in non-deterministic behavior, so this patch moves the cached pfn reset to be performed *before* the values are read. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
de6c60a6c1 |
mm: compaction: encapsulate defer reset logic
Currently there are several functions to manipulate the deferred compaction state variables. The remaining case where the variables are touched directly is when a successful allocation occurs in direct compaction, or is expected to be successful in the future by kswapd. Here, the lowest order that is expected to fail is updated, and in the case of successful allocation, the deferred status and counter is reset completely. Create a new function compaction_defer_reset() to encapsulate this functionality and make it easier to understand the code. No functional change. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0eb927c0ab |
mm: compaction: trace compaction begin and end
The broad goal of the series is to improve allocation success rates for
huge pages through memory compaction, while trying not to increase the
compaction overhead. The original objective was to reintroduce
capturing of high-order pages freed by the compaction, before they are
split by concurrent activity. However, several bugs and opportunities
for simple improvements were found in the current implementation, mostly
through extra tracepoints (which are however too ugly for now to be
considered for sending).
The patches mostly deal with two mechanisms that reduce compaction
overhead, which is caching the progress of migrate and free scanners,
and marking pageblocks where isolation failed to be skipped during
further scans.
Patch 1 (from mgorman) adds tracepoints that allow calculate time spent in
compaction and potentially debug scanner pfn values.
Patch 2 encapsulates the some functionality for handling deferred compactions
for better maintainability, without a functional change
type is not determined without being actually needed.
Patch 3 fixes a bug where cached scanner pfn's are sometimes reset only after
they have been read to initialize a compaction run.
Patch 4 fixes a bug where scanners meeting is sometimes not properly detected
and can lead to multiple compaction attempts quitting early without
doing any work.
Patch 5 improves the chances of sync compaction to process pageblocks that
async compaction has skipped due to being !MIGRATE_MOVABLE.
Patch 6 improves the chances of sync direct compaction to actually do anything
when called after async compaction fails during allocation slowpath.
The impact of patches were validated using mmtests's stress-highalloc
benchmark with mmtests's stress-highalloc benchmark on a x86_64 machine
with 4GB memory.
Due to instability of the results (mostly related to the bugs fixed by
patches 2 and 3), 10 iterations were performed, taking min,mean,max
values for success rates and mean values for time and vmstat-based
metrics.
First, the default GFP_HIGHUSER_MOVABLE allocations were tested with the
patches stacked on top of v3.13-rc2. Patch 2 is OK to serve as baseline
due to no functional changes in 1 and 2. Comments below.
stress-highalloc
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
Success 1 Min 9.00 ( 0.00%) 10.00 (-11.11%) 43.00 (-377.78%) 43.00 (-377.78%) 33.00 (-266.67%)
Success 1 Mean 27.50 ( 0.00%) 25.30 ( 8.00%) 45.50 (-65.45%) 45.90 (-66.91%) 46.30 (-68.36%)
Success 1 Max 36.00 ( 0.00%) 36.00 ( 0.00%) 47.00 (-30.56%) 48.00 (-33.33%) 52.00 (-44.44%)
Success 2 Min 10.00 ( 0.00%) 8.00 ( 20.00%) 46.00 (-360.00%) 45.00 (-350.00%) 35.00 (-250.00%)
Success 2 Mean 26.40 ( 0.00%) 23.50 ( 10.98%) 47.30 (-79.17%) 47.60 (-80.30%) 48.10 (-82.20%)
Success 2 Max 34.00 ( 0.00%) 33.00 ( 2.94%) 48.00 (-41.18%) 50.00 (-47.06%) 54.00 (-58.82%)
Success 3 Min 65.00 ( 0.00%) 63.00 ( 3.08%) 85.00 (-30.77%) 84.00 (-29.23%) 85.00 (-30.77%)
Success 3 Mean 76.70 ( 0.00%) 70.50 ( 8.08%) 86.20 (-12.39%) 85.50 (-11.47%) 86.00 (-12.13%)
Success 3 Max 87.00 ( 0.00%) 86.00 ( 1.15%) 88.00 ( -1.15%) 87.00 ( 0.00%) 87.00 ( 0.00%)
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
User 6437.72 6459.76 5960.32 5974.55 6019.67
System 1049.65 1049.09 1029.32 1031.47 1032.31
Elapsed 1856.77 1874.48 1949.97 1994.22 1983.15
3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2
2-nothp 3-nothp 4-nothp 5-nothp 6-nothp
Minor Faults 253952267 254581900 250030122 250507333 250157829
Major Faults 420 407 506 530 530
Swap Ins 4 9 9 6 6
Swap Outs 398 375 345 346 333
Direct pages scanned 197538 189017 298574 287019 299063
Kswapd pages scanned 1809843 1801308 1846674 1873184 1861089
Kswapd pages reclaimed 1806972 1798684 1844219 1870509 1858622
Direct pages reclaimed 197227 188829 298380 286822 298835
Kswapd efficiency 99% 99% 99% 99% 99%
Kswapd velocity 953.382 970.449 952.243 934.569 922.286
Direct efficiency 99% 99% 99% 99% 99%
Direct velocity 104.058 101.832 153.961 143.200 148.205
Percentage direct scans 9% 9% 13% 13% 13%
Zone normal velocity 347.289 359.676 348.063 339.933 332.983
Zone dma32 velocity 710.151 712.605 758.140 737.835 737.507
Zone dma velocity 0.000 0.000 0.000 0.000 0.000
Page writes by reclaim 557.600 429.000 353.600 426.400 381.800
Page writes file 159 53 7 79 48
Page writes anon 398 375 345 346 333
Page reclaim immediate 825 644 411 575 420
Sector Reads 2781750 2769780 2878547 2939128 2910483
Sector Writes 12080843 12083351 12012892 12002132 12010745
Page rescued immediate 0 0 0 0 0
Slabs scanned
|
||
|
|
6815bf3f23 |
mm/compaction: respect ignore_skip_hint in update_pageblock_skip
update_pageblock_skip() only fits to compaction which tries to isolate by pageblock unit. If isolate_migratepages_range() is called by CMA, it try to isolate regardless of pageblock unit and it don't reference get_pageblock_skip() by ignore_skip_hint. We should also respect it on update_pageblock_skip() to prevent from setting the wrong information. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9e4be4708e |
mm/compaction.c: update comment about zone lock in isolate_freepages_block
Since commit
|
||
|
|
f6ea3adb70 |
mm/compaction.c: periodically schedule when freeing pages
We've been getting warnings about an excessive amount of time spent allocating pages for migration during memory compaction without scheduling. isolate_freepages_block() already periodically checks for contended locks or the need to schedule, but isolate_freepages() never does. When a zone is massively long and no suitable targets can be found, this iteration can be quite expensive without ever doing cond_resched(). Check periodically for the need to reschedule while the compaction free scanner iterates. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3a7200af3d |
mm: compaction: do not compact pgdat for order-0
If kswapd was reclaiming for a high order and resets it to 0 due to fragmentation it will still call compact_pgdat. For the most part, this will fail a compaction_suitable() test and not compact but it is unnecessarily sloppy. It could be fixed in the caller but fix it in the API instead. [dhillf@gmail.com: pointed out that it was a potential problem] Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
108bcc96ef |
mm: add & use zone_end_pfn() and zone_spans_pfn()
Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code duplication. This also switches to using them in compaction (where an additional variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and kmemleak. Note that in compaction.c I avoid calling zone_end_pfn() repeatedly because I expect at some point the sycronization issues with start_pfn & spanned_pages will need fixing, either by actually using the seqlock or clever memory barrier usage. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: David Hansen <dave@linux.vnet.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
9c620e2bc5 |
mm: remove offlining arg to migrate_pages
No functional change, but the only purpose of the offlining argument to migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a KSM page for memory hotremove (which took ksm_thread_mutex) but not for other callers. Now all cases are safe, remove the arg. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Petr Holasek <pholasek@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Izik Eidus <izik.eidus@ravellosystems.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
194159fbcc |
mm: remove MIGRATE_ISOLATE check in hotpath
Several functions test MIGRATE_ISOLATE and some of those are hotpath but MIGRATE_ISOLATE is used only if we enable CONFIG_MEMORY_ISOLATION(ie, CMA, memory-hotplug and memory-failure) which are not common config option. So let's not add unnecessary overhead and code when we don't enable CONFIG_MEMORY_ISOLATION. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7103f16dbf |
mm: compaction: make __compact_pgdat() and compact_pgdat() return void
These functions always return 0. Formalise this. Cc: Jason Liu <r64343@freescale.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a9aacbccf3 |
mm: compaction: do not accidentally skip pageblocks in the migrate scanner
Compaction uses the ALIGN macro incorrectly with the migrate scanner by adding pageblock_nr_pages to a PFN. It happened to work when initially implemented as the starting PFN was also aligned but with caching restarts and isolating in smaller chunks this is no longer always true. The impact is that the migrate scanner scans outside its current pageblock. As pfn_valid() is still checked properly it does not cause any failure and the impact of the bug is that in some cases it will scan more than necessary when it crosses a page boundary but by no more than COMPACT_CLUSTER_MAX. It is highly unlikely this is even measurable but it's still wrong so this patch addresses the problem. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
8fb74b9fb2 |
mm: compaction: partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when waiting for POLLIN on a local TCP socket. It was easier to trigger if there was disk IO and dirty pages at the same time and he bisected it to commit |
||
|
|
7964c06d66 |
mm: compaction: fix echo 1 > compact_memory return error issue
when run the folloing command under shell, it will return error sh/$ echo 1 > /proc/sys/vm/compact_memory sh/$ sh: write error: Bad address After strace, I found the following log: ... write(1, "1\n", 2) = 3 write(1, "", 4294967295) = -1 EFAULT (Bad address) write(2, "echo: write error: Bad address\n", 31echo: write error: Bad address ) = 31 This tells system return 3(COMPACT_COMPLETE) after write data to compact_memory. The fix is to make the system just return 0 instead 3(COMPACT_COMPLETE) from sysctl_compaction_handler after compaction_nodes finished. Signed-off-by: Jason Liu <r64343@freescale.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
010fc29a45 |
compaction: fix build error in CMA && !COMPACTION
isolate_freepages_block() and isolate_migratepages_range() are used for
CMA as well as compaction so it breaks build for CONFIG_CMA &&
!CONFIG_COMPACTION.
This patch fixes it.
[akpm@linux-foundation.org: add "do { } while (0)", per Mel]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
3d59eebc5e |
Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma
Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.
In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.
The most recent set of comparisons available from different people are
mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397
The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.
My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.
These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."
* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...
|
||
|
|
c8bf2d8ba4 |
mm: compaction: Fix compiler warning
compact_capture_page() is only used if compaction is enabled so it should be moved into the corresponding #ifdef. Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
5733c7d11d |
mm: introduce putback_movable_pages()
The PATCH "mm: introduce compaction and migration for virtio ballooned pages" hacks around putback_lru_pages() in order to allow ballooned pages to be re-inserted on balloon page list as if a ballooned page was like a LRU page. As ballooned pages are not legitimate LRU pages, this patch introduces putback_movable_pages() to properly cope with cases where the isolated pageset contains ballooned pages and LRU pages, thus fixing the mentioned inelegant hack around putback_lru_pages(). Signed-off-by: Rafael Aquini <aquini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <andi@firstfloor.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bf6bddf192 |
mm: introduce compaction and migration for ballooned pages
Memory fragmentation introduced by ballooning might reduce significantly the number of 2MB contiguous memory blocks that can be used within a guest, thus imposing performance penalties associated with the reduced number of transparent huge pages that could be used by the guest workload. This patch introduces the helper functions as well as the necessary changes to teach compaction and migration bits how to cope with pages which are part of a guest memory balloon, in order to make them movable by memory compaction procedures. Signed-off-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |