mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
29c1f677d424e8c5683a837fc4f03fc9f19201d7
127 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
29c1f677d4 |
mm: migration: use rcu_dereference_protected when dereferencing the radix tree slot during file page migration
migrate_pages() -> unmap_and_move() only calls rcu_read_lock() for
anonymous pages, as introduced by git commit
|
||
|
|
500d65d471 |
thp: pmd_trans_huge migrate bugcheck
No pmd_trans_huge should ever materialize in migration ptes areas, because we split the hugepage before migration ptes are instantiated. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
7f0f24967b |
mm: migration: cleanup migrate_pages API by matching types for offlining and sync
With the introduction of the boolean sync parameter, the API looks a little inconsistent as offlining is still an int. Convert offlining to a bool for the sake of being tidy. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
77f1fe6b08 |
mm: migration: allow migration to operate asynchronously and avoid synchronous compaction in the faster path
Migration synchronously waits for writeback if the initial passes fails. Callers of memory compaction do not necessarily want this behaviour if the caller is latency sensitive or expects that synchronous migration is not going to have a significantly better success rate. This patch adds a sync parameter to migrate_pages() allowing the caller to indicate if wait_on_page_writeback() is allowed within migration or not. For reclaim/compaction, try_to_compact_pages() is first called asynchronously, direct reclaim runs and then try_to_compact_pages() is called synchronously as there is a greater expectation that it'll succeed. [akpm@linux-foundation.org: build/merge fix] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3e7d344970 |
mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim
Lumpy reclaim is disruptive. It reclaims a large number of pages and ignores the age of the pages it reclaims. This can incur significant stalls and potentially increase the number of major faults. Compaction has reached the point where it is considered reasonably stable (meaning it has passed a lot of testing) and is a potential candidate for displacing lumpy reclaim. This patch introduces an alternative to lumpy reclaim whe compaction is available called reclaim/compaction. The basic operation is very simple - instead of selecting a contiguous range of pages to reclaim, a number of order-0 pages are reclaimed and then compaction is later by either kswapd (compact_zone_order()) or direct compaction (__alloc_pages_direct_compact()). [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: use conventional task_struct naming] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
0d1836c366 |
mm/migrate.c: fix compilation error
GCC complained about update_mmu_cache() not being defined in migrate.c. Including <asm/tlbflush.h> seems to solve the problem. Signed-off-by: Michal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
70384dc6dc |
mm: fix error reporting in move_pages() syscall
The vma returned by find_vma does not necessarily include the target address. If this happens the code tries to follow a page outside of any vma and returns ENOENT instead of EFAULT. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
cf608ac19c |
mm: compaction: fix COMPACTPAGEFAILED counting
Presently update_nr_listpages() doesn't have a role. That's because lists
passed is always empty just after calling migrate_pages. The
migrate_pages cleans up page list which have failed to migrate before
returning by
|
||
|
|
1b430beee5 |
writeback: remove nonblocking/encountered_congestion references
This removes more dead code that was somehow missed by commit
|
||
|
|
3ef8fd7f72 |
Fix migration.c compilation on s390
31bit s390 doesn't have huge pages and failed with: > mm/migrate.c: In function 'remove_migration_pte': > mm/migrate.c:143:3: error: implicit declaration of function 'pte_mkhuge' > mm/migrate.c:143:7: error: incompatible types when assigning to type 'pte_t' from type 'int' Put that code into a ifdef. Reported by Heiko Carstens Signed-off-by: Andi Kleen <ak@linux.intel.com> |
||
|
|
290408d4a2 |
hugetlb: hugepage migration core
This patch extends page migration code to support hugepage migration. One of the potential users of this feature is soft offlining which is triggered by memory corrected errors (added by the next patch.) Todo: - there are other users of page migration such as memory policy, memory hotplug and memocy compaction. They are not ready for hugepage support for now. ChangeLog since v4: - define migrate_huge_pages() - remove changes on isolation/putback_lru_page() ChangeLog since v2: - refactor isolate/putback_lru_page() to handle hugepage - add comment about race on unmap_and_move_huge_page() ChangeLog since v1: - divide migration code path for hugepage - define routine checking migration swap entry for hugetlb - replace "goto" with "if/else" in remove_migration_pte() Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andi Kleen <ak@linux.intel.com> |
||
|
|
76545066c8 |
mm: extend KSM refcounts to the anon_vma root
KSM reference counts can cause an anon_vma to exist after the processe it belongs to have already exited. Because the anon_vma lock now lives in the root anon_vma, we need to ensure that the root anon_vma stays around until after all the "child" anon_vmas have been freed. The obvious way to do this is to have a "child" anon_vma take a reference to the root in anon_vma_fork. When the anon_vma is freed at munmap or process exit, we drop the refcount in anon_vma_unlink and possibly free the root anon_vma. The KSM anon_vma reference count function also needs to be modified to deal with the possibility of freeing 2 levels of anon_vma. The easiest way to do this is to break out the KSM magic and make it generic. When compiling without CONFIG_KSM, this code is compiled out. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
012f18004d |
mm: always lock the root (oldest) anon_vma
Always (and only) lock the root (oldest) anon_vma whenever we do something in an anon_vma. The recently introduced anon_vma scalability is due to the rmap code scanning only the VMAs that need to be scanned. Many common operations still took the anon_vma lock on the root anon_vma, so always taking that lock is not expected to introduce any scalability issues. However, always taking the same lock does mean we only need to take one lock, which means rmap_walk on pages from any anon_vma in the vma is excluded from occurring during an munmap, expand_stack or other operation that needs to exclude rmap_walk and similar functions. Also add the proper locking to vma_adjust. Signed-off-by: Rik van Riel <riel@redhat.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
cba48b98f2 |
mm: change direct call of spin_lock(anon_vma->lock) to inline function
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline function doing exactly the same. This makes it easier to do the substitution to the root anon_vma lock in a following patch. We will deal with the handful of special locks (nested, dec_and_lock, etc) separately. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
ac39cf8cb8 |
memcg: fix mis-accounting of file mapped racy with migration
FILE_MAPPED per memcg of migrated file cache is not properly updated,
because our hook in page_add_file_rmap() can't know to which memcg
FILE_MAPPED should be counted.
Basically, this patch is for fixing the bug but includes some big changes
to fix up other messes.
Now, at migrating mapped file, events happen in following sequence.
1. allocate a new page.
2. get memcg of an old page.
3. charge ageinst a new page before migration. But at this point,
no changes to new page's page_cgroup, no commit for the charge.
(IOW, PCG_USED bit is not set.)
4. page migration replaces radix-tree, old-page and new-page.
5. page migration remaps the new page if the old page was mapped.
6. Here, the new page is unlocked.
7. memcg commits the charge for newpage, Mark the new page's page_cgroup
as PCG_USED.
Because "commit" happens after page-remap, we can count FILE_MAPPED
at "5", because we should avoid to trust page_cgroup->mem_cgroup.
if PCG_USED bit is unset.
(Note: memcg's LRU removal code does that but LRU-isolation logic is used
for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is
not on LRU or page_cgroup->mem_cgroup is NULL.)
We can lose file_mapped accounting information at 5 because FILE_MAPPED
is updated only when mapcount changes 0->1. So we should catch it.
BTW, historically, above implemntation comes from migration-failure
of anonymous page. Because we charge both of old page and new page
with mapcount=0, we can't catch
- the page is really freed before remap.
- migration fails but it's freed before remap
or .....corner cases.
New migration sequence with memcg is:
1. allocate a new page.
2. mark PageCgroupMigration to the old page.
3. charge against a new page onto the old page's memcg. (here, new page's pc
is marked as PageCgroupUsed.)
4. page migration replaces radix-tree, page table, etc...
5. At remapping, new page's page_cgroup is now makrked as "USED"
We can catch 0->1 event and FILE_MAPPED will be properly updated.
And we can catch SWAPOUT event after unlock this and freeing this
page by unmap() can be caught.
7. Clear PageCgroupMigration of the old page.
So, FILE_MAPPED will be correctly updated.
Then, for what MIGRATION flag is ?
Without it, at migration failure, we may have to charge old page again
because it may be fully unmapped. "charge" means that we have to dive into
memory reclaim or something complated. So, it's better to avoid
charge it again. Before this patch, __commit_charge() was working for
both of the old/new page and fixed up all. But this technique has some
racy condtion around FILE_MAPPED and SWAPOUT etc...
Now, the kernel use MIGRATION flag and don't uncharge old page until
the end of migration.
I hope this change will make memcg's page migration much simpler. This
page migration has caused several troubles. Worth to add a flag for
simplification.
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
748446bb6b |
mm: compaction: memory compaction core
This patch is the core of a mechanism which compacts memory in a zone by relocating movable pages towards the end of the zone. A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages. [aarcange@redhat.com: Fix unsafe optimisation] [mel@csn.ul.ie: do not schedule work on other CPUs for compaction] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3fe2011ff5 |
mm: migration: allow the migration of PageSwapCache pages
PageAnon pages that are unmapped may or may not have an anon_vma so are not currently migrated. However, a swap cache page can be migrated and fits this description. This patch identifies page swap caches and allows them to be migrated but ensures that no attempt to made to remap the pages would would potentially try to access an already freed anon_vma. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
67b9509b2c |
mm: migration: do not try to migrate unmapped anonymous pages
rmap_walk_anon() was triggering errors in memory compaction that look like
use-after-free errors. The problem is that between the page being
isolated from the LRU and rcu_read_lock() being taken, the mapcount of the
page dropped to 0 and the anon_vma gets freed. This can happen during
memory compaction if pages being migrated belong to a process that exits
before migration completes. Hence, the use-after-free race looks like
1. Page isolated for migration
2. Process exits
3. page_mapcount(page) drops to zero so anon_vma was no longer reliable
4. unmap_and_move() takes the rcu_lock but the anon_vma is already garbage
4. call try_to_unmap, looks up tha anon_vma and "locks" it but the lock
is garbage.
This patch checks the mapcount after the rcu lock is taken. If the
mapcount is zero, the anon_vma is assumed to be freed and no further
action is taken.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
7f60c214fd |
mm: migration: share the anon_vma ref counts between KSM and page migration
For clarity of review, KSM and page migration have separate refcounts on the anon_vma. While clear, this is a waste of memory. This patch gets KSM and page migration to share their toys in a spirit of harmony. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
3f6c82728f |
mm: migration: take a reference to the anon_vma before migrating
This patchset is a memory compaction mechanism that reduces external
fragmentation memory by moving GFP_MOVABLE pages to a fewer number of
pageblocks. The term "compaction" was chosen as there are is a number of
mechanisms that are not mutually exclusive that can be used to defragment
memory. For example, lumpy reclaim is a form of defragmentation as was
slub "defragmentation" (really a form of targeted reclaim). Hence, this
is called "compaction" to distinguish it from other forms of
defragmentation.
In this implementation, a full compaction run involves two scanners
operating within a zone - a migration and a free scanner. The migration
scanner starts at the beginning of a zone and finds all movable pages
within one pageblock_nr_pages-sized area and isolates them on a
migratepages list. The free scanner begins at the end of the zone and
searches on a per-area basis for enough free pages to migrate all the
pages on the migratepages list. As each area is respectively migrated or
exhausted of free pages, the scanners are advanced one area. A compaction
run completes within a zone when the two scanners meet.
This method is a bit primitive but is easy to understand and greater
sophistication would require maintenance of counters on a per-pageblock
basis. This would have a big impact on allocator fast-paths to improve
compaction which is a poor trade-off.
It also does not try relocate virtually contiguous pages to be physically
contiguous. However, assuming transparent hugepages were in use, a
hypothetical khugepaged might reuse compaction code to isolate free pages,
split them and relocate userspace pages for promotion.
Memory compaction can be triggered in one of three ways. It may be
triggered explicitly by writing any value to /proc/sys/vm/compact_memory
and compacting all of memory. It can be triggered on a per-node basis by
writing any value to /sys/devices/system/node/nodeN/compact where N is the
node ID to be compacted. When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation
instead of entering direct reclaim. Explicit compaction does not finish
until the two scanners meet and direct compaction ends if a suitable page
becomes available that would meet watermarks.
The series is in 14 patches. The first three are not "core" to the series
but are important pre-requisites.
Patch 1 reference counts anon_vma for rmap_walk_anon(). Without this
patch, it's possible to use anon_vma after free if the caller is
not holding a VMA or mmap_sem for the pages in question. While
there should be no existing user that causes this problem,
it's a requirement for memory compaction to be stable. The patch
is at the start of the series for bisection reasons.
Patch 2 merges the KSM and migrate counts. It could be merged with patch 1
but would be slightly harder to review.
Patch 3 skips over unmapped anon pages during migration as there are no
guarantees about the anon_vma existing. There is a window between
when a page was isolated and migration started during which anon_vma
could disappear.
Patch 4 notes that PageSwapCache pages can still be migrated even if they
are unmapped.
Patch 5 allows CONFIG_MIGRATION to be set without CONFIG_NUMA
Patch 6 exports a "unusable free space index" via debugfs. It's
a measure of external fragmentation that takes the size of the
allocation request into account. It can also be calculated from
userspace so can be dropped if requested
Patch 7 exports a "fragmentation index" which only has meaning when an
allocation request fails. It determines if an allocation failure
would be due to a lack of memory or external fragmentation.
Patch 8 moves the definition for LRU isolation modes for use by compaction
Patch 9 is the compaction mechanism although it's unreachable at this point
Patch 10 adds a means of compacting all of memory with a proc trgger
Patch 11 adds a means of compacting a specific node with a sysfs trigger
Patch 12 adds "direct compaction" before "direct reclaim" if it is
determined there is a good chance of success.
Patch 13 adds a sysctl that allows tuning of the threshold at which the
kernel will compact or direct reclaim
Patch 14 temporarily disables compaction if an allocation failure occurs
after compaction.
Testing of compaction was in three stages. For the test, debugging,
preempt, the sleep watchdog and lockdep were all enabled but nothing nasty
popped out. min_free_kbytes was tuned as recommended by hugeadm to help
fragmentation avoidance and high-order allocations. It was tested on X86,
X86-64 and PPC64.
Ths first test represents one of the easiest cases that can be faced for
lumpy reclaim or memory compaction.
1. Machine freshly booted and configured for hugepage usage with
a) hugeadm --create-global-mounts
b) hugeadm --pool-pages-max DEFAULT:8G
c) hugeadm --set-recommended-min_free_kbytes
d) hugeadm --set-recommended-shmmax
The min_free_kbytes here is important. Anti-fragmentation works best
when pageblocks don't mix. hugeadm knows how to calculate a value that
will significantly reduce the worst of external-fragmentation-related
events as reported by the mm_page_alloc_extfrag tracepoint.
2. Load up memory
a) Start updatedb
b) Create in parallel a X files of pagesize*128 in size. Wait
until files are created. By parallel, I mean that 4096 instances
of dd were launched, one after the other using &. The crude
objective being to mix filesystem metadata allocations with
the buffer cache.
c) Delete every second file so that pageblocks are likely to
have holes
d) kill updatedb if it's still running
At this point, the system is quiet, memory is full but it's full with
clean filesystem metadata and clean buffer cache that is unmapped.
This is readily migrated or discarded so you'd expect lumpy reclaim
to have no significant advantage over compaction but this is at
the POC stage.
3. In increments, attempt to allocate 5% of memory as hugepages.
Measure how long it took, how successful it was, how many
direct reclaims took place and how how many compactions. Note
the compaction figures might not fully add up as compactions
can take place for orders other than the hugepage size
X86 vanilla compaction
Final page count 913 916 (attempted 1002)
pages reclaimed 68296 9791
X86-64 vanilla compaction
Final page count: 901 902 (attempted 1002)
Total pages reclaimed: 112599 53234
PPC64 vanilla compaction
Final page count: 93 94 (attempted 110)
Total pages reclaimed: 103216 61838
There was not a dramatic improvement in success rates but it wouldn't be
expected in this case either. What was important is that fewer pages were
reclaimed in all cases reducing the amount of IO required to satisfy a
huge page allocation.
The second tests were all performance related - kernbench, netperf, iozone
and sysbench. None showed anything too remarkable.
The last test was a high-order allocation stress test. Many kernel
compiles are started to fill memory with a pressured mix of unmovable and
movable allocations. During this, an attempt is made to allocate 90% of
memory as huge pages - one at a time with small delays between attempts to
avoid flooding the IO queue.
vanilla compaction
Percentage of request allocated X86 98 99
Percentage of request allocated X86-64 95 98
Percentage of request allocated PPC64 55 70
This patch:
rmap_walk_anon() does not use page_lock_anon_vma() for looking up and
locking an anon_vma and it does not appear to have sufficient locking to
ensure the anon_vma does not disappear from under it.
This patch copies an approach used by KSM to take a reference on the
anon_vma while pages are being migrated. This should prevent rmap_walk()
running into nasty surprises later because anon_vma has been freed.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
e13861d822 |
mm: remove return value of putback_lru_pages()
putback_lru_page() never can fail. So it doesn't matter count of "the number of pages put back". In addition, users of this functions don't use return value. Let's remove unnecessary code. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
5a0e3ad6af |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> |
||
|
|
85f1fb72fa |
mm/migrate.c: kill anon local variable from migrate_page_copy
commit
|
||
|
|
ac0f6f927d |
Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits) ARM: Eliminate decompressor -Dstatic= PIC hack ARM: 5958/1: ARM: U300: fix inverted clk round rate ARM: 5956/1: misplaced parentheses ARM: 5955/1: ep93xx: move timer defines into core.c and document ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c ARM: 5953/1: ep93xx: fix broken build of clock.c ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig ARM: 5949/1: NUC900 add gpio virtual memory map ARM: 5948/1: Enable timer0 to time4 clock support for nuc910 ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk ARM: make_coherent(): fix problems with highpte, part 2 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself ARM: 5945/1: ep93xx: include correct irq.h in core.c ARM: 5933/1: amba-pl011: support hardware flow control ARM: 5930/1: Add PKMAP area description to memory.txt. ARM: 5929/1: Add checks to detect overlap of memory regions. ARM: 5928/1: Change type of VMALLOC_END to unsigned long. ARM: 5927/1: Make delimiters of DMA area globally visibly. ARM: 5926/1: Add "Virtual kernel memory..." printout. ARM: 5920/1: OMAP4: Enable L2 Cache ... Fix up trivial conflict in arch/arm/mach-mx25/clock.c |
||
|
|
87b8d1adef |
mm: Make copy_from_user() in migrate.c statically predictable
x86-32 has had a static test for copy_on_user() overflow for a while.
This test currently fails in mm/migrate.c resulting in an
allyesconfig/allmodconfig build failure on x86-32:
In function ‘copy_from_user’,
inlined from ‘do_pages_stat’ at
/home/hpa/kernel/git/mm/migrate.c:1012:
/home/hpa/kernel/git/arch/x86/include/asm/uaccess_32.h:212: error:
call to ‘copy_from_user_overflow’ declared
Make the logic more explicit and therefore easier for gcc to
understand.
v2: rewrite the loop entirely using a more normal structure for a
chunked-data loop (Linus Torvalds)
Reported-by: Len Brown <lenb@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@linux.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|