Commit Graph

215 Commits

Author SHA1 Message Date
Eric B Munson 73ae31e598 hugetlb: fix handling of parse errors in sysfs
When parsing changes to the huge page pool sizes made from userspace via
the sysfs interface, bogus input values are being covered up by
nr_hugepages_store_common and nr_overcommit_hugepages_store returning 0
when strict_strtoul returns an error.  This can cause an infinite loop in
the nr_hugepages_store code.  This patch changes the return value for
these functions to -EINVAL when strict_strtoul returns an error.

Signed-off-by: Eric B Munson <emunson@mgebm.net>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:49 -08:00
Eric B Munson adbe8726dc hugetlb: do not allow pagesize >= MAX_ORDER pool adjustment
Huge pages with order >= MAX_ORDER must be allocated at boot via the
kernel command line, they cannot be allocated or freed once the kernel is
up and running.  Currently we allow values to be written to the sysfs and
sysctl files controling pool size for these huge page sizes.  This patch
makes the store functions for nr_hugepages and nr_overcommit_hugepages
return -EINVAL when the pool for a page size >= MAX_ORDER is changed.

[akpm@linux-foundation.org: avoid multiple return paths in nr_hugepages_store_common()]
[caiqian@redhat.com: add checking in hugetlb_overcommit_handler()]
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Reported-by: CAI Qian <caiqian@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:49 -08:00
Michal Hocko 08d4a24659 hugetlb: check the return value of string conversion in sysctl handler
proc_doulongvec_minmax may fail if the given buffer doesn't represent a
valid number.  If we provide something invalid we will initialize the
resulting value (nr_overcommit_huge_pages in this case) to a random value
from the stack.

The issue was introduced by a3d0c6aa when the default handler has been
replaced by the helper function where we do not check the return value.

Reproducer:
echo "" > /proc/sys/vm/nr_overcommit_hugepages

[akpm@linux-foundation.org: correctly propagate proc_doulongvec_minmax return code]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: CAI Qian <caiqian@redhat.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:49 -08:00
Jesper Juhl 32d6feadf4 mm/hugetlb.c: fix error-path memory leak in nr_hugepages_store_common()
The NODEMASK_ALLOC macro may dynamically allocate memory for its second
argument ('nodes_allowed' in this context).

In nr_hugepages_store_common() we may abort early if strict_strtoul()
fails, but in that case we do not free the memory already allocated to
'nodes_allowed', causing a memory leak.

This patch closes the leak by freeing the memory in the error path.

[akpm@linux-foundation.org: use NODEMASK_FREE, per Minchan Kim]
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:48 -08:00
Andrea Arcangeli 47ad8475c0 thp: clear_copy_huge_page
Move the copy/clear_huge_page functions to common code to share between
hugetlb.c and huge_memory.c.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:41 -08:00
Dean Nelson 1f64d69c7a mm/hugetlb.c: avoid double unlock_page() in hugetlb_fault()
Have hugetlb_fault() call unlock_page(page) only if it had previously
called lock_page(page).

Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite,
resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in
unlock_page() having been called by hugetlb_fault() when page ==
pagecache_page.  This patch remedied the problem.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-12-02 14:51:14 -08:00
Dean Nelson 44e2aa937e mm/hugetlb.c: add missing spin_lock() to hugetlb_cow()
Add missing spin_lock() of the page_table_lock before an error return in
hugetlb_cow(). Callers of hugtelb_cow() expect it to be held upon return.

Signed-off-by: Dean Nelson <dnelson@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:11 -07:00
Andi Kleen aa50d3a7aa Encode huge page size for VM_FAULT_HWPOISON errors
This fixes a problem introduced with the hugetlb hwpoison handling

The user space SIGBUS signalling wants to know the size of the hugepage
that caused a HWPOISON fault.

Unfortunately the architecture page fault handlers do not have easy
access to the struct page.

Pass the information out in the fault error code instead.

I added a separate VM_FAULT_HWPOISON_LARGE bit for this case and encode
the hpage index in some free upper bits of the fault code. The small
page hwpoison keeps stays with the VM_FAULT_HWPOISON name to minimize
changes.

Also add code to hugetlb.h to convert that index into a page shift.

Will be used in a further patch.

Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: fengguang.wu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:46 +02:00
Andi Kleen d5bd910696 hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning
Fixes warning reported by Stephen Rothwell

mm/hugetlb.c:2950: warning: 'is_hugepage_on_freelist' defined but not used

for the !CONFIG_MEMORY_FAILURE case.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:46 +02:00
Naoya Horiguchi 8c6c2ecb44 HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED
Currently error recovery for free hugepage works only for MF_COUNT_INCREASED.
This patch enables !MF_COUNT_INCREASED case.

Free hugepages can be handled directly by alloc_huge_page() and
dequeue_hwpoisoned_huge_page(), and both of them are protected
by hugetlb_lock, so there is no race between them.

Note that this patch defines the refcount of HWPoisoned hugepage
dequeued from freelist is 1, deviated from present 0, thereby we
can avoid race between unpoison and memory failure on free hugepage.
This is reasonable because unlikely to free buddy pages, free hugepage
is governed by hugetlbfs even after error handling finishes.
And it also makes unpoison code added in the later patch cleaner.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi a9869b837c hugetlb: move refcounting in hugepage allocation inside hugetlb_lock
Currently alloc_huge_page() raises page refcount outside hugetlb_lock.
but it causes race when dequeue_hwpoison_huge_page() runs concurrently
with alloc_huge_page().
To avoid it, this patch moves set_page_refcounted() in hugetlb_lock.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 6de2b1aab9 HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page()
This check is necessary to avoid race between dequeue and allocation,
which can cause a free hugepage to be dequeued twice and get kernel unstable.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 290408d4a2 hugetlb: hugepage migration core
This patch extends page migration code to support hugepage migration.
One of the potential users of this feature is soft offlining which
is triggered by memory corrected errors (added by the next patch.)

Todo:
- there are other users of page migration such as memory policy,
  memory hotplug and memocy compaction.
  They are not ready for hugepage support for now.

ChangeLog since v4:
- define migrate_huge_pages()
- remove changes on isolation/putback_lru_page()

ChangeLog since v2:
- refactor isolate/putback_lru_page() to handle hugepage
- add comment about race on unmap_and_move_huge_page()

ChangeLog since v1:
- divide migration code path for hugepage
- define routine checking migration swap entry for hugetlb
- replace "goto" with "if/else" in remove_migration_pte()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:45 +02:00
Naoya Horiguchi 0ebabb416f hugetlb: redefine hugepage copy functions
This patch modifies hugepage copy functions to have only destination
and source hugepages as arguments for later use.
The old ones are renamed from copy_{gigantic,huge}_page() to
copy_user_{gigantic,huge}_page().
This naming convention is consistent with that between copy_highpage()
and copy_user_highpage().

ChangeLog since v4:
- add blank line between local declaration and code
- remove unnecessary might_sleep()

ChangeLog since v2:
- change copy_huge_page() from macro to inline dummy function
  to avoid compile warning when !CONFIG_HUGETLB_PAGE.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Naoya Horiguchi bf50bab2b3 hugetlb: add allocate function for hugepage migration
We can't use existing hugepage allocation functions to allocate hugepage
for page migration, because page migration can happen asynchronously with
the running processes and page migration users should call the allocation
function with physical addresses (not virtual addresses) as arguments.

ChangeLog since v3:
- unify alloc_buddy_huge_page() and alloc_buddy_huge_page_node()

ChangeLog since v2:
- remove unnecessary get/put_mems_allowed() (thanks to David Rientjes)

ChangeLog since v1:
- add comment on top of alloc_huge_page_no_vma()

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Naoya Horiguchi 998b4382c1 hugetlb: fix metadata corruption in hugetlb_fault()
Since the PageHWPoison() check is for avoiding hwpoisoned page remained
in pagecache mapping to the process, it should be done in "found in pagecache"
branch, not in the common path.
Otherwise, metadata corruption occurs if memory failure happens between
alloc_huge_page() and lock_page() because page fault fails with metadata
changes remained (such as refcount, mapcount, etc.)

This patch moves the check to "found in pagecache" branch and fix the problem.

ChangeLog since v2:
- remove retry check in "new allocation" path.
- make description more detailed
- change patch name from "HWPOISON, hugetlb: move PG_HWPoison bit check"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-10-08 09:32:44 +02:00
Naoya Horiguchi 56c9cfb13c hugetlb, rmap: fix confusing page locking in hugetlb_cow()
The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
looks confusing and is buggy.  Originally this trylock_page() was
intended to make sure that old_page is locked even when old_page !=
pagecache_page, because then only pagecache_page is locked.

This patch fixes it by moving page locking into hugetlb_fault().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-23 17:29:18 -07:00
Naoya Horiguchi cd67f0d2a9 hugetlb, rmap: use hugepage_add_new_anon_rmap() in hugetlb_cow()
Obviously, setting anon_vma for COWed hugepage should be done
by hugepage_add_new_anon_rmap() to scan vmas faster.
This patch fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-23 17:29:18 -07:00
Linus Torvalds 1021a64534 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6:
  hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
  hwpoison: rename CONFIG
  HWPOISON, hugetlb: support hwpoison injection for hugepage
  HWPOISON, hugetlb: detect hwpoison in hugetlb code
  HWPOISON, hugetlb: isolate corrupted hugepage
  HWPOISON, hugetlb: maintain mce_bad_pages in handling hugepage error
  HWPOISON, hugetlb: set/clear PG_hwpoison bits on hugepage
  HWPOISON, hugetlb: enable error handling path for hugepage
  hugetlb, rmap: add reverse mapping for hugepage
  hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h

Fix up trivial conflicts in mm/memory-failure.c
2010-08-12 10:15:10 -07:00
Naoya Horiguchi 28957a5467 hugetlb: add missing unlock in avoidcopy path in hugetlb_cow()
This patch fixes possible deadlock in hugepage lock_page()
by adding missing unlock_page().

libhugetlbfs test will hit this bug when the next patch in this
patchset ("hugetlb, HWPOISON: move PG_HWPoison bit check") is applied.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-11 09:23:48 +02:00
Naoya Horiguchi 43131e141a HWPOISON, hugetlb: support hwpoison injection for hugepage
This patch enables hwpoison injection through debug/hwpoison interfaces,
with which we can test memory error handling for free or reserved
hugepages (which cannot be tested by madvise() injector).

[AK: Export PageHuge too for the injection module]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-11 09:23:11 +02:00
Naoya Horiguchi fd6a03edd2 HWPOISON, hugetlb: detect hwpoison in hugetlb code
This patch enables to block access to hwpoisoned hugepage and
also enables to block unmapping for it.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-11 09:23:01 +02:00
Naoya Horiguchi 93f70f900d HWPOISON, hugetlb: isolate corrupted hugepage
If error hugepage is not in-use, we can fully recovery from error
by dequeuing it from freelist, so return RECOVERY.
Otherwise whether or not we can recovery depends on user processes,
so return DELAYED.

Dependency:
  "HWPOISON, hugetlb: enable error handling path for hugepage"

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-11 09:22:46 +02:00
Naoya Horiguchi 0fe6e20b9c hugetlb, rmap: add reverse mapping for hugepage
This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.

While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.

Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
  we can collect processes using a hugepage through pagecache,
  but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
  we can neither collect processes nor unmap the hugepage.
This patch solves these problems.

This patch include the bug fix given by commit 23be7468e8, so reverts it.

Dependency:
  "hugetlb: move definition of is_vm_hugetlb_page() to hugepage_inline.h"

ChangeLog since May 24.
- create hugetlb_inline.h and move is_vm_hugetlb_index() in it.
- move functions setting up anon_vma for hugepage into mm/rmap.c.

ChangeLog since May 13.
- rebased to 2.6.34
- fix logic error (in case that private mapping and shared mapping coexist)
- move is_vm_hugetlb_page() into include/linux/mm.h to use this function
  from linear_page_index()
- define and use linear_hugepage_index() instead of compound_order()
- use page_move_anon_rmap() in hugetlb_cow()
- copy exclusive switch of __set_page_anon_rmap() into hugepage counterpart.
- revert commit 24be7468 completely

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2010-08-11 09:21:15 +02:00
Doug Doan 3edd4fc953 hugetlb: call mmu notifiers on hugepage cow
When a copy-on-write occurs, we take one of two paths in handle_mm_fault:
through handle_pte_fault for normal pages, or through hugetlb_fault for
huge pages.

In the normal page case, we eventually get to do_wp_page and call mmu
notifiers via ptep_clear_flush_notify.  There is no callout to the mmmu
notifiers in the huge page case.  This patch fixes that.

Signed-off-by: Doug Doan <dougd@cray.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00