Commit Graph

33923 Commits

Author SHA1 Message Date
Alexander Clouter
1907da78bf hwrng: timeriomem - update to support more than one device
timeriomem_rng only supports a single device instance.  This patch
enables multiple timeriomem_rng devices to coexist as well as adds
some additional error checking.

Signed-off-by: Alexander Clouter <alex@digriz.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-04-25 21:01:45 +08:00
Nicolas Royer
6150f3bc0b ARM: AT91SAM9G45: same platform data structure for all crypto peripherals
Only AES use DMA in AT91SAM9G45 (TDES and SHA use PDC).

However latest Atmel TDES and SHA IP releases use DMA instead of PDC.
  --> Atmel TDES and SHA drivers need DMA platform data for those IP releases.

Goal of this patch is to use the same platform data structure for all Atmel
crypto peripherals. This structure contains information about DMA interface.

Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2013-03-10 16:46:41 +08:00
Linus Torvalds
f6d43b93bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:
 "From Mimi:

    Both of these patches are bug fixes for patches, which were
    upstreamed in this open window.  The first patch addresses a merge
    issue.  The second patch addresses a CONFIG_BLOCK dependency."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  block: fix part_pack_uuid() build error
  ima: "remove enforce checking duplication" merge fix
2013-02-25 15:45:29 -08:00
Linus Torvalds
9043a2650c Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
 "The sweeping change is to make add_taint() explicitly indicate whether
  to disable lockdep, but it's a mechanical change."

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  MODSIGN: Add option to not sign modules during modules_install
  MODSIGN: Add -s <signature> option to sign-file
  MODSIGN: Specify the hash algorithm on sign-file command line
  MODSIGN: Simplify Makefile with a Kconfig helper
  module: clean up load_module a little more.
  modpost: Ignore ARC specific non-alloc sections
  module: constify within_module_*
  taint: add explicit flag to show whether lock dep is still OK.
  module: printk message when module signature fail taints kernel.
2013-02-25 15:41:43 -08:00
Mimi Zohar
446d64e3e1 block: fix part_pack_uuid() build error
Commit "85865c1 ima: add policy support for file system uuid"
introduced a CONFIG_BLOCK dependency.  This patch defines a
wrapper called blk_part_pack_uuid(), which returns -EINVAL,
when CONFIG_BLOCK is not defined.

security/integrity/ima/ima_policy.c:538:4: error: implicit declaration
of function 'part_pack_uuid' [-Werror=implicit-function-declaration]

Changelog v2:
- Reference commit number in patch description
Changelog v1:
- rename ima_part_pack_uuid() to blk_part_pack_uuid()
- resolve scripts/checkpatch.pl warnings
Changelog v0:
- fix UUID scripts/Lindent msgs

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-02-26 03:10:52 +11:00
Linus Torvalds
ab7826595e Merge tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFS updates from Samuel Ortiz:
 "This is the MFD pull request for the 3.9 merge window.

  No new drivers this time, but a bunch of fairly big cleanups:

   - Roger Quadros worked on a OMAP USBHS and TLL platform data
     consolidation, OMAP5 support and clock management code cleanup.

   - The first step of a major sync for the ab8500 driver from Lee
     Jones.  In particular, the debugfs and the sysct interfaces got
     extended and improved.

   - Peter Ujfalusi sent a nice patchset for cleaning and fixing the
     twl-core driver, with a much needed module id lookup code
     improvement.

   - The regular wm5102 and arizona cleanups and fixes from Mark Brown.

   - Laxman Dewangan extended the palmas APIs in order to implement the
     palmas GPIO and rt drivers.

   - Laxman also added DT support for the tps65090 driver.

   - The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra
     and Darren Hart.

   - Linus Walleij patchset for the ab8500 driver allowed ab8500 and
     ab9540 based devices to switch to the new abx500 pin-ctrl driver.

   - The max8925 now has device tree and irqdomain support thanks to
     Qing Xu.

   - The recently added rtsx driver got a few cleanups and fixes for a
     better card detection code path and now also supports the RTS5227
     chipset, thanks to Wei Wang and Roger Tseng."

* tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (109 commits)
  mfd: lpc_ich: Use devres API to allocate private data
  mfd: lpc_ich: Add Device IDs for Intel Wellsburg PCH
  mfd: lpc_sch: Accomodate partial population of the MFD devices
  mfd: da9052-i2c: Staticize da9052_i2c_fix()
  mfd: syscon: Fix sparse warning
  mfd: twl-core: Fix kernel panic on boot
  mfd: rtsx: Fix issue that booting OS with SD card inserted
  mfd: ab8500: Fix compile error
  mfd: Add missing GENERIC_HARDIRQS dependecies
  Documentation: Add docs for max8925 dt
  mfd: max8925: Add dts
  mfd: max8925: Support dt for backlight
  mfd: max8925: Fix onkey driver irq base
  mfd: max8925: Fix mfd device register failure
  mfd: max8925: Add irqdomain for dt
  mfd: vexpress: Allow vexpress-sysreg to self-initialise
  mfd: rtsx: Support RTS5227
  mfd: rtsx: Implement driving adjustment to device-dependent callbacks
  mfd: vexpress: Add pseudo-GPIO based LEDs
  mfd: ab8500: Rename ab8500 to abx500 for hwmon driver
  ...
2013-02-24 20:00:58 -08:00
Linus Torvalds
d9978ec568 Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
Pull libata updates from Jeff Garzik:

1) apply, and then revert, the sysfs export of ATA host controller
   number.  Discussion was continuing after patch application, trying to
   figure out how to best mesh exported data with the installers,
   boot-time agents and other parties that want this info.

2) Merge Zero-Power Optical Device Driver (ZPODD) support, bringing the
   wonderfulness of sane power management to your CD/DVD device.

   Includes one SCSI-subsystem patch (with appropriate ACKs), adding
   runtime PM support to 'sr' driver.  That is the ZPODD interaction
   bits.

   Patchset went through some 13 revisions before it got here; kudos to
   Intel for persistence.

3) pata_samsung_cf: use devm_clk_get()

4) more ata_piix, ahci PCI IDs

5) Add SATA driver for R-Car SoC

6) Convert libata to use devm_ioremap_resource (Note: I think Greg sent
   this to you, also)

7) Set proper Sense Key (SK) in the SCSI simulator when ATA passthrough
   indicates check condition.  Google and specification hawks everywhere
   shall rejoice.

* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (22 commits)
  [libata] fix smatch warning for zpodd_wake_dev
  [libata] Set proper SK when CK_COND is set.
  [libata] Convert to devm_ioremap_resource()
  libata: add R-Car SATA driver
  ahci: Add Device IDs for Intel Wellsburg PCH
  ata_piix: Add Device IDs for Intel Wellsburg PCH
  [SCSI] remove can_power_off flag from scsi_device
  [libata] scsi: no poll when ODD is powered off
  [SCSI] sr: support runtime pm
  ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs
  ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs
  [libata] PM code cleanup for ata port
  [libata] pm: differentiate system and runtime pm for ata port
  Revert "libata: export host controller number thru /sys"
  libata: do not suspend port if normal ODD is attached
  libata: expose pm qos flags for ata device
  libata: handle power transition of ODD
  libata: check zero power ready status for ZPODD
  libata: move acpi notification code to zpodd
  libata: identify and init ZPODD devices
  ...
2013-02-24 17:32:15 -08:00
Linus Torvalds
89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Linus Torvalds
9e2d59ad58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "This is the first pile; another one will come a bit later and will
  contain SYSCALL_DEFINE-related patches.

   - a bunch of signal-related syscalls (both native and compat)
     unified.

   - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
     (fixing several potential problems with missing argument
     validation, while we are at it)

   - a lot of now-pointless wrappers killed

   - a couple of architectures (cris and hexagon) forgot to save
     altstack settings into sigframe, even though they used the
     (uninitialized) values in sigreturn; fixed.

   - microblaze fixes for delivery of multiple signals arriving at once

   - saner set of helpers for signal delivery introduced, several
     architectures switched to using those."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
  x86: convert to ksignal
  sparc: convert to ksignal
  arm: switch to struct ksignal * passing
  alpha: pass k_sigaction and siginfo_t using ksignal pointer
  burying unused conditionals
  make do_sigaltstack() static
  arm64: switch to generic old sigaction() (compat-only)
  arm64: switch to generic compat rt_sigaction()
  arm64: switch compat to generic old sigsuspend
  arm64: switch to generic compat rt_sigqueueinfo()
  arm64: switch to generic compat rt_sigpending()
  arm64: switch to generic compat rt_sigprocmask()
  arm64: switch to generic sigaltstack
  sparc: switch to generic old sigsuspend
  sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
  sparc: kill sign-extending wrappers for native syscalls
  kill sparc32_open()
  sparc: switch to use of generic old sigaction
  sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
  mips: switch to generic sys_fork() and sys_clone()
  ...
2013-02-23 18:50:11 -08:00
Linus Torvalds
5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Hugh Dickins
5117b3b835 mm,ksm: FOLL_MIGRATION do migration_entry_wait
In "ksm: remove old stable nodes more thoroughly" I said that I'd never
seen its WARN_ON_ONCE(page_mapped(page)).  True at the time of writing,
but it soon appeared once I tried fuller tests on the whole series.

It turned out to be due to the KSM page migration itself: unmerge_and_
remove_all_rmap_items() failed to locate and replace all the KSM pages,
because of that hiatus in page migration when old pte has been replaced
by migration entry, but not yet by new pte.  follow_page() finds no page
at that instant, but a KSM page reappears shortly after, without a
fault.

Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
for KSM's break_cow().  I'd have preferred to avoid another flag, and do
it every time, in case someone else makes the same easy mistake; but did
not find another transgressor (the common get_user_pages() is of course
safe), and cannot be sure that every follow_page() caller is prepared to
sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
already sleep there, since anon_vma locking was changed to mutex, but
maybe that's somehow excluded.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:23 -08:00
Michel Lespinasse
240aadeedc mm: accelerate mm_populate() treatment of THP pages
This change adds a follow_page_mask function which is equivalent to
follow_page, but with an extra page_mask argument.

follow_page_mask sets *page_mask to HPAGE_PMD_NR - 1 when it encounters
a THP page, and to 0 in other cases.

__get_user_pages() makes use of this in order to accelerate populating
THP ranges - that is, when both the pages and vmas arrays are NULL, we
don't need to iterate HPAGE_PMD_NR times to cover a single THP page (and
we also avoid taking mm->page_table_lock that many times).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:23 -08:00
Michel Lespinasse
28a35716d3 mm: use long type for page counts in mm_populate() and get_user_pages()
Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main(void) {
  void *p = mmap(NULL, 0x100000000000, PROT_READ,
                 MAP_PRIVATE | MAP_ANON, -1, 0);
  printf("p: %p\n", p);
  mlockall(MCL_CURRENT);
  printf("done\n");
  return 0;
}

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Zhang Yanfei
b21e0b90cc vmscan: change type of vm_total_pages to unsigned long
This variable is calculated from nr_free_pagecache_pages so
change its type to unsigned long.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Zhang Yanfei
ebec3862fd mm: fix return type for functions nr_free_*_pages
Currently, the amount of RAM that functions nr_free_*_pages return is
held in unsigned int.  But in machines with big memory (exceeding 16TB),
the amount may be incorrect because of overflow, so fix it.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:21 -08:00
Cody P Schafer
da3649e133 mmzone: add pgdat_{end_pfn,is_empty}() helpers & consolidate.
Add pgdat_end_pfn() and pgdat_is_empty() helpers which match the similar
zone_*() functions.

Change node_end_pfn() to be a wrapper of pgdat_end_pfn().

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:20 -08:00
Cody P Schafer
2a6e3ebee2 mm: add zone_is_empty() and zone_is_initialized()
Factoring out these 2 checks makes it more clear what we are actually
checking for.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:20 -08:00
Cody P Schafer
108bcc96ef mm: add & use zone_end_pfn() and zone_spans_pfn()
Add 2 helpers (zone_end_pfn() and zone_spans_pfn()) to reduce code
duplication.

This also switches to using them in compaction (where an additional
variable needed to be renamed), page_alloc, vmstat, memory_hotplug, and
kmemleak.

Note that in compaction.c I avoid calling zone_end_pfn() repeatedly
because I expect at some point the sycronization issues with start_pfn &
spanned_pages will need fixing, either by actually using the seqlock or
clever memory barrier usage.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:20 -08:00
Cody P Schafer
9127ab4ff9 mm: add SECTION_IN_PAGE_FLAGS
Instead of directly utilizing a combination of config options to determine
this, add a macro to specifically address it.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Cc: David Hansen <dave@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:20 -08:00
Johannes Weiner
e3790144c9 mm: refactor inactive_file_is_low() to use get_lru_size()
An inactive file list is considered low when its active counterpart is
bigger, regardless of whether it is a global zone LRU list or a memcg
zone LRU list.  The only difference is in how the LRU size is assessed.

get_lru_size() does the right thing for both global and memcg reclaim
situations.

Get rid of inactive_file_is_low_global() and
mem_cgroup_inactive_file_is_low() by using get_lru_size() and compare
the numbers in common code.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:20 -08:00
Hugh Dickins
9c620e2bc5 mm: remove offlining arg to migrate_pages
No functional change, but the only purpose of the offlining argument to
migrate_pages() etc, was to ensure that __unmap_and_move() could migrate a
KSM page for memory hotremove (which took ksm_thread_mutex) but not for
other callers.  Now all cases are safe, remove the arg.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:19 -08:00
Hugh Dickins
cbf86cfe04 ksm: remove old stable nodes more thoroughly
Switching merge_across_nodes after running KSM is liable to oops on stale
nodes still left over from the previous stable tree.  It's not something
that people will often want to do, but it would be lame to demand a reboot
when they're trying to determine which merge_across_nodes setting is best.

How can this happen?  We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.

Three causes:

1. The old stable tree (built according to the inverse
   merge_across_nodes) has not been fully torn down.  A stable node
   lingers until get_ksm_page() notices that the page it references no
   longer references it: but the page is not necessarily freed as soon as
   expected, particularly when swapcache.

   Fix this with a pass through the old stable tree, applying
   get_ksm_page() to each of the remaining nodes (most found stale and
   removed immediately), with forced removal of any left over.  Unless the
   page is still mapped: I've not seen that case, it shouldn't occur, but
   better to WARN_ON_ONCE and EBUSY than BUG.

2. __ksm_enter() has a nice little optimization, to insert the new mm
   just behind ksmd's cursor, so there's a full pass for it to stabilize
   (or be removed) before ksmd addresses it.  Nice when ksmd is running,
   but not so nice when we're trying to unmerge all mms: we were missing
   those mms forked and inserted behind the unmerge cursor.  Easily fixed
   by inserting at the end when KSM_RUN_UNMERGE.

3.  It is possible for a KSM page to be faulted back from swapcache
   into an mm, just after unmerge_and_remove_all_rmap_items() scanned past
   it.  Fix this by copying on fault when KSM_RUN_UNMERGE: but that is
   private to ksm.c, so dissolve the distinction between
   ksm_might_need_to_copy() and ksm_does_need_to_copy(), doing it all in
   the one call into ksm.c.

A long outstanding, unrelated bugfix sneaks in with that third fix:
ksm_does_need_to_copy() would copy from a !PageUptodate page (implying I/O
error when read in from swap) to a page which it then marks Uptodate.  Fix
this case by not copying, letting do_swap_page() discover the error.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:19 -08:00
Mel Gorman
22b751c3d0 mm: rename page struct field helpers
The function names page_xchg_last_nid(), page_last_nid() and
reset_page_last_nid() were judged to be inconsistent so rename them to a
struct_field_op style pattern.  As it looked jarring to have
reset_page_mapcount() and page_nid_reset_last() beside each other in
memmap_init_zone(), this patch also renames reset_page_mapcount() to
page_mapcount_reset().  There are others like init_page_count() but as
it is used throughout the arch code a rename would likely cause more
conflicts than it is worth.

[akpm@linux-foundation.org: fix zcache]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:18 -08:00
Mel Gorman
4468b8f1e2 mm: uninline page_xchg_last_nid()
Andrew Morton pointed out that page_xchg_last_nid() and
reset_page_last_nid() were "getting nuttily large" and asked that it be
investigated.

reset_page_last_nid() is on the page free path and it would be
unfortunate to make that path more expensive than it needs to be.  Due
to the internal use of page_xchg_last_nid() it is already too expensive
but fortunately, it should also be impossible for the page->flags to be
updated in parallel when we call reset_page_last_nid().  Instead of
unlining the function, it uses a simplier implementation that assumes no
parallel updates and should now be sufficiently short for inlining.

page_xchg_last_nid() is called in paths that are already quite expensive
(splitting huge page, fault handling, migration) and it is reasonable to
uninline.  There was not really a good place to place the function but
mm/mmzone.c was the closest fit IMO.

This patch saved 128 bytes of text in the vmlinux file for the kernel
configuration I used for testing automatic NUMA balancing.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:18 -08:00
Paul Szabo
75f7ad8e04 page-writeback.c: subtract min_free_kbytes from dirtyable memory
When calculating amount of dirtyable memory, min_free_kbytes should be
subtracted because it is not intended for dirty pages.

Addresses http://bugs.debian.org/695182

[akpm@linux-foundation.org: fix up min_free_kbytes extern declarations]
[akpm@linux-foundation.org: fix min() warning]
Signed-off-by: Paul Szabo <psz@maths.usyd.edu.au>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:17 -08:00