Commit Graph

39 Commits

Author SHA1 Message Date
Linus Torvalds
ed2d265d12 Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull <linux/bug.h> cleanup from Paul Gortmaker:
 "The changes shown here are to unify linux's BUG support under the one
  <linux/bug.h> file.  Due to historical reasons, we have some BUG code
  in bug.h and some in kernel.h -- i.e.  the support for BUILD_BUG in
  linux/kernel.h predates the addition of linux/bug.h, but old code in
  kernel.h wasn't moved to bug.h at that time.  As a band-aid, kernel.h
  was including <asm/bug.h> to pseudo link them.

  This has caused confusion[1] and general yuck/WTF[2] reactions.  Here
  is an example that violates the principle of least surprise:

      CC      lib/string.o
      lib/string.c: In function 'strlcat':
      lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
      make[2]: *** [lib/string.o] Error 1
      $
      $ grep linux/bug.h lib/string.c
      #include <linux/bug.h>
      $

  We've included <linux/bug.h> for the BUG infrastructure and yet we
  still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
  very confusing for someone who is new to kernel development.

  With the above in mind, the goals of this changeset are:

  1) find and fix any include/*.h files that were relying on the
     implicit presence of BUG code.
  2) find and fix any C files that were consuming kernel.h and hence
     relying on implicitly getting some/all BUG code.
  3) Move the BUG related code living in kernel.h to <linux/bug.h>
  4) remove the asm/bug.h from kernel.h to finally break the chain.

  During development, the order was more like 3-4, build-test, 1-2.  But
  to ensure that git history for bisect doesn't get needless build
  failures introduced, the commits have been reorderd to fix the problem
  areas in advance.

	[1]  https://lkml.org/lkml/2012/1/3/90
	[2]  https://lkml.org/lkml/2012/1/17/414"

Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.

* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  kernel.h: doesn't explicitly use bug.h, so don't include it.
  bug: consolidate BUILD_BUG_ON with other bug code
  BUG: headers with BUG/BUG_ON etc. need linux/bug.h
  bug.h: add include of it to various implicit C users
  lib: fix implicit users of kernel.h for TAINT_WARN
  spinlock: macroize assert_spin_locked to avoid bug.h dependency
  x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-24 10:08:39 -07:00
Andrea Arcangeli
1a5a9906d4 mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode
In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode.  In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.

It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds).  The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().

Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously.  This is
probably why it wasn't common to run into this.  For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.

Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).

The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value.  Even if the real pmd is changing under the
value we hold on the stack, we don't care.  If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).

All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd.  The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds).  I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).

		if (pmd_trans_huge(*pmd)) {
			if (next-addr != HPAGE_PMD_SIZE) {
				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
				split_huge_page_pmd(vma->vm_mm, pmd);
			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
				continue;
			/* fall through */
		}
		if (pmd_none_or_clear_bad(pmd))

Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.

The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.

====== start quote =======
      mapcount 0 page_mapcount 1
      kernel BUG at mm/huge_memory.c:1384!

    At some point prior to the panic, a "bad pmd ..." message similar to the
    following is logged on the console:

      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).

    The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
    the page's PMD table entry.

        143 void pmd_clear_bad(pmd_t *pmd)
        144 {
    ->  145         pmd_ERROR(*pmd);
        146         pmd_clear(pmd);
        147 }

    After the PMD table entry has been cleared, there is an inconsistency
    between the actual number of PMD table entries that are mapping the page
    and the page's map count (_mapcount field in struct page). When the page
    is subsequently reclaimed, __split_huge_page() detects this inconsistency.

       1381         if (mapcount != page_mapcount(page))
       1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
       1383                        mapcount, page_mapcount(page));
    -> 1384         BUG_ON(mapcount != page_mapcount(page));

    The root cause of the problem is a race of two threads in a multithreaded
    process. Thread B incurs a page fault on a virtual address that has never
    been accessed (PMD entry is zero) while Thread A is executing an madvise()
    system call on a virtual address within the same 2 MB (huge page) range.

               virtual address space
              .---------------------.
              |                     |
              |                     |
            .-|---------------------|
            | |                     |
            | |                     |<-- B(fault)
            | |                     |
      2 MB  | |/////////////////////|-.
      huge <  |/////////////////////|  > A(range)
      page  | |/////////////////////|-'
            | |                     |
            | |                     |
            '-|---------------------|
              |                     |
              |                     |
              '---------------------'

    - Thread A is executing an madvise(..., MADV_DONTNEED) system call
      on the virtual address range "A(range)" shown in the picture.

    sys_madvise
      // Acquire the semaphore in shared mode.
      down_read(&current->mm->mmap_sem)
      ...
      madvise_vma
        switch (behavior)
        case MADV_DONTNEED:
             madvise_dontneed
               zap_page_range
                 unmap_vmas
                   unmap_page_range
                     zap_pud_range
                       zap_pmd_range
                         //
                         // Assume that this huge page has never been accessed.
                         // I.e. content of the PMD entry is zero (not mapped).
                         //
                         if (pmd_trans_huge(*pmd)) {
                             // We don't get here due to the above assumption.
                         }
                         //
                         // Assume that Thread B incurred a page fault and
             .---------> // sneaks in here as shown below.
             |           //
             |           if (pmd_none_or_clear_bad(pmd))
             |               {
             |                 if (unlikely(pmd_bad(*pmd)))
             |                     pmd_clear_bad
             |                     {
             |                       pmd_ERROR
             |                         // Log "bad pmd ..." message here.
             |                       pmd_clear
             |                         // Clear the page's PMD entry.
             |                         // Thread B incremented the map count
             |                         // in page_add_new_anon_rmap(), but
             |                         // now the page is no longer mapped
             |                         // by a PMD entry (-> inconsistency).
             |                     }
             |               }
             |
             v
    - Thread B is handling a page fault on virtual address "B(fault)" shown
      in the picture.

    ...
    do_page_fault
      __do_page_fault
        // Acquire the semaphore in shared mode.
        down_read_trylock(&mm->mmap_sem)
        ...
        handle_mm_fault
          if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
              // We get here due to the above assumption (PMD entry is zero).
              do_huge_pmd_anonymous_page
                alloc_hugepage_vma
                  // Allocate a new transparent huge page here.
                ...
                __do_huge_pmd_anonymous_page
                  ...
                  spin_lock(&mm->page_table_lock)
                  ...
                  page_add_new_anon_rmap
                    // Here we increment the page's map count (starts at -1).
                    atomic_set(&page->_mapcount, 0)
                  set_pmd_at
                    // Here we set the page's PMD entry which will be cleared
                    // when Thread A calls pmd_clear_bad().
                  ...
                  spin_unlock(&mm->page_table_lock)

    The mmap_sem does not prevent the race because both threads are acquiring
    it in shared mode (down_read).  Thread B holds the page_table_lock while
    the page's map count and PMD table entry are updated.  However, Thread A
    does not synchronize on that lock.

====== end quote =======

[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Larry Woodman <lwoodman@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>		[2.6.38+]
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:54:54 -07:00
Paul Gortmaker
187f1882b5 BUG: headers with BUG/BUG_ON etc. need linux/bug.h
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including <linux/bug.h> and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-03-04 17:54:34 -05:00
Nicolas Kaiser
49b24d6b41 include/asm-generic/pgtable.h: fix unbalanced parenthesis
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-15 20:04:00 -07:00
Martin Schwidefsky
2d42552d1c [S390] merge page_test_dirty and page_clear_dirty
The page_clear_dirty primitive always sets the default storage key
which resets the access control bits and the fetch protection bit.
That will surprise a KVM guest that sets non-zero access control
bits or the fetch protection bit. Merge page_test_dirty and
page_clear_dirty back to a single function and only clear the
dirty bit from the storage key.

In addition move the function page_test_and_clear_dirty and
page_test_and_clear_young to page.h where they belong. This
requires to change the parameter from a struct page * to a page
frame number.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:31 +02:00
Ben Hutchings
fbd7184485 mm: <asm-generic/pgtable.h> must include <linux/mm_types.h>
Commit e2cda32264 ("thp: add pmd mangling generic functions") replaced
some macros in <asm-generic/pgtable.h> with inline functions.

If the functions are to be defined (not all architectures need them)
then struct vm_area_struct must be defined first.  So include
<linux/mm_types.h>.

Fixes a build failure seen in Debian:

    CC [M]  drivers/media/dvb/mantis/mantis_pci.o
  In file included from arch/arm/include/asm/pgtable.h:460,
                   from drivers/media/dvb/mantis/mantis_pci.c:25:
  include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
  include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-02-28 17:46:49 -08:00
Andrea Arcangeli
b3697c0255 fix non-x86 build failure in pmdp_get_and_clear
pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
BUG() and they were defined only to diminish the risk of build issues on
not-x86 archs and to be consistent with the generic pte methods previously
defined in include/asm-generic/pgtable.h.

But they are causing more trouble than they were supposed to solve, so
it's simpler not to define them when THP is off.

This is also correcting the export of pmdp_splitting_flush which is
currently unused (x86 isn't using the generic implementation in
mm/pgtable-generic.c and no other arch needs that [yet]).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Sam Ravnborg <sam@ravnborg.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-16 15:05:44 -08:00
Andrea Arcangeli
e2cda32264 thp: add pmd mangling generic functions
Some are needed to build but not actually used on archs not supporting
transparent hugepages.  Others like pmdp_clear_flush are used by x86 too.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Andrea Arcangeli
5f6e8da70a thp: special pmd_trans_* functions
These returns 0 at compile time when the config option is disabled, to
allow gcc to eliminate the transparent hugepage function calls at compile
time without additional #ifdefs (only the export of those functions have
to be visible to gcc but they won't be required at link time and
huge_memory.o can be not built at all).

_PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:40 -08:00
Martin Schwidefsky
e2b8d7af0e [S390] add support for nonquiescing sske
Improve performance of the sske operation by using the nonquiescing
variant if the affected page has no mappings established. On machines
with no support for the new sske variant the mask bit will be ignored.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:15 +02:00
Shaohua Li
61c77326d1 x86, mm: Avoid unnecessary TLB flush
In x86, access and dirty bits are set automatically by CPU when CPU accesses
memory. When we go into the code path of below flush_tlb_fix_spurious_fault(),
we already set dirty bit for pte and don't need flush tlb. This might mean
tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When
the CPUs do page write, they will automatically check the bit and no software
involved.

On the other hand, flush tlb in below position is harmful. Test creates CPU
number of threads, each thread writes to a same but random address in same vma
range and we measure the total time. Under a 4 socket system, original time is
1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is
20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for
tlb flush.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrea Archangeli <aarcange@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23 10:04:57 -07:00
Paul Mundt
0634a632f5 asm-generic: add dummy pgprot_noncached()
Most architectures now provide a pgprot_noncached(), the
remaining ones can simply use an dummy default implementation,
except for cris and xtensa, which should override the
default appropriately.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Magnus Damm <magnus.damm@gmail.com>
2009-06-23 14:34:30 +02:00
Jeremy Fitzhardinge
224101ed69 x86/paravirt: finish change from lazy cpu to context switch start/end
Impact: fix lazy context switch API

Pass the previous and next tasks into the context switch start
end calls, so that the called functions can properly access the
task state (esp in end_context_switch, in which the next task
is not yet completely current).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-29 23:36:01 -07:00
Jeremy Fitzhardinge
7fd7d83d49 x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switch
Impact: simplification, prepare for later changes

Make lazy cpu mode more specific to context switching, so that
it makes sense to do more context-switch specific things in
the callbacks.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2009-03-29 23:35:59 -07:00
venkatesh.pallipadi@intel.com
e4b866ed19 x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param
Impact: cleanup

Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
Subsequent patch changes the x86 PAT handling to return a compatible
memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
No fuctionality change in this patch.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-13 19:13:01 +01:00
venkatesh.pallipadi@intel.com
34801ba9bf x86: PAT: move track untrack pfnmap stubs to asm-generic
Impact: Cleanup and branch hints only.

Move the track and untrack pfn stub routines from memory.c to asm-generic.
Also add unlikely to pfnmap related calls in fork and exit path.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-19 15:40:30 -08:00
venkatesh.pallipadi@intel.com
2520bd3123 x86: PAT: add pgprot_writecombine() interface for drivers - v3
Impact: New mm functionality.

Add pgprot_writecombine. pgprot_writecombine will be aliased to
pgprot_noncached when not supported by the architecture.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-12-18 13:30:16 -08:00
Sebastian Siewior
fe1a6875fc mm: fix build on non-mmu machines
Commit 1ea0704e0d aka "mm: add a ptep_modify_prot transaction abstraction"

caused:

|  CC      init/main.o
|In file included from include2/asm/pgtable.h:68,
|                 from /home/bigeasy/git/linux-2.6-m68k/include/linux/mm.h:39,
|                 from include2/asm/uaccess.h:8,
|                 from /home/bigeasy/git/linux-2.6-m68k/include/linux/poll.h:13,
|                 from /home/bigeasy/git/linux-2.6-m68k/include/linux/rtc.h:113,
|                 from /home/bigeasy/git/linux-2.6-m68k/include/linux/efi.h:19,
|                 from /home/bigeasy/git/linux-2.6-m68k/init/main.c:43:
|/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_start':
|/linux-2.6/include/asm-generic/pgtable.h:209: error: implicit declaration of function 'ptep_get_and_clear'
|/linux-2.6/include/asm-generic/pgtable.h:209: error: incompatible types in return
|/linux-2.6/include/asm-generic/pgtable.h: In function '__ptep_modify_prot_commit':
|/linux-2.6/include/asm-generic/pgtable.h:220: error: implicit declaration of function 'set_pte_at'
|make[2]: *** [init/main.o] Error 1
|make[1]: *** [init] Error 2
|make: *** [sub-make] Error 2

on my m68knommu box.

Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-15 13:58:40 -07:00
Jeremy Fitzhardinge
1ea0704e0d mm: add a ptep_modify_prot transaction abstraction
This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.

The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory.  This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state.  When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.

When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor.  To mitigate the cost of these updates, we tend to batch
them.

However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable.  This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:

ptep_modify_prot_start() returns the current pte value, and puts the
  pte entry into a state where either the hardware will not update the
  pte, or if it does, the updates will be preserved on commit.

ptep_modify_prot_commit() writes back the updated pte, makes sure that
  any hardware updates made since ptep_modify_prot_start() are
  preserved.

ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock.  They do not protect
against other software updates of the pte in any way.

The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.

The only current user of this interface is mprotect

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:15:53 +02:00
KAMEZAWA Hiroyuki
954ffcb35f flush icache before set_pte() on ia64: flush icache at set_pte
Current ia64 kernel flushes icache by lazy_mmu_prot_update() *after*
set_pte().  This is too late.  This patch removes lazy_mmu_prot_update and
add modfied set_pte() for flushing if necessary.

This patch flush icache of a page when
	new pte has exec bit.
	&& new pte has present bit
	&& new pte is user's page.
	&& (old *ptep is not present
            || new pte's pfn is not same to old *ptep's ptn)
	&& new pte's page has no Pg_arch_1 bit.
	   Pg_arch_1 is set when a page is cache consistent.

I think this condition checks are much easier to understand than considering
"Where sync_icache_dcache() should be inserted ?".

pte_user() for ia64 was removed by http://lkml.org/lkml/2007/6/12/67 as
clean-up. So, I added it again.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:59 -07:00
Greg Ungerer
9535239f6b changing include/asm-generic/pgtable.h for non-mmu
There are some parts of include/asm-generic/pgtable.h that are relevant to
the non-mmu architectures.  To make it easier to include this from them I
would like to ifdef the relevant parts.

Without this there is a handful of functions that are referenced in here
that are not defined on many non-mmu architectures.  They could be defined
out of course, as an alternative approach.

Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11 15:47:42 -07:00
Martin Schwidefsky
e21ea246bc mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirty
Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty.  Remove
the functions from all architectures.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:22:59 -07:00
Martin Schwidefsky
f0e47c229b mm: remove ptep_establish()
The last user of ptep_establish in mm/ is long gone.  Remove the architecture
primitive as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:22:59 -07:00
Benjamin Herrenschmidt
8dab5241d0 Rework ptep_set_access_flags and fix sun4c
Some changes done a while ago to avoid pounding on ptep_set_access_flags and
update_mmu_cache in some race situations break sun4c which requires
update_mmu_cache() to always be called on minor faults.

This patch reworks ptep_set_access_flags() semantics, implementations and
callers so that it's now responsible for returning whether an update is
necessary or not (basically whether the PTE actually changed).  This allow
fixing the sparc implementation to always return 1 on sun4c.

[akpm@linux-foundation.org: fixes, cleanups]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: David Miller <davem@davemloft.net>
Cc: Mark Fortescue <mark@mtfhpc.demon.co.uk>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-16 13:16:16 -07:00
Martin Schwidefsky
6c210482ae [S390] split page_test_and_clear_dirty.
The page_test_and_clear_dirty primitive really consists of two
operations, page_test_dirty and the page_clear_dirty. The combination
of the two is not an atomic operation, so it makes more sense to have
two separate operations instead of one.
In addition to the improved readability of the s390 version of
SetPageUptodate, it now avoids the page_test_dirty operation which is
an insert-storage-key-extended (iske) instruction which is an expensive
operation.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2007-04-27 16:01:46 +02:00