page_order() is called by memory hotplug's user interface to check the
section is removable or not. (is_mem_section_removable())
It calls page_order() withoug holding zone->lock.
So, even if the caller does
if (PageBuddy(page))
ret = page_order(page) ...
The caller may hit BUG_ON().
For fixing this, there are 2 choices.
1. add zone->lock.
2. remove BUG_ON().
is_mem_section_removable() is used for some "advice" and doesn't need to
be 100% accurate. This is_removable() can be called via user program..
We don't want to take this important lock for long by user's request. So,
this patch removes BUG_ON().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
System management wants to subscribe to changes in swap configuration.
Make /proc/swaps pollable like /proc/mounts.
[akpm@linux-foundation.org: document proc_poll_event]
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes following warning from sparse:
mm/vmstat.c:466:5: warning: symbol 'fragmentation_index' was not declared. Should it be static?
[akpm@linux-foundation.org: move the include to top-of-file]
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename redundant 'tmp' to fix following sparse warnings:
mm/vmalloc.c:296:34: warning: symbol 'tmp' shadows an earlier one
mm/vmalloc.c:293:24: originally declared here
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page_check_address() conditionally grabs *@ptlp in case of returning
non-NULL. Rename and wrap it using __cond_lock() removes following
warnings from sparse:
mm/rmap.c:472:9: warning: context imbalance in 'page_mapped_in_vma' - unexpected unlock
mm/rmap.c:524:9: warning: context imbalance in 'page_referenced_one' - unexpected unlock
mm/rmap.c:706:9: warning: context imbalance in 'page_mkclean_one' - unexpected unlock
mm/rmap.c:1066:9: warning: context imbalance in 'try_to_unmap_one' - unexpected unlock
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page_lock_anon_vma() conditionally grabs RCU and anon_vma lock but
page_unlock_anon_vma() releases them unconditionally. This leads sparse
to complain about context imbalance. Annotate them.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The follow_pte() conditionally grabs *@ptlp in case of returning 0.
Rename and wrap it using __cond_lock() removes following warnings:
mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The do_wp_page() releases @ptl but was missing proper annotation. Add it.
This removes following warnings from sparse:
mm/memory.c:2337:9: warning: context imbalance in 'do_wp_page' - unexpected unlock
mm/memory.c:3142:19: warning: context imbalance in 'handle_mm_fault' - different lock contexts for basic block
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The get_locked_pte() conditionally grabs 'ptl' in case of returning
non-NULL. This leads sparse to complain about context imbalance. Rename
and wrap it using __cond_lock() to make sparse happy.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
'end' shadows earlier one and is not necessary at all. Remove it and use
'pos' instead. This removes following sparse warnings:
mm/filemap.c:2180:24: warning: symbol 'end' shadows an earlier one
mm/filemap.c:2132:25: originally declared here
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change reduces mmap_sem hold times that are caused by waiting for
disk transfers when accessing file mapped VMAs.
It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call
site wants mmap_sem to be released if blocking on a pending disk transfer.
In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and
do_page_fault() will then re-acquire mmap_sem and retry the page fault.
It is expected that the retry will hit the same page which will now be
cached, and thus it will complete with a low mmap_sem hold time.
Tests:
- microbenchmark: thread A mmaps a large file and does random read accesses
to the mmaped area - achieves about 55 iterations/s. Thread B does
mmap/munmap in a loop at a separate location - achieves 55 iterations/s
before, 15000 iterations/s after.
- We are seeing related effects in some applications in house, which show
significant performance regressions when running without this change.
[akpm@linux-foundation.org: fix warning & crash]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Ying Han <yinghan@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Buggy drivers (e.g. fsl_udc) could call dma_pool_alloc from atomic
context with GFP_KERNEL. In most instances, the first pool_alloc_page
call would succeed and the sleeping functions would never be called. This
allowed the buggy drivers to slip through the cracks.
Add a might_sleep_if() checking for __GFP_WAIT in flags.
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keep the current interface but ignore the KM_type and use a stack based
approach.
The advantage is that we get rid of crappy code like:
#define __KM_PTE \
(in_nmi() ? KM_NMI_PTE : \
in_irq() ? KM_IRQ_PTE : \
KM_PTE0)
and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.
The downside is that FRV kmap_atomic() gets more expensive.
For now we use a CPP trick suggested by Andrew:
#define kmap_atomic(page, args...) __kmap_atomic(page)
to avoid having to touch all kmap_atomic() users in a single patch.
[ not compiled on:
- mn10300: the arch doesn't actually build with highmem to begin with ]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a page has PG_referenced, shrink_page_list() discards it only if it
is not dirty. This rule works fine if the backing filesystem is a regular
one. PG_dirty is a good signal that the page was used recently because
the flusher threads clean pages periodically. In addition, page writeback
is costlier than simple page discard.
However, when a page is on tmpfs this heuristic doesn't work because
flusher threads don't write back tmpfs pages. Consequently tmpfs pages
always rotate around the lru twice at least and adds unnecessary lru
churn. Simple tmpfs streaming io shouldn't cause large anonymous page
swap-out.
Remove this unncessary reclaim bonus of tmpfs pages.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dirty_ratio was silently limited in global_dirty_limits() to >= 5%.
This is not a user expected behavior. And it's inconsistent with
calc_period_shift(), which uses the plain vm_dirty_ratio value.
Let's remove the internal bound.
At the same time, fix balance_dirty_pages() to work with the
dirty_thresh=0 case. This allows applications to proceed when
dirty+writeback pages are all cleaned.
And ">" fits with the name "exceeded" better than ">=" does. Neil thinks
it is an aesthetic improvement as well as a functional one :)
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Proposed-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If congestion_wait() is called with no BDI congested, the caller will
sleep for the full timeout and this may be an unnecessary sleep. This
patch adds a wait_iff_congested() that checks congestion and only sleeps
if a BDI is congested else, it calls cond_resched() to ensure the caller
is not hogging the CPU longer than its quota but otherwise will not sleep.
This is aimed at reducing some of the major desktop stalls reported during
IO. For example, while kswapd is operating, it calls congestion_wait()
but it could just have been reclaiming clean page cache pages with no
congestion. Without this patch, it would sleep for a full timeout but
after this patch, it'll just call schedule() if it has been on the CPU too
long. Similar logic applies to direct reclaimers that are not making
enough progress.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>