Introduce a new API vmemmap_free() to free and remove vmemmap
pagetables. Since pagetable implements are different, each architecture
has to provide its own version of vmemmap_free(), just like
vmemmap_populate().
Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.
[mhocko@suse.cz: fix implicit declaration of remove_pagetable]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When memory is removed, the corresponding pagetables should alse be
removed. This patch introduces some common APIs to support vmemmap
pagetable and x86_64 architecture direct mapping pagetable removing.
All pages of virtual mapping in removed memory cannot be freed if some
pages used as PGD/PUD include not only removed memory but also other
memory. So this patch uses the following way to check whether a page
can be freed or not.
1) When removing memory, the page structs of the removed memory are
filled with 0FD.
2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be
cleared. In this case, the page used as PT/PMD can be freed.
For direct mapping pages, update direct_pages_count[level] when we freed
their pagetables. And do not free the pages again because they were
freed when offlining.
For vmemmap pages, free the pages and their pagetables.
For larger pages, do not split them into smaller ones because there is
no way to know if the larger page has been split. As a result, there is
no way to decide when to split. We deal the larger pages in the
following way:
1) For direct mapped pages, all the pages were freed when they were
offlined. And since menmory offline is done section by section, all
the memory ranges being removed are aligned to PAGE_SIZE. So only need
to deal with unaligned pages when freeing vmemmap pages.
2) For vmemmap pages being used to store page_struct, if part of the
larger page is still in use, just fill the unused part with 0xFD. And
when the whole page is fulfilled with 0xFD, then free the larger page.
[akpm@linux-foundation.org: fix typo in comment]
[tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables]
[tangchen@cn.fujitsu.com: do not free direct mapping pages twice]
[tangchen@cn.fujitsu.com: do not free page split from hugepage one by one]
[tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages]
[akpm@linux-foundation.org: use pmd_page_vaddr()]
[akpm@linux-foundation.org: fix used-uninitialised bug]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In __remove_section(), we locked pgdat_resize_lock when calling
sparse_remove_one_section(). This lock will disable irq. But we don't
need to lock the whole function. If we do some work to free pagetables
in free_section_usemap(), we need to call flush_tlb_all(), which need
irq enabled. Otherwise the WARN_ON_ONCE() in smp_call_function_many()
will be triggered.
If we lock the whole sparse_remove_one_section(), then we come to this call trace:
------------[ cut here ]------------
WARNING: at kernel/smp.c:461 smp_call_function_many+0xbd/0x260()
Hardware name: PRIMEQUEST 1800E
......
Call Trace:
smp_call_function_many+0xbd/0x260
smp_call_function+0x3b/0x50
on_each_cpu+0x3b/0xc0
flush_tlb_all+0x1c/0x20
remove_pagetable+0x14e/0x1d0
vmemmap_free+0x18/0x20
sparse_remove_one_section+0xf7/0x100
__remove_section+0xa2/0xb0
__remove_pages+0xa0/0xd0
arch_remove_memory+0x6b/0xc0
remove_memory+0xb8/0xf0
acpi_memory_device_remove+0x53/0x96
acpi_device_remove+0x90/0xb2
__device_release_driver+0x7c/0xf0
device_release_driver+0x2f/0x50
acpi_bus_remove+0x32/0x6d
acpi_bus_trim+0x91/0x102
acpi_bus_hot_remove_device+0x88/0x16b
acpi_os_execute_deferred+0x27/0x34
process_one_work+0x20e/0x5c0
worker_thread+0x12e/0x370
kthread+0xee/0x100
ret_from_fork+0x7c/0xb0
---[ end trace 25e85300f542aa01 ]---
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem(). So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().
NOTE: register_page_bootmem_memmap() is not implemented for ia64,
ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
and revert register_page_bootmem_info_node() when platform doesn't
support it.
It's implemented by adding a new Kconfig option named
CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
by memory-hotplug feature fully supported archs(currently only on
x86_64).
Since we have 2 config options called MEMORY_HOTPLUG and
MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
and codes in function register_page_bootmem_info_node() are only
used for collecting infomation for hot-remove, so reside it under
MEMORY_HOTREMOVE.
Besides page_isolation.c selected by MEMORY_ISOLATION under
MEMORY_HOTPLUG is also such case, move it too.
[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wu Jianguo <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start,
type} sysfs files are created. But there is no code to remove these
files. This patch implements the function to remove them.
We cannot free firmware_map_entry which is allocated by bootmem because
there is no way to do so when the system is up. But we can at least
remember the address of that memory and reuse the storage when the
memory is added next time.
This patch also introduces a new list map_entries_bootmem to link the
map entries allocated by bootmem when they are removed, and a lock to
protect it. And these entries will be reused when the memory is
hot-added again.
The idea is suggestted by Andrew Morton.
NOTE: It is unsafe to return an entry pointer and release the
map_entries_lock. So we should not hold the map_entries_lock
separately in firmware_map_find_entry() and
firmware_map_remove_entry(). Hold the map_entries_lock across find
and remove /sys/firmware/memmap/X operation.
And also, users of these two functions need to be careful to
hold the lock when using these two functions.
[tangchen@cn.fujitsu.com: Hold spinlock across find|remove /sys operation]
[tangchen@cn.fujitsu.com: fix the wrong comments of map_entries]
[tangchen@cn.fujitsu.com: reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem]
[tangchen@cn.fujitsu.com: fix section mismatch problem]
[tangchen@cn.fujitsu.com: fix the doc format in drivers/firmware/memmap.c]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We remove the memory like this:
1. lock memory hotplug
2. offline a memory block
3. unlock memory hotplug
4. repeat 1-3 to offline all memory blocks
5. lock memory hotplug
6. remove memory(TODO)
7. unlock memory hotplug
All memory blocks must be offlined before removing memory. But we don't
hold the lock in the whole operation. So we should check whether all
memory blocks are offlined before step6. Otherwise, kernel maybe
panicked.
Offlining a memory block and removing a memory device can be two
different operations. Users can just offline some memory blocks without
removing the memory device. For this purpose, the kernel has held
lock_memory_hotplug() in __offline_pages(). To reuse the code for
memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
repeatedly lock and unlock memory hotplug, but not hold the memory
hotplug lock in the whole operation.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memory can't be offlined when CONFIG_MEMCG is selected. For example:
there is a memory device on node 1. The address range is [1G, 1.5G).
You will find 4 new directories memory8, memory9, memory10, and memory11
under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page
cgroup when we online pages. When we online memory8, the memory stored
page cgroup is not provided by this memory device. But when we online
memory9, the memory stored page cgroup may be provided by memory8. So
we can't offline memory8 now. We should offline the memory in the
reversed order.
When the memory device is hotremoved, we will auto offline memory
provided by this memory device. But we don't know which memory is
onlined first, so offlining memory may fail. In such case, iterate
twice to offline the memory. 1st iterate: offline every non primary
memory block. 2nd iterate: offline primary (i.e. first added) memory
block.
This idea is suggested by KOSAKI Motohiro.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE
multiple, however there was no equivalent code in mm_populate(), which
caused issues.
This could be fixed by introduced the same rounding in mm_populate(),
however I think it's preferable to make do_mmap_pgoff() return populate
as a size rather than as a boolean, so we don't have to duplicate the
size rounding logic in mm_populate().
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The vm_populate() code populates user mappings without constantly
holding the mmap_sem. This makes it susceptible to racy userspace
programs: the user mappings may change while vm_populate() is running,
and in this case vm_populate() may end up populating the new mapping
instead of the old one.
In order to reduce the possibility of userspace getting surprised by
this behavior, this change introduces the VM_POPULATE vma flag which
gets set on vmas we want vm_populate() to work on. This way
vm_populate() may still end up populating the new mapping after such a
race, but only if the new mapping is also one that the user has
requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In find_extend_vma(), we don't need mlock_vma_pages_range() to verify
the vma type - we know we're working with a stack. So, we can call
directly into __mlock_vma_pages_range(), and remove the last
make_pages_present() call site.
Note that we don't use mm_populate() here, so we can't release the
mmap_sem while allocating new stack pages. This is deemed acceptable,
because the stack vmas grow by a bounded number of pages at a time, and
these are anon pages so we don't have to read from disk to populate
them.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the MAP_POPULATE handling has been moved to mmap_region() call
sites, the only remaining use of the flags argument is to pass the
MAP_NORESERVE flag. This can be just as easily handled by
do_mmap_pgoff(), so do that and remove the mmap_region() flags
parameter.
[akpm@linux-foundation.org: remove double parens]
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas. This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.
This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() /
mlockall().
Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have many vma manipulation functions that are fast in the typical
case, but can optionally be instructed to populate an unbounded number
of ptes within the region they work on:
- mmap with MAP_POPULATE or MAP_LOCKED flags;
- remap_file_pages() with MAP_NONBLOCK not set or when working on a
VM_LOCKED vma;
- mmap_region() and all its wrappers when mlock(MCL_FUTURE) is in
effect;
- brk() when mlock(MCL_FUTURE) is in effect.
Current code handles these pte operations locally, while the
sourrounding code has to hold the mmap_sem write side since it's
manipulating vmas. This means we're doing an unbounded amount of pte
population work with mmap_sem held, and this causes problems as Andy
Lutomirski reported (we've hit this at Google as well, though it's not
entirely clear why people keep trying to use mlock(MCL_FUTURE) in the
first place).
I propose introducing a new mm_populate() function to do this pte
population work after the mmap_sem has been released. mm_populate()
does need to acquire the mmap_sem read side, but critically, it doesn't
need to hold it continuously for the entire duration of the operation -
it can drop it whenever things take too long (such as when hitting disk
for a file read) and re-acquire it later on.
The following patches are included
- Patches 1 fixes some issues I noticed while working on the existing code.
If needed, they could potentially go in before the rest of the patches.
- Patch 2 introduces the new mm_populate() function and changes
mmap_region() call sites to use it after they drop mmap_sem. This is
inspired from Andy Lutomirski's proposal and is built as an extension
of the work I had previously done for mlock() and mlockall() around
v2.6.38-rc1. I had tried doing something similar at the time but had
given up as there were so many do_mmap() call sites; the recent cleanups
by Linus and Viro are a tremendous help here.
- Patches 3-5 convert some of the less-obvious places doing unbounded
pte populates to the new mm_populate() mechanism.
- Patches 6-7 are code cleanups that are made possible by the
mm_populate() work. In particular, they remove more code than the
entire patch series added, which should be a good thing :)
- Patch 8 is optional to this entire series. It only helps to deal more
nicely with racy userspace programs that might modify their mappings
while we're trying to populate them. It adds a new VM_POPULATE flag
on the mappings we do want to populate, so that if userspace replaces
them with mappings it doesn't want populated, mm_populate() won't
populate those replacement mappings.
This patch:
Assorted small fixes. The first two are quite small:
- Move check for vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR)
within existing if (!(vma->vm_flags & VM_NONLINEAR)) block.
Purely cosmetic.
- In the VM_LOCKED case, when dropping PG_Mlocked for the over-mapped
range, make sure we own the mmap_sem write lock around the
munlock_vma_pages_range call as this manipulates the vma's vm_flags.
Last fix requires a longer explanation. remap_file_pages() can do its work
either through VM_NONLINEAR manipulation or by creating extra vmas.
These two cases were inconsistent with each other (and ultimately, both wrong)
as to exactly when did they fault in the newly mapped file pages:
- In the VM_NONLINEAR case, new file pages would be populated if
the MAP_NONBLOCK flag wasn't passed. If MAP_NONBLOCK was passed,
new file pages wouldn't be populated even if the vma is already
marked as VM_LOCKED.
- In the linear (emulated) case, the work is passed to the mmap_region()
function which would populate the pages if the vma is marked as
VM_LOCKED, and would not otherwise - regardless of the value of the
MAP_NONBLOCK flag, because MAP_POPULATE wasn't being passed to
mmap_region().
The desired behavior is that we want the pages to be populated and locked
if the vma is marked as VM_LOCKED, or to be populated if the MAP_NONBLOCK
flag is not passed to remap_file_pages().
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that balance_pgdat() is slightly tidied up, thanks to more capable
pgdat_balanced(), it's become obvious that pgdat_balanced() is called to
check the status, then break the loop if pgdat is balanced, just to be
immediately called again. The second call is completely unnecessary, of
course.
The patch introduces pgdat_is_balanced boolean, which helps resolve the
above suboptimal behavior, with the added benefit of slightly better
documenting one other place in the function where we jump and skip lots
of code.
Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>