Prakash Sangappa
81aac3a15e
userfaultfd: selftest: add tests for UFFD_FEATURE_SIGBUS feature
...
Add tests for UFFD_FEATURE_SIGBUS feature. The tests will verify signal
delivery instead of userfault events. Also, test use of UFFDIO_COPY to
allocate memory and retry accessing monitored area after signal
delivery.
Also fix a bug in uffd_poll_thread() where 'uffd' is leaked.
Link: http://lkml.kernel.org/r/1501552446-748335-3-git-send-email-prakash.sangappa@oracle.com
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com >
Cc: Shuah Khan <shuah@kernel.org >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:29 -07:00
Prakash Sangappa
2d6d6f5a09
mm: userfaultfd: add feature to request for a signal delivery
...
In some cases, userfaultfd mechanism should just deliver a SIGBUS signal
to the faulting process, instead of the page-fault event. Dealing with
page-fault event using a monitor thread can be an overhead in these
cases. For example applications like the database could use the
signaling mechanism for robustness purpose.
Database uses hugetlbfs for performance reason. Files on hugetlbfs
filesystem are created and huge pages allocated using fallocate() API.
Pages are deallocated/freed using fallocate() hole punching support.
These files are mmapped and accessed by many processes as shared memory.
The database keeps track of which offsets in the hugetlbfs file have
pages allocated.
Any access to mapped address over holes in the file, which can occur due
to bugs in the application, is considered invalid and expect the process
to simply receive a SIGBUS. However, currently when a hole in the file
is accessed via the mapped address, kernel/mm attempts to automatically
allocate a page at page fault time, resulting in implicitly filling the
hole in the file. This may not be the desired behavior for applications
like the database that want to explicitly manage page allocations of
hugetlbfs files.
Using userfaultfd mechanism with this support to get a signal, database
application can prevent pages from being allocated implicitly when
processes access mapped address over holes in the file.
This patch adds UFFD_FEATURE_SIGBUS feature to userfaultfd mechnism to
request for a SIGBUS signal.
See following for previous discussion about the database requirement
leading to this proposal as suggested by Andrea.
http://www.spinics.net/lists/linux-mm/msg129224.html
Link: http://lkml.kernel.org/r/1501552446-748335-2-git-send-email-prakash.sangappa@oracle.com
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com >
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Shuah Khan <shuah@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:29 -07:00
Michal Hocko
c41f012ade
mm: rename global_page_state to global_zone_page_state
...
global_page_state is error prone as a recent bug report pointed out [1].
It only returns proper values for zone based counters as the enum it
gets suggests. We already have global_node_page_state so let's rename
global_page_state to global_zone_page_state to be more explicit here.
All existing users seems to be correct:
$ git grep "global_page_state(NR_" | sed 's@.*(\(NR_[A-Z_]*\)).*@\1@' | sort | uniq -c
2 NR_BOUNCE
2 NR_FREE_CMA_PAGES
11 NR_FREE_PAGES
1 NR_KERNEL_STACK_KB
1 NR_MLOCK
2 NR_PAGETABLE
This patch shouldn't introduce any functional change.
[1] http://lkml.kernel.org/r/201707260628.v6Q6SmaS030814@www262.sakura.ne.jp
Link: http://lkml.kernel.org/r/20170801134256.5400-2-hannes@cmpxchg.org
Signed-off-by: Michal Hocko <mhocko@suse.com >
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org >
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp >
Cc: Josef Bacik <josef@toxicpanda.com >
Cc: Vladimir Davydov <vdavydov.dev@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:29 -07:00
Mike Kravetz
4da243ac1c
mm: shm: use new hugetlb size encoding definitions
...
Use the common definitions from hugetlb_encode.h header file for
encoding hugetlb size definitions in shmget system call flags.
In addition, move these definitions from the internal (kernel) to user
(uapi) header file.
Link: http://lkml.kernel.org/r/1501527386-10736-4-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com >
Suggested-by: Matthew Wilcox <willy@infradead.org >
Acked-by: Michal Hocko <mhocko@suse.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com >
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com >
Cc: Arnd Bergmann <arnd@arndb.de >
Cc: Davidlohr Bueso <dbueso@suse.de >
Cc: Michael Kerrisk <mtk.manpages@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Kravetz
aafd4562df
mm: arch: consolidate mmap hugetlb size encodings
...
A non-default huge page size can be encoded in the flags argument of the
mmap system call. The definitions for these encodings are in arch
specific header files. However, all architectures use the same values.
Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h). Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.
Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com >
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com >
Cc: Arnd Bergmann <arnd@arndb.de >
Cc: Davidlohr Bueso <dbueso@suse.de >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Michael Kerrisk <mtk.manpages@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Kravetz
e652f69459
mm: hugetlb: define system call hugetlb size encodings in single file
...
Patch series "Consolidate system call hugetlb page size encodings".
These patches are the result of discussions in
https://lkml.org/lkml/2017/3/8/548 . The following changes are made in the
patch set:
1) Put all the log2 encoded huge page size definitions in a common
header file. The idea is have a set of definitions that can be use as
the basis for system call specific definitions such as MAP_HUGE_* and
SHM_HUGE_*.
2) Remove MAP_HUGE_* definitions in arch specific files. All these
definitions are the same. Consolidate all definitions in the primary
user header file (uapi/linux/mman.h).
3) Remove SHM_HUGE_* definitions intended for user space from kernel
header file, and add to user (uapi/linux/shm.h) header file. Add
definitions for all known huge page size encodings as in mmap.
This patch (of 3):
If hugetlb pages are requested in mmap or shmget system calls, a huge
page size other than default can be requested. This is accomplished by
encoding the log2 of the huge page size in the upper bits of the flag
argument. asm-generic and arch specific headers all define the same
values for these encodings.
Put common definitions in a single header file. The primary uapi header
files for mmap and shm will use these definitions as a basis for
definitions specific to those system calls.
Link: http://lkml.kernel.org/r/1501527386-10736-2-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Andi Kleen <ak@linux.intel.com >
Cc: Michael Kerrisk <mtk.manpages@gmail.com >
Cc: Davidlohr Bueso <dbueso@suse.de >
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com >
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Arnd Bergmann <arnd@arndb.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Jeff Layton
a446d6f9ce
include/linux/fs.h: remove unneeded forward definition of mm_struct
...
Link: http://lkml.kernel.org/r/20170525102927.6163-1-jlayton@redhat.com
Signed-off-by: Jeff Layton <jlayton@redhat.com >
Reviewed-by: Jan Kara <jack@suse.cz >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Jeff Layton
de23abd151
fs/sync.c: remove unnecessary NULL f_mapping check in sync_file_range
...
fsync codepath assumes that f_mapping can never be NULL, but
sync_file_range has a check for that.
Remove the one from sync_file_range as I don't see how you'd ever get a
NULL pointer in here.
Link: http://lkml.kernel.org/r/20170525110509.9434-1-jlayton@redhat.com
Signed-off-by: Jeff Layton <jlayton@redhat.com >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
824f973904
userfaultfd: selftest: enable testing of UFFDIO_ZEROPAGE for shmem
...
Link: http://lkml.kernel.org/r/1497939652-16528-8-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
ce53e8e6f2
userfaultfd: report UFFDIO_ZEROPAGE as available for shmem VMAs
...
Now when shmem VMAs can be filled with zero page via userfaultfd we can
report that UFFDIO_ZEROPAGE is available for those VMAs
Link: http://lkml.kernel.org/r/1497939652-16528-7-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
8fb44e5403
userfaultfd: shmem: wire up shmem_mfill_zeropage_pte
...
For shmem VMAs we can use shmem_mfill_zeropage_pte for UFFDIO_ZEROPAGE
Link: http://lkml.kernel.org/r/1497939652-16528-6-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
3217d3c79b
userfaultfd: mcopy_atomic: introduce mfill_atomic_pte helper
...
Shuffle the code a bit to improve readability.
Link: http://lkml.kernel.org/r/1497939652-16528-5-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
8d10396342
userfaultfd: shmem: add shmem_mfill_zeropage_pte for userfaultfd support
...
shmem_mfill_zeropage_pte is the low level routine that implements the
userfaultfd UFFDIO_ZEROPAGE command. Since for shmem mappings zero
pages are always allocated and accounted, the new method is a slight
extension of the existing shmem_mcopy_atomic_pte.
Link: http://lkml.kernel.org/r/1497939652-16528-4-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
0f07969456
shmem: introduce shmem_inode_acct_block
...
The shmem_acct_block and the update of used_blocks are following one
another in all the places they are used. Combine these two into a
helper function.
Link: http://lkml.kernel.org/r/1497939652-16528-3-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Mike Rapoport
b1cc94ab2f
shmem: shmem_charge: verify max_block is not exceeded before inode update
...
Patch series "userfaultfd: enable zeropage support for shmem".
These patches enable support for UFFDIO_ZEROPAGE for shared memory.
The first two patches are not strictly related to userfaultfd, they are
just minor refactoring to reduce amount of code duplication.
This patch (of 7):
Currently we update inode and shmem_inode_info before verifying that
used_blocks will not exceed max_blocks. In case it will, we undo the
update. Let's switch the order and move the verification of the blocks
count before the inode and shmem_inode_info update.
Link: http://lkml.kernel.org/r/1497939652-16528-2-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Hillf Danton <hillf.zj@alibaba-inc.com >
Cc: Pavel Emelyanov <xemul@virtuozzo.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
fe490cc0fe
mm, THP, swap: add THP swapping out fallback counting
...
When swapping out THP (Transparent Huge Page), instead of swapping out
the THP as a whole, sometimes we have to fallback to split the THP into
normal pages before swapping, because no free swap clusters are
available, or cgroup limit is exceeded, etc. To count the number of the
fallback, a new VM event THP_SWPOUT_FALLBACK is added, and counted when
we fallback to split the THP.
Link: http://lkml.kernel.org/r/20170724051840.2309-13-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Hugh Dickins <hughd@google.com >
Cc: Shaohua Li <shli@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
bd4c82c22c
mm, THP, swap: delay splitting THP after swapped out
...
In this patch, splitting transparent huge page (THP) during swapping out
is delayed from after adding the THP into the swap cache to after
swapping out finishes. After the patch, more operations for the
anonymous THP reclaiming, such as writing the THP to the swap device,
removing the THP from the swap cache could be batched. So that the
performance of anonymous THP swapping out could be improved.
This is the second step for the THP swap support. The plan is to delay
splitting the THP step by step and avoid splitting the THP finally.
With the patchset, the swap out throughput improves 42% (from about
5.81GB/s to about 8.25GB/s) in the vm-scalability swap-w-seq test case
with 16 processes. At the same time, the IPI (reflect TLB flushing)
reduced about 78.9%. The test is done on a Xeon E5 v3 system. The swap
device used is a RAM simulated PMEM (persistent memory) device. To test
the sequential swapping out, the test case creates 8 processes, which
sequentially allocate and write to the anonymous pages until the RAM and
part of the swap device is used up.
Link: http://lkml.kernel.org/r/20170724051840.2309-12-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Hugh Dickins <hughd@google.com >
Cc: Shaohua Li <shli@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
d6810d7300
memcg, THP, swap: make mem_cgroup_swapout() support THP
...
This patch makes mem_cgroup_swapout() works for the transparent huge
page (THP). Which will move the memory cgroup charge from memory to
swap for a THP.
This will be used for the THP swap support. Where a THP may be swapped
out as a whole to a set of (HPAGE_PMD_NR) continuous swap slots on the
swap device.
Link: http://lkml.kernel.org/r/20170724051840.2309-11-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Rik van Riel <riel@redhat.com >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Shaohua Li <shli@kernel.org >
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
abe2895b76
memcg, THP, swap: avoid to duplicated charge THP in swap cache
...
For a THP (Transparent Huge Page), tail_page->mem_cgroup is NULL. So to
check whether the page is charged already, we need to check the head
page. This is not an issue before because it is impossible for a THP to
be in the swap cache before. But after we add delaying splitting THP
after swapped out support, it is possible now.
Link: http://lkml.kernel.org/r/20170724051840.2309-10-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Rik van Riel <riel@redhat.com >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Shaohua Li <shli@kernel.org >
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
3e14a57b24
memcg, THP, swap: support move mem cgroup charge for THP swapped out
...
PTE mapped THP (Transparent Huge Page) will be ignored when moving
memory cgroup charge. But for THP which is in the swap cache, the
memory cgroup charge for the swap of a tail-page may be moved in current
implementation. That isn't correct, because the swap charge for all
sub-pages of a THP should be moved together. Following the processing
of the PTE mapped THP, the mem cgroup charge moving for the swap entry
for a tail-page of a THP is ignored too.
Link: http://lkml.kernel.org/r/20170724051840.2309-9-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Rik van Riel <riel@redhat.com >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Shaohua Li <shli@kernel.org >
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
59807685a7
mm, THP, swap: support splitting THP for THP swap out
...
After adding swapping out support for THP (Transparent Huge Page), it is
possible that a THP in swap cache (partly swapped out) need to be split.
To split such a THP, the swap cluster backing the THP need to be split
too, that is, the CLUSTER_FLAG_HUGE flag need to be cleared for the swap
cluster. The patch implemented this.
And because the THP swap writing needs the THP keeps as huge page during
writing. The PageWriteback flag is checked before splitting.
Link: http://lkml.kernel.org/r/20170724051840.2309-8-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Hugh Dickins <hughd@google.com >
Cc: Shaohua Li <shli@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
225311a464
mm: test code to write THP to swap device as a whole
...
To support delay splitting THP (Transparent Huge Page) after swapped
out, we need to enhance swap writing code to support to write a THP as a
whole. This will improve swap write IO performance.
As Ming Lei <ming.lei@redhat.com > pointed out, this should be based on
multipage bvec support, which hasn't been merged yet. So this patch is
only for testing the functionality of the other patches in the series.
And will be reimplemented after multipage bvec support is merged.
Link: http://lkml.kernel.org/r/20170724051840.2309-7-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Shaohua Li <shli@kernel.org >
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:28 -07:00
Huang Ying
98cc093cba
block, THP: make block_device_operations.rw_page support THP
...
The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device. To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write THP if possible.
Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Reviewed-by: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Shaohua Li <shli@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:27 -07:00
Huang Ying
f0eea189e8
mm, THP, swap: don't allocate huge cluster for file backed swap device
...
It's hard to write a whole transparent huge page (THP) to a file backed
swap device during swapping out and the file backed swap device isn't
very popular. So the huge cluster allocation for the file backed swap
device is disabled.
Link: http://lkml.kernel.org/r/20170724051840.2309-5-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Hugh Dickins <hughd@google.com >
Cc: Shaohua Li <shli@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:27 -07:00
Huang Ying
ba3c4ce6de
mm, THP, swap: make reuse_swap_page() works for THP swapped out
...
After supporting to delay THP (Transparent Huge Page) splitting after
swapped out, it is possible that some page table mappings of the THP are
turned into swap entries. So reuse_swap_page() need to check the swap
count in addition to the map count as before. This patch done that.
In the huge PMD write protect fault handler, in addition to the page map
count, the swap count need to be checked too, so the page lock need to
be acquired too when calling reuse_swap_page() in addition to the page
table lock.
[ying.huang@intel.com: silence a compiler warning]
Link: http://lkml.kernel.org/r/87bmnzizjy.fsf@yhuang-dev.intel.com
Link: http://lkml.kernel.org/r/20170724051840.2309-4-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Hugh Dickins <hughd@google.com >
Cc: Shaohua Li <shli@kernel.org >
Cc: Rik van Riel <riel@redhat.com >
Cc: Andrea Arcangeli <aarcange@redhat.com >
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com >
Cc: Dan Williams <dan.j.williams@intel.com >
Cc: Jens Axboe <axboe@kernel.dk >
Cc: Michal Hocko <mhocko@kernel.org >
Cc: Ross Zwisler <ross.zwisler@intel.com > [for brd.c, zram_drv.c, pmem.c]
Cc: Vishal L Verma <vishal.l.verma@intel.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2017-09-06 17:27:27 -07:00