Kefeng Wang
60115fa54a
mm: defer kmemleak object creation of module_alloc()
...
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].
When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.
module_alloc
__vmalloc_node_range
kmemleak_vmalloc
kmemleak_scan
update_checksum
kasan_module_alloc
kmemleak_ignore
Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated. Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.
Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.
[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82d ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Fixes: bebf56a1b1 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com >
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com >
Cc: Andrey Konovalov <andreyknvl@gmail.com >
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com >
Cc: Dmitry Vyukov <dvyukov@google.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Heiko Carstens <hca@linux.ibm.com >
Cc: Vasily Gorbik <gor@linux.ibm.com >
Cc: Christian Borntraeger <borntraeger@linux.ibm.com >
Cc: Alexander Gordeev <agordeev@linux.ibm.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Alexander Potapenko <glider@google.com >
Cc: Kefeng Wang <wangkefeng.wang@huawei.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2022-01-15 16:30:25 +02:00
Peter Zijlstra
bd1a8fb2d4
mm/vmalloc: don't allow VM_NO_GUARD on vmap()
...
The vmalloc guard pages are added on top of each allocation, thereby
isolating any two allocations from one another. The top guard of the
lower allocation is the bottom guard guard of the higher allocation etc.
Therefore VM_NO_GUARD is dangerous; it breaks the basic premise of
isolating separate allocations.
There are only two in-tree users of this flag, neither of which use it
through the exported interface. Ensure it stays this way.
Link: https://lkml.kernel.org/r/YUMfdA36fuyZ+/xt@hirez.programming.kicks-ass.net
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Reviewed-by: David Hildenbrand <david@redhat.com >
Acked-by: Will Deacon <will@kernel.org >
Acked-by: Kees Cook <keescook@chromium.org >
Cc: Andrey Konovalov <andreyknvl@gmail.com >
Cc: Mel Gorman <mgorman@suse.de >
Cc: Uladzislau Rezki <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-11-06 13:30:36 -07:00
Kees Cook
894f24bb56
mm/vmalloc: add __alloc_size attributes for better bounds checking
...
As already done in GrapheneOS, add the __alloc_size attribute for
appropriate vmalloc allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.
Link: https://lkml.kernel.org/r/20210930222704.2631604-7-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org >
Co-developed-by: Daniel Micay <danielmicay@gmail.com >
Signed-off-by: Daniel Micay <danielmicay@gmail.com >
Cc: Andy Whitcroft <apw@canonical.com >
Cc: Christoph Lameter <cl@linux.com >
Cc: David Rientjes <rientjes@google.com >
Cc: Dennis Zhou <dennis@kernel.org >
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com >
Cc: Joe Perches <joe@perches.com >
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com >
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com >
Cc: Miguel Ojeda <ojeda@kernel.org >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Pekka Enberg <penberg@kernel.org >
Cc: Tejun Heo <tj@kernel.org >
Cc: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexandre Bounine <alex.bou9@gmail.com >
Cc: Gustavo A. R. Silva <gustavoars@kernel.org >
Cc: Ira Weiny <ira.weiny@intel.com >
Cc: Jing Xiangfeng <jingxiangfeng@huawei.com >
Cc: John Hubbard <jhubbard@nvidia.com >
Cc: kernel test robot <lkp@intel.com >
Cc: Matt Porter <mporter@kernel.crashing.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Souptick Joarder <jrdr.linux@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-11-06 13:30:34 -07:00
Christoph Hellwig
82a70ce042
mm: move ioremap_page_range to vmalloc.c
...
Patch series "small ioremap cleanups".
The first patch moves a little code around the vmalloc/ioremap boundary
following a bigger move by Nick earlier. The second enforces
non-executable mapping on ioremap just like we do for vmap. No driver
currently uses executable mappings anyway, as they should.
This patch (of 2):
This keeps it together with the implementation, and to remove the
vmap_range wrapper.
Link: https://lkml.kernel.org/r/20210824091259.1324527-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210824091259.1324527-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de >
Reviewed-by: Nicholas Piggin <npiggin@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-09-08 11:50:24 -07:00
Zhen Lei
06c8839815
mm: fix spelling mistakes in header files
...
Fix some spelling mistakes in comments:
successfull ==> successful
potentialy ==> potentially
alloced ==> allocated
indicies ==> indices
wont ==> won't
resposible ==> responsible
dirtyness ==> dirtiness
droppped ==> dropped
alread ==> already
occured ==> occurred
interupts ==> interrupts
extention ==> extension
slighly ==> slightly
Dont't ==> Don't
Link: https://lkml.kernel.org/r/20210531034849.9549-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com >
Cc: Jerome Glisse <jglisse@redhat.com >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Dennis Zhou <dennis@kernel.org >
Cc: Tejun Heo <tj@kernel.org >
Cc: Christoph Lameter <cl@linux.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-07-08 11:48:21 -07:00
Christophe Leroy
3382bbee04
mm/vmalloc: enable mapping of huge pages at pte level in vmalloc
...
On some architectures like powerpc, there are huge pages that are mapped
at pte level.
Enable it in vmalloc.
For that, architectures can provide arch_vmap_pte_supported_shift() that
returns the shift for pages to map at pte level.
Link: https://lkml.kernel.org/r/2c717e3b1fba1894d890feb7669f83025bfa314d.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Nicholas Piggin <npiggin@gmail.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-30 20:47:26 -07:00
Christophe Leroy
f7ee1f13d6
mm/vmalloc: enable mapping of huge pages at pte level in vmap
...
On some architectures like powerpc, there are huge pages that are mapped
at pte level.
Enable it in vmap.
For that, architectures can provide arch_vmap_pte_range_map_size() that
returns the size of pages to map at pte level.
Link: https://lkml.kernel.org/r/fb3ccc73377832ac6708181ec419128a2f98ce36.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Nicholas Piggin <npiggin@gmail.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-30 20:47:26 -07:00
Claudio Imbrenda
15a64f5a88
mm/vmalloc: add vmalloc_no_huge
...
Patch series "mm: add vmalloc_no_huge and use it", v4.
Add vmalloc_no_huge() and export it, so modules can allocate memory with
small pages.
Use the newly added vmalloc_no_huge() in KVM on s390 to get around a
hardware limitation.
This patch (of 2):
Commit 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings") added
support for hugepage vmalloc mappings, it also added the flag
VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be
performed with 0-order non-huge pages.
This flag is not accessible when calling vmalloc, the only option is to
call directly __vmalloc_node_range, which is not exported.
This means that a module can't vmalloc memory with small pages.
Case in point: KVM on s390x needs to vmalloc a large area, and it needs
to be mapped with non-huge pages, because of a hardware limitation.
This patch adds the function vmalloc_no_huge, which works like vmalloc,
but it is guaranteed to always back the mapping using small pages. This
new function is exported, therefore it is usable by modules.
[akpm@linux-foundation.org: whitespace fixes, per Christoph]
Link: https://lkml.kernel.org/r/20210614132357.10202-1-imbrenda@linux.ibm.com
Link: https://lkml.kernel.org/r/20210614132357.10202-2-imbrenda@linux.ibm.com
Fixes: 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com >
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com >
Acked-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: David Hildenbrand <david@redhat.com >
Acked-by: David Rientjes <rientjes@google.com >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Christoph Hellwig <hch@infradead.org >
Cc: Cornelia Huck <cohuck@redhat.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-24 19:40:53 -07:00
Ingo Molnar
f0953a1bba
mm: fix typos in comments
...
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.
Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Reviewed-by: Randy Dunlap <rdunlap@infradead.org >
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:35 -07:00
David Hildenbrand
f7c8ce44eb
mm/vmalloc: remove vwrite()
...
The last user (/dev/kmem) is gone. Let's drop it.
Link: https://lkml.kernel.org/r/20210324102351.6932-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Hillf Danton <hdanton@sina.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: huang ying <huang.ying.caritas@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:34 -07:00
David Hildenbrand
bbcd53c960
drivers/char: remove /dev/kmem for good
...
Patch series "drivers/char: remove /dev/kmem for good".
Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
memory ballooning, I started questioning the existence of /dev/kmem.
Comparing it with the /proc/kcore implementation, it does not seem to be
able to deal with things like
a) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
-> kern_addr_valid(). virt_addr_valid() is not sufficient.
b) Special cases like gart aperture memory that is not to be touched
-> mem_pfn_is_ram()
Unless I am missing something, it's at least broken in some cases and might
fault/crash the machine.
Looks like its existence has been questioned before in 2005 and 2010 [1],
after ~11 additional years, it might make sense to revive the discussion.
CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
mistake?). All distributions disable it: in Ubuntu it has been disabled
for more than 10 years, in Debian since 2.6.31, in Fedora at least
starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
15sp2, and OpenSUSE has it disabled as well.
1) /dev/kmem was popular for rootkits [2] before it got disabled
basically everywhere. Ubuntu documents [3] "There is no modern user of
/dev/kmem any more beyond attackers using it to load kernel rootkits.".
RHEL documents in a BZ [5] "it served no practical purpose other than to
serve as a potential security problem or to enable binary module drivers
to access structures/functions they shouldn't be touching"
2) /proc/kcore is a decent interface to have a controlled way to read
kernel memory for debugging puposes. (will need some extensions to
deal with memory offlining/unplug, memory ballooning, and poisoned
pages, though)
3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
better fit, especially, to write random memory; harder to shoot
yourself into the foot.
4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
to be incompatible with 64bit [1]. For educational purposes,
/proc/kcore might be used to monitor value updates -- or older
kernels can be used.
5) It's broken on arm64, and therefore, completely disabled there.
Looks like it's essentially unused and has been replaced by better
suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
just remove it.
[1] https://lwn.net/Articles/147901/
[2] https://www.linuxjournal.com/article/10505
[3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
[4] https://sourceforge.net/projects/kme/
[5] https://bugzilla.redhat.com/show_bug.cgi?id=154796
Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Acked-by: Kees Cook <keescook@chromium.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com >
Cc: Andrew Lunn <andrew@lunn.ch >
Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com >
Cc: Arnd Bergmann <arnd@arndb.de >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Brian Cain <bcain@codeaurora.org >
Cc: Christian Borntraeger <borntraeger@de.ibm.com >
Cc: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Chris Zankel <chris@zankel.net >
Cc: Corentin Labbe <clabbe@baylibre.com >
Cc: "David S. Miller" <davem@davemloft.net >
Cc: "Eric W. Biederman" <ebiederm@xmission.com >
Cc: Geert Uytterhoeven <geert@linux-m68k.org >
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com >
Cc: Greentime Hu <green.hu@gmail.com >
Cc: Gregory Clement <gregory.clement@bootlin.com >
Cc: Heiko Carstens <hca@linux.ibm.com >
Cc: Helge Deller <deller@gmx.de >
Cc: Hillf Danton <hdanton@sina.com >
Cc: huang ying <huang.ying.caritas@gmail.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru >
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com >
Cc: James Troup <james.troup@canonical.com >
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com >
Cc: Jonas Bonn <jonas@southpole.se >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Kairui Song <kasong@redhat.com >
Cc: Krzysztof Kozlowski <krzk@kernel.org >
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com >
Cc: Liviu Dudau <liviu.dudau@arm.com >
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com >
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com >
Cc: Luis Chamberlain <mcgrof@kernel.org >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Matt Turner <mattst88@gmail.com >
Cc: Max Filippov <jcmvbkbc@gmail.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Mikulas Patocka <mpatocka@redhat.com >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Niklas Schnelle <schnelle@linux.ibm.com >
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com >
Cc: openrisc@lists.librecores.org
Cc: Palmer Dabbelt <palmerdabbelt@google.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: "Pavel Machek (CIP)" <pavel@denx.de >
Cc: Pavel Machek <pavel@ucw.cz >
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org >
Cc: Pierre Morel <pmorel@linux.ibm.com >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Richard Henderson <rth@twiddle.net >
Cc: Rich Felker <dalias@libc.org >
Cc: Robert Richter <rric@kernel.org >
Cc: Rob Herring <robh@kernel.org >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Sam Ravnborg <sam@ravnborg.org >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com >
Cc: sparclinux@vger.kernel.org
Cc: Stafford Horne <shorne@gmail.com >
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Sudeep Holla <sudeep.holla@arm.com >
Cc: Theodore Dubois <tblodt@icloud.com >
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Vasily Gorbik <gor@linux.ibm.com >
Cc: Viresh Kumar <viresh.kumar@linaro.org >
Cc: William Cohen <wcohen@redhat.com >
Cc: Xiaoming Ni <nixiaoming@huawei.com >
Cc: Yoshinori Sato <ysato@users.sourceforge.jp >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:34 -07:00
Nicholas Piggin
4ad0ae8c64
mm/vmalloc: remove unmap_kernel_range
...
This is a shim around vunmap_range, get rid of it.
Move the main API comment from the _noflush variant to the normal
variant, and make _noflush internal to mm/.
[npiggin@gmail.com: fix nommu builds and a comment bug per sfr]
Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none
[akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA]
[npiggin@gmail.com: fix nommu builds]
Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.none
Link: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Cédric Le Goater <clg@kaod.org >
Cc: Uladzislau Rezki <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
b67177ecd9
mm/vmalloc: remove map_kernel_range
...
Patch series "mm/vmalloc: cleanup after hugepage series", v2.
Christoph pointed out some overdue cleanups required after the huge
vmalloc series, and I had another failure error message improvement as
well.
This patch (of 5):
This is a shim around vmap_pages_range, get rid of it.
Move the main API comment from the _noflush variant to the normal variant,
and make _noflush internal to mm/.
Link: https://lkml.kernel.org/r/20210322021806.892164-1-npiggin@gmail.com
Link: https://lkml.kernel.org/r/20210322021806.892164-2-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Uladzislau Rezki <urezki@gmail.com >
Cc: Cédric Le Goater <clg@kaod.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
121e6f3258
mm/vmalloc: hugepage vmalloc mappings
...
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.
vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.
Architectures must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx)
use the VM_NOHUGE flag to inhibit larger mappings.
This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.
[colin.king@canonical.com: fix read of uninitialized pointer area]
Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.com
Link: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
5e9e3d777b
mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c
...
This is a generic kernel virtual memory mapper, not specific to ioremap.
Code is unchanged other than making vmap_range non-static.
Link: https://lkml.kernel.org/r/20210317062402.533919-12-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
6f680e70b6
mm/vmalloc: provide fallback arch huge vmap support functions
...
If an architecture doesn't support a particular page table level as a huge
vmap page size then allow it to skip defining the support query function.
Link: https://lkml.kernel.org/r/20210317062402.533919-11-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Suggested-by: Christoph Hellwig <hch@lst.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
bbc180a5ad
mm: HUGE_VMAP arch support cleanup
...
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.
This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.
This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).
Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com >
Acked-by: Catalin Marinas <catalin.marinas@arm.com > [arm64]
Cc: Will Deacon <will@kernel.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Paul E. McKenney
5bb1bb353c
mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels
...
The mem_dump_obj() functionality adds a few hundred bytes, which is a
small price to pay. Except on kernels built with CONFIG_PRINTK=n, in
which mem_dump_obj() messages will be suppressed. This commit therefore
makes mem_dump_obj() be a static inline empty function on kernels built
with CONFIG_PRINTK=n and excludes all of its support functions as well.
This avoids kernel bloat on systems that cannot use mem_dump_obj().
Cc: Christoph Lameter <cl@linux.com >
Cc: Pekka Enberg <penberg@kernel.org >
Cc: David Rientjes <rientjes@google.com >
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com >
Cc: <linux-mm@kvack.org >
Suggested-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Paul E. McKenney <paulmck@kernel.org >
2021-03-08 14:18:46 -08:00
Ingo Molnar
85e853c5ec
Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
...
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2021-02-12 12:56:55 +01:00
Rick Edgecombe
4f6ec86023
mm/vmalloc: separate put pages and flush VM flags
...
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things
so flags are less likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d6 ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com >
Suggested-by: Matthew Wilcox <willy@infradead.org >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Daniel Axtens <dja@axtens.net >
Cc: <stable@vger.kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-02-05 11:03:47 -08:00
Paul E. McKenney
98f180837a
mm: Make mem_dump_obj() handle vmalloc() memory
...
This commit adds vmalloc() support to mem_dump_obj(). Note that the
vmalloc_dump_obj() function combines the checking and dumping, in
contrast with the split between kmem_valid_obj() and kmem_dump_obj().
The reason for the difference is that the checking in the vmalloc()
case involves acquiring a global lock, and redundant acquisitions of
global locks should be avoided, even on not-so-fast paths.
Note that this change causes on-stack variables to be reported as
vmalloc() storage from kernel_clone() or similar, depending on the degree
of inlining that your compiler does. This is likely more helpful than
the earlier "non-paged (local) memory".
Cc: Andrew Morton <akpm@linux-foundation.org >
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com >
Cc: <linux-mm@kvack.org >
Reported-by: Andrii Nakryiko <andrii@kernel.org >
Acked-by: Vlastimil Babka <vbabka@suse.cz >
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org >
Signed-off-by: Paul E. McKenney <paulmck@kernel.org >
2021-01-22 15:24:04 -08:00
Uladzislau Rezki (Sony)
96e2db4561
mm/vmalloc: rework the drain logic
...
A current "lazy drain" model suffers from at least two issues.
First one is related to the unsorted list of vmap areas, thus in order to
identify the [min:max] range of areas to be drained, it requires a full
list scan. What is a time consuming if the list is too long.
Second one and as a next step is about merging all fragments with a free
space. What is also a time consuming because it has to iterate over
entire list which holds outstanding lazy areas.
See below the "preemptirqsoff" tracer that illustrates a high latency. It
is ~24676us. Our workloads like audio and video are effected by such long
latency:
<snip>
tracer: preemptirqsoff
preemptirqsoff latency trace v1.1.5 on 4.9.186-perf+
--------------------------------------------------------------------
latency: 24676 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 P:8)
-----------------
| task: crtc_commit:112-261 (uid:0 nice:0 policy:1 rt_prio:16)
-----------------
=> started at: __purge_vmap_area_lazy
=> ended at: __purge_vmap_area_lazy
_------=> CPU#
/ _-----=> irqs-off
| / _----=> need-resched
|| / _---=> hardirq/softirq
||| / _--=> preempt-depth
|||| / delay
cmd pid ||||| time | caller
\ / ||||| \ | /
crtc_com-261 1...1 1us*: _raw_spin_lock <-__purge_vmap_area_lazy
[...]
crtc_com-261 1...1 24675us : _raw_spin_unlock <-__purge_vmap_area_lazy
crtc_com-261 1...1 24677us : trace_preempt_on <-__purge_vmap_area_lazy
crtc_com-261 1...1 24683us : <stack trace>
=> free_vmap_area_noflush
=> remove_vm_area
=> __vunmap
=> vfree
=> drm_property_free_blob
=> drm_mode_object_unreference
=> drm_property_unreference_blob
=> __drm_atomic_helper_crtc_destroy_state
=> sde_crtc_destroy_state
=> drm_atomic_state_default_clear
=> drm_atomic_state_clear
=> drm_atomic_state_free
=> complete_commit
=> _msm_drm_commit_work_cb
=> kthread_worker_fn
=> kthread
=> ret_from_fork
<snip>
To address those two issues we can redesign a purging of the outstanding
lazy areas. Instead of queuing vmap areas to the list, we replace it by
the separate rb-tree. In hat case an area is located in the tree/list in
ascending order. It will give us below advantages:
a) Outstanding vmap areas are merged creating bigger coalesced blocks,
thus it becomes less fragmented.
b) It is possible to calculate a flush range [min:max] without scanning
all elements. It is O(1) access time or complexity;
c) The final merge of areas with the rb-tree that represents a free
space is faster because of (a). As a result the lock contention is
also reduced.
Link: https://lkml.kernel.org/r/20201116220033.1837-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Hillf Danton <hdanton@sina.com >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: huang ying <huang.ying.caritas@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2020-12-15 12:13:41 -08:00
Christoph Hellwig
301fa9f2dd
mm: remove alloc_vm_area
...
All users are gone now.
Signed-off-by: Christoph Hellwig <hch@lst.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Chris Wilson <chris@chris-wilson.co.uk >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Matthew Auld <matthew.auld@intel.com >
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Nitin Gupta <ngupta@vflare.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Stefano Stabellini <sstabellini@kernel.org >
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Link: https://lkml.kernel.org/r/20201002122204.1534411-12-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2020-10-18 09:27:10 -07:00
Christoph Hellwig
3e9a9e256b
mm: add a vmap_pfn function
...
Add a proper helper to remap PFNs into kernel virtual space so that
drivers don't have to abuse alloc_vm_area and open coded PTE manipulation
for it.
Signed-off-by: Christoph Hellwig <hch@lst.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Chris Wilson <chris@chris-wilson.co.uk >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Matthew Auld <matthew.auld@intel.com >
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Nitin Gupta <ngupta@vflare.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Stefano Stabellini <sstabellini@kernel.org >
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Link: https://lkml.kernel.org/r/20201002122204.1534411-4-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2020-10-18 09:27:10 -07:00
Christoph Hellwig
b944afc9d6
mm: add a VM_MAP_PUT_PAGES flag for vmap
...
Add a flag so that vmap takes ownership of the passed in page array. When
vfree is called on such an allocation it will put one reference on each
page, and free the page array itself.
Signed-off-by: Christoph Hellwig <hch@lst.de >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com >
Cc: Chris Wilson <chris@chris-wilson.co.uk >
Cc: Jani Nikula <jani.nikula@linux.intel.com >
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com >
Cc: Juergen Gross <jgross@suse.com >
Cc: Matthew Auld <matthew.auld@intel.com >
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Nitin Gupta <ngupta@vflare.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com >
Cc: Stefano Stabellini <sstabellini@kernel.org >
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Link: https://lkml.kernel.org/r/20201002122204.1534411-3-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2020-10-18 09:27:10 -07:00