Andrey Konovalov
01d92c7f35
kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_tagged
...
HW_TAGS KASAN relies on ARM Memory Tagging Extension (MTE). With MTE, a
memory region must be mapped as MT_NORMAL_TAGGED to allow setting memory
tags via MTE-specific instructions.
Add proper protection bits to vmalloc() allocations. These allocations
are always backed by page_alloc pages, so the tags will actually be
getting set on the corresponding physical memory.
Link: https://lkml.kernel.org/r/983fc33542db2f6b1e77b34ca23448d4640bbb9e.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com >
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com >
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com >
Acked-by: Marco Elver <elver@google.com >
Cc: Alexander Potapenko <glider@google.com >
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Dmitry Vyukov <dvyukov@google.com >
Cc: Evgenii Stepanov <eugenis@google.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Collingbourne <pcc@google.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2022-03-24 19:06:47 -07:00
Andrey Konovalov
0b7ccc70ee
kasan, vmalloc: drop outdated VM_KASAN comment
...
The comment about VM_KASAN in include/linux/vmalloc.c is outdated.
VM_KASAN is currently only used to mark vm_areas allocated for kernel
modules when CONFIG_KASAN_VMALLOC is disabled.
Drop the comment.
Link: https://lkml.kernel.org/r/780395afea83a147b3b5acc36cf2e38f7f8479f9.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com >
Reviewed-by: Alexander Potapenko <glider@google.com >
Acked-by: Marco Elver <elver@google.com >
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Dmitry Vyukov <dvyukov@google.com >
Cc: Evgenii Stepanov <eugenis@google.com >
Cc: Mark Rutland <mark.rutland@arm.com >
Cc: Peter Collingbourne <pcc@google.com >
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2022-03-24 19:06:47 -07:00
Linus Torvalds
1ebdbeb03e
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
...
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
2022-03-24 11:58:57 -07:00
Bang Li
ff11a7ce1f
mm/vmalloc: fix comments about vmap_area struct
...
The vmap_area_root should be in the "busy" tree and the
free_vmap_area_root should be in the "free" tree.
Link: https://lkml.kernel.org/r/20220305011510.33596-1-libang.linuxer@gmail.com
Fixes: 688fcbfc06 ("mm/vmalloc: modify struct vmap_area to reduce its size")
Signed-off-by: Bang Li <libang.linuxer@gmail.com >
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Pengfei Li <lpf.vector@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2022-03-22 15:57:05 -07:00
Paolo Bonzini
a8749a35c3
mm: vmalloc: introduce array allocation functions
...
Linux has dozens of occurrences of vmalloc(array_size()) and
vzalloc(array_size()). Allow to simplify the code by providing
vmalloc_array and vcalloc, as well as the underscored variants that let
the caller specify the GFP flags.
Acked-by: Michal Hocko <mhocko@suse.com >
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com >
2022-03-08 09:30:17 -05:00
Kefeng Wang
60115fa54a
mm: defer kmemleak object creation of module_alloc()
...
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].
When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.
module_alloc
__vmalloc_node_range
kmemleak_vmalloc
kmemleak_scan
update_checksum
kasan_module_alloc
kmemleak_ignore
Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated. Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.
Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.
[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/
[wangkefeng.wang@huawei.com: fix build]
Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com
Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82d ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Fixes: bebf56a1b1 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com >
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com >
Cc: Andrey Konovalov <andreyknvl@gmail.com >
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com >
Cc: Dmitry Vyukov <dvyukov@google.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Will Deacon <will@kernel.org >
Cc: Heiko Carstens <hca@linux.ibm.com >
Cc: Vasily Gorbik <gor@linux.ibm.com >
Cc: Christian Borntraeger <borntraeger@linux.ibm.com >
Cc: Alexander Gordeev <agordeev@linux.ibm.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: Alexander Potapenko <glider@google.com >
Cc: Kefeng Wang <wangkefeng.wang@huawei.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2022-01-15 16:30:25 +02:00
Peter Zijlstra
bd1a8fb2d4
mm/vmalloc: don't allow VM_NO_GUARD on vmap()
...
The vmalloc guard pages are added on top of each allocation, thereby
isolating any two allocations from one another. The top guard of the
lower allocation is the bottom guard guard of the higher allocation etc.
Therefore VM_NO_GUARD is dangerous; it breaks the basic premise of
isolating separate allocations.
There are only two in-tree users of this flag, neither of which use it
through the exported interface. Ensure it stays this way.
Link: https://lkml.kernel.org/r/YUMfdA36fuyZ+/xt@hirez.programming.kicks-ass.net
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Reviewed-by: David Hildenbrand <david@redhat.com >
Acked-by: Will Deacon <will@kernel.org >
Acked-by: Kees Cook <keescook@chromium.org >
Cc: Andrey Konovalov <andreyknvl@gmail.com >
Cc: Mel Gorman <mgorman@suse.de >
Cc: Uladzislau Rezki <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-11-06 13:30:36 -07:00
Kees Cook
894f24bb56
mm/vmalloc: add __alloc_size attributes for better bounds checking
...
As already done in GrapheneOS, add the __alloc_size attribute for
appropriate vmalloc allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.
Link: https://lkml.kernel.org/r/20210930222704.2631604-7-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org >
Co-developed-by: Daniel Micay <danielmicay@gmail.com >
Signed-off-by: Daniel Micay <danielmicay@gmail.com >
Cc: Andy Whitcroft <apw@canonical.com >
Cc: Christoph Lameter <cl@linux.com >
Cc: David Rientjes <rientjes@google.com >
Cc: Dennis Zhou <dennis@kernel.org >
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com >
Cc: Joe Perches <joe@perches.com >
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com >
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com >
Cc: Miguel Ojeda <ojeda@kernel.org >
Cc: Nathan Chancellor <nathan@kernel.org >
Cc: Nick Desaulniers <ndesaulniers@google.com >
Cc: Pekka Enberg <penberg@kernel.org >
Cc: Tejun Heo <tj@kernel.org >
Cc: Vlastimil Babka <vbabka@suse.cz >
Cc: Alexandre Bounine <alex.bou9@gmail.com >
Cc: Gustavo A. R. Silva <gustavoars@kernel.org >
Cc: Ira Weiny <ira.weiny@intel.com >
Cc: Jing Xiangfeng <jingxiangfeng@huawei.com >
Cc: John Hubbard <jhubbard@nvidia.com >
Cc: kernel test robot <lkp@intel.com >
Cc: Matt Porter <mporter@kernel.crashing.org >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Souptick Joarder <jrdr.linux@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-11-06 13:30:34 -07:00
Christoph Hellwig
82a70ce042
mm: move ioremap_page_range to vmalloc.c
...
Patch series "small ioremap cleanups".
The first patch moves a little code around the vmalloc/ioremap boundary
following a bigger move by Nick earlier. The second enforces
non-executable mapping on ioremap just like we do for vmap. No driver
currently uses executable mappings anyway, as they should.
This patch (of 2):
This keeps it together with the implementation, and to remove the
vmap_range wrapper.
Link: https://lkml.kernel.org/r/20210824091259.1324527-1-hch@lst.de
Link: https://lkml.kernel.org/r/20210824091259.1324527-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de >
Reviewed-by: Nicholas Piggin <npiggin@gmail.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-09-08 11:50:24 -07:00
Zhen Lei
06c8839815
mm: fix spelling mistakes in header files
...
Fix some spelling mistakes in comments:
successfull ==> successful
potentialy ==> potentially
alloced ==> allocated
indicies ==> indices
wont ==> won't
resposible ==> responsible
dirtyness ==> dirtiness
droppped ==> dropped
alread ==> already
occured ==> occurred
interupts ==> interrupts
extention ==> extension
slighly ==> slightly
Dont't ==> Don't
Link: https://lkml.kernel.org/r/20210531034849.9549-2-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com >
Cc: Jerome Glisse <jglisse@redhat.com >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Dennis Zhou <dennis@kernel.org >
Cc: Tejun Heo <tj@kernel.org >
Cc: Christoph Lameter <cl@linux.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-07-08 11:48:21 -07:00
Christophe Leroy
3382bbee04
mm/vmalloc: enable mapping of huge pages at pte level in vmalloc
...
On some architectures like powerpc, there are huge pages that are mapped
at pte level.
Enable it in vmalloc.
For that, architectures can provide arch_vmap_pte_supported_shift() that
returns the shift for pages to map at pte level.
Link: https://lkml.kernel.org/r/2c717e3b1fba1894d890feb7669f83025bfa314d.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Nicholas Piggin <npiggin@gmail.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-30 20:47:26 -07:00
Christophe Leroy
f7ee1f13d6
mm/vmalloc: enable mapping of huge pages at pte level in vmap
...
On some architectures like powerpc, there are huge pages that are mapped
at pte level.
Enable it in vmap.
For that, architectures can provide arch_vmap_pte_range_map_size() that
returns the size of pages to map at pte level.
Link: https://lkml.kernel.org/r/fb3ccc73377832ac6708181ec419128a2f98ce36.1620795204.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Kravetz <mike.kravetz@oracle.com >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Nicholas Piggin <npiggin@gmail.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-30 20:47:26 -07:00
Claudio Imbrenda
15a64f5a88
mm/vmalloc: add vmalloc_no_huge
...
Patch series "mm: add vmalloc_no_huge and use it", v4.
Add vmalloc_no_huge() and export it, so modules can allocate memory with
small pages.
Use the newly added vmalloc_no_huge() in KVM on s390 to get around a
hardware limitation.
This patch (of 2):
Commit 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings") added
support for hugepage vmalloc mappings, it also added the flag
VM_NO_HUGE_VMAP for __vmalloc_node_range to request the allocation to be
performed with 0-order non-huge pages.
This flag is not accessible when calling vmalloc, the only option is to
call directly __vmalloc_node_range, which is not exported.
This means that a module can't vmalloc memory with small pages.
Case in point: KVM on s390x needs to vmalloc a large area, and it needs
to be mapped with non-huge pages, because of a hardware limitation.
This patch adds the function vmalloc_no_huge, which works like vmalloc,
but it is guaranteed to always back the mapping using small pages. This
new function is exported, therefore it is usable by modules.
[akpm@linux-foundation.org: whitespace fixes, per Christoph]
Link: https://lkml.kernel.org/r/20210614132357.10202-1-imbrenda@linux.ibm.com
Link: https://lkml.kernel.org/r/20210614132357.10202-2-imbrenda@linux.ibm.com
Fixes: 121e6f3258 ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com >
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com >
Acked-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: David Hildenbrand <david@redhat.com >
Acked-by: David Rientjes <rientjes@google.com >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Christoph Hellwig <hch@infradead.org >
Cc: Cornelia Huck <cohuck@redhat.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-06-24 19:40:53 -07:00
Ingo Molnar
f0953a1bba
mm: fix typos in comments
...
Fix ~94 single-word typos in locking code comments, plus a few
very obvious grammar mistakes.
Link: https://lkml.kernel.org/r/20210322212624.GA1963421@gmail.com
Link: https://lore.kernel.org/r/20210322205203.GB1959563@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org >
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Reviewed-by: Randy Dunlap <rdunlap@infradead.org >
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:35 -07:00
David Hildenbrand
f7c8ce44eb
mm/vmalloc: remove vwrite()
...
The last user (/dev/kmem) is gone. Let's drop it.
Link: https://lkml.kernel.org/r/20210324102351.6932-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: Hillf Danton <hdanton@sina.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Minchan Kim <minchan@kernel.org >
Cc: huang ying <huang.ying.caritas@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:34 -07:00
David Hildenbrand
bbcd53c960
drivers/char: remove /dev/kmem for good
...
Patch series "drivers/char: remove /dev/kmem for good".
Exploring /dev/kmem and /dev/mem in the context of memory hot(un)plug and
memory ballooning, I started questioning the existence of /dev/kmem.
Comparing it with the /proc/kcore implementation, it does not seem to be
able to deal with things like
a) Pages unmapped from the direct mapping (e.g., to be used by secretmem)
-> kern_addr_valid(). virt_addr_valid() is not sufficient.
b) Special cases like gart aperture memory that is not to be touched
-> mem_pfn_is_ram()
Unless I am missing something, it's at least broken in some cases and might
fault/crash the machine.
Looks like its existence has been questioned before in 2005 and 2010 [1],
after ~11 additional years, it might make sense to revive the discussion.
CONFIG_DEVKMEM is only enabled in a single defconfig (on purpose or by
mistake?). All distributions disable it: in Ubuntu it has been disabled
for more than 10 years, in Debian since 2.6.31, in Fedora at least
starting with FC3, in RHEL starting with RHEL4, in SUSE starting from
15sp2, and OpenSUSE has it disabled as well.
1) /dev/kmem was popular for rootkits [2] before it got disabled
basically everywhere. Ubuntu documents [3] "There is no modern user of
/dev/kmem any more beyond attackers using it to load kernel rootkits.".
RHEL documents in a BZ [5] "it served no practical purpose other than to
serve as a potential security problem or to enable binary module drivers
to access structures/functions they shouldn't be touching"
2) /proc/kcore is a decent interface to have a controlled way to read
kernel memory for debugging puposes. (will need some extensions to
deal with memory offlining/unplug, memory ballooning, and poisoned
pages, though)
3) It might be useful for corner case debugging [1]. KDB/KGDB might be a
better fit, especially, to write random memory; harder to shoot
yourself into the foot.
4) "Kernel Memory Editor" [4] hasn't seen any updates since 2000 and seems
to be incompatible with 64bit [1]. For educational purposes,
/proc/kcore might be used to monitor value updates -- or older
kernels can be used.
5) It's broken on arm64, and therefore, completely disabled there.
Looks like it's essentially unused and has been replaced by better
suited interfaces for individual tasks (/proc/kcore, KDB/KGDB). Let's
just remove it.
[1] https://lwn.net/Articles/147901/
[2] https://www.linuxjournal.com/article/10505
[3] https://wiki.ubuntu.com/Security/Features#A.2Fdev.2Fkmem_disabled
[4] https://sourceforge.net/projects/kme/
[5] https://bugzilla.redhat.com/show_bug.cgi?id=154796
Link: https://lkml.kernel.org/r/20210324102351.6932-1-david@redhat.com
Link: https://lkml.kernel.org/r/20210324102351.6932-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com >
Acked-by: Michal Hocko <mhocko@suse.com >
Acked-by: Kees Cook <keescook@chromium.org >
Cc: Linus Torvalds <torvalds@linux-foundation.org >
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org >
Cc: "Alexander A. Klimov" <grandmaster@al2klimov.de >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com >
Cc: Andrew Lunn <andrew@lunn.ch >
Cc: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com >
Cc: Arnd Bergmann <arnd@arndb.de >
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org >
Cc: Brian Cain <bcain@codeaurora.org >
Cc: Christian Borntraeger <borntraeger@de.ibm.com >
Cc: Christophe Leroy <christophe.leroy@csgroup.eu >
Cc: Chris Zankel <chris@zankel.net >
Cc: Corentin Labbe <clabbe@baylibre.com >
Cc: "David S. Miller" <davem@davemloft.net >
Cc: "Eric W. Biederman" <ebiederm@xmission.com >
Cc: Geert Uytterhoeven <geert@linux-m68k.org >
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com >
Cc: Greentime Hu <green.hu@gmail.com >
Cc: Gregory Clement <gregory.clement@bootlin.com >
Cc: Heiko Carstens <hca@linux.ibm.com >
Cc: Helge Deller <deller@gmx.de >
Cc: Hillf Danton <hdanton@sina.com >
Cc: huang ying <huang.ying.caritas@gmail.com >
Cc: Ingo Molnar <mingo@kernel.org >
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru >
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com >
Cc: James Troup <james.troup@canonical.com >
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com >
Cc: Jonas Bonn <jonas@southpole.se >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Kairui Song <kasong@redhat.com >
Cc: Krzysztof Kozlowski <krzk@kernel.org >
Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com >
Cc: Liviu Dudau <liviu.dudau@arm.com >
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com >
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com >
Cc: Luis Chamberlain <mcgrof@kernel.org >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Matt Turner <mattst88@gmail.com >
Cc: Max Filippov <jcmvbkbc@gmail.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Mike Rapoport <rppt@kernel.org >
Cc: Mikulas Patocka <mpatocka@redhat.com >
Cc: Minchan Kim <minchan@kernel.org >
Cc: Niklas Schnelle <schnelle@linux.ibm.com >
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com >
Cc: openrisc@lists.librecores.org
Cc: Palmer Dabbelt <palmerdabbelt@google.com >
Cc: Paul Mackerras <paulus@samba.org >
Cc: "Pavel Machek (CIP)" <pavel@denx.de >
Cc: Pavel Machek <pavel@ucw.cz >
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org >
Cc: Pierre Morel <pmorel@linux.ibm.com >
Cc: Randy Dunlap <rdunlap@infradead.org >
Cc: Richard Henderson <rth@twiddle.net >
Cc: Rich Felker <dalias@libc.org >
Cc: Robert Richter <rric@kernel.org >
Cc: Rob Herring <robh@kernel.org >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Sam Ravnborg <sam@ravnborg.org >
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de >
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com >
Cc: sparclinux@vger.kernel.org
Cc: Stafford Horne <shorne@gmail.com >
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi >
Cc: Steven Rostedt <rostedt@goodmis.org >
Cc: Sudeep Holla <sudeep.holla@arm.com >
Cc: Theodore Dubois <tblodt@icloud.com >
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Vasily Gorbik <gor@linux.ibm.com >
Cc: Viresh Kumar <viresh.kumar@linaro.org >
Cc: William Cohen <wcohen@redhat.com >
Cc: Xiaoming Ni <nixiaoming@huawei.com >
Cc: Yoshinori Sato <ysato@users.sourceforge.jp >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-05-07 00:26:34 -07:00
Nicholas Piggin
4ad0ae8c64
mm/vmalloc: remove unmap_kernel_range
...
This is a shim around vunmap_range, get rid of it.
Move the main API comment from the _noflush variant to the normal
variant, and make _noflush internal to mm/.
[npiggin@gmail.com: fix nommu builds and a comment bug per sfr]
Link: https://lkml.kernel.org/r/1617292598.m6g0knx24s.astroid@bobo.none
[akpm@linux-foundation.org: move vunmap_range_noflush() stub inside !CONFIG_MMU, not !CONFIG_NUMA]
[npiggin@gmail.com: fix nommu builds]
Link: https://lkml.kernel.org/r/1617292497.o1uhq5ipxp.astroid@bobo.none
Link: https://lkml.kernel.org/r/20210322021806.892164-5-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Cédric Le Goater <clg@kaod.org >
Cc: Uladzislau Rezki <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
b67177ecd9
mm/vmalloc: remove map_kernel_range
...
Patch series "mm/vmalloc: cleanup after hugepage series", v2.
Christoph pointed out some overdue cleanups required after the huge
vmalloc series, and I had another failure error message improvement as
well.
This patch (of 5):
This is a shim around vmap_pages_range, get rid of it.
Move the main API comment from the _noflush variant to the normal variant,
and make _noflush internal to mm/.
Link: https://lkml.kernel.org/r/20210322021806.892164-1-npiggin@gmail.com
Link: https://lkml.kernel.org/r/20210322021806.892164-2-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Uladzislau Rezki <urezki@gmail.com >
Cc: Cédric Le Goater <clg@kaod.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
121e6f3258
mm/vmalloc: hugepage vmalloc mappings
...
Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
supports PMD sized vmap mappings.
vmalloc will attempt to allocate PMD-sized pages if allocating PMD size or
larger, and fall back to small pages if that was unsuccessful.
Architectures must ensure that any arch specific vmalloc allocations that
require PAGE_SIZE mappings (e.g., module allocations vs strict module rwx)
use the VM_NOHUGE flag to inhibit larger mappings.
This can result in more internal fragmentation and memory overhead for a
given allocation, an option nohugevmalloc is added to disable at boot.
[colin.king@canonical.com: fix read of uninitialized pointer area]
Link: https://lkml.kernel.org/r/20210318155955.18220-1-colin.king@canonical.com
Link: https://lkml.kernel.org/r/20210317062402.533919-14-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
5e9e3d777b
mm: move vmap_range from mm/ioremap.c to mm/vmalloc.c
...
This is a generic kernel virtual memory mapper, not specific to ioremap.
Code is unchanged other than making vmap_range non-static.
Link: https://lkml.kernel.org/r/20210317062402.533919-12-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Christoph Hellwig <hch@lst.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
6f680e70b6
mm/vmalloc: provide fallback arch huge vmap support functions
...
If an architecture doesn't support a particular page table level as a huge
vmap page size then allow it to skip defining the support query function.
Link: https://lkml.kernel.org/r/20210317062402.533919-11-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Suggested-by: Christoph Hellwig <hch@lst.de >
Cc: Borislav Petkov <bp@alien8.de >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Ding Tianhong <dingtianhong@huawei.com >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Cc: Will Deacon <will@kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Nicholas Piggin
bbc180a5ad
mm: HUGE_VMAP arch support cleanup
...
This changes the awkward approach where architectures provide init
functions to determine which levels they can provide large mappings for,
to one where the arch is queried for each call.
This removes code and indirection, and allows constant-folding of dead
code for unsupported levels.
This also adds a prot argument to the arch query. This is unused
currently but could help with some architectures (e.g., some powerpc
processors can't map uncacheable memory with large pages).
Link: https://lkml.kernel.org/r/20210317062402.533919-7-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com >
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com >
Acked-by: Catalin Marinas <catalin.marinas@arm.com > [arm64]
Cc: Will Deacon <will@kernel.org >
Cc: Thomas Gleixner <tglx@linutronix.de >
Cc: Ingo Molnar <mingo@redhat.com >
Cc: Borislav Petkov <bp@alien8.de >
Cc: "H. Peter Anvin" <hpa@zytor.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Michael Ellerman <mpe@ellerman.id.au >
Cc: Russell King <linux@armlinux.org.uk >
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-04-30 11:20:40 -07:00
Paul E. McKenney
5bb1bb353c
mm: Don't build mm_dump_obj() on CONFIG_PRINTK=n kernels
...
The mem_dump_obj() functionality adds a few hundred bytes, which is a
small price to pay. Except on kernels built with CONFIG_PRINTK=n, in
which mem_dump_obj() messages will be suppressed. This commit therefore
makes mem_dump_obj() be a static inline empty function on kernels built
with CONFIG_PRINTK=n and excludes all of its support functions as well.
This avoids kernel bloat on systems that cannot use mem_dump_obj().
Cc: Christoph Lameter <cl@linux.com >
Cc: Pekka Enberg <penberg@kernel.org >
Cc: David Rientjes <rientjes@google.com >
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com >
Cc: <linux-mm@kvack.org >
Suggested-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Paul E. McKenney <paulmck@kernel.org >
2021-03-08 14:18:46 -08:00
Ingo Molnar
85e853c5ec
Merge branch 'for-mingo-rcu' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
...
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
Signed-off-by: Ingo Molnar <mingo@kernel.org >
2021-02-12 12:56:55 +01:00
Rick Edgecombe
4f6ec86023
mm/vmalloc: separate put pages and flush VM flags
...
When VM_MAP_PUT_PAGES was added, it was defined with the same value as
VM_FLUSH_RESET_PERMS. This doesn't seem like it will cause any big
functional problems other than some excess flushing for VM_MAP_PUT_PAGES
allocations.
Redefine VM_MAP_PUT_PAGES to have its own value. Also, rearrange things
so flags are less likely to be missed in the future.
Link: https://lkml.kernel.org/r/20210122233706.9304-1-rick.p.edgecombe@intel.com
Fixes: b944afc9d6 ("mm: add a VM_MAP_PUT_PAGES flag for vmap")
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com >
Suggested-by: Matthew Wilcox <willy@infradead.org >
Cc: Miaohe Lin <linmiaohe@huawei.com >
Cc: Christoph Hellwig <hch@lst.de >
Cc: Daniel Axtens <dja@axtens.net >
Cc: <stable@vger.kernel.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org >
2021-02-05 11:03:47 -08:00