mirror of
https://github.com/armbian/linux-rockchip.git
synced 2026-01-06 11:08:10 -08:00
9a2f97bb8ddddbf655ce1fcdf688dcec19deb59f
67 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
97105f0ab7 |
mm: vmalloc: show number of vmalloc pages in /proc/meminfo
Vmalloc() is getting more and more used these days (kernel stacks, bpf and
percpu allocator are new top users), and the total % of memory consumed by
vmalloc() can be pretty significant and changes dynamically.
/proc/meminfo is the best place to display this information: its top goal
is to show top consumers of the memory.
Since the VmallocUsed field in /proc/meminfo is not in use for quite a
long time (it has been defined to 0 by
|
||
|
|
68ad4a3304 |
mm/vmalloc.c: keep track of free blocks for vmap allocation
Patch series "improve vmap allocation", v3. Objective --------- Please have a look for the description at: https://lkml.org/lkml/2018/10/19/786 but let me also summarize it a bit here as well. The current implementation has O(N) complexity. Requests with different permissive parameters can lead to long allocation time. When i say "long" i mean milliseconds. Description ----------- This approach organizes the KVA memory layout into free areas of the 1-ULONG_MAX range, i.e. an allocation is done over free areas lookups, instead of finding a hole between two busy blocks. It allows to have lower number of objects which represent the free space, therefore to have less fragmented memory allocator. Because free blocks are always as large as possible. It uses the augment tree where all free areas are sorted in ascending order of va->va_start address in pair with linked list that provides O(1) access to prev/next elements. Since the tree is augment, we also maintain the "subtree_max_size" of VA that reflects a maximum available free block in its left or right sub-tree. Knowing that, we can easily traversal toward the lowest (left most path) free area. Allocation: ~O(log(N)) complexity. It is sequential allocation method therefore tends to maximize locality. The search is done until a first suitable block is large enough to encompass the requested parameters. Bigger areas are split. I copy paste here the description of how the area is split, since i described it in https://lkml.org/lkml/2018/10/19/786 <snip> A free block can be split by three different ways. Their names are FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e. they correspond to how requested size and alignment fit to a free block. FL_FIT_TYPE - in this case a free block is just removed from the free list/tree because it fully fits. Comparing with current design there is an extra work with rb-tree updating. LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit. In this case what we do is just cutting a free block. It is as fast as a current design. Most of the vmalloc allocations just end up with this case, because the edge is always aligned to 1. NE_FIT_TYPE - Is much less common case. Basically it happens when requested size and alignment does not fit left nor right edges, i.e. it is between them. In this case during splitting we have to build a remaining left free area and place it back to the free list/tree. Comparing with current design there are two extra steps. First one is we have to allocate a new vmap_area structure. Second one we have to insert that remaining free block to the address sorted list/tree. In order to optimize a first case there is a cache with free_vmap objects. Instead of allocating from slab we just take an object from the cache and reuse it. Second one is pretty optimized. Since we know a start point in the tree we do not do a search from the top. Instead a traversal begins from a rb-tree node we split. <snip> De-allocation. ~O(log(N)) complexity. An area is not inserted straight away to the tree/list, instead we identify the spot first, checking if it can be merged around neighbors. The list provides O(1) access to prev/next, so it is pretty fast to check it. Summarizing. If merged then large coalesced areas are created, if not the area is just linked making more fragments. There is one more thing that i should mention here. After modification of VA node, its subtree_max_size is updated if it was/is the biggest area in its left or right sub-tree. Apart of that it can also be populated back to upper levels to fix the tree. For more details please have a look at the __augment_tree_propagate_from() function and the description. Tests and stressing ------------------- I use the "test_vmalloc.sh" test driver available under "tools/testing/selftests/vm/" since 5.1-rc1 kernel. Just trigger "sudo ./test_vmalloc.sh" to find out how to deal with it. Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA. Regarding last one, i do not have any physical access to NUMA system, therefore i emulated it. The time of stressing is days. If you run the test driver in "stress mode", you also need the patch that is in Andrew's tree but not in Linux 5.1-rc1. So, please apply it: http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c After massive testing, i have not identified any problems like memory leaks, crashes or kernel panics. I find it stable, but more testing would be good. Performance analysis -------------------- I have used two systems to test. One is i5-3320M CPU @ 2.60GHz and another is HiKey960(arm64) board. i5-3320M runs on 4.20 kernel, whereas Hikey960 uses 4.15 kernel. I have both system which could run on 5.1-rc1 as well, but the results have not been ready by time i an writing this. Currently it consist of 8 tests. There are three of them which correspond to different types of splitting(to compare with default). We have 3 ones(see above). Another 5 do allocations in different conditions. a) sudo ./test_vmalloc.sh performance When the test driver is run in "performance" mode, it runs all available tests pinned to first online CPU with sequential execution test order. We do it in order to get stable and repeatable results. Take a look at time difference in "long_busy_list_alloc_test". It is not surprising because the worst case is O(N). # i5-3320M How many cycles all tests took: CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt # Hikey960 8x CPUs How many cycles all tests took: CPU0=3478683207 cycles vs CPU0=463767978 cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt b) time sudo ./test_vmalloc.sh test_repeat_count=1 With this configuration, all tests are run on all available online CPUs. Before running each CPU shuffles its tests execution order. It gives random allocation behaviour. So it is rough comparison, but it puts in the picture for sure. # i5-3320M <default> vs <patched> real 101m22.813s real 0m56.805s user 0m0.011s user 0m0.015s sys 0m5.076s sys 0m0.023s # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt # Hikey960 8x CPUs <default> vs <patched> real unknown real 4m25.214s user unknown user 0m0.011s sys unknown sys 0m0.670s I did not manage to complete this test on "default Hikey960" kernel version. After 24 hours it was still running, therefore i had to cancel it. That is why real/user/sys are "unknown". This patch (of 3): Currently an allocation of the new vmap area is done over busy list iteration(complexity O(n)) until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to over fragmented list and different permissive parameters an allocation can take a long time. For example on embedded devices it is milliseconds. This patch organizes the KVA memory layout into free areas of the 1-ULONG_MAX range. It uses an augment red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Nodes are augmented with the size of the maximum available free block in its left or right sub-tree. Thus, that allows to take a decision and traversal toward the block that will fit and will have the lowest start address, i.e. it is sequential allocation. Allocation: to allocate a new block a search is done over the tree until a suitable lowest(left most) block is large enough to encompass: the requested size, alignment and vstart point. If the block is bigger than requested size - it is split. De-allocation: when a busy vmap area is freed it can either be merged or inserted to the tree. Red-black tree allows efficiently find a spot whereas a linked list provides a constant-time access to previous and next blocks to check if merging can be done. In case of merging of de-allocated memory chunk a large coalesced area is created. Complexity: ~O(log(N)) [urezki@gmail.com: v3] Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
868b104d73 |
mm/vmalloc: Add flag for freeing of special permsissions
Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to immediately clear executable TLB entries before freeing pages, and handle resetting permissions on the directmap. This flag is useful for any kind of memory with elevated permissions, or where there can be related permissions changes on the directmap. Today this is RO+X and RO memory. Although this enables directly vfreeing non-writeable memory now, non-writable memory cannot be freed in an interrupt because the allocation itself is used as a node on deferred free list. So when RO memory needs to be freed in an interrupt the code doing the vfree needs to have its own work queue, as was the case before the deferred vfree list was added to vmalloc. For architectures with set_direct_map_ implementations this whole operation can be done with one TLB flush when centralized like this. For others with directmap permissions, currently only arm64, a backup method using set_memory functions is used to reset the directmap. When arm64 adds set_direct_map_ functions, this backup can be removed. When the TLB is flushed to both remove TLB entries for the vmalloc range mapping and the direct map permissions, the lazy purge operation could be done to try to save a TLB flush later. However today vm_unmap_aliases could flush a TLB range that does not include the directmap. So a helper is added with extra parameters that can allow both the vmalloc address and the direct mapping to be flushed during this operation. The behavior of the normal vm_unmap_aliases function is unchanged. Suggested-by: Dave Hansen <dave.hansen@intel.com> Suggested-by: Andy Lutomirski <luto@kernel.org> Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-17-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
3b3b1a29eb |
mm: Use overflow helpers in kvmalloc()
Instead of open-coded multiplication and bounds checking, use the new overflow helper. Additionally prepare for vmalloc() users to add array_size()-family helpers in the future. Signed-off-by: Kees Cook <keescook@chromium.org> |
||
|
|
b24413180f |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
|
8594a21cf7 |
mm, vmalloc: fix vmalloc users tracking properly
Commit |
||
|
|
1f5307b1e0 |
mm, vmalloc: properly track vmalloc users
__vmalloc_node_flags used to be static inline but this has changed by "mm: introduce kv[mz]alloc helpers" because kvmalloc_node needs to use it as well and the code is outside of the vmalloc proper. I haven't realized that changing this will lead to a subtle bug though. The function is responsible to track the caller as well. This caller is then printed by /proc/vmallocinfo. If __vmalloc_node_flags is not inline then we would get only direct users of __vmalloc_node_flags as callers (e.g. v[mz]alloc) which reduces usefulness of this debugging feature considerably. It simply doesn't help to see that the given range belongs to vmalloc as a caller: 0xffffc90002c79000-0xffffc90002c7d000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N0=3 0xffffc90002c81000-0xffffc90002c85000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c8d000-0xffffc90002c91000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 0xffffc90002c95000-0xffffc90002c99000 16384 vmalloc+0x16/0x18 pages=3 vmalloc N1=3 We really want to catch the _caller_ of the vmalloc function. Fix this issue by making __vmalloc_node_flags static inline again. Link: http://lkml.kernel.org/r/20170502134657.12381-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a7c3e901a4 |
mm: introduce kv[mz]alloc helpers
Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
bf22e37a64 |
mm: add vfree_atomic()
We are going to use sleeping lock for freeing vmap. However some vfree() users want to free memory from atomic (but not from interrupt) context. For this we add vfree_atomic() - deferred variation of vfree() which can be used in any atomic context (except NMIs). [akpm@linux-foundation.org: tweak comment grammar] [aryabinin@virtuozzo.com: use raw_cpu_ptr() instead of this_cpu_ptr()] Link: http://lkml.kernel.org/r/1481553981-3856-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1479474236-4139-5-git-send-email-hch@lst.de Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Joel Fernandes <joelaf@google.com> Cc: Jisheng Zhang <jszhang@marvell.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Dias <joaodias@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
1d6da87a32 |
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie: "Here's the main drm pull request for 4.7, it's been a busy one, and I've been a bit more distracted in real life this merge window. Lots more ARM drivers, not sure if it'll ever end. I think I've at least one more coming the next merge window. But changes are all over the place, support for AMD Polaris GPUs is in here, some missing GM108 support for nouveau (found in some Lenovos), a bunch of MST and skylake fixes. I've also noticed a few fixes from Arnd in my inbox, that I'll try and get in asap, but I didn't think they should hold this up. New drivers: - Hisilicon kirin display driver - Mediatek MT8173 display driver - ARC PGU - bitstreamer on Synopsys ARC SDP boards - Allwinner A13 initial RGB output driver - Analogix driver for DisplayPort IP found in exynos and rockchip DRM Core: - UAPI headers fixes and C++ safety - DRM connector reference counting - DisplayID mode parsing for Dell 5K monitors - Removal of struct_mutex from drivers - Connector registration cleanups - MST robustness fixes - MAINTAINERS updates - Lockless GEM object freeing - Generic fbdev deferred IO support panel: - Support for a bunch of new panels i915: - VBT refactoring - PLL computation cleanups - DSI support for BXT - Color manager support - More atomic patches - GEM improvements - GuC fw loading fixes - DP detection fixes - SKL GPU hang fixes - Lots of BXT fixes radeon/amdgpu: - Initial Polaris support - GPUVM/Scheduler/Clock/Power improvements - ASYNC pageflip support - New mesa feature support nouveau: - GM108 support - Power sensor support improvements - GR init + ucode fixes. - Use GPU provided topology information vmwgfx: - Add host messaging support gma500: - Some cleanups and fixes atmel: - Bridge support - Async atomic commit support fsl-dcu: - Timing controller for LCD support - Pixel clock polarity support rcar-du: - Misc fixes exynos: - Pipeline clock support - Exynoss4533 SoC support - HW trigger mode support - export HDMI_PHY clock - DECON5433 fixes - Use generic prime functions - use DMA mapping APIs rockchip: - Lots of little fixes vc4: - Render node support - Gamma ramp support - DPI output support msm: - Mostly cleanups and fixes - Conversion to generic struct fence etnaviv: - Fix for prime buffer handling - Allow hangcheck to be coalesced with other wakeups tegra: - Gamme table size fix" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits) drm/edid: add displayid detailed 1 timings to the modelist. (v1.1) drm/edid: move displayid validation to it's own function. drm/displayid: Iterate over all DisplayID blocks drm/edid: move displayid tiled block parsing into separate function. drm: Nuke ->vblank_disable_allowed drm/vmwgfx: Report vmwgfx version to vmware.log drm/vmwgfx: Add VMWare host messaging capability drm/vmwgfx: Kill some lockdep warnings drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode drm/nouveau/core: recognise GM108 chipsets drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup drm/nouveau/gr/gk104-: share implementation of ppc exception init drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx drm/nouveau/bios/pll: check BIT table version before trying to parse it drm/nouveau/bios/pll: prevent oops when limits table can't be parsed drm/nouveau/volt/gk104: round up in gk104_volt_set drm/nouveau/fb/gm200: setup mmu debug buffer registers at init() drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init() drm/nouveau/fb/gf100-: allocate mmu debug buffers drm/nouveau/fb: allow chipset-specific actions for oneinit() ... |
||
|
|
80c4bd7a5e |
mm/vmalloc: keep a separate lazy-free list
When mixing lots of vmallocs and set_memory_*() (which calls vm_unmap_aliases()) I encountered situations where the performance degraded severely due to the walking of the entire vmap_area list each invocation. One simple improvement is to add the lazily freed vmap_area to a separate lockless free list, such that we then avoid having to walk the full list on each purge. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Roman Pen <r.peniaev@gmail.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Pen <r.peniaev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
4da56b99d9 |
mm/vmap: Add a notifier for when we run out of vmap address space
vmaps are temporary kernel mappings that may be of long duration. Reusing a vmap on an object is preferrable for a driver as the cost of setting up the vmap can otherwise dominate the operation on the object. However, the vmap address space is rather limited on 32bit systems and so we add a notification for vmap pressure in order for the driver to release any cached vmappings. The interface is styled after the oom-notifier where the callees are passed a pointer to an unsigned long counter for them to indicate if they have freed any space. v2: Guard the blocking notifier call with gfpflags_allow_blocking() v3: Correct typo in forward declaration and move to head of file Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Roman Peniaev <r.peniaev@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Acked-by: Andrew Morton <akpm@linux-foundation.org> # for inclusion via DRM Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1459777603-23618-3-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
||
|
|
244d63ee34 |
mm, vmalloc: remove VM_VPAGES
VM_VPAGES is unnecessary, it's easier to check is_vmalloc_addr() when reading /proc/vmallocinfo. [akpm@linux-foundation.org: remove VM_VPAGES reference via kvfree()] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a5ad88ce8c |
mm: get rid of 'vmalloc_info' from /proc/meminfo
It turns out that at least some versions of glibc end up reading /proc/meminfo at every single startup, because glibc wants to know the amount of memory the machine has. And while that's arguably insane, it's just how things are. And it turns out that it's not all that expensive most of the time, but the vmalloc information statistics (amount of virtual memory used in the vmalloc space, and the biggest remaining chunk) can be rather expensive to compute. The 'get_vmalloc_info()' function actually showed up on my profiles as 4% of the CPU usage of "make test" in the git source repository, because the git tests are lots of very short-lived shell-scripts etc. It turns out that apparently this same silly vmalloc info gathering shows up on the facebook servers too, according to Dave Jones. So it's not just "make test" for git. We had two patches to just cache the information (one by me, one by Ingo) to mitigate this issue, but the whole vmalloc information of of rather dubious value to begin with, and people who *actually* want to know what the situation is wrt the vmalloc area should just look at the much more complete /proc/vmallocinfo instead. In fact, according to my testing - and perhaps more importantly, according to that big search engine in the sky: Google - there is nothing out there that actually cares about those two expensive fields: VmallocUsed and VmallocChunk. So let's try to just remove them entirely. Actually, this just removes the computation and reports the numbers as zero for now, just to try to be minimally intrusive. If this breaks anything, we'll obviously have to re-introduce the code to compute this all and add the caching patches on top. But if given the option, I'd really prefer to just remove this bad idea entirely rather than add even more code to work around our historical mistake that likely nobody really cares about. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
a5af5aa8b6 |
kasan, module, vmalloc: rework shadow allocation for modules
Current approach in handling shadow memory for modules is broken.
Shadow memory could be freed only after memory shadow corresponds it is no
longer used. vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:
void vfree(const void *addr)
{
...
if (unlikely(in_interrupt())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);
Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.
So shadow memory should be freed after module's memory. However, such
deallocation order could race with kasan_module_alloc() in module_alloc().
Free shadow right before releasing vm area. At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
cb9e3c292d |
mm: vmalloc: pass additional vm_flags to __vmalloc_node_range()
For instrumenting global variables KASan will shadow memory backing memory for modules. So on module loading we will need to allocate memory for shadow and map it at address in shadow that corresponds to the address allocated in module_alloc(). __vmalloc_node_range() could be used for this purpose, except it puts a guard hole after allocated area. Guard hole in shadow memory should be a problem because at some future point we might need to have a shadow memory at address occupied by guard hole. So we could fail to allocate shadow for module_alloc(). Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into __vmalloc_node_range(). Add new parameter 'vm_flags' to __vmalloc_node_range() function. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
71394fe501 |
mm: vmalloc: add flag preventing guard hole allocation
For instrumenting global variables KASan will shadow memory backing memory for modules. So on module loading we will need to allocate memory for shadow and map it at address in shadow that corresponds to the address allocated in module_alloc(). __vmalloc_node_range() could be used for this purpose, except it puts a guard hole after allocated area. Guard hole in shadow memory should be a problem because at some future point we might need to have a shadow memory at address occupied by guard hole. So we could fail to allocate shadow for module_alloc(). Add a new vm_struct flag 'VM_NO_GUARD' indicating that vm area doesn't have a guard hole. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f6f8ed4735 |
mm/vmalloc.c: clean up map_vm_area third argument
Currently map_vm_area() takes (struct page *** pages) as third argument, and after mapping, it moves (*pages) to point to (*pages + nr_mappped_pages). It looks like this kind of increment is useless to its caller these days. The callers don't care about the increments and actually they're trying to avoid this by passing another copy to map_vm_area(). The caller can always guarantee all the pages can be mapped into vm_area as specified in first argument and the caller only cares about whether map_vm_area() fails or not. This patch cleans up the pointer movement in map_vm_area() and updates its callers accordingly. Signed-off-by: WANG Chao <chaowang@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
20fc02b477 |
mm/vmalloc.c: rename VM_UNLIST to VM_UNINITIALIZED
VM_UNLIST was used to indicate that the vm_struct is not listed in
vmlist.
But after commit
|
||
|
|
e69e9d4aee |
vmalloc: introduce remap_vmalloc_range_partial
We want to allocate ELF note segment buffer on the 2nd kernel in vmalloc space and remap it to user-space in order to reduce the risk that memory allocation fails on system with huge number of CPUs and so with huge ELF note segment that exceeds 11-order block size. Although there's already remap_vmalloc_range for the purpose of remapping vmalloc memory to user-space, we need to specify user-space range via vma. Mmap on /proc/vmcore needs to remap range across multiple objects, so the interface that requires vma to cover full range is problematic. This patch introduces remap_vmalloc_range_partial that receives user-space range as a pair of base address and size and can be used for mmap on /proc/vmcore case. remap_vmalloc_range is rewritten using remap_vmalloc_range_partial. [akpm@linux-foundation.org: use PAGE_ALIGNED()] Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Lisa Mitchell <lisa.mitchell@hp.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
13ba3fcbbe |
kexec, vmalloc: export additional vmalloc layer information
Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get the start address of vmalloc region (vmalloc_start). The address which contains vmalloc_start value is represented as below: vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start) However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) aren't exported as VMCOREINFO. So this patch exports them externally with small cleanup. [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head] Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
f1c4069e1d |
mm, vmalloc: export vmap_area_list, instead of vmlist
Although our intention is to unexport internal structure entirely, but there is one exception for kexec. kexec dumps address of vmlist and makedumpfile uses this information. We are about to remove vmlist, then another way to retrieve information of vmalloc layer is needed for makedumpfile. For this purpose, we export vmap_area_list, instead of vmlist. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
db3808c1ba |
mm, vmalloc: move get_vmalloc_info() to vmalloc.c
Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
e9da6e9905 |
ARM: dma-mapping: remove custom consistent dma region
This patch changes dma-mapping subsystem to use generic vmalloc areas for all consistent dma allocations. This increases the total size limit of the consistent allocations and removes platform hacks and a lot of duplicated code. Atomic allocations are served from special pool preallocated on boot, because vmalloc areas cannot be reliably created in atomic context. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Minchan Kim <minchan@kernel.org> |
||
|
|
5e6cafc83e |
mm: vmalloc: use const void * for caller argument
'const void *' is a safer type for caller function type. This patch updates all references to caller function type. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Minchan Kim <minchan@kernel.org> |