linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Jens Axboe	ba13e83ec3	mm: make __swap_writepage() use bio_set_op_attrs() Cleaner than manipulating bio->bi_rw flags directly. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Jens Axboe	c11f0c0b5b	block/mm: make bdev_ops->rw_page() take a bool for read/write Commit `abf545484d` changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Linus Torvalds	fff648da96	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Here's the second round of block updates for this merge window. It's a mix of fixes for changes that went in previously in this round, and fixes in general. This pull request contains: - Fixes for loop from Christoph - A bdi vs gendisk lifetime fix from Dan, worth two cookies. - A blk-mq timeout fix, when on frozen queues. From Gabriel. - Writeback fix from Jan, ensuring that __writeback_single_inode() does the right thing. - Fix for bio->bi_rw usage in f2fs from me. - Error path deadlock fix in blk-mq sysfs registration from me. - Floppy O_ACCMODE fix from Jiri. - Fix to the new bio op methods from Mike. One more followup will be coming here, ensuring that we don't propagate the block types outside of block. That, and a rename of bio->bi_rw is coming right after -rc1 is cut. - Various little fixes" * 'for-linus' of git://git.kernel.dk/linux-block: mm/block: convert rw_page users to bio op use loop: make do_req_filebacked more robust loop: don't try to use AIO for discards blk-mq: fix deadlock in blk_mq_register_disk() error path Include: blkdev: Removed duplicate 'struct request;' declaration. Fixup direct bi_rw modifiers block: fix bdi vs gendisk lifetime mismatch blk-mq: Allow timeouts to run while queue is freezing nbd: fix race in ioctl block: fix use-after-free in seq file f2fs: drop bio->bi_rw manual assignment block: add missing group association in bio-cloning functions blkcg: kill unused field nr_undestroyed_grps writeback: Write dirty times for WB_SYNC_ALL writeback floppy: fix open(O_ACCMODE) for ioctl-only open	2016-08-05 23:31:51 -04:00
Linus Torvalds	2cfd716d27	Merge tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull more powerpc updates from Michael Ellerman: "These were delayed for various reasons, so I let them sit in next a bit longer, rather than including them in my first pull request. Fixes: - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan - Move register_process_table() out of ppc_md from Michael Ellerman Use jump_label use for [cpu\|mmu]_has_feature(): - Add mmu_early_init_devtree() from Michael Ellerman - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman - Do hash device tree scanning earlier from Michael Ellerman - Do radix device tree scanning earlier from Michael Ellerman - Do feature patching before MMU init from Michael Ellerman - Check features don't change after patching from Michael Ellerman - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V - Convert mmu_has_feature() to returning bool from Michael Ellerman - Convert cpu_has_feature() to returning bool from Michael Ellerman - Define radix_enabled() in one place & use static inline from Michael Ellerman - Add early_[cpu\|mmu]_has_feature() from Michael Ellerman - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V - Remove mfvtb() from Kevin Hao - Move cpu_has_feature() to a separate file from Kevin Hao - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman - Add option to use jump label for cpu_has_feature() from Kevin Hao - Add option to use jump label for mmu_has_feature() from Kevin Hao - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V - Annotate jump label assembly from Michael Ellerman TLB flush enhancements from Aneesh Kumar K.V: - radix: Implement tlb mmu gather flush efficiently - Add helper for finding SLBE LLP encoding - Use hugetlb flush functions - Drop multiple definition of mm_is_core_local - radix: Add tlb flush of THP ptes - radix: Rename function and drop unused arg - radix/hugetlb: Add helper for finding page size - hugetlb: Add flush_hugetlb_tlb_range - remove flush_tlb_page_nohash Add new ptrace regsets from Anshuman Khandual and Simon Guo: - elf: Add powerpc specific core note sections - Add the function flush_tmregs_to_thread - Enable in transaction NT_PRFPREG ptrace requests - Enable in transaction NT_PPC_VMX ptrace requests - Enable in transaction NT_PPC_VSX ptrace requests - Adapt gpr32_get, gpr32_set functions for transaction - Enable support for NT_PPC_CGPR - Enable support for NT_PPC_CFPR - Enable support for NT_PPC_CVMX - Enable support for NT_PPC_CVSX - Enable support for TM SPR state - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR - Enable support for EBB registers - Enable support for Performance Monitor registers" * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits) powerpc/mm: Move register_process_table() out of ppc_md powerpc/perf: Fix incorrect event codes in power9-event-list powerpc/32: Fix early access to cpu_spec relocation powerpc/ptrace: Enable support for Performance Monitor registers powerpc/ptrace: Enable support for EBB registers powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR powerpc/ptrace: Enable support for TM SPR state powerpc/ptrace: Enable support for NT_PPC_CVSX powerpc/ptrace: Enable support for NT_PPC_CVMX powerpc/ptrace: Enable support for NT_PPC_CFPR powerpc/ptrace: Enable support for NT_PPC_CGPR powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests powerpc/process: Add the function flush_tmregs_to_thread elf: Add powerpc specific core note sections powerpc/mm: remove flush_tlb_page_nohash powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range ...	2016-08-05 09:00:54 -04:00
zijun_hu	e47608ab6d	mm/memblock.c: fix NULL dereference error It causes NULL dereference error and failure to get type_a->regions[0] info if parameter type_b of __next_mem_range_rev() == NULL Fix this by checking before dereferring and initializing idx_b to 0 The approach is tested by dumping all types of region via __memblock_dump_all() and __next_mem_range_rev() fixed to UART separately the result is okay after checking the logs. Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com Signed-off-by: zijun_hu <zijun_hu@htc.com> Tested-by: zijun_hu <zijun_hu@htc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 20:02:09 -04:00
Geert Uytterhoeven	117d54df7a	slub: drop bogus inline for fixup_red_left() With m68k-linux-gnu-gcc-4.1: include/linux/slub_def.h:126: warning: `fixup_red_left' declared inline after being called include/linux/slub_def.h:126: warning: previous declaration of `fixup_red_left' was here Commit `c146a2b98e` ("mm, kasan: account for object redzone in SLUB's nearest_obj()") made fixup_red_left() global, but forgot to remove the inline keyword. Fixes: `c146a2b98e` ("mm, kasan: account for object redzone in SLUB's nearest_obj()") Link: http://lkml.kernel.org/r/1470256262-1586-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexander Potapenko <glider@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 20:02:09 -04:00
Mel Gorman	b4911ea2bc	mm: initialise per_cpu_nodestats for all online pgdats at boot Paul Mackerras and Reza Arbab reported that machines with memoryless nodes fail when vmstats are refreshed. Paul reported an oops as follows Unable to handle kernel paging request for data at address 0xff7a10000 Faulting instruction address: 0xc000000000270cd0 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-kvm+ #118 task: c000000ff0680010 task.stack: c000000ff0704000 NIP: c000000000270cd0 LR: c000000000270ce8 CTR: 0000000000000000 REGS: c000000ff0707900 TRAP: 0300 Not tainted (4.7.0-kvm+) MSR: 9000000102009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]> CR: 846b6824 XER: 20000000 CFAR: c000000000008768 DAR: 0000000ff7a10000 DSISR: 42000000 SOFTE: 1 NIP refresh_zone_stat_thresholds+0x80/0x240 LR refresh_zone_stat_thresholds+0x98/0x240 Call Trace: refresh_zone_stat_thresholds+0xb8/0x240 (unreliable) Both supplied potential fixes but one potentially misses checks and another had redundant initialisations. This version initialises per_cpu_nodestats on a per-pgdat basis instead of on a per-zone basis. Link: http://lkml.kernel.org/r/20160804092404.GI2799@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Paul Mackerras <paulus@ozlabs.org> Reported-by: Reza Arbab <arbab@linux.vnet.ibm.com> Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 20:02:09 -04:00
Alexander Kuleshov	412d0008d6	mm/memblock: fix a typo in a comment s/accomodate/accommodate/ Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.com Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 20:02:09 -04:00
zhong jiang	1e185736d2	mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabled At present it is obvious that memory online and offline will fail when KASAN is enabled. So add the condition to limit the memory_hotplug when KASAN is enabled. Link: http://lkml.kernel.org/r/1470063651-29519-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-04 20:02:09 -04:00
Mike Christie	abf545484d	mm/block: convert rw_page users to bio op use The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixes: `4e1b2d52a8` ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-04 14:25:33 -06:00
Dan Williams	df08c32ce3	block: fix bdi vs gendisk lifetime mismatch The name for a bdi of a gendisk is derived from the gendisk's devt. However, since the gendisk is destroyed before the bdi it leaves a window where a new gendisk could dynamically reuse the same devt while a bdi with the same name is still live. Arrange for the bdi to hold a reference against its "owner" disk device while it is registered. Otherwise we can hit sysfs duplicate name collisions like the following: WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1' Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015 0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351 0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8 Call Trace: [<ffffffff8134caec>] dump_stack+0x63/0x87 [<ffffffff8108c351>] __warn+0xd1/0xf0 [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80 [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90 [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320 [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0 [<ffffffff8134ff55>] kobject_add+0x75/0xd0 [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f [<ffffffff8148b0a5>] device_add+0x125/0x610 [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100 [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20 [<ffffffff811b775c>] bdi_register+0x8c/0x180 [<ffffffff811b7877>] bdi_register_dev+0x27/0x30 [<ffffffff813317f5>] add_disk+0x175/0x4a0 Cc: <stable@vger.kernel.org> Reported-by: Yi Zhang <yizhan@redhat.com> Tested-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Fixed up missing 0 return in bdi_register_owner(). Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-04 14:19:16 -06:00
Geert Uytterhoeven	4620a06e4b	shmem: Fix link error if huge pages support is disabled If CONFIG_TRANSPARENT_HUGE_PAGECACHE=n, HPAGE_PMD_NR evaluates to BUILD_BUG_ON(), and may cause (e.g. with gcc 4.12): mm/built-in.o: In function `shmem_alloc_hugepage': shmem.c:(.text+0x17570): undefined reference to `__compiletime_assert_1365' To fix this, move the assignment to hindex after the check for huge pages support. Fixes: `800d8c63b2` ("shmem: add huge pages support") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-03 18:20:12 -04:00
Linus Torvalds	d52bd54db8	Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: - the rest of ocfs2 - various hotfixes, mainly MM - quite a bit of misc stuff - drivers, fork, exec, signals, etc. - printk updates - firmware - checkpatch - nilfs2 - more kexec stuff than usual - rapidio updates - w1 things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits) ipc: delete "nr_ipc_ns" kcov: allow more fine-grained coverage instrumentation init/Kconfig: add clarification for out-of-tree modules config: add android config fragments init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig relay: add global mode support for buffer-only channels init: allow blacklisting of module_init functions w1:omap_hdq: fix regression w1: add helper macro module_w1_family w1: remove need for ida and use PLATFORM_DEVID_AUTO rapidio/switches: add driver for IDT gen3 switches powerpc/fsl_rio: apply changes for RIO spec rev 3 rapidio: modify for rev.3 specification changes rapidio: change inbound window size type to u64 rapidio/idt_gen2: fix locking warning rapidio: fix error handling in mbox request/release functions rapidio/tsi721_dma: advance queue processing from transfer submit call rapidio/tsi721: add messaging mbox selector parameter rapidio/tsi721: add PCIe MRRS override parameter rapidio/tsi721_dma: add channel mask and queue size parameters ...	2016-08-02 21:08:07 -04:00
Kees Cook	ba093a6d93	mm: refuse wrapped vm_brk requests The vm_brk() alignment calculations should refuse to overflow. The ELF loader depending on this, but it has been fixed now. No other unsafe callers have been found. Link: http://lkml.kernel.org/r/1468014494-25291-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Ismael Ripoll Ripoll <iripoll@upv.es> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 19:35:15 -04:00
Fabian Frederick	bd721ea73e	treewide: replace obsolete _refok by __ref There was only one use of __initdata_refok and __exit_refok __init_refok was used 46 times against 82 for __ref. Those definitions are obsolete since commit `312b1485fb` ("Introduce new section reference annotations tags: __ref, __refdata, __refconst") This patch removes the following compatibility definitions and replaces them treewide. /* compatibility defines */ #define __init_refok __ref #define __initdata_refok __refdata #define __exit_refok __ref I can also provide separate patches if necessary. (One patch per tree and check in 1 month or 2 to remove old definitions) [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Vladimir Davydov	b5afba2974	mm: vmscan: fix memcg-aware shrinkers not called on global reclaim We must call shrink_slab() for each memory cgroup on both global and memcg reclaim in shrink_node_memcg(). Commit d71df22b55099 accidentally changed that so that now shrink_slab() is only called with memcg != NULL on memcg reclaim. As a result, memcg-aware shrinkers (including dentry/inode) are never invoked on global reclaim. Fix that. Fixes: `b2e18757f2` ("mm, vmscan: begin reclaiming pages on a per-node basis") Link: http://lkml.kernel.org/r/1470056590-7177-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Alexander Potapenko	c3cee37228	kasan: avoid overflowing quarantine size on low memory systems If the total amount of memory assigned to quarantine is less than the amount of memory assigned to per-cpu quarantines, \|new_quarantine_size\| may overflow. Instead, set it to zero. [akpm@linux-foundation.org: cleanup: use WARN_ONCE return value] Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com Fixes: `55834c5909` ("mm: kasan: initial memory quarantine implementation") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	7e08897893	kasan: improve double-free reports Currently we just dump stack in case of double free bug. Let's dump all info about the object that we have. [aryabinin@virtuozzo.com: change double free message per Alexander] Link: http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	b3cbd9bf77	mm/kasan: get rid of ->state in struct kasan_alloc_meta The state of object currently tracked in two places - shadow memory, and the ->state field in struct kasan_alloc_meta. We can get rid of the latter. The will save us a little bit of memory. Also, this allow us to move free stack into struct kasan_alloc_meta, without increasing memory consumption. So now we should always know when the last time the object was freed. This may be useful for long delayed use-after-free bugs. As a side effect this fixes following UBSAN warning: UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13 member access within misaligned address ffff88000d1efebc for type 'struct qlist_node' which requires 8 byte alignment Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	47b5c2a0f0	mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta Size of slab object already stored in cache->object_size. Note, that kmalloc() internally rounds up size of allocation, so object_size may be not equal to alloc_size, but, usually we don't need to know the exact size of allocated object. In case if we need that information, we still can figure it out from the report. The dump of shadow memory allows to identify the end of allocated memory, and thereby the exact allocation size. Link: http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	f7376aed6c	mm/kasan, slub: don't disable interrupts when object leaves quarantine SLUB doesn't require disabled interrupts to call ___cache_free(). Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	4b3ec5a3f4	mm/kasan: don't reduce quarantine in atomic contexts Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied by __GFP_RECLAIM) allocation. So, basically we call it on almost every allocation. quarantine_reduce() sometimes is heavy operation, and calling it with disabled interrupts may trigger hard LOCKUP: NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp: 1411258 Call Trace: <NMI> dump_stack+0x68/0x96 watchdog_overflow_callback+0x15b/0x190 __perf_event_overflow+0x1b1/0x540 perf_event_overflow+0x14/0x20 intel_pmu_handle_irq+0x36a/0xad0 perf_event_nmi_handler+0x2c/0x50 nmi_handle+0x128/0x480 default_do_nmi+0xb2/0x210 do_nmi+0x1aa/0x220 end_repeat_nmi+0x1a/0x1e <<EOE>> __kernel_text_address+0x86/0xb0 print_context_stack+0x7b/0x100 dump_trace+0x12b/0x350 save_stack_trace+0x2b/0x50 set_track+0x83/0x140 free_debug_processing+0x1aa/0x420 __slab_free+0x1d6/0x2e0 ___cache_free+0xb6/0xd0 qlist_free_all+0x83/0x100 quarantine_reduce+0x177/0x1b0 kasan_kmalloc+0xf3/0x100 Reduce the quarantine_reduce iff direct reclaim is allowed. Fixes: 55834c59098d("mm: kasan: initial memory quarantine implementation") Link: http://lkml.kernel.org/r/1470062715-14077-2-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Andrey Ryabinin	4a3d308d66	mm/kasan: fix corruptions and false positive reports Once an object is put into quarantine, we no longer own it, i.e. object could leave the quarantine and be reallocated. So having set_track() call after the quarantine_put() may corrupt slab objects. BUG kmalloc-4096 (Not tainted): Poison overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b ... INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492 __slab_free+0x1d6/0x2e0 ___cache_free+0xb6/0xd0 qlist_free_all+0x83/0x100 quarantine_reduce+0x177/0x1b0 kasan_kmalloc+0xf3/0x100 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0x109/0x3e0 mmap_region+0x53e/0xe40 do_mmap+0x70f/0xa50 vm_mmap_pgoff+0x147/0x1b0 SyS_mmap_pgoff+0x2c7/0x5b0 SyS_mmap+0x1b/0x30 do_syscall_64+0x1a0/0x4e0 return_from_SYSCALL_64+0x0/0x7a INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x (null) flags=0x8000000000004080 INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588 Redzone ffff8804540de840: bb bb bb bb bb bb bb bb ........ Object ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc kkkkkkkk.R....`. Similarly, poisoning after the quarantine_put() leads to false positive use-after-free reports: BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr ffff880405c540a0 Read of size 8 by task trinity-c0/3036 CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9 Call Trace: dump_stack+0x68/0x96 kasan_report_error+0x222/0x600 __asan_report_load8_noabort+0x61/0x70 anon_vma_interval_tree_insert+0x304/0x430 anon_vma_chain_link+0x91/0xd0 anon_vma_clone+0x136/0x3f0 anon_vma_fork+0x81/0x4c0 copy_process.part.47+0x2c43/0x5b20 _do_fork+0x16d/0xbd0 SyS_clone+0x19/0x20 do_syscall_64+0x1a0/0x4e0 entry_SYSCALL64_slow_path+0x25/0x25 Fix this by putting an object in the quarantine after all other operations. Fixes: `80a9201a59` ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB") Link: http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Sasha Levin <alexander.levin@verizon.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Michal Hocko	d6507ff533	memcg: put soft limit reclaim out of way if the excess tree is empty We've had a report about soft lockups caused by lock bouncing in the soft reclaim path: BUG: soft lockup - CPU#0 stuck for 22s! [kav4proxy-kavic:3128] RIP: 0010:[<ffffffff81469798>] [<ffffffff81469798>] _raw_spin_lock+0x18/0x20 Call Trace: mem_cgroup_soft_limit_reclaim+0x25a/0x280 shrink_zones+0xed/0x200 do_try_to_free_pages+0x74/0x320 try_to_free_pages+0x112/0x180 __alloc_pages_slowpath+0x3ff/0x820 __alloc_pages_nodemask+0x1e9/0x200 alloc_pages_vma+0xe1/0x290 do_wp_page+0x19f/0x840 handle_pte_fault+0x1cd/0x230 do_page_fault+0x1fd/0x4c0 page_fault+0x25/0x30 There are no memcgs created so there cannot be any in the soft limit excess obviously: [...] memory 0 1 1 so all this just seems to be mem_cgroup_largest_soft_limit_node trying to get spin_lock_irq(&mctz->lock) just to find out that the soft limit excess tree is empty. This is just pointless wasting of cycles and cache line bouncing during heavy parallel reclaim on large machines. The particular machine wasn't very healthy and most probably suffering from a memory leak which just caused the memory reclaim to trash heavily. But bouncing on the lock certainly didn't help... Fix this by optimistic lockless check and bail out early if the tree is empty. This is theoretically racy but that shouldn't matter all that much. First of all soft limit is a best effort feature and it is slowly getting deprecated and its usage should be really scarce. Bouncing on a lock without a good reason is surely much bigger problem, especially on large CPU machines. Link: http://lkml.kernel.org/r/1470073277-1056-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00
Michal Hocko	4e666314d2	mm, hugetlb: fix huge_pte_alloc BUG_ON Zhong Jiang has reported a BUG_ON from huge_pte_alloc hitting when he runs his database load with memory online and offline running in parallel. The reason is that huge_pmd_share might detect a shared pmd which is currently migrated and so it has migration pte which is !pte_huge. There doesn't seem to be any easy way to prevent from the race and in fact seeing the migration swap entry is not harmful. Both callers of huge_pte_alloc are prepared to handle them. copy_hugetlb_page_range will copy the swap entry and make it COW if needed. hugetlb_fault will back off and so the page fault is retries if the page is still under migration and waits for its completion in hugetlb_fault. That means that the BUG_ON is wrong and we should update it. Let's simply check that all present ptes are pte_huge instead. Link: http://lkml.kernel.org/r/20160721074340.GA26398@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: zhongjiang <zhongjiang@huawei.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-08-02 17:31:41 -04:00

1 2 3 4 5 ...

10650 Commits