Commit Graph

561657 Commits

Author SHA1 Message Date
Chris Wilson 86fffe4a61 kernel: remove stop_machine() Kconfig dependency
Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable.  This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.

For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Nicolas Iooss 98e89cf02a mm: kmemleak: mark kmemleak_init prototype as __init
The kmemleak_init() definition in mm/kmemleak.c is marked __init but its
prototype in include/linux/kmemleak.h is marked __ref since commit
a6186d89c9 ("kmemleak: Mark the early log buffer as __initdata").

This causes a section mismatch which is reported as a warning when
building with clang -Wsection, because kmemleak_init() is declared in
section .ref.text but defined in .init.text.

Fix this by marking kmemleak_init() prototype __init.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Hugh Dickins 25be6a6595 mm: fix kerneldoc on mem_cgroup_replace_page
Whoops, I missed removing the kerneldoc comment of the lrucare arg
removed from mem_cgroup_replace_page; but it's a good comment, keep it.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Hugh Dickins 3066a9670b osd fs: __r4w_get_page rely on PageUptodate for uptodate
Commit 42cb14b110 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.

It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate.  This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate.  What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.

When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.

I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?

Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Johannes Weiner ed0f1e2102 MAINTAINERS: make Vladimir co-maintainer of the memory controller
Vladimir architected and authored much of the current state of the
memcg's slab memory accounting and tracking.  Make sure he gets CC'd on
bug reports ;-)

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Michal Hocko 373ccbe592 mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress
Tetsuo Handa has reported that the system might basically livelock in
OOM condition without triggering the OOM killer.

The issue is caused by internal dependency of the direct reclaim on
vmstat counter updates (via zone_reclaimable) which are performed from
the workqueue context.  If all the current workers get assigned to an
allocation request, though, they will be looping inside the allocator
trying to reclaim memory but zone_reclaimable can see stalled numbers so
it will consider a zone reclaimable even though it has been scanned way
too much.  WQ concurrency logic will not consider this situation as a
congested workqueue because it relies that worker would have to sleep in
such a situation.  This also means that it doesn't try to spawn new
workers or invoke the rescuer thread if the one is assigned to the
queue.

In order to fix this issue we need to do two things.  First we have to
let wq concurrency code know that we are in trouble so we have to do a
short sleep.  In order to prevent from issues handled by 0e093d9976
("writeback: do not sleep on the congestion queue if there are no
congested BDIs or if significant congestion is not being encountered in
the current zone") we limit the sleep only to worker threads which are
the ones of the interest anyway.

The second thing to do is to create a dedicated workqueue for vmstat and
mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
have a spare worker thread for it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Vlastimil Babka 475a2f905d mm: fix swapped Movable and Reclaimable in /proc/pagetypeinfo
Commit 016c13daa5 ("mm, page_alloc: use masks and shifts when
converting GFP flags to migrate types") has swapped MIGRATE_MOVABLE and
MIGRATE_RECLAIMABLE in the enum definition.  However, migratetype_names
wasn't updated to reflect that.

As a result, the file /proc/pagetypeinfo shows the counts for Movable as
Reclaimable and vice versa.

Additionally, commit 0aaa29a56e ("mm, page_alloc: reserve pageblocks
for high-order atomic allocations on demand") introduced
MIGRATE_HIGHATOMIC, but did not add a letter to distinguish it into
show_migration_types(), so it doesn't appear in the listing of free
areas during page alloc failures or oom kills.

This patch fixes both problems.  The atomic reserves will show with a
letter 'H' in the free areas listings.

Fixes: 016c13daa5 ("mm, page_alloc: use masks and shifts when converting GFP flags to migrate types")
Fixes: 0aaa29a56e ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Vladimir Davydov 9516a18a9a memcg: fix memory.high target
When the memory.high threshold is exceeded, try_charge() schedules a
task_work to reclaim the excess.  The reclaim target is set to the
number of pages requested by try_charge().

This is wrong, because try_charge() usually charges more pages than
requested (batch > nr_pages) in order to refill per cpu stocks.  As a
result, a process in a cgroup can easily exceed memory.high
significantly when doing a lot of charges w/o returning to userspace
(e.g.  reading a file in big chunks).

Fix this issue by assuring that when exceeding memory.high a process
reclaims as many pages as were actually charged (i.e.  batch).

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Naoya Horiguchi a88c769548 mm: hugetlb: fix hugepage memory leak caused by wrong reserve count
When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on
alloc_buddy_huge_page() to directly create a hugepage from the buddy
allocator.

In that case, however, if alloc_buddy_huge_page() succeeds we don't
decrement h->resv_huge_pages, which means that successful
hugetlb_fault() returns without releasing the reserve count.  As a
result, subsequent hugetlb_fault() might fail despite that there are
still free hugepages.

This patch simply adds decrementing code on that code path.

I reproduced this problem when testing v4.3 kernel in the following situation:
 - the test machine/VM is a NUMA system,
 - hugepage overcommiting is enabled,
 - most of hugepages are allocated and there's only one free hugepage
   which is on node 0 (for example),
 - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to
   node 1, tries to allocate a hugepage,
 - the allocation should fail but the reserve count is still hold.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org> [3.16+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Linus Torvalds b9d85451dd Merge tag 'dm-4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
 "Five stable fixes:

   - Two DM btree bufio buffer leak fixes that resolve reported BUG_ONs
     during DM thinp metadata close's dm_bufio_client_destroy().

   - A DM thinp range discard fix to handle discarding a partially
     mapped range.

   - A DM thinp metadata snapshot fix to make sure the btree roots saved
     in the metadata snapshot are the most current.

   - A DM space map metadata refcounting fix that improves both DM thinp
     and DM cache metadata"

* tag 'dm-4.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm btree: fix bufio buffer leaks in dm_btree_del() error path
  dm space map metadata: fix ref counting bug when bootstrapping a new space map
  dm thin metadata: fix bug when taking a metadata snapshot
  dm thin metadata: fix bug in dm_thin_remove_range()
  dm btree: fix leak of bufio-backed block in btree_split_sibling error path
2015-12-11 11:00:30 -08:00
Linus Torvalds 732c4a9e14 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "Two bugfixes, both bound for -stable"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: break infinite loop in fuse_fill_write_pages()
  cuse: fix memory leak
2015-12-11 10:56:41 -08:00
Linus Torvalds 4be460d96f Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Not too much this time.

   - One nouveau workaround extended to a few more GPUs
   - Some amdgpu big endian fixes, and a regression fixer
   - Some vmwgfx fixes
   - One ttm locking fix
   - One vgaarb fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  vgaarb: fix signal handling in vga_get()
  radeon: Fix VCE IB test on Big-Endian systems
  radeon: Fix VCE ring test for Big-Endian systems
  radeon/cik: Fix GFX IB test on Big-Endian
  drm/amdgpu: fix the lost duplicates checking
  drm/nouveau/pmu: remove whitelist for PGOB-exit WAR, enable by default
  drm/vmwgfx: Implement the cursor_set2 callback v2
  drm/vmwgfx: fix a warning message
  drm/ttm: Fixed a read/write lock imbalance
2015-12-11 10:51:02 -08:00
Kirill A. Shutemov 9f5bd30818 vgaarb: fix signal handling in vga_get()
There are few defects in vga_get() related to signal hadning:

  - we shouldn't check for pending signals for TASK_UNINTERRUPTIBLE
    case;

  - if we found pending signal we must remove ourself from wait queue
    and change task state back to running;

  - -ERESTARTSYS is more appropriate, I guess.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: stable@vger.kernel.org
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-12-11 14:04:44 +10:00
Dave Airlie 49307da31b Merge branch 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
some big endian fixes and one regression fix.

* 'drm-fixes-4.4' of git://people.freedesktop.org/~agd5f/linux:
  radeon: Fix VCE IB test on Big-Endian systems
  radeon: Fix VCE ring test for Big-Endian systems
  radeon/cik: Fix GFX IB test on Big-Endian
  drm/amdgpu: fix the lost duplicates checking
2015-12-11 14:04:09 +10:00
Linus Torvalds 0bd0f1e6d4 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
 "Most are minor to important fixes.

  There is one performance enhancement that I took on the grounds that
  failing to check if other processes can run before running what's
  intended to be a background, idle-time task is a bug, even though the
  primary effect of the fix is to improve performance (and it was a very
  simple patch)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Postpone remove_keys under knowledge of coming preemption
  IB/mlx4: Use vmalloc for WR buffers when needed
  IB/mlx4: Use correct order of variables in log message
  iser-target: Remove explicit mlx4 work-around
  mlx4: Expose correct max_sge_rd limit
  IB/mad: Require CM send method for everything except ClassPortInfo
  IB/cma: Add a missing rcu_read_unlock()
  IB core: Fix ib_sg_to_pages()
  IB/srp: Fix srp_map_sg_fr()
  IB/srp: Fix indirect data buffer rkey endianness
  IB/srp: Initialize dma_length in srp_map_idb
  IB/srp: Fix possible send queue overflow
  IB/srp: Fix a memory leak
  IB/sa: Put netlink request into the request list before sending
  IB/iser: use sector_div instead of do_div
  IB/core: use RCU for uverbs id lookup
  IB/qib: Minor fixes to qib per SFF 8636
  IB/core: Fix user mode post wr corruption
  IB/qib: Fix qib_mr structure
2015-12-10 14:42:22 -08:00
Linus Torvalds a80c47daa8 Merge tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
 "Again less intensive changes in this rc: you can find only a few
  HD-audio fixes (noise fixes for Intel Broxton chip and a few Thinkpad
  models, quirks for Alienware 17 and Packard Bell DOTS) in addition to
  a long-standing rme96 bug fix"

* tag 'sound-4.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/ca0132 - quirk for Alienware 17 2015
  ALSA: hda - Fix noise problems on Thinkpad T440s
  ALSA: hda - Fixing speaker noise on the two latest thinkpad models
  ALSA: hda - Add inverted dmic for Packard Bell DOTS
  ALSA: hda - Fix playback noise with 24/32 bit sample size on BXT
  ALSA: rme96: Fix unexpected volume reset after rate changes
2015-12-10 10:44:32 -08:00
Joe Thornber ed8b45a367 dm btree: fix bufio buffer leaks in dm_btree_del() error path
If dm_btree_del()'s call to push_frame() fails, e.g. due to
btree_node_validator finding invalid metadata, the dm_btree_del() error
path must unlock all frames (which have active dm-bufio buffers) that
were pushed onto the del_stack.

Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
buffers have leaked, e.g.:
  device-mapper: bufio: leaked buffer 3, hold count 1, list 0

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-10 10:30:18 -05:00
Linus Torvalds 6764e5ebd5 Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:

 - Various fixes for removing redundancy, const'ifying structs, avoiding
   stack usage, fixing WARN usage (Krzysztof Kozlowski, Julia Lawall,
   Kees Cook, Dan Carpenter)

 - Revert No-IOMMU mode as the intended user has not emerged (Alex
   Williamson)

* tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio:
  Revert: "vfio: Include No-IOMMU mode"
  vfio: fix a warning message
  vfio: platform: remove needless stack usage
  vfio-pci: constify pci_error_handlers structures
  vfio: Drop owner assignment from platform_driver
2015-12-09 16:52:12 -08:00
Linus Torvalds eef121f407 Merge tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DT fixes from Rob Herring:
 "I think this should be all for 4.4:

   - Fix incorrect warning about overlapping memory regions

   - Export of_irq_find_parent again which was made static in 4.4, but
     has users pending for 4.5.

   - Fix of_msi_map_rid declaration location

   - Fix re-entrancy for of_fdt_unflatten_tree

   - Clean-up of phys_addr_t printks"

* tag 'devicetree-fixes-for-4.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of/irq: move of_msi_map_rid declaration to the correct ifdef section
  of/irq: Export of_irq_find_parent again
  of/fdt: Add mutex protection for calls to __unflatten_device_tree()
  of/address: fix typo in comment block of of_translate_one()
  of: do not use 0x in front of %pa
  of: Fix comparison of reserved memory regions
2015-12-09 16:44:07 -08:00
Linus Torvalds abb7e2b3f0 Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
 "One small build fix, a couple do_div() fixes, and a fix for the gpio
  basic clock type are the major changes here.  There's also a couple
  fixes for the TI, sunxi, and scpi clock drivers"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: sunxi: pll2: Fix clock running too fast
  clk: scpi: add missing of_node_put
  clk: qoriq: fix memory leak
  imx/clk-pllv2: fix wrong do_div() usage
  imx/clk-pllv1: fix wrong do_div() usage
  clk: mmp: add linux/clk.h includes
  clk: ti: drop locking code from mux/divider drivers
  clk: ti816x: Add missing dmtimer clkdev entries
  clk: ti: fapll: fix wrong do_div() usage
  clk: ti: clkt_dpll: fix wrong do_div() usage
  clk: gpio: Get parent clk names in of_gpio_clk_setup()
2015-12-09 16:36:29 -08:00
Linus Torvalds 9a0f76fde9 Merge tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi
Pull IPMI fix from Corey Minyard:
 "Fix an Oops if an interrupt occurs at startup.  This can happen on
  some hardware"

* tag 'for-linus-4.4-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: move timer init to before irq is setup
2015-12-09 11:57:10 -08:00
Jan Stancek 27f972d3e0 ipmi: move timer init to before irq is setup
We encountered a panic on boot in ipmi_si on a dell per320 due to an
uninitialized timer as follows.

static int smi_start_processing(void       *send_info,
                                ipmi_smi_t intf)
{
        /* Try to claim any interrupts. */
        if (new_smi->irq_setup)
                new_smi->irq_setup(new_smi);

 --> IRQ arrives here and irq handler tries to modify uninitialized timer

    which triggers BUG_ON(!timer->function) in __mod_timer().

 Call Trace:
   <IRQ>
   [<ffffffffa0532617>] start_new_msg+0x47/0x80 [ipmi_si]
   [<ffffffffa053269e>] start_check_enables+0x4e/0x60 [ipmi_si]
   [<ffffffffa0532bd8>] smi_event_handler+0x1e8/0x640 [ipmi_si]
   [<ffffffff810f5584>] ? __rcu_process_callbacks+0x54/0x350
   [<ffffffffa053327c>] si_irq_handler+0x3c/0x60 [ipmi_si]
   [<ffffffff810efaf0>] handle_IRQ_event+0x60/0x170
   [<ffffffff810f245e>] handle_edge_irq+0xde/0x180
   [<ffffffff8100fc59>] handle_irq+0x49/0xa0
   [<ffffffff8154643c>] do_IRQ+0x6c/0xf0
   [<ffffffff8100ba53>] ret_from_intr+0x0/0x11

        /* Set up the timer that drives the interface. */
        setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi);

The following patch fixes the problem.

To: Openipmi-developer@lists.sourceforge.net
To: Corey Minyard <minyard@acm.org>
CC: linux-kernel@vger.kernel.org

Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Cc: stable@vger.kernel.org # Applies cleanly to 3.10-, needs small rework before
2015-12-09 13:13:06 -06:00
Sasha Levin d7e35dfa25 bitops.h: correctly handle rol32 with 0 byte shift
ROL on a 32 bit integer with a shift of 32 or more is undefined and the
result is arch-dependent. Avoid this by handling the trivial case of
roling by 0 correctly.

The trivial solution of checking if shift is 0 breaks gcc's detection
of this code as a ROL instruction, which is unacceptable.

This bug was reported and fixed in GCC
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):

	The standard rotate idiom,

	  (x << n) | (x >> (32 - n))

	is recognized by gcc (for concreteness, I discuss only the case that x
	is an uint32_t here).

	However, this is portable C only for n in the range 0 < n < 32. For n
	== 0, we get x >> 32 which gives undefined behaviour according to the
	C standard (6.5.7, Bitwise shift operators). To portably support n ==
	0, one has to write the rotate as something like

	  (x << n) | (x >> ((-n) & 31))

	And this is apparently not recognized by gcc.

Note that this is broken on older GCCs and will result in slower ROL.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-09 10:35:16 -08:00
Joe Thornber 50dd842ad8 dm space map metadata: fix ref counting bug when bootstrapping a new space map
When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-09 13:27:25 -05:00
Joe Thornber 49e99fc717 dm thin metadata: fix bug when taking a metadata snapshot
When you take a metadata snapshot the btree roots for the mapping and
details tree need to have their reference counts incremented so they
persist for the lifetime of the metadata snap.

The roots being incremented were those currently written in the
superblock, which could possibly be out of date if concurrent IO is
triggering new mappings, breaking of sharing, etc.

Fix this by performing a commit with the metadata lock held while taking
a metadata snapshot.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-12-09 13:18:12 -05:00