Commit Graph

4183 Commits

Author SHA1 Message Date
Linus Torvalds 564884fbde Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes that wasn't included in the first merge window pull
  request.  This pull request contains:

   - A set of NVMe fixes from Keith, and one from Nic for the integrity
     side of it.

   - Fix from Ming, clearing ->mq_ops if we don't successfully setup a
     queue for multiqueue.

   - A set of stability fixes for bcache from Jiri, and also marking
     bcache as orphaned as it's no longer actively maintained (in
     mainline, at least)"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: clear q->mq_ops if init fail
  MAINTAINERS: mark bcache as orphan
  bcache: bch_gc_thread() is not freezable
  bcache: bch_allocator_thread() is not freezable
  bcache: bch_writeback_thread() is not freezable
  nvme/host: Add missing blk_integrity tag_size + flags assignments
  NVMe: Add device ID's with stripe quirk
  NVMe: Short-cut removal on surprise hot-unplug
  NVMe: Allow user initiated rescan
  NVMe: Reduce driver log spamming
  NVMe: Unbind driver on failure
  NVMe: Delete only created queues
  NVMe: Allocate queues only for online cpus
2016-05-27 14:28:09 -07:00
Jiri Kosina 29e6c57cc7 bcache: bch_gc_thread() is not freezable
bch_gc_thread() doesn't mark itself freezable, so calling try_to_freeze()
in its context is just an expensive no-op.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-24 09:00:45 -06:00
Jiri Kosina 770b8ce400 bcache: bch_allocator_thread() is not freezable
bch_allocator_thread() is calling try_to_freeze(), but that's just an
expensive no-op given the fact that the thread is not marked freezable.

Bucket allocator has to be up and running to the very last stages of the
suspend, as the bcache I/O that's in flight (think of writing an
hibernation image to a swap device served by bcache).

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-24 09:00:43 -06:00
Jiri Kosina 7c87df9c15 bcache: bch_writeback_thread() is not freezable
bch_writeback_thread() is calling try_to_freeze(), but that's just an
expensive no-op given the fact that the thread is not marked freezable.

I/O helper kthreads, exactly such as the bcache writeback thread, actually
shouldn't be freezable, because they are potentially necessary for
finalizing the image write-out.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-24 09:00:40 -06:00
Linus Torvalds feaa7cb5c5 Merge tag 'md/4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD updates from Shaohua Li:
 "Several patches from Guoqing fixing md-cluster bugs and several
  patches from Heinz fixing dm-raid bugs"

* tag 'md/4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md-cluster: check the return value of process_recvd_msg
  md-cluster: gather resync infos and enable recv_thread after bitmap is ready
  md: set MD_CHANGE_PENDING in a atomic region
  md: raid5: add prerequisite to run underneath dm-raid
  md: raid10: add prerequisite to run underneath dm-raid
  md: md.c: fix oops in mddev_suspend for raid0
  md-cluster: fix ifnullfree.cocci warnings
  md-cluster/bitmap: unplug bitmap to sync dirty pages to disk
  md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit
  md-cluster/bitmap: fix wrong calcuation of offset
  md-cluster: sync bitmap when node received RESYNCING msg
  md-cluster: always setup in-memory bitmap
  md-cluster: wakeup thread if activated a spare disk
  md-cluster: change array_sectors and update size are not supported
  md-cluster: fix locking when node joins cluster during message broadcast
  md-cluster: unregister thread if err happened
  md-cluster: wake up thread to continue recovery
  md-cluser: make resync_finish only called after pers->sync_request
  md-cluster: change resync lock from asynchronous to synchronous
2016-05-19 17:25:13 -07:00
Linus Torvalds b80fed9595 Merge tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:

 - based on Jens' 'for-4.7/core' to have DM thinp's discard support use
   bio_inc_remaining() and the block core's new async __blkdev_issue_discard()
   interface

 - make DM multipath's fast code-paths lockless, using lockless_deference,
   to significantly improve large NUMA performance when using blk-mq.
   The m->lock spinlock contention was a serious bottleneck.

 - a few other small code cleanups and Documentation fixes

* tag 'dm-4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm thin: unroll issue_discard() to create longer discard bio chains
  dm thin: use __blkdev_issue_discard for async discard support
  dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
  dm raid: make sure no feature flags are set in metadata
  dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call
  dm stats: fix spelling mistake in Documentation
  dm cache: update cache-policies.txt now that mq is an alias for smq
  dm mpath: eliminate use of spinlock in IO fast-paths
  dm mpath: move trigger_event member to the end of 'struct multipath'
  dm mpath: use atomic_t for counting members of 'struct multipath'
  dm mpath: switch to using bitops for state flags
  dm thin: Remove return statement from void function
  dm: remove unused mapped_device argument from free_tio()
2016-05-17 16:13:00 -07:00
Linus Torvalds 24b9f0cf00 Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "On top of the core pull request, this is the drivers pull request for
  this merge window.  This contains:

   - Switch drivers to the new write back cache API, and kill off the
     flush flags.  From me.

   - Kill the discard support for the STEC pci-e flash driver.  It's
     trivially broken, and apparently unmaintained, so it's safer to
     just remove it.  From Jeff Moyer.

   - A set of lightnvm updates from the usual suspects (Matias/Javier,
     and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei
     Tao.

   - A set of updates for NVMe:

        - Turn the controller state management into a proper state
          machine.  From Christoph.

        - Shuffling of code in preparation for NVMe-over-fabrics, also
          from Christoph.

        - Cleanup of the command prep part from Ming Lin.

        - Rewrite of the discard support from Ming Lin.

        - Deadlock fix for namespace removal from Ming Lin.

        - Use the now exported blk-mq tag helper for IO termination.
          From Sagi.

        - Various little fixes from Christoph, Guilherme, Keith, Ming
          Lin, Wang Sheng-Hui.

   - Convert mtip32xx to use the now exported blk-mq tag iter function,
     from Keith"

* 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits)
  lightnvm: reserved space calculation incorrect
  lightnvm: rename nr_pages to nr_ppas on nvm_rq
  lightnvm: add is_cached entry to struct ppa_addr
  lightnvm: expose gennvm_mark_blk to targets
  lightnvm: remove mgt targets on mgt removal
  lightnvm: pass dma address to hardware rather than pointer
  lightnvm: do not assume sequential lun alloc.
  nvme/lightnvm: Log using the ctrl named device
  lightnvm: rename dma helper functions
  lightnvm: enable metadata to be sent to device
  lightnvm: do not free unused metadata on rrpc
  lightnvm: fix out of bound ppa lun id on bb tbl
  lightnvm: refactor set_bb_tbl for accepting ppa list
  lightnvm: move responsibility for bad blk mgmt to target
  lightnvm: make nvm_set_rqd_ppalist() aware of vblks
  lightnvm: remove struct factory_blks
  lightnvm: refactor device ops->get_bb_tbl()
  lightnvm: introduce nvm_for_each_lun_ppa() macro
  lightnvm: refactor dev->online_target to global nvm_targets
  lightnvm: rename nvm_targets to nvm_tgt_type
  ...
2016-05-17 16:03:32 -07:00
Joe Thornber 202bae5293 dm thin: unroll issue_discard() to create longer discard bio chains
There is little benefit to doing this but it does structure DM thinp's
code to more cleanly use the __blkdev_issue_discard() interface --
particularly in passdown_double_checking_shared_status().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-13 09:04:20 -04:00
Mike Snitzer 3dba53a958 dm thin: use __blkdev_issue_discard for async discard support
With commit 38f2525533 ("block: add __blkdev_issue_discard") DM thinp
no longer needs to carry its own async discard method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-05-13 09:03:52 -04:00
Mike Snitzer 13e4f8a695 dm thin: remove __bio_inc_remaining() and switch to using bio_inc_remaining()
DM thinp's use of bio_inc_remaining() is critical to ensure the original
parent discard bio isn't completed before sub-discards have.  DM thinp
needs this due to the extra quiescing that occurs, via multiple DM thinp
mappings, while processing large discards.  As such DM thinp must build
the async discard bio chain after some delay -- so bio_inc_remaining()
is used to enable DM thinp to take a reference on the original parent
discard bio for each mapping.  This allows the immediate use of
bio_endio() on that discard bio; but with the understanding that the
actual completion won't occur until each of the sub-discards'
per-mapping references are dropped.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2016-05-13 09:03:52 -04:00
Heinz Mauelshagen 4c9971ca6a dm raid: make sure no feature flags are set in metadata
Given we don't yet support any feature flags in the dm-raid ondisk
metadata (see: 'features' member of 'struct dm_raid_superblock'),
add a check to ensure no flags are actually set, if any features are
set reject the activation of the RAID mapping.

This is to prevent possible data corruption in case of a kernel
downgrade when there'll potentially be feature flags set by a future
dm-raid target.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-13 09:03:51 -04:00
Guoqing Jiang 1fa9a1ad0a md-cluster: check the return value of process_recvd_msg
We don't need to run the full path of recv_daemon
if process_recvd_msg doesn't return 0.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:04 -07:00
Guoqing Jiang 51e453aecb md-cluster: gather resync infos and enable recv_thread after bitmap is ready
The in-memory bitmap is not ready when node joins cluster,
so it doesn't make sense to make gather_all_resync_info()
called so earlier, we need to call it after the node's
bitmap is setup. Also, recv_thread could be wake up after
node joins cluster, but it could cause problem if node
receives RESYNCING message without persionality since
mddev->pers->quiesce is called in process_suspend_info.

This commit introduces a new cluster interface load_bitmaps
to fix above problems, load_bitmaps is called in bitmap_load
where bitmap and persionality are ready, and load_bitmaps
does the following tasks:

1. call gather_all_resync_info to load all the node's
   bitmap info.
2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
   could be wake up, and wake up recv_thread if there is
   pending recv event.

Then ack_bast only wakes up recv_thread after IN_CLUSTER
bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
set.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:03 -07:00
Guoqing Jiang 85ad1d13ee md: set MD_CHANGE_PENDING in a atomic region
Some code waits for a metadata update by:

1. flagging that it is needed (MD_CHANGE_DEVS or MD_CHANGE_CLEAN)
2. setting MD_CHANGE_PENDING and waking the management thread
3. waiting for MD_CHANGE_PENDING to be cleared

If the first two are done without locking, the code in md_update_sb()
which checks if it needs to repeat might test if an update is needed
before step 1, then clear MD_CHANGE_PENDING after step 2, resulting
in the wait returning early.

So make sure all places that set MD_CHANGE_PENDING are atomicial, and
bit_clear_unless (suggested by Neil) is introduced for the purpose.

Cc: Martin Kepplinger <martink@posteo.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: <linux-kernel@vger.kernel.org>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:02 -07:00
Heinz Mauelshagen fe67d19a2d md: raid5: add prerequisite to run underneath dm-raid
In case md runs underneath the dm-raid target, the mddev does not have
a request queue or gendisk, thus avoid accesses.

This patch adds a missing conditional to the raid5 personality.

Signed-of-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:02 -07:00
Heinz Mauelshagen 859644f0fa md: raid10: add prerequisite to run underneath dm-raid
In case md runs underneath the dm-raid target, the mddev does not have
a request queue or gendisk, thus avoid accesses to it.

This patch adds two missing conditionals to the raid10 personality.

Signed-of-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:24:01 -07:00
Heinz Mauelshagen 092398dce8 md: md.c: fix oops in mddev_suspend for raid0
Introduced by upstream commit 70d9798b95

The raid0 personality does not create mddev->thread as oposed to
other personalities leading to its unconditional access in
mddev_suspend() causing an oops.

Patch checks for mddev->thread in order to keep the
intention of aforementioned commit.

Fixes: 70d9798b95 ("MD: warn for potential deadlock")
Cc: stable@vger.kernel.org (4.5+)
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-09 09:23:23 -07:00
Michal Hocko 72f6d8d8c9 dm ioctl: drop use of __GFP_REPEAT in copy_params()'s __vmalloc() call
copy_params()'s use of __GFP_REPEAT for the __vmalloc() call doesn't make much
sense because vmalloc doesn't rely on costly high order allocations.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:55 -04:00
Mike Snitzer 2da1610ae2 dm mpath: eliminate use of spinlock in IO fast-paths
The primary motivation of this commit is to improve the scalability of
DM multipath on large NUMA systems where m->lock spinlock contention has
been proven to be a serious bottleneck on really fast storage.

The ability to atomically read a pointer, using lockless_dereference(),
is leveraged in this commit.  But all pointer writes are still protected
by the m->lock spinlock (which is fine since these all now occur in the
slow-path).

The following functions no longer require the m->lock spinlock in their
fast-path: multipath_busy(), __multipath_map(), and do_end_io()

And choose_pgpath() is modified to _not_ update m->current_pgpath unless
it also switches the path-group.  This is done to avoid needing to take
the m->lock everytime __multipath_map() calls choose_pgpath().
But m->current_pgpath will be reset if it is failed via fail_path().

Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:52 -04:00
Mike Snitzer 20800cb345 dm mpath: move trigger_event member to the end of 'struct multipath'
Allows the 'work_mutex' member to no longer cross a cacheline.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:52 -04:00
Mike Snitzer 91e968aa60 dm mpath: use atomic_t for counting members of 'struct multipath'
The use of atomic_t for nr_valid_paths, pg_init_in_progress and
pg_init_count will allow relaxing the use of the m->lock spinlock.

Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:51 -04:00
Mike Snitzer 518257b132 dm mpath: switch to using bitops for state flags
Mechanical change that doesn't make any real effort to reduce the use of
m->lock; that will come later (once atomics are used for counters, etc).

Suggested-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:50 -04:00
Amitoj Kaur Chawla 813923b1a2 dm thin: Remove return statement from void function
Return statement at the end of a void function is useless.

The Coccinelle semantic patch used to make this change is as follows:
//<smpl>
@@
identifier f;
expression e;
@@
void f(...) {
<...
- return
  e;
...>
}
//</smpl>

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:50 -04:00
Mike Snitzer cfae7529b5 dm: remove unused mapped_device argument from free_tio()
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-05-05 15:25:49 -04:00
kbuild test robot bc47e84258 md-cluster: fix ifnullfree.cocci warnings
drivers/md/bitmap.c:2049:6-11: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2016-05-04 12:39:35 -07:00