Commit Graph

165 Commits

Author SHA1 Message Date
Keith Busch 61e4ce086d NVMe: Make iod bio timeout a parameter
This was originally set to 4 times the IO timeout, but that was when
the IO timeout was 5 seconds instead of 30. 20 seconds for total time
to failure seemed more reasonable than 2 minutes for most, but other
users have requested to make this a module parameter instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[renamed the module parameter to retry_time]
[made retry_time static]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 22:56:43 -04:00
Santosh Y 6808c5fb7f NVMe: Prevent possible NULL pointer dereference
kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod'
can fail. So check the return value.

Signed-off-by: Santosh Y <santosh.sy@samsung.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-03 16:43:30 -04:00
Keith Busch f0d54a5417 NVMe: Implement PCIe reset notification callback
Quiesce and shutdown the device prior to reset, then restart the device and
resume IO after.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-05-27 11:12:29 -06:00
Matthew Wilcox 21bd78bcf4 NVMe: Enable BUILD_BUG_ON checks
Since _nvme_check_size() wasn't being called from anywhere, the compiler
was optimising it away ... along with all the link-time build failures
that would result if any of the structures were the wrong size.  Call it
from nvme_exit() for no particular reason.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-09 22:42:39 -04:00
Keith Busch 53562be74b NVMe: Flush with data support
It is possible a filesystem may send a flush flagged bio with write
data. There is no such composite NVMe command, so the driver sends flush
and write separately.

The device is allowed to execute these commands in any order, so it was
possible the driver ends the bio after the write completes, but while the
flush is still active. We don't want to let a filesystem believe flush
succeeded before it really has; this could cause data corruption on a
power loss between these events. To fix, this patch splits the flush
and write into chained bios.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:54:02 -04:00
Keith Busch a7d2ce2832 NVMe: Configure support for block flush
This configures an nvme request_queue as flush capable if the device
has a volatile write cache present.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:53:53 -04:00
Keith Busch 3291fa57cb NVMe: Add tracepoints
Adding tracepoints for bio_complete and block_split into nvme to help
with gathering IO info using blktrace and blkparse.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:26 -04:00
Keith Busch 94bbac4052 NVMe: Protect against badly formatted CQEs
If a misbehaving device posts a CQE with a command id < depth but for
one that was never allocated, the command info will have a callback
function set to NULL and we don't want to try invoking that.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:25 -04:00
Matthew Wilcox 27e8166c31 NVMe: Improve error messages
Help people diagnose what is going wrong at initialisation time by
printing out which command has gone wrong and what the device returned.
Also fix the error message printed while waiting for reset.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2014-05-05 10:41:25 -04:00
Matthew Wilcox 8757ad65d3 NVMe: Update copyright headers
Make the copyright dates accurate and remove the final paragraph that
includes the address of the FSF.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-05-05 10:41:25 -04:00
Linus Torvalds 3e8072d48b Merge git://git.infradead.org/users/willy/linux-nvme
Pull NVMe driver updates from Matthew Wilcox:
 "Various updates to the NVMe driver.  The most user-visible change is
  that drive hotplugging now works and CPU hotplug while an NVMe drive
  is installed should also work better"

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Retry failed commands with non-fatal errors
  NVMe: Add getgeo to block ops
  NVMe: Start-stop nvme_thread during device add-remove.
  NVMe: Make I/O timeout a module parameter
  NVMe: CPU hot plug notification
  NVMe: per-cpu io queues
  NVMe: Replace DEFINE_PCI_DEVICE_TABLE
  NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds
  NVMe: IOCTL path RCU protect queue access
  NVMe: RCU protected access to io queues
  NVMe: Initialize device reference count earlier
  NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
2014-04-11 16:45:59 -07:00
Keith Busch edd10d3328 NVMe: Retry failed commands with non-fatal errors
For commands returned with failed status, queue these for resubmission
and continue retrying them until success or for a limited amount of
time. The final timeout was arbitrarily chosen so requests can't be
retried indefinitely.

Since these are requeued on the nvmeq that submitted the command, the
callbacks have to take an nvmeq instead of an nvme_dev as a parameter
so that we can use the locked queue to append the iod to retry later.

The nvme_iod conviently can be used to track how long we've been trying
to successfully complete an iod request. The nvme_iod also provides the
nvme prp dma mappings, so I had to move a few things around so we can
keep those mappings.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[fixed checkpatch issue with long line]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:11:59 -04:00
Keith Busch 4cc09e2dc4 NVMe: Add getgeo to block ops
Some programs require HDIO_GETGEO work, which requires we implement
getgeo.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:06:11 -04:00
Dan McLeran b9afca3efb NVMe: Start-stop nvme_thread during device add-remove.
Done to ensure nvme_thread is not running when there
are no devices to poll.

Signed-off-by: Dan McLeran <daniel.mcleran@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:04:46 -04:00
Keith Busch b355084a89 NVMe: Make I/O timeout a module parameter
Increase the default timeout to 30 seconds to match SCSI.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[use byte instead of ushort]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:04:38 -04:00
Keith Busch 33b1e95c90 NVMe: CPU hot plug notification
Registers with hot cpu notification to rebalance, and potentially allocate
additional, io queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:03:42 -04:00
Keith Busch 42f614201e NVMe: per-cpu io queues
The device's IO queues are associated with CPUs, so we can use a per-cpu
variable to map the a qid to a cpu. This provides a convienient way
to optimally assign queues to multiple cpus when the device supports
fewer queues than the host has cpus. The previous implementation may
have assigned these poorly in these situations. This patch addresses
this by sharing queues among cpus that are "close" together and should
have a lower lock contention penalty.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-04-10 17:03:15 -04:00
Linus Torvalds b33ce44299 Merge branch 'for-3.15/drivers' of git://git.kernel.dk/linux-block
Pull block driver update from Jens Axboe:
 "On top of the core pull request, here's the pull request for the
  driver related changes for 3.15.  It contains:

   - Improvements for msi-x registration for block drivers (mtip32xx,
     skd, cciss, nvme) from Alexander Gordeev.

   - A round of cleanups and improvements for drbd from Andreas
     Gruenbacher and Rashika Kheria.

   - A round of clanups and improvements for bcache from Kent.

   - Removal of sleep_on() and friends in DAC960, ataflop, swim3 from
     Arnd Bergmann.

   - Bug fix for a bug in the mtip32xx async completion code from Sam
     Bradshaw.

   - Bug fix for accidentally bouncing IO on 32-bit platforms with
     mtip32xx from Felipe Franciosi"

* 'for-3.15/drivers' of git://git.kernel.dk/linux-block: (103 commits)
  bcache: remove nested function usage
  bcache: Kill bucket->gc_gen
  bcache: Kill unused freelist
  bcache: Rework btree cache reserve handling
  bcache: Kill btree_io_wq
  bcache: btree locking rework
  bcache: Fix a race when freeing btree nodes
  bcache: Add a real GC_MARK_RECLAIMABLE
  bcache: Add bch_keylist_init_single()
  bcache: Improve priority_stats
  bcache: Better alloc tracepoints
  bcache: Kill dead cgroup code
  bcache: stop moving_gc marking buckets that can't be moved.
  bcache: Fix moving_pred()
  bcache: Fix moving_gc deadlocking with a foreground write
  bcache: Fix discard granularity
  bcache: Fix another bug recovering from unclean shutdown
  bcache: Fix a bug recovering from unclean shutdown
  bcache: Fix a journalling reclaim after recovery bug
  bcache: Fix a null ptr deref in journal replay
  ...
2014-04-01 19:43:53 -07:00
Matthew Wilcox 6eb0d698ef NVMe: Replace DEFINE_PCI_DEVICE_TABLE
Checkpatch has started warning against using DEFINE_PCI_DEVICE_TABLE,
so replace it.  Also update the copyright date and bump the module
version number to 0.9.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-03-24 10:11:22 -04:00
Keith Busch 4f5099af4f NVMe: IOCTL path RCU protect queue access
This adds rcu protected access to a queue in the nvme IOCTL path
to fix potential races between a surprise removal and queue usage in
nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent
the nvme_queue from freeing while this path is executing so it can't
sleep, and so this path will no longer wait for a available command
id should they all be in use at the time a passthrough IOCTL request
is received.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-03-24 08:54:40 -04:00
Keith Busch 5a92e700af NVMe: RCU protected access to io queues
This adds rcu protected access to nvme_queue to fix a race between a
surprise removal freeing the queue and a thread with open reference on
a NVMe block device using that queue.

The queues do not need to be rcu protected during the initialization or
shutdown parts, so I've added a helper function for raw deferencing
to get around the sparse errors.

There is still a hole in the IOCTL path for the same problem, which is
fixed in a subsequent patch.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-03-24 08:45:57 -04:00
Keith Busch fb35e914b3 NVMe: Initialize device reference count earlier
If an NVMe device becomes ready but fails to create IO queues, the driver
creates a character device handle so the device can be managed. The
device reference count needs to be initialized before creating the
character device.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-03-24 08:15:14 -04:00
Jingoo Han 671a6018db NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions
Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following
build warning when CONFIG_PM_SLEEP is not selected. This is because
sleep PM callbacks defined by SIMPLE_DEV_PM_OPS are only used when
the CONFIG_PM_SLEEP is enabled.

drivers/block/nvme-core.c:2541:12: warning: 'nvme_suspend' defined but not used [-Wunused-function]
drivers/block/nvme-core.c:2550:12: warning: 'nvme_resume' defined but not used [-Wunused-function]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-03-24 08:05:28 -04:00
Alexander Gordeev be577fabf3 nvme: Use pci_enable_msi_range() and pci_enable_msix_range()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-03-13 14:56:39 -06:00
Tejun Heo 9ca9737444 nvme: don't use PREPARE_WORK
PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

nvme_dev->reset_work is multiplexed with multiple work functions.
Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
always use it as the work function and update the users to set the
->reset_workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-nvme@lists.infradead.org
2014-03-07 10:24:49 -05:00