Commit Graph

5885 Commits

Author SHA1 Message Date
Ross Zwisler 1d037577c3 loop: Fix lost writes caused by missing flag
The following commit:

commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")

replaced __do_lo_send_write(), which used ITER_KVEC iterators, with
lo_write_bvec() which uses ITER_BVEC iterators.  In this change, though,
the WRITE flag was lost:

-       iov_iter_kvec(&from, ITER_KVEC | WRITE, &kvec, 1, len);
+       iov_iter_bvec(&i, ITER_BVEC, bvec, 1, bvec->bv_len);

This flag is necessary for the DAX case because we make decisions based on
whether or not the iterator is a READ or a WRITE in dax_iomap_actor() and
in dax_iomap_rw().

We end up going through this path in configurations where we combine a PMEM
device with 4k sectors, a loopback device and DAX.  The consequence of this
missed flag is that what we intend as a write actually turns into a read in
the DAX code, so no data is ever written.

The very simplest test case is to create a loopback device and try and
write a small string to it, then hexdump a few bytes of the device to see
if the write took.  Without this patch you read back all zeros, with this
you read back the string you wrote.

For XFS this causes us to fail or panic during the following xfstests:

	xfs/074 xfs/078 xfs/216 xfs/217 xfs/250

For ext4 we have a similar issue where writes never happen, but we don't
currently have any xfstests that use loopback and show this issue.

Fix this by restoring the WRITE flag argument to iov_iter_bvec().  This
causes the xfstests to all pass.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Fixes: commit aa4d86163e ("block: loop: switch to VFS ITER_BVEC")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-09 08:36:36 -07:00
Jens Axboe 9296080a18 Merge branch 'stable/for-jens-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
Pull a xen_blkfront fix from Konrad:

"It has one simple fix for the multi-queue support not showing up after
a block device was detached/re-attached."

* 'stable/for-jens-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen-blkfront: move negotiate_mq to cover all cases of new VBDs
2018-03-08 09:24:52 -07:00
Bhavesh Davda 7ed8ce1c5f xen-blkfront: move negotiate_mq to cover all cases of new VBDs
negotiate_mq should happen in all cases of a new VBD being discovered by
xen-blkfront, whether called through _probe() or a hot-attached new VBD
from dom-0 via xenstore. Otherwise, hot-attached new VBDs are left
configured without multi-queue.

Signed-off-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-03-07 15:51:24 -05:00
Jiufei Xue 158e61865a block: fix a typo
Fix a typo in pkt_start_recovery.

Fixes: 74d46992e0 ("block: replace bi_bdev with a gendisk pointer and partitions index")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-01 08:41:27 -07:00
Gustavo A. R. Silva 0979962f54 nbd: fix return value in error handling path
It seems that the proper value to return in this particular case is the
one contained into variable new_index instead of ret.

Addresses-Coverity-ID: 1465148 ("Copy-paste error")
Fixes: e46c7287b1 ("nbd: add a basic netlink interface")
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-27 15:51:37 -07:00
Jan Kara 3079c22ea8 genhd: Rename get_disk() to get_disk_and_module()
Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26 09:48:42 -07:00
Linus Torvalds 9e95dae76b Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
 "Things have been very quiet on the rbd side, as work continues on the
  big ticket items slated for the next merge window.

  On the CephFS side we have a large number of cap handling
  improvements, a fix for our long-standing abuse of ->journal_info in
  ceph_readpages() and yet another dentry pointer management patch"

* tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client:
  ceph: improving efficiency of syncfs
  libceph: check kstrndup() return value
  ceph: try to allocate enough memory for reserved caps
  ceph: fix race of queuing delayed caps
  ceph: delete unreachable code in ceph_check_caps()
  ceph: limit rate of cap import/export error messages
  ceph: fix incorrect snaprealm when adding caps
  ceph: fix un-balanced fsc->writeback_count update
  ceph: track read contexts in ceph_file_info
  ceph: avoid dereferencing invalid pointer during cached readdir
  ceph: use atomic_t for ceph_inode_info::i_shared_gen
  ceph: cleanup traceless reply handling for rename
  ceph: voluntarily drop Fx cap for readdir request
  ceph: properly drop caps for setattr request
  ceph: voluntarily drop Lx cap for link/rename requests
  ceph: voluntarily drop Ax cap for requests that create new inode
  rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
  rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
  rbd: use kmem_cache_zalloc() in rbd_img_request_create()
  rbd: obj_request->completion is unused
2018-02-08 11:38:59 -08:00
Linus Torvalds 846ade7dd2 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
 "virtio, vhost: fixes, cleanups, features

  This includes the disk/cache memory stats for for the virtio balloon,
  as well as multiple fixes and cleanups"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: don't hold onto file pointer for VHOST_SET_LOG_FD
  vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR
  vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL
  ringtest: ring.c malloc & memset to calloc
  virtio_vop: don't kfree device on register failure
  virtio_pci: don't kfree device on register failure
  virtio: split device_register into device_initialize and device_add
  vhost: remove unused lock check flag in vhost_dev_cleanup()
  vhost: Remove the unused variable.
  virtio_blk: print capacity at probe time
  virtio: make VIRTIO a menuconfig to ease disabling it all
  virtio/ringtest: virtio_ring: fix up need_event math
  virtio/ringtest: fix up need_event math
  virtio: virtio_mmio: make of_device_ids const.
  firmware: Use PTR_ERR_OR_ZERO()
  virtio-mmio: Use PTR_ERR_OR_ZERO()
  vhost/scsi: Improve a size determination in four functions
  virtio_balloon: include disk/file caches memory statistics
2018-02-08 10:41:00 -08:00
Linus Torvalds 105cf3c8c6 Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:

 - skip AER driver error recovery callbacks for correctable errors
   reported via ACPI APEI, as we already do for errors reported via the
   native path (Tyler Baicar)

 - fix DPC shared interrupt handling (Alex Williamson)

 - print full DPC interrupt number (Keith Busch)

 - enable DPC only if AER is available (Keith Busch)

 - simplify DPC code (Bjorn Helgaas)

 - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
   Helgaas)

 - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
   Helgaas)

 - move ASPM internal interfaces out of public header (Bjorn Helgaas)

 - allow hot-removal of VGA devices (Mika Westerberg)

 - speed up unplug and shutdown by assuming Thunderbolt controllers
   don't support Command Completed events (Lukas Wunner)

 - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
   Jay Cornwall)

 - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)

 - clean up PCI DMA interface usage (Christoph Hellwig)

 - remove PCI pool API (replaced with DMA pool) (Romain Perier)

 - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
   Kaya)

 - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)

 - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)

 - remove warnings on sysfs mmap failure (Bjorn Helgaas)

 - quiet ROM validation messages (Alex Deucher)

 - remove redundant memory alloc failure messages (Markus Elfring)

 - fill in types for compile-time VGA and other I/O port resources
   (Bjorn Helgaas)

 - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
   Ports to help AmigaOne X1000 (Bjorn Helgaas)

 - add SPDX tags to all PCI files (Bjorn Helgaas)

 - quirk Marvell 9128 DMA aliases (Alex Williamson)

 - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)

 - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
   Cassel)

 - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)

 - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)

 - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)

 - add support for ARTPEC-7 SoC (Niklas Cassel)

 - add endpoint-mode support for ARTPEC (Niklas Cassel)

 - add Cadence PCIe host and endpoint controller driver (Cyrille
   Pitchen)

 - handle multiple INTx status bits being set in dra7xx (Vignesh R)

 - translate dra7xx hwirq range to fix INTD handling (Vignesh R)

 - remove deprecated Exynos PHY initialization code (Jaehoon Chung)

 - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)

 - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)

 - fix Keystone interrupt-controller-node lookup (Johan Hovold)

 - constify qcom driver structures (Julia Lawall)

 - rework Tegra config space mapping to increase space available for
   endpoints (Vidya Sagar)

 - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)

 - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)

 - add support for Global Fabric Manager Server (GFMS) event to
   Microsemi Switchtec switch driver (Logan Gunthorpe)

 - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)

* tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
  PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
  PCI: endpoint: Fix EPF device name to support multi-function devices
  PCI: endpoint: Add the function number as argument to EPC ops
  PCI: cadence: Add host driver for Cadence PCIe controller
  dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
  PCI: Add vendor ID for Cadence
  PCI: Add generic function to probe PCI host controllers
  PCI: generic: fix missing call of pci_free_resource_list()
  PCI: OF: Add generic function to parse and allocate PCI resources
  PCI: Regroup all PCI related entries into drivers/pci/Makefile
  PCI/DPC: Reformat DPC register definitions
  PCI/DPC: Add and use DPC Status register field definitions
  PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
  PCI/DPC: Remove unnecessary RP PIO register structs
  PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
  PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
  PCI/DPC: Make RP PIO log size check more generic
  PCI/DPC: Rename local "status" to "dpc_status"
  PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
  ...
2018-02-06 09:59:40 -08:00
Linus Torvalds 64b28683de Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
 "Most of this is fixes and not new code/features:

   - skd fix from Arnd, fixing a build error dependent on sla allocator
     type.

   - blk-mq scheduler discard merging fixes, one from me and one from
     Keith. This fixes a segment miscalculation for blk-mq-sched, where
     we mistakenly think two segments are physically contigious even
     though the request isn't carrying real data. Also fixes a bio-to-rq
     merge case.

   - Don't re-set a bit on the buffer_head flags, if it's already set.
     This can cause scalability concerns on bigger machines and
     workloads. From Kemi Wang.

   - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
     distuingish between a local (device related) resource starvation
     and a global one. The latter might happen without IO being in
     flight, so it has to be handled a bit differently. From Ming"

* tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
  block: skd: fix incorrect linux/slab_def.h inclusion
  buffer: Avoid setting buffer bits that are already set
  blk-mq-sched: Enable merging discard bio into request
  blk-mq: fix discard merge with scheduler attached
  blk-mq: introduce BLK_STS_DEV_RESOURCE
2018-02-04 11:16:35 -08:00
Arnd Bergmann 1d51877578 block: skd: fix incorrect linux/slab_def.h inclusion
skd includes slab_def.h to get access to the slab cache object size.
However, including this header breaks when we use SLUB or SLOB instead of
the SLAB allocator, since the structure layout is completely different,
as shown by this warning when we build this driver in one of the invalid
configurations with link-time optimizations enabled:

include/linux/slab.h:715:0: error: type of 'kmem_cache_size' does not match original declaration [-Werror=lto-type-mismatch]
 unsigned int kmem_cache_size(struct kmem_cache *s);

mm/slab_common.c:77:14: note: 'kmem_cache_size' was previously declared here
 unsigned int kmem_cache_size(struct kmem_cache *s)
              ^
mm/slab_common.c:77:14: note: code may be misoptimized unless -fno-strict-aliasing is used
include/linux/slab.h:147:0: error: type of 'kmem_cache_destroy' does not match original declaration [-Werror=lto-type-mismatch]
 void kmem_cache_destroy(struct kmem_cache *);

mm/slab_common.c:858:6: note: 'kmem_cache_destroy' was previously declared here
 void kmem_cache_destroy(struct kmem_cache *s)
      ^
mm/slab_common.c:858:6: note: code may be misoptimized unless -fno-strict-aliasing is used
include/linux/slab.h:140:0: error: type of 'kmem_cache_create' does not match original declaration [-Werror=lto-type-mismatch]
 struct kmem_cache *kmem_cache_create(const char *name, size_t size,

mm/slab_common.c:534:1: note: 'kmem_cache_create' was previously declared here
 kmem_cache_create(const char *name, size_t size, size_t align,
 ^

This removes the header inclusion and instead uses the kmem_cache_size()
interface to get the size in a reliable way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-02 08:07:45 -07:00
Stefan Hajnoczi daf2a50169 virtio_blk: print capacity at probe time
Print the capacity of the block device when the driver is probed.  Many
users expect this since SCSI disks (sd) do it.  Moreover, kernel dmesg
output is the primary source of troubleshooting information so it's
helpful to include the disk size there.

The capacity is already printed by virtio_blk when a resize event
occurs.  Extract the code and reuse it from virtblk_probe().

This patch also adds the block device name to the message so it can be
correlated with a specific device:

  virtio_blk virtio0: [vda] 20971520 512-byte logical blocks (10.7 GB/10.0 GiB)

Cc: Rodrigo A B Freire <rfreire@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-01 16:26:43 +02:00
Ming Lei 86ff7c2a80 blk-mq: introduce BLK_STS_DEV_RESOURCE
This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
  completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-30 20:18:28 -07:00
Linus Torvalds 1ed2d76e02 Merge branch 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull kern_recvmsg reduction from Al Viro:
 "kernel_recvmsg() is a set_fs()-using wrapper for sock_recvmsg(). In
  all but one case that is not needed - use of ITER_KVEC for ->msg_iter
  takes care of the data and does not care about set_fs(). The only
  exception is svc_udp_recvfrom() where we want cmsg to be store into
  kernel object; everything else can just use sock_recvmsg() and be done
  with that.

  A followup converting svc_udp_recvfrom() away from set_fs() (and
  killing kernel_recvmsg() off) is *NOT* in here - I'd like to hear what
  netdev folks think of the approach proposed in that followup)"

* 'work.sock_recvmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  tipc: switch to sock_recvmsg()
  smc: switch to sock_recvmsg()
  ipvs: switch to sock_recvmsg()
  mISDN: switch to sock_recvmsg()
  drbd: switch to sock_recvmsg()
  lustre lnet_sock_read(): switch to sock_recvmsg()
  cfs2: switch to sock_recvmsg()
  ncpfs: switch to sock_recvmsg()
  dlm: switch to sock_recvmsg()
  svc_recvfrom(): switch to sock_recvmsg()
2018-01-30 18:59:03 -08:00
Linus Torvalds 0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Ilya Dryomov e573427a44 rbd: whitelist RBD_FEATURE_OPERATIONS feature bit
This feature bit restricts older clients from performing certain
maintenance operations against an image (e.g. clone, snap create).
krbd does not perform maintenance operations.

Cc: stable@vger.kernel.org
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-01-29 18:36:03 +01:00
Ilya Dryomov d98f153f1a rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full()
If rbd_img_request_submit() fails, parent_request->obj_request is
NULLed out, triggering an assert in rbd_obj_request_put():

  rbd_img_request_put(parent_request)
    rbd_parent_request_destroy
      rbd_obj_request_put(NULL)

Just remove it -- parent_request->obj_request will be put in
rbd_parent_request_destroy().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:23:01 +01:00
Ilya Dryomov a0c5895b27 rbd: use kmem_cache_zalloc() in rbd_img_request_create()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:23:01 +01:00
Ilya Dryomov 2e584bce70 rbd: obj_request->completion is unused
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29 15:22:56 +01:00
Corentin Labbe 796baeeef8 block: remove smart1,2.h
smart1,2.h is unused since commit d436641439 ("cpqarray: remove it from the kernel")
Remove it from tree.

Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-26 13:23:35 -07:00
Tina Ruchandani 85cf955df8 aoe: use ktime_t instead of timeval
'struct frame' uses two variables to store the sent timestamp - 'struct
timeval' and jiffies. jiffies is used to avoid discrepancies caused by
updates to system time. 'struct timeval' is deprecated because it uses
32-bit representation for seconds which will overflow in year 2038.

This patch does the following:
- Replace the use of 'struct timeval' and jiffies with ktime_t, which
  is the recommended type for timestamping
- ktime_t provides both long range (like jiffies) and high resolution
  (like timeval). Using ktime_get (monotonic time) instead of wall-clock
  time prevents any discprepancies caused by updates to system time.

[updates by Arnd below]
The original patch from Tina never went anywhere as we discussed how
to keep the impact on performance minimal. I've started over now but
arrived at basically the same patch that she had originally, except for
an slightly improved tsince_hr() function. I'm making it more robust
against overflows, and also optimize explicitly for the common case
in which a frame is less than 4.2 seconds old, using only a 32-bit
division in that case.

This should make the new version more efficient than the old code,
since we replace the existing two 32-bit division in do_gettimeofday()
plus one multiplication with a single single 32-bit division in
tsince_hr() and drop the double bookkeeping. It's also more efficient
than the ktime_get_us() API we discussed before, since that would
also rely on multiple divisions.

Link: https://lists.linaro.org/pipermail/y2038/2015-May/000276.html
Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Cc: Ed Cashin <ed.cashin@acm.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-17 08:41:07 -07:00
Linus Torvalds 1545dec46d Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
  stable"

* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
  rbd: set max_segments to USHRT_MAX
  rbd: reacquire lock should update lock owner client id
2018-01-11 16:57:32 -08:00
Arnd Bergmann 33f782c49a null_blk: remove explicit 'select FAULT_INJECTION'
Selecting FAULT_INJECTION causes a Kconfig warning when CONFIG_DEBUG_KERNEL
is not set:

warning: (BLK_DEV_NULL_BLK && DRM_I915_SELFTEST) selects FAULT_INJECTION which has unmet direct dependencies (DEBUG_KERNEL)

The other drivers that use FAULT_INJECTION tend to have a separate
Kconfig symbol for turning on that feature, so let's do the same
thing here. This may add a bit more complexity than we like, but
it avoids the warning and is more consistent with the rest of the
kernel.

Fixes: 93b570464c ("null_blk: add option for managing IO timeouts")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-11 07:58:31 -07:00
Jens Axboe 93b570464c null_blk: add option for managing IO timeouts
Use the fault injection framework to provide a way for null_blk
to configure timeouts. This only works for queue_mode 1 and 2,
since the bio mode doesn't have code for tracking timeouts.

Let's say you want to have a 10% chance of timing out every
100,000 requests, and for 5 total timeouts, you could do:

modprobe null_blk timeout="100000,10,0,5"

This is useful for adding blktests to test that IO timeouts
are handled appropriately.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-10 09:06:23 -07:00
Jens Axboe 5448aca41c null_blk: wire up timeouts
This is needed to ensure that we actually handle timeouts.
Without it, the queue_mode=1 path will never call blk_add_timer(),
and the queue_mode=2 path will continually just return
EH_RESET_TIMER and we never actually complete the offending request.

This was used to test the new timeout code, and the changes around
killing off REQ_ATOM_COMPLETE.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-09 14:59:19 -07:00