These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
WRITE SAME is a data integrity operation and we can't simply ignore
errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently blkdev_issue_zeroout cascades down from discards (if the driver
guarantees that discards zero data), to WRITE SAME and then to a loop
writing zeroes. Unfortunately we ignore run-time EOPNOTSUPP errors in the
block layer blkdev_issue_discard helper to work around DM volumes that
may have mixed discard support underneath.
This patch intoroduces a new BLKDEV_DISCARD_ZERO flag to
blkdev_issue_discard that indicates we are called for zeroing operation.
This allows both to ignore the EOPNOTSUPP hack and actually consolidating
the discard_zeroes_data check into the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When CONFIG_NUMA is enabled and node 0 is memoryless, the system
crashes because nvme_probe() sets the device->numa_node to 0 by
set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0.
To avoid the crash, we should change the 0 to first_memory_node.
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.
This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.
The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
There is no error number returned if loop driver fails in function
alloc_disk to add new loop device. Add a correct error number to make
user notify in this case.
Signed-off-by: Minfei Huang <mnghuan@gmail.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
I ran into the same problem on NVME_TARGET_RDMA now,
which otherwise needs dependencies on both CONFIG_BLOCK and
CONFIGFS_FS:
warning: (NVME_TARGET_LOOP && NVME_TARGET_RDMA) selects NVME_TARGET which has unmet direct dependencies (BLOCK && CONFIGFS_FS)
0xA002B368 Mon Jul 11 18:00:45 CEST 2016 failed
In file included from ../drivers/nvme/target/core.c:16:0:
drivers/nvme/target/nvmet.h:222:14: error: field 'inline_bio' has incomplete type
struct bio inline_bio;
^~~~~~~~~~
drivers/nvme/target/core.c: In function 'nvmet_async_event_work':
drivers/nvme/target/core.c:98:3: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
kfree(aen);
^~~~~
../drivers/nvme/target/core.c: In function 'nvmet_ns_enable':
../drivers/nvme/target/core.c:269:13: error: implicit declaration of function 'blkdev_get_by_path' [-Werror=implicit-function-declaration]
ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE,
Folding in my patch below should address that too.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
In case of error, the function kstrndup() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The timeout before error recovery logic kicks in is
dictated by the nvme keep-alive, so we don't really need
a transport layer retry count. transports can retry for
as much as they like.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Always use the maximum qp retry count as the
error recovery timeout is dictated from the nvme
keep-alive.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
userspace application can send READ_SUB_CHANNEL command with time bit
enabled and disabled. The time bit allows selection of address reporting
format. If the time bit is disabled the response is in logical block
address(CDROM_LBA) format, represented as a 32-bit integer with ms-byte
first. If the time bit is enabled the response is in time format i.e.,
minutes, second, frame (CDROM_MSF) format.
Signed-off-by: vchannaiah <vanitha.channaiah@in.bosch.com>
Signed-off-by: Mahendran Kuppusamy <mahendran.kuppusamy@in.bosch.com>
[veeraiyan.chidambaram@in.bosch.com: updated Documentation/ioctl/cdrom.txt]
Signed-off-by: Veeraiyan Chidambaram <veeraiyan.chidambaram@in.bosch.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When disabling the controller, the specification says the register
NVME_REG_CC should be written and then driver needs to wait the
adapter to be ready, which is checked by reading another register
bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
so in case this timeout is reached the driver gives up and removes
the adapter from the system.
After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
(HGST adapter) end up being removed if we issue a reset_controller,
because driver keeps verifying the NVME_REG_CSTS until the timeout is
reached. This patch adds a necessary quirk for this adapter, by
introducing a delay before nvme_wait_ready(), so the reset procedure
is able to be completed. This quirk is needed because just increasing
the timeout is not enough in case of this adapter - the driver must
wait before start reading NVME_REG_CSTS register on this specific
device.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Dan writes:
"The removal of ->driverfs_dev in favor of just passing the parent
device in as a parameter to add_disk(). See below, it has received a
"Reviewed-by" from Christoph, Bart, and Johannes.
It is also a pre-requisite for Fam Zheng's work to cleanup gendisk
uevents vs attribute visibility [1]. We would extend device_add_disk()
to take an attribute_group list.
This is based off a branch of block.git/for-4.8/drivers and has
received a positive build success notification from the kbuild robot
across several configs.
[1]: "gendisk: Generate uevent after attribute available"
http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"
This patch implements the RDMA host (initiator in SCSI speak) driver. It
can be used to connect to remote NVMe over Fabrics controllers over
Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well
a the new fabrics library.
To connect to all NVMe over Fabrics controller reachable on a given taget
port using RDMA/CM use the following command:
nvme connect-all -t rdma -a $IPADDR
This requires the latest version of nvme-cli with Fabrics support.
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch implements the RDMA transport for the NVMe over Fabrics target,
which allows exporting NVMe over Fabrics functionality over RDMA fabrics
(Infiniband, RoCE, iWARP).
All NVMe logic is in the generic target and this module just provides a
small glue between it and the generic code in the RDMA subsystem.
Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>,
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
NVMe over Fabrics RDMA transport defines a connection establishment
protocol over the RDMA connection manager. This header will be used by
both the host and target drivers to negotiate the connection
establishment parameters.
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The nvme fabric (RDMA, FC, etc...) can introduce port, link or node
failures that may require a reconnect to re-establish the connection.
Add a new reconnecting state that will initially be used by the RDMA
driver.
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The new nvme-rdma driver will need to reinitialize all the tags as part of
the error recovery procedure (realloc the tag memory region). Add a helper
in blk-mq for it that can iterate over all requests in a tagset to make
this easier.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Ming Lin <ming.l@ssi.samsung.com>
Reviewed-by: Stephen Bates <Stephen.Bates@pmcs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The __nvm_submit_ppa() function is not used outside lightnvm core.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The passed by reference ppa list in nvm_set_rqd_list() is updated when
multiple planes are available. In that case, each PPA plane is
incremented when the device side PPA list is created. This prevents the
caller to rely on the PPA list to be unmodified after a call.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The gen_mark_blk_bad function marks the wrong block when a block is on
a different channel. Fix the index calculation, so that it updates the
correct block.
Reported-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The nvm_get_blk() function is called with rlun->lock held. This is ok
when the media manager implementation doesn't go out of its atomic
context. However, if a media manager persists its metadata, and
guarantees that the block is given to the target, this is no longer
a viable approach. Therefore, clean up the flow of rrpc_map_page,
and make sure that nvm_get_blk() is called without any locks acquired.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>