In case of a failure when submitting a request, convert the ppa_list
addresses to the target format so that it can interpret ppas for
recovery
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pblk_submit_read() uses bio_clone_bioset() but doesn't change the
io_vec, so bio_clone_fast() is a better choice.
It also uses fs_bio_set which is intended for filesystems. Using it
in a device driver can deadlock.
So allocate a new bioset, and and use bio_clone_fast().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.
Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.
This is inconsistent and unnecessary. Remove the last arg and always use
q->bio_split inside blk_queue_split()
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Free memory correctly when an allocation fails on a loop and we free
backwards previously successful allocations.
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
'blks' is malloced in pblk_bb_discovery() and should be freed
before leaving from the nvm_get_tgt_bb_tbl() error handling cases,
otherwise it will cause memory leak. Also skip assign blks to
rlun->bb_list when error.
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When block erases fail, these blocks are marked bad. The number of valid
blocks in the line was not updated, which could cause an infinite loop
on the erase path.
Fix this atomic counter and, in order to avoid taking an irq lock on the
interrupt context, make the erase counters atomic too.
Also, in the case that a significant number of blocks become bad in a
line, the result is the double shared metadata buffer (emeta) to stop
the pipeline until all metadata is flushed to the media. Increase the
number of metadata lines from 2 to 4 to avoid this case.
Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When a line allocation fails, for example, due to having too many bad
blocks, free its metadata correctly.
Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
When a pblk line fails (or is recovered), make sure to take the line
management lock.
Fixes: a4bd217b43 "lightnvm: physical block device (pblk) target"
Signed-off-by: Javier González <javier@cnexlabs.com>
Reviewed-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Reading from ADDR_EMPTY is out of bounds. The current code generates a
static checker warning because we check for out of bounds "lba" before
we check for ADDR_EMPTY, so the second check is always false. It looks
like we intended ADDR_EMPTY to be a no-op without printing a warning.
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This is a static checker fix, and perhaps not a real bug. The static
checker thinks that nr_secs could be negative. It would result in
zeroing more memory than intended. Anyway, even if it's not a bug,
changing this variable to unsigned makes the code easier to audit.
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
From userspace calling ioctl(NVM_DEV_CREATE) was returning ENOMEM for
invalid arguments even though pblk (pblk_init) was returning correctly
-EINVAL to nvm_create_tgt inside core. This patch propagates the
correct return value to userspace.
Because pblk was introduced recently this only needs to go in 4.12.
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Rakesh Pandit <rakesh@tuxera.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The driver uses both u64 and sector_t to refer to offsets, and assigns between the
two. This causes one harmless warning when sector_t is 32-bit:
drivers/lightnvm/pblk-rb.c: In function 'pblk_rb_write_entry_gc':
include/linux/lightnvm.h:215:20: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
drivers/lightnvm/pblk-rb.c:324:22: note: in expansion of macro 'ADDR_EMPTY'
As the driver is already doing this inconsistently, changing the type
won't make it worse and is an easy way to avoid the warning.
Fixes: a4bd217b43 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
There were a bunch of places in pblk_lines_init() where we didn't set an
error code. And in pblk_writer_init() we accidentally return 1 instead
of a correct error code, which would result in a Oops later.
Fixes: 11a5d6fdf919 ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
WARN_ON() takes a condition, not an error message. I slightly tweaked
some conditions so hopefully it's more clear.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
These labels are reversed so we could end up dereferencing an error
pointer or leaking.
Fixes: 7f347ba6bb3a ("lightnvm: physical block device (pblk) target")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.
An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.
To manage the constraints, pblk maintains a logical to
physical address (L2P) table, write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.
The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.
The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.
pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.
Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.
This work also contains contributions from:
Matias Bjørling <matias@cnexlabs.com>
Simon A. F. Lund <slund@cnexlabs.com>
Young Tack Jin <youngtack.jin@gmail.com>
Huaicheng Li <huaicheng@cs.uchicago.edu>
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Convert sprintf calls to strlcpy in order to make possible buffer
overflow more obvious.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Prefix the nvm_free static function with a missing static keyword.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Target initialization has two responsibilities: creating the target
partition and instantiating the target. This patch enables to create a
factory partition (e.g., do not trigger recovery on the given target).
This is useful for target development and for being able to restore the
device state at any moment in time without requiring a full-device
erase.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>