Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.
In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.
Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function. It's now
indexed by op_is_write().
tj: Refreshed on top of v4.17. Updated to pass around REQ_OP.
Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can't know if a block is closed or not on 1.2 devices, so assume
closed state to make sure that blocks are erased before writing.
Fixes: 32ef9412c1 ("lightnvm: pblk: implement get log report chunk")
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the read path, partial reads are currently performed synchronously
which affects performance for workloads that generate many partial
reads. This patch adds an asynchronous partial read path as well as
the required partial read ctx.
Signed-off-by: Heiner Litz <hlitz@ucsc.edu>
Reviewed-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The error messages in pblk does not say which pblk instance that
a message occurred from. Update each error message to reflect the
instance it belongs to, and also prefix it with pblk, so we know
the message comes from the pblk module.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If using pblk on a 32bit architecture, and there is a need to
perform a partial read, the partial read bitmap will only have
allocated 32 entries, where as 64 are needed.
Make sure that the read_bitmap is initialized to 64bits on 32bit
architectures as well.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When recovering a line, an extra check was added when debugging was
active, such that minor version where also checked. Unfortunately,
this used the ifdef NVM_DEBUG, which is not correct.
Instead use the proper DEBUG def, and now that it compiles, also fix
the variable.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Fixes: d0ab0b1ab9 ("lightnvm: pblk: check data lines version on recovery")
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no users of CONFIG_NVM_DEBUG in the LightNVM subsystem. All
users are in pblk. Rename NVM_DEBUG to NVM_PBLK_DEBUG and enable
only for pblk.
Also fix up the CONFIG_NVM_PBLK entry to follow the code style for
Kconfig files.
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some devices can expose mw_cunits equal to 0, it can cause the
creation of too small write buffer and cause performance to drop
on write workloads.
Additionally, write buffer size must cover write data requirements,
such as WS_MIN and MW_CUNITS - it must be greater than or equal to
the larger one multiplied by the number of PUs. However, for
performance reasons, use the WS_OPT value to calculation instead of
WS_MIN.
Because the place where buffer size is calculated was changed, this
patch also removes pgs_in_buffer filed in pblk structure.
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Reviewed-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull block fixes from Jens Axboe:
- Further timeout fixes. We aren't quite there yet, so expect another
round of fixes for that to completely close some of the IRQ vs
completion races. (Christoph/Bart)
- Set of NVMe fixes from the usual suspects, mostly error handling
- Two off-by-one fixes (Dan)
- Another bdi race fix (Jan)
- Fix nbd reconfigure with NBD_DISCONNECT_ON_CLOSE (Doron)
* tag 'for-linus-20180623' of git://git.kernel.dk/linux-block:
blk-mq: Fix timeout handling in case the timeout handler returns BLK_EH_DONE
bdi: Fix another oops in wb_workfn()
lightnvm: Remove depends on HAS_DMA in case of platform dependency
nvme-pci: limit max IO size and segments to avoid high order allocations
nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl
nvme-fc: release io queues to allow fast fail
nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag.
block: sed-opal: Fix a couple off by one bugs
blk-mq-debugfs: Off by one in blk_mq_rq_state_name()
nvmet: reset keep alive timer in controller enable
nvme-rdma: don't override opts->queue_size
nvme-rdma: Fix command completion race at error recovery
nvme-rdma: fix possible free of a non-allocated async event buffer
nvme-rdma: fix possible double free condition when failing to create a controller
Revert "block: Add warning for bi_next not NULL in bio_endio()"
block: fix timeout changes for legacy request drivers
Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
In most cases this other symbol is an architecture or platform specific
symbol, or PCI.
Generic symbols and drivers without platform dependencies keep their
dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
cannot work anyway.
This simplifies the dependencies, and allows to improve compile-testing.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently the error exit path when the emeta could not be
interpreted is via fail_free_ws and this fails to free
invalid_bitmap. Fix this by adding another exit label and
exiting via this to kfree invalid_bitmap.
Detected by CoverityScan, CID#1469659 ("Resource leak")
Fixes: 48b8d20895 ("lightnvm: pblk: garbage collect lines with failed writes")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fixes the following sparse warning:
drivers/lightnvm/pblk-init.c:23:14: warning:
symbol 'write_buffer_size' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
pblk allocates line bitmaps within the line lock unnecessarily. In order
to take pressure out of the fast patch, allocate line bitmaps outside
of this lock and refactor accordingly.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Unless we kick the writer directly when setting a new flush point, the
user risks having to wait for up to one second (the default timeout for
the write thread to be kicked) for the IO to complete.
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When switching between different lun configurations, there is no
guarantee that all lines that contain closed/open chunks have some
valid data to recover.
Check that the smeta chunk has been written to instead. Also
skip bad lines (that does not have enough good chunks).
Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the read path, pblk gets a reference to the incoming bio and puts it
after ending the bio. Though this behavior is correct, it is unnecessary
since pblk is the one putting the bio, therefore, it cannot disappear
underneath it.
Removing this reference, allows to clean up rqd->bio and avoids pointer
bouncing for the different read paths. Now, the incoming bio always
resides in the read context and pblk's internal bios (if any) reside in
rqd->bio.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In some cases, users can want set write buffer size manually, e.g. to
adjust it to specific workload. This patch provides the possibility
to set write buffer size via module parameter feature.
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently in case of error caused by bio_pc_add_page in
pblk_bio_add_pages two issues occur when calling from
pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
are trying to free pages not allocated from our mempool. Second one
is the warn from dma_pool_free, that we are trying to free NULL
pointer dma.
This commit fix both issues.
Signed-off-by: Igor Konopko <igor.j.konopko@intel.com>
Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@intel.com>
Signed-off-by: Matias Bjørling <mb@lightnvm.io>
Signed-off-by: Jens Axboe <axboe@kernel.dk>