bt_get already does a non-blocking pass as well as running the queue
when scheduling internally, no need to duplicate it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Two cases:
1) blk_mq_alloc_request() needlessly re-runs the queue, after
calling into the tag allocation without NOWAIT set. We don't
need to do that.
2) blk_mq_map_request() should just use blk_mq_run_hw_queue() with
the async flag set to false.
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
device_add() may fail, and all callers are supposed to check the
return value, but one new user in lightnvm doesn't:
drivers/lightnvm/sysfs.c: In function 'nvm_sysfs_register_dev':
drivers/lightnvm/sysfs.c:184:2: error: ignoring return value of 'device_add',
declared with attribute warn_unused_result [-Werror=unused-result]
This changes the caller to propagate any error codes, which avoids
the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 38c9e260b9f9 ("lightnvm: expose device geometry through sysfs")
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
For a host to access an Open-Channel SSD, it has to know its geometry,
so that it writes and reads at the appropriate device bounds.
Currently, the geometry information is kept within the kernel, and not
exported to user-space for consumption. This patch exposes the
configuration through sysfs and enables user-space libraries, such as
liblightnvm, to use the sysfs implementation to get the geometry of an
Open-Channel SSD.
The sysfs entries are stored within the device hierarchy, and can be
found using the "lightnvm" device type.
An example configuration looks like this:
/sys/class/nvme/
└── nvme0n1
├── capabilities: 3
├── device_mode: 1
├── erase_max: 1000000
├── erase_typ: 1000000
├── flash_media_type: 0
├── media_capabilities: 0x00000001
├── media_type: 0
├── multiplane: 0x00010101
├── num_blocks: 1022
├── num_channels: 1
├── num_luns: 4
├── num_pages: 64
├── num_planes: 1
├── page_size: 4096
├── prog_max: 100000
├── prog_typ: 100000
├── read_max: 10000
├── read_typ: 10000
├── sector_oob_size: 0
├── sector_size: 4096
├── media_manager: gennvm
├── ppa_format: 0x380830082808001010102008
├── vendor_opcode: 0
├── max_phys_secs: 64
└── version: 1
Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
LightNVM compatible device drivers does not have a method to expose
LightNVM specific sysfs entries.
To enable LightNVM sysfs entries to be exposed, lightnvm device
drivers require a struct device to attach it to. To allow both the
actual device driver and lightnvm sysfs entries to coexist, the device
driver tracks the lifetime of the nvm_dev structure.
This patch refactors NVMe and null_blk to handle the lifetime of struct
nvm_dev, which eliminates the need for struct gendisk when a lightnvm
compatible device is provided.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Enable devices without a gendisk instance to register itself with blk-mq
and expose the associated multi-queue sysfs entries.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
With LightNVM enabled devices, the gendisk structure is not exposed
to the user. This hides the device driver specific sysfs entries, and
prevents binding of LightNVM geometry information to the device.
Refactor the device registration process, so that gendisk and
non-gendisk devices are easily managed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
With LightNVM enabled namespaces, the gendisk structure is not exposed
to the user. This prevents LightNVM users from accessing the NVMe device
driver specific sysfs entries, and LightNVM namespace geometry.
Refactor the revalidation process, so that a namespace, instead of a
gendisk, is revalidated. This later allows patches to wire up the
sysfs entries up to a non-gendisk namespace.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
If NO_DMA=y:
drivers/built-in.o: In function `nvme_nvm_dev_dma_free':
lightnvm.c:(.text+0x23df1a): undefined reference to `dma_pool_free'
drivers/built-in.o: In function `nvme_nvm_dev_dma_alloc':
lightnvm.c:(.text+0x23df38): undefined reference to `dma_pool_alloc'
drivers/built-in.o: In function `nvme_nvm_destroy_dma_pool':
lightnvm.c:(.text+0x23df4c): undefined reference to `dma_pool_destroy'
drivers/built-in.o: In function `nvme_nvm_create_dma_pool':
lightnvm.c:(.text+0x23df7e): undefined reference to `dma_pool_create'
and
ERROR: "dma_pool_destroy" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_free" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_alloc" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_create" [drivers/nvme/host/nvme-core.ko] undefined!
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Variable weight is not being initialized to zero before it is
used to compute the weight sum. Ensure it is initialized to zero.
Found with static analysis with cppcheck:
[lib/sbitmap.c:177]: (error) Uninitialized variable: weight
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
If we have a bunch of high-numbered bits allocated and then we resize
the struct sbitmap_queue, when those bits get cleared, we'll update the
hint and then have to re-randomize it repeatedly. Avoid that by checking
that the cleared bit is still a valid hint. No measurable performance
difference in the common case.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
After a struct sbitmap_queue is resized smaller, the allocation hints
may still be set to bits beyond the new depth of the bitmap. This means
that, for example, if the number of blk-mq tags is reduced through
sysfs, more requests than the nominal queue depth may be in flight.
It's tempting to fix this at resize time by doing a one-time
reinitialization of the hints, but this can race with
__sbitmap_queue_get() updating the hint. Instead, check the hint before
we use it. This caused no measurable performance difference in my
synthetic benchmarks.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
In order to get good cache behavior from a sbitmap, we want each CPU to
stick to its own cacheline(s) as much as possible. This might happen
naturally as the bitmap gets filled up and the alloc_hint values spread
out, but we really want this behavior from the start. blk-mq apparently
intended to do this, but the code to do this was never wired up. Get rid
of the dead code and make it part of the sbitmap library.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Again, there's no point in passing this in every time. Make it part of
struct sbitmap_queue and clean up the API.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allocating your own per-cpu allocation hint separately makes for an
awkward API. Instead, allocate the per-cpu hint as part of the struct
sbitmap_queue. There's no point for a struct sbitmap_queue without the
cache, but you can still use a bare struct sbitmap.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The original bt_alloc() we converted from was using kzalloc(), not
kzalloc_node(), to allocate the wait queues. This was probably an
oversight, so fix it for sbitmap_queue_init_node().
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.
The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.
This should be a complete noop functionality-wise.
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We currently account a '0' dispatch, and anything above that still falls
below the range set by BLK_MQ_MAX_DISPATCH_ORDER. If we dispatch more,
we don't account it.
Change the last bucket to be inclusive of anything above the range we
track, and have the sysfs file reflect that by including a '+' in the
output:
$ cat /sys/block/nvme0n1/mq/0/dispatched
0 1006
1 20229
2 1
4 0
8 0
16 0
32+ 0
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time. To do this the request_queue's
'requeue_work' member was changed to a delayed_work.
blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since REQ_OP_BITS == 3 and __REQ_NR_BITS == 30 it is not that hard
to pass an op_flags argument to bio_set_op_attrs() that is larger
than the number of bits reserved for the op_flags argument. Complain
if this happens. Additionally, ensure that negative arguments trigger
a complaint (1 << ... is signed while 1U << ... is unsigned; adding
0U to an integer expression causes it to be promoted to an unsigned
type).
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Introduce the bio_flags() macro. Ensure that the second argument of
bio_set_op_attrs() only contains flags and no operation. This patch
does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM)
Cc: Josef Bacik <jbacik@fb.com> (maintainer:BTRFS FILE SYSTEM)
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>