Looks like we pull it in through other ways on x86, but we fail
on sparc:
In file included from drivers/block/cryptoloop.c:30:0:
drivers/block/loop.h:63:24: error: field 'tag_set' has incomplete type
struct blk_mq_tag_set tag_set;
Add the include to loop.h, kill it from loop.c.
Signed-off-by: Jens Axboe <axboe@fb.com>
block core handles REQ_FUA by its flush state machine, so
won't do it in loop explicitly.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
No behaviour change, just move the handling for REQ_DISCARD
and REQ_FLUSH in these two functions.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The conversion is a bit straightforward, and use work queue to
dispatch requests of loop block, and one big change is that requests
is submitted to backend file/device concurrently with work queue,
so throughput may get improved much. Given write requests over same
file are often run exclusively, so don't handle them concurrently for
avoiding extra context switch cost, possible lock contention and work
schedule cost. Also with blk-mq, there is opportunity to get loop I/O
merged before submitting to backend file/device.
In the following test:
- base: v3.19-rc2-2041231
- loop over file in ext4 file system on SSD disk
- bs: 4k, libaio, io depth: 64, O_DIRECT, num of jobs: 1
- throughput: IOPS
------------------------------------------------------
| | base | base with loop-mq | delta |
------------------------------------------------------
| randread | 1740 | 25318 | +1355%|
------------------------------------------------------
| read | 42196 | 51771 | +22.6%|
-----------------------------------------------------
| randwrite | 35709 | 34624 | -3% |
-----------------------------------------------------
| write | 39137 | 40326 | +3% |
-----------------------------------------------------
So loop-mq can improve throughput for both read and randread, meantime,
performance of write and randwrite isn't hurted basically.
Another benefit is that loop driver code gets simplified
much after blk-mq conversion, and the patch can be thought as
cleanup too.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Discard requests are ignored if the encryption is enabled for the given
loop device. Update comment to match the code, and similar comments
elsewhere in the file.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
Pull block fixes from Jens Axboe:
"I've got a few smaller fixes queued up for 3.9 that should go in. The
major one is the loop regression, the others are nice fixes on their
own though. It contains:
- Fix for unitialized var in the block sysfs code, courtesy of Arnd
and gcc-4.8.
- Two fixes for mtip32xx, fixing probe and command timeout. Also a
debug measure that could have waited for 3.10, but it's driver
only, so I let it slip in.
- Revert the loop partition cleanup fix, it could cause a deadlock on
auto-teardown as part of umount. The fix is clear, but at this
point we just want to revert it and get a real fix in for 3.10."
* tag 'for-linus-20130409' of git://git.kernel.dk/linux-block:
Revert "loop: cleanup partitions when detaching loop device"
mtip32xx: fix two smatch warnings
mtip32xx: Add debugfs entry device_status
mtip32xx: return 0 from pci probe in case of rebuild
mtip32xx: recovery from command timeout
block: avoid using uninitialized value in from queue_var_store
This reverts commit 8761a3dc1f.
There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Any partitions added by user space to the loop device were being
left in place after detaching the loop device. This was because
the detach path issued a BLKRRPART to clean up partitions if
LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
scanned on attach. Replace this BLKRRPART with code that
unconditionally cleans up partitions on detach instead.
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Modified by Jens to export delete_partition().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix to return a negative error code from the error handling
case, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>