Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.
Historically, all block devices have automatically made entropy
contributions. But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
- On SSD disks, the completion times aren't as random as they
are for rotational drives. So it's questionable whether they
should contribute to the random pool in the first place.
- Calling add_disk_randomness() has a lot of overhead.
There are more reliable sources for randomness than non-rotational block
devices. From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This adds a mechanism by which we can advance a bio by an arbitrary
number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
indicates the number of bytes completed in the current bvec.
Various driver code still needs to be updated to not refer to the bvec
directly before we can use this for interesting things, like efficient
bio splitting.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net
Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
ioctl) the return from NBD_DO_IT is undefined (it is usually one of
several error codes). This means that nbd-client does not know if a
manual disconnect was performed or whether a network error occurred.
Because of this, nbd-client's persist mode (which tries to reconnect after
error, but not after manual disconnect) does not always work correctly.
This change fixes this by causing NBD_DO_IT to always return 0 if a user
requests a disconnect. This means that nbd-client can correctly either
persist the connection (if an error occurred) or disconnect (if the user
requested it).
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Acked-by: Rob Landley <rob@landley.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
is now done by two other ioctls). We should stop trying to make bogus
assertions in it. Also, user-level code should remove calls to
NBD_CLEAR_QUE, ASAP.
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Disk names may contain arbitrary strings, so they must not be
interpreted as format strings. It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.
CVE-2013-2851
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned. This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.
The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)
Especially in high latency networks, larger request size can make a dramatic
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I just fixed this in "drivers/block/rbd.c" and I noticed that
"drivers/block/nbd.c" has the same problem. Fix a warning issued by
sparse by adding some lockdep annotations to indicate the queue lock gets
dropped (because it's held when do_nbd_request() is called) and
re-acquired within the function.
Signed-off-by: Alex Elder <elder@inktank.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Paul Clements <paul.clements@us.sios.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two problems with shutdown in the NBD driver.
1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.
This patch adds the sync operation into __nbd_ioctl()'s
NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted
to processes that have CAP_SYS_ADMIN, and the NBD client may not
possess it (fsync of the block device does not sync the filesystem,
either).
2: Once we clear the socket we have no guarantee that later reads will
come from the same backing storage.
The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
clearing code so the page cache is cleaned, lest reads that hit on the
page cache will return stale data from the previously-accessible disk.
Example:
# qemu-nbd -r -c/dev/nbd0 /dev/sr0
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.
# qemu-nbd -d /dev/nbd0
# qemu-nbd -r -c/dev/nbd0 /dev/sda
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.
While /dev/sda has:
# file -s /dev/sda
/dev/sda: x86 boot sector; etc.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Paul Clements <Paul.Clements@steeleye.com>
Cc: Alex Bligh <alex@alex.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the NBD device does not accept flush requests from the Linux
block layer. If the NBD server opened the target with neither O_SYNC nor
O_DSYNC, however, the device will be effectively backed by a writeback
cache. Without issuing flushes properly, operation of the NBD device will
not be safe against power losses.
The NBD protocol has support for both a cache flush command and a FUA
command flag; the server will also pass a flag to note its support for
these features. This patch adds support for the cache flush command and
flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the
flag is active the block layer will send REQ_FLUSH requests, which we
translate to NBD_CMD_FLUSH commands.
FUA support is not included in this patch because all free software
servers implement it with a full fdatasync; thus it has no advantage over
supporting flush only. Because I [Paolo] cannot really benchmark it in a
realistic scenario, I cannot tell if it is a good idea or not. It is also
not clear if it is valid for an NBD server to support FUA but not flush.
The Linux block layer gives a warning for this combination, the NBD
protocol documentation says nothing about it.
The patch also fixes a small problem in the handling of flags: nbd->flags
must be cleared at the end of NBD_DO_IT, but the driver was not doing
that. The bug manifests itself as follows. Suppose you two different
client/server pairs to start the NBD device. Suppose also that the first
client supports NBD_SET_FLAGS, and the first server sends
NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
things. Before this patch, the second invocation of NBD_DO_IT will use a
stale value of nbd->flags, and the second server will issue an error every
time it receives an NBD_CMD_FLUSH command.
This bug is pre-existing, but it becomes much more important after this
patch; flush failures make the device pretty much unusable, unlike
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bligh <alex@alex.org.uk>
Acked-by: Paul Clements <Paul.Clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add discard support to nbd. If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).
If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:
fallocate(.. FALLOC_FL_PUNCH_HOLE ..)
To punch a hole in the backend storage, which is no longer needed.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).
Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.
Also included is a one-line fix to debug output for the set-timeout ioctl.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a serious but uncommon bug in nbd which occurs when there is heavy
I/O going to the nbd device while, at the same time, a failure (server,
network) or manual disconnect of the nbd connection occurs.
There is a small window between the time that the nbd_thread is stopped
and the socket is shutdown where requests can continue to be queued to
nbd's internal waiting_queue. When this happens, those requests are
never completed or freed.
The fix is to clear the waiting_queue on shutdown of the nbd device, in
the same way that the nbd request queue (queue_head) is already being
cleared.
Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull block driver changes from Jens Axboe:
- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.
- The usual round of drbd updates.
- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.
- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.
* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous
Add in-flight cmds to the tail. That way while searching
(during request completion),we will always get a hit on the
first element.
Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
Acked-by: Paul.Clements@steeleye.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge third batch of patches from Andrew Morton:
- Some MM stragglers
- core SMP library cleanups (on_each_cpu_mask)
- Some IPI optimisations
- kexec
- kdump
- IPMI
- the radix-tree iterator work
- various other misc bits.
"That'll do for -rc1. I still have ~10 patches for 3.4, will send
those along when they've baked a little more."
* emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits)
backlight: fix typo in tosa_lcd.c
crc32: add help text for the algorithm select option
mm: move hugepage test examples to tools/testing/selftests/vm
mm: move slabinfo.c to tools/vm
mm: move page-types.c from Documentation to tools/vm
selftests/Makefile: make `run_tests' depend on `all'
selftests: launch individual selftests from the main Makefile
radix-tree: use iterators in find_get_pages* functions
radix-tree: rewrite gang lookup using iterator
radix-tree: introduce bit-optimized iterator
fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
nbd: rename the nbd_device variable from lo to nbd
pidns: add reboot_pid_ns() to handle the reboot syscall
sysctl: use bitmap library functions
ipmi: use locks on watchdog timeout set on reboot
ipmi: simplify locking
ipmi: fix message handling during panics
ipmi: use a tasklet for handling received messages
ipmi: increase KCS timeouts
ipmi: decrease the IPMI message transaction time in interrupt mode
...
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>