This patch removes the kernel blocking API as it has been completely
replaced by the callback API.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The get_blocking_random_bytes API is broken because the wait can
be arbitrarily long (potentially forever) so there is no safe way
of calling it from within the kernel.
This patch replaces it with a callback API instead. The callback
is invoked potentially from interrupt context so the user needs
to schedule their own work thread if necessary.
In addition to adding callbacks, they can also be removed as
otherwise this opens up a way for user-space to allocate kernel
memory with no bound (by opening algif_rng descriptors and then
closing them).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If more than one application invokes getrandom(2) before the pool
is ready, then all bar one will be stuck forever because we use
wake_up_interruptible which wakes up a single task.
This patch replaces it with wake_up_all.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There was a bad typo in commit 43759d4f42 ("random: use an improved
fast_mix() function") and I didn't notice because it "looked right", so
I saw what I expected to see when I reviewed it.
Only months later did I look and notice it's not the Threefish-inspired
mix function that I had designed and optimized.
Mea Culpa. Each input bit still has a chance to affect each output bit,
and the fast pool is spilled *long* before it fills, so it's not a total
disaster, but it's definitely not the intended great improvement.
I'm still working on finding better rotation constants. These are good
enough, but since it's unrolled twice, it's possible to get better
mixing for free by using eight different constants rather than repeating
the same four.
Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull /dev/random updates from Ted Ts'o:
"This adds a memzero_explicit() call which is guaranteed not to be
optimized away by GCC. This is important when we are wiping
cryptographically sensitive material"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
crypto: memzero_explicit - make sure to clear out sensitive data
random: add and use memzero_explicit() for clearing data
zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
memset() calls which clear out sensitive data in extract_{buf,entropy,
entropy_user}() in random driver are being optimized away by gcc.
Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
that can be used in such cases where a variable with sensitive data is
being cleared out in the end. Other use cases might also be in crypto
code. [ I have put this into lib/string.c though, as it's always built-in
and doesn't need any dependencies then. ]
Fixes kernel bugzilla: 82041
Reported-by: zatimend@hotmail.co.uk
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Pull randomness updates from Ted Ts'o:
"Cleanups and bug fixes to /dev/random, add a new getrandom(2) system
call, which is a superset of OpenBSD's getentropy(2) call, for use
with userspace crypto libraries such as LibreSSL.
Also add the ability to have a kernel thread to pull entropy from
hardware rng devices into /dev/random"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
hwrng: Pass entropy to add_hwgenerator_randomness() in bits, not bytes
random: limit the contribution of the hw rng to at most half
random: introduce getrandom(2) system call
hw_random: fix sparse warning (NULL vs 0 for pointer)
random: use registers from interrupted code for CPU's w/o a cycle counter
hwrng: add per-device entropy derating
hwrng: create filler thread
random: add_hwgenerator_randomness() for feeding entropy from devices
random: use an improved fast_mix() function
random: clean up interrupt entropy accounting for archs w/o cycle counters
random: only update the last_pulled time if we actually transferred entropy
random: remove unneeded hash of a portion of the entropy pool
random: always update the entropy pool under the spinlock
For people who don't trust a hardware RNG which can not be audited,
the changes to add support for RDSEED can be troubling since 97% or
more of the entropy will be contributed from the in-CPU hardware RNG.
We now have a in-kernel khwrngd, so for those people who do want to
implicitly trust the CPU-based system, we could create an arch-rng
hw_random driver, and allow khwrng refill the entropy pool. This
allows system administrator whether or not they trust the CPU (I
assume the NSA will trust RDRAND/RDSEED implicitly :-), and if so,
what level of entropy derating they want to use.
The reason why this is a really good idea is that if different people
use different levels of entropy derating, it will make it much more
difficult to design a backdoor'ed hwrng that can be generally
exploited in terms of the output of /dev/random when different attack
targets are using differing levels of entropy derating.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The getrandom(2) system call was requested by the LibreSSL Portable
developers. It is analoguous to the getentropy(2) system call in
OpenBSD.
The rationale of this system call is to provide resiliance against
file descriptor exhaustion attacks, where the attacker consumes all
available file descriptors, forcing the use of the fallback code where
/dev/[u]random is not available. Since the fallback code is often not
well-tested, it is better to eliminate this potential failure mode
entirely.
The other feature provided by this new system call is the ability to
request randomness from the /dev/urandom entropy pool, but to block
until at least 128 bits of entropy has been accumulated in the
/dev/urandom entropy pool. Historically, the emphasis in the
/dev/urandom development has been to ensure that urandom pool is
initialized as quickly as possible after system boot, and preferably
before the init scripts start execution.
This is because changing /dev/urandom reads to block represents an
interface change that could potentially break userspace which is not
acceptable. In practice, on most x86 desktop and server systems, in
general the entropy pool can be initialized before it is needed (and
in modern kernels, we will printk a warning message if not). However,
on an embedded system, this may not be the case. And so with this new
interface, we can provide the functionality of blocking until the
urandom pool has been initialized. Any userspace program which uses
this new functionality must take care to assure that if it is used
during the boot process, that it will not cause the init scripts or
other portions of the system startup to hang indefinitely.
SYNOPSIS
#include <linux/random.h>
int getrandom(void *buf, size_t buflen, unsigned int flags);
DESCRIPTION
The system call getrandom() fills the buffer pointed to by buf
with up to buflen random bytes which can be used to seed user
space random number generators (i.e., DRBG's) or for other
cryptographic uses. It should not be used for Monte Carlo
simulations or other programs/algorithms which are doing
probabilistic sampling.
If the GRND_RANDOM flags bit is set, then draw from the
/dev/random pool instead of the /dev/urandom pool. The
/dev/random pool is limited based on the entropy that can be
obtained from environmental noise, so if there is insufficient
entropy, the requested number of bytes may not be returned.
If there is no entropy available at all, getrandom(2) will
either block, or return an error with errno set to EAGAIN if
the GRND_NONBLOCK bit is set in flags.
If the GRND_RANDOM bit is not set, then the /dev/urandom pool
will be used. Unlike using read(2) to fetch data from
/dev/urandom, if the urandom pool has not been sufficiently
initialized, getrandom(2) will block (or return -1 with the
errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags).
The getentropy(2) system call in OpenBSD can be emulated using
the following function:
int getentropy(void *buf, size_t buflen)
{
int ret;
if (buflen > 256)
goto failure;
ret = getrandom(buf, buflen, 0);
if (ret < 0)
return ret;
if (ret == buflen)
return 0;
failure:
errno = EIO;
return -1;
}
RETURN VALUE
On success, the number of bytes that was filled in the buf is
returned. This may not be all the bytes requested by the
caller via buflen if insufficient entropy was present in the
/dev/random pool, or if the system call was interrupted by a
signal.
On error, -1 is returned, and errno is set appropriately.
ERRORS
EINVAL An invalid flag was passed to getrandom(2)
EFAULT buf is outside the accessible address space.
EAGAIN The requested entropy was not available, and
getentropy(2) would have blocked if the
GRND_NONBLOCK flag was not set.
EINTR While blocked waiting for entropy, the call was
interrupted by a signal handler; see the description
of how interrupted read(2) calls on "slow" devices
are handled with and without the SA_RESTART flag
in the signal(7) man page.
NOTES
For small requests (buflen <= 256) getrandom(2) will not
return EINTR when reading from the urandom pool once the
entropy pool has been initialized, and it will return all of
the bytes that have been requested. This is the recommended
way to use getrandom(2), and is designed for compatibility
with OpenBSD's getentropy() system call.
However, if you are using GRND_RANDOM, then getrandom(2) may
block until the entropy accounting determines that sufficient
environmental noise has been gathered such that getrandom(2)
will be operating as a NRBG instead of a DRBG for those people
who are working in the NIST SP 800-90 regime. Since it may
block for a long time, these guarantees do *not* apply. The
user may want to interrupt a hanging process using a signal,
so blocking until all of the requested bytes are returned
would be unfriendly.
For this reason, the user of getrandom(2) MUST always check
the return value, in case it returns some error, or if fewer
bytes than requested was returned. In the case of
!GRND_RANDOM and small request, the latter should never
happen, but the careful userspace code (and all crypto code
should be careful) should check for this anyway!
Finally, unless you are doing long-term key generation (and
perhaps not even then), you probably shouldn't be using
GRND_RANDOM. The cryptographic algorithms used for
/dev/urandom are quite conservative, and so should be
sufficient for all purposes. The disadvantage of GRND_RANDOM
is that it can block, and the increased complexity required to
deal with partially fulfilled getrandom(2) requests.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Zach Brown <zab@zabbo.net>
The expression entropy_count -= ibytes << (ENTROPY_SHIFT + 3) could
actually increase entropy_count if during assignment of the unsigned
expression on the RHS (mind the -=) we reduce the value modulo
2^width(int) and assign it to entropy_count. Trinity found this.
[ Commit modified by tytso to add an additional safety check for a
negative entropy_count -- which should never happen, and to also add
an additional paranoia check to prevent overly large count values to
be passed into urandom_read(). ]
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
For CPU's that don't have a cycle counter, or something equivalent
which can be used for random_get_entropy(), random_get_entropy() will
always return 0. In that case, substitute with the saved interrupt
registers to add a bit more unpredictability.
Some folks have suggested hashing all of the registers
unconditionally, but this would increase the overhead of
add_interrupt_randomness() by at least an order of magnitude, and this
would very likely be unacceptable.
The changes in this commit have been benchmarked as mostly unaffecting
the overhead of add_interrupt_randomness() if the entropy counter is
present, and doubling the overhead if it is not present.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Jörn Engel <joern@logfs.org>
This patch adds an interface to the random pool for feeding entropy
in-kernel.
Signed-off-by: Torsten Duwe <duwe@suse.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Use more efficient fast_mix() function. Thanks to George Spelvin for
doing the leg work to find a more efficient mixing function.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>
For architectures that don't have cycle counters, the algorithm for
deciding when to avoid giving entropy credit due to back-to-back timer
interrupts didn't make any sense, since we were checking every 64
interrupts. Change it so that we only give an entropy credit if the
majority of the interrupts are not based on the timer.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>
In xfer_secondary_pull(), check to make sure we need to pull from the
secondary pool before checking and potentially updating the
last_pulled time.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>
We previously extracted a portion of the entropy pool in
mix_pool_bytes() and hashed it in to avoid racing CPU's from returning
duplicate random values. Now that we are using a spinlock to prevent
this from happening, this is no longer necessary. So remove it, to
simplify the code a bit.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>
Instead of using lockless techniques introduced in commit
902c098a36, use spin_trylock to try to grab entropy pool's lock. If
we can't get the lock, then just try again on the next interrupt.
Based on discussions with George Spelvin.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: George Spelvin <linux@horizon.com>
Commit 0fb7a01af5 "random: simplify accounting code", introduced in
v3.15, has a very nasty accounting problem when the entropy pool has
has fewer bytes of entropy than the number of requested reserved
bytes. In that case, "have_bytes - reserved" goes negative, and since
size_t is unsigned, the expression:
ibytes = min_t(size_t, ibytes, have_bytes - reserved);
... does not do the right thing. This is rather bad, because it
defeats the catastrophic reseeding feature in the
xfer_secondary_pool() path.
It also can cause the "BUG: spinlock trylock failure on UP" for some
kernel configurations when prandom_reseed() calls get_random_bytes()
in the early init, since when the entropy count gets corrupted,
credit_entropy_bits() erroneously believes that the nonblocking pool
has been fully initialized (when in fact it is not), and so it calls
prandom_reseed(true) recursively leading to the spinlock BUG.
The logic is *not* the same it was originally, but in the cases where
it matters, the behavior is the same, and the resulting code is
hopefully easier to read and understand.
Fixes: 0fb7a01af5 "random: simplify accounting code"
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Greg Price <price@mit.edu>
Cc: stable@vger.kernel.org #v3.15
Pull block core updates from Jens Axboe:
"It's a big(ish) round this time, lots of development effort has gone
into blk-mq in the last 3 months. Generally we're heading to where
3.16 will be a feature complete and performant blk-mq. scsi-mq is
progressing nicely and will hopefully be in 3.17. A nvme port is in
progress, and the Micron pci-e flash driver, mtip32xx, is converted
and will be sent in with the driver pull request for 3.16.
This pull request contains:
- Lots of prep and support patches for scsi-mq have been integrated.
All from Christoph.
- API and code cleanups for blk-mq from Christoph.
- Lots of good corner case and error handling cleanup fixes for
blk-mq from Ming Lei.
- A flew of blk-mq updates from me:
* Provide strict mappings so that the driver can rely on the CPU
to queue mapping. This enables optimizations in the driver.
* Provided a bitmap tagging instead of percpu_ida, which never
really worked well for blk-mq. percpu_ida relies on the fact
that we have a lot more tags available than we really need, it
fails miserably for cases where we exhaust (or are close to
exhausting) the tag space.
* Provide sane support for shared tag maps, as utilized by scsi-mq
* Various fixes for IO timeouts.
* API cleanups, and lots of perf tweaks and optimizations.
- Remove 'buffer' from struct request. This is ancient code, from
when requests were always virtually mapped. Kill it, to reclaim
some space in struct request. From me.
- Remove 'magic' from blk_plug. Since we store these on the stack
and since we've never caught any actual bugs with this, lets just
get rid of it. From me.
- Only call part_in_flight() once for IO completion, as includes two
atomic reads. Hopefully we'll get a better implementation soon, as
the part IO stats are now one of the more expensive parts of doing
IO on blk-mq. From me.
- File migration of block code from {mm,fs}/ to block/. This
includes bio.c, bio-integrity.c, bounce.c, and ioprio.c. From me,
from a discussion on lkml.
That should describe the meat of the pull request. Also has various
little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong,
Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam
Bradshaw"
* 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits)
blk-mq: push IPI or local end_io decision to __blk_mq_complete_request()
blk-mq: remember to start timeout handler for direct queue
block: ensure that the timer is always added
blk-mq: blk_mq_unregister_hctx() can be static
blk-mq: make the sysfs mq/ layout reflect current mappings
blk-mq: blk_mq_tag_to_rq should handle flush request
block: remove dead code in scsi_ioctl:blk_verify_command
blk-mq: request initialization optimizations
block: add queue flag for disabling SG merging
block: remove 'magic' from struct blk_plug
blk-mq: remove alloc_hctx and free_hctx methods
blk-mq: add file comments and update copyright notices
blk-mq: remove blk_mq_alloc_request_pinned
blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request
blk-mq: remove blk_mq_wait_for_tags
blk-mq: initialize request in __blk_mq_alloc_request
blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request
blk-mq: add helper to insert requests from irq context
blk-mq: remove stale comment for blk_mq_complete_request()
blk-mq: allow non-softirq completions
...
Commit ee1de406ba ("random: simplify accounting logic") simplified
things too much, in that it allows the following to trigger an
overflow that results in a BUG_ON crash:
dd if=/dev/urandom of=/dev/zero bs=67108707 count=1
Thanks to Peter Zihlstra for discovering the crash, and Hannes
Frederic for analyizing the root cause.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Greg Price <price@mit.edu>
This will be needed for pending changes to the scsi midlayer that now
calls lower level block APIs, as well as any blk-mq driver that wants to
contribute to the random pool.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>