On Fri, Aug 30, 2024 at 10:51:54AM -0700, Eric Biggers wrote:
>
> Given below in defconfig form, use 'make olddefconfig' to apply. The failures
> are nondeterministic and sometimes there are different ones, for example:
>
> [ 0.358017] alg: skcipher: failed to allocate transform for cbc(twofish-generic): -2
> [ 0.358365] alg: self-tests for cbc(twofish) using cbc(twofish-generic) failed (rc=-2)
> [ 0.358535] alg: skcipher: failed to allocate transform for cbc(camellia-generic): -2
> [ 0.358918] alg: self-tests for cbc(camellia) using cbc(camellia-generic) failed (rc=-2)
> [ 0.371533] alg: skcipher: failed to allocate transform for xts(ecb(aes-generic)): -2
> [ 0.371922] alg: self-tests for xts(aes) using xts(ecb(aes-generic)) failed (rc=-2)
>
> Modules are not enabled, maybe that matters (I haven't checked yet).
Yes I think that was the key. This triggers a massive self-test
run which executes in parallel and reveals a few race conditions
in the system. I think it boils down to the following scenario:
Base algorithm X-generic, X-optimised
Template Y
Optimised algorithm Y-X-optimised
Everything gets registered, and then the self-tests are started.
When Y-X-optimised gets tested, it requests the creation of the
generic Y(X-generic). Which then itself undergoes testing.
The race is that after Y(X-generic) gets registered, but just
before it gets tested, X-optimised finally finishes self-testing
which then causes all spawns of X-generic to be destroyed. So
by the time the self-test for Y(X-generic) comes along, it can
no longer find the algorithm. This error then bubbles up all
the way up to the self-test of Y-X-optimised which then fails.
Note that there is some complexity that I've omitted here because
when the generic self-test fails to find Y(X-generic) it actually
triggers the construction of it again which then fails for various
other reasons (these are not important because the construction
should *not* be triggered at this point).
So in a way the error is expected, and we should probably remove
the pr_err for the case where ENOENT is returned for the algorithm
that we're currently testing.
The solution is two-fold. First when an algorithm undergoes
self-testing it should not trigger its construction. Secondly
if an instance larval fails to materialise due to it being destroyed
by a more optimised algorithm coming along, it should obviously
retry the construction.
Remove the check in __crypto_alg_lookup that stops a larval from
matching new requests based on differences in the mask. It is better
to block new requests even if it is wrong and then simply retry the
lookup. If this ends up being the wrong larval it will sort iself
out during the retry.
Reduce the CRYPTO_ALG_TYPE_MASK bits in type during larval creation
as otherwise LSKCIPHER algorithms may not match SKCIPHER larvals.
Also block the instance creation during self-testing in the function
crypto_larval_lookup by checking for CRYPTO_ALG_TESTED in the mask
field.
Finally change the return value when crypto_alg_lookup fails in
crypto_larval_wait to EAGAIN to redo the lookup.
Fixes: 37da5d0ffa ("crypto: api - Do not wait for tests during registration")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As registration is usually carried out during module init, this
is a context where as little work as possible should be carried
out. Testing may trigger module loads of underlying components,
which could even lead back to the module that is registering at
the moment. This may lead to dead-locks outside of the Crypto API.
Avoid this by not waiting for the tests to complete. They will
be scheduled but completion will be asynchronous. Any users will
still wait for completion.
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In order to allow testing to complete asynchronously after the
registration process, instance larvals need to complete prior
to having a test result. Support this by redoing the lookup for
instance larvals after completion. This should locate the pending
test larval and then repeat the wait on that (if it is still pending).
As the lookup is now repeated there is no longer any need to compute
the fulfilment status and all that code can be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The boot-test-finished toggle is only necessary if algapi
is built into the kernel. Do not include this code if it is a module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There is a confusing pattern in the kernel to use a variable named 'timeout' to
store the result of wait_for_completion_killable_timeout() causing patterns like:
timeout = wait_for_completion_killable_timeout(...)
if (!timeout) return -ETIMEDOUT;
with all kinds of permutations. Use 'time_left' as a variable to make the code
self explaining.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use it straight away in crypto_clone_cipher(), as that is not meant to
sleep.
Fixes: 51d8d6d0f4 ("crypto: cipher - Add crypto_clone_cipher")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Purge crypto_type::init() as well.
The last user seems to be gone with commit d63007eb95 ("crypto:
ablkcipher - remove deprecated and unused ablkcipher support").
Signed-off-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the helper crypto_clone_tfm. The purpose is to
allocate a tfm object with GFP_ATOMIC. As we cannot sleep, the
object has to be cloned from an existing tfm object.
This allows code paths that cannot otherwise allocate a crypto_tfm
object to do so. Once a new tfm has been obtained its key could
then be changed without impacting other users.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a crypto_tfm_get interface to allow tfm objects to be shared.
They can still be freed in the usual way.
This should only be done with tfm objects with no keys. You must
also not modify the tfm flags in any way once it becomes shared.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch does the final flag day conversion of all completion
functions which are now all contained in the Crypto API.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The crypto_boot_test_finished static key is unnecessary when self-tests
are disabled in the kconfig, so optimize it out accordingly, along with
the entirety of crypto_start_tests(). This mainly avoids the overhead
of an unnecessary static_branch_enable() on every boot.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently, registering an algorithm with the crypto API always causes a
notification to be posted to the "cryptomgr", which then creates a
kthread to self-test the algorithm. However, if self-tests are disabled
in the kconfig (as is the default option), then this kthread just
notifies waiters that the algorithm has been tested, then exits.
This causes a significant amount of overhead, especially in the kthread
creation and destruction, which is not necessary at all. For example,
in a quick test I found that booting a "minimum" x86_64 kernel with all
the crypto options enabled (except for the self-tests) takes about 400ms
until PID 1 can start. Of that, a full 13ms is spent just doing this
pointless dance, involving a kthread being created, run, and destroyed
over 200 times. That's over 3% of the entire kernel start time.
Fix this by just skipping the creation of the test larval and the
posting of the registration notification entirely, when self-tests are
disabled.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently we do not distinguish between algorithms that fail on
the self-test vs. those which are disabled in FIPS mode (not allowed).
Both are marked as having failed the self-test.
Recently the need arose to allow the usage of certain algorithms only
as arguments to specific template instantiations in FIPS mode. For
example, standalone "dh" must be blocked, but e.g. "ffdhe2048(dh)" is
allowed. Other potential use cases include "cbcmac(aes)", which must
only be used with ccm(), or "ghash", which must be used only for
gcm().
This patch allows this scenario by adding a new flag FIPS_INTERNAL to
indicate those algorithms that are not FIPS-allowed. They can then be
used as template arguments only, i.e. when looked up via
crypto_grab_spawn() to be more specific. The FIPS_INTERNAL bit gets
propagated upwards recursively into the surrounding template
instances, until the construction eventually matches an explicit
testmgr entry with ->fips_allowed being set, if any.
The behaviour to skip !->fips_allowed self-test executions in FIPS
mode will be retained. Note that this effectively means that
FIPS_INTERNAL algorithms are handled very similarly to the INTERNAL
ones in this regard. It is expected that the FIPS_INTERNAL algorithms
will receive sufficient testing when the larger constructions they're
a part of, if any, get exercised by testmgr.
Note that as a side-effect of this patch algorithms which are not
FIPS-allowed will now return ENOENT instead of ELIBBAD. Hopefully
this is not an issue as some people were relying on this already.
Link: https://lore.kernel.org/r/YeEVSaMEVJb3cQkq@gondor.apana.org.au
Originally-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The soft dependency on cryptomgr is only needed in algapi because
if algapi isn't present then no algorithms can be loaded. This
also fixes the case where api is built-in but algapi is built as
a module as the soft dependency would otherwise get lost.
Fixes: 8ab23d547f ("crypto: api - Add softdep on cryptomgr")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The delayed boot-time testing patch created a dependency loop
between api.c and algapi.c because it added a crypto_alg_tested
call to the former when the crypto manager is disabled.
We could instead avoid creating the test larvals if the crypto
manager is disabled. This avoids the dependency loop as well
as saving some unnecessary work, albeit in a very unlikely case.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: adad556efc ("crypto: api - Fix built-in testing dependency failures")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When complex algorithms that depend on other algorithms are built
into the kernel, the order of registration must be done such that
the underlying algorithms are ready before the ones on top are
registered. As otherwise they would fail during the self-test
which is required during registration.
In the past we have used subsystem initialisation ordering to
guarantee this. The number of such precedence levels are limited
and they may cause ripple effects in other subsystems.
This patch solves this problem by delaying all self-tests during
boot-up for built-in algorithms. They will be tested either when
something else in the kernel requests for them, or when we have
finished registering all built-in algorithms, whichever comes
earlier.
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As said by Linus:
A symmetric naming is only helpful if it implies symmetries in use.
Otherwise it's actively misleading.
In "kzalloc()", the z is meaningful and an important part of what the
caller wants.
In "kzfree()", the z is actively detrimental, because maybe in the
future we really _might_ want to use that "memfill(0xdeadbeef)" or
something. The "zero" part of the interface isn't even _relevant_.
The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.
Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.
The renaming is done by using the command sequence:
git grep -w --name-only kzfree |\
xargs sed -i 's/kzfree/kfree_sensitive/'
followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.
[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For a Linux server with NUMA, there are possibly multiple (de)compressors
which are either local or remote to some NUMA node. Some drivers will
automatically use the (de)compressor near the CPU calling acomp_alloc().
However, it is not necessarily correct because users who send acomp_req
could be from different NUMA node with the CPU which allocates acomp.
Just like kernel has kmalloc() and kmalloc_node(), here crypto can have
same support.
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are two problems in crypto_spawn_alg. First of all it may
return spawn->alg even if spawn->dead is set. This results in a
double-free as detected by syzbot.
Secondly the setting of the DYING flag is racy because we hold
the read-lock instead of the write-lock. We should instead call
crypto_shoot_alg in a safe manner by gaining a refcount, dropping
the lock, and then releasing the refcount.
This patch fixes both problems.
Reported-by: syzbot+fc0674cde00b66844470@syzkaller.appspotmail.com
Fixes: 4f87ee118d ("crypto: api - Do not zap spawn->alg")
Fixes: 73669cc556 ("crypto: api - Fix race condition in...")
Cc: <stable@vger.kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>