This perl code is taken from the OpenSSL project and added gcm_init_htable function
used in the p10-aes-gcm-glue.c code to initialize hash table. gcm_hash_p8 is used
to hash encrypted data blocks.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This code is taken from CRYPTOGAMs[1]. The following functions are used,
aes_p8_set_encrypt_key is used to generate AES round keys and aes_p8_encrypt is used
to encrypt single block.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Improve overall performance of AES/GCM encrypt and decrypt operations
for Power10+ CPU.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Defined CRYPTO_P10_AES_GCM in Kconfig to support AES/GCM
stitched implementation for Power10+ CPU.
Added a new module driver p10-aes-gcm-crypto.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx512 implementation uses AVX512 and GFNI.
It supports 64way parallel processing.
So, byteslicing code is changed to support 64way parallel.
And it exports some aria-avx2 functions such as encrypt() and decrypt().
AVX and AVX2 have 16 registers.
They should use memory to store/load state because of lack of registers.
But AVX512 supports 32 registers.
So, it doesn't require store/load in the s-box layer.
It means that it can reduce overhead of store/load in the s-box layer.
Also code become much simpler.
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX512(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx512) encryption
tcrypt: 1 operation in 1504 cycles (1024 bytes)
tcrypt: 1 operation in 4595 cycles (4096 bytes)
tcrypt: 1 operation in 1763 cycles (1024 bytes)
tcrypt: 1 operation in 5540 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx512) decryption
tcrypt: 1 operation in 1502 cycles (1024 bytes)
tcrypt: 1 operation in 4615 cycles (4096 bytes)
tcrypt: 1 operation in 1759 cycles (1024 bytes)
tcrypt: 1 operation in 5554 cycles (4096 bytes)
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx2 implementation uses AVX2, AES-NI, and GFNI.
It supports 32way parallel processing.
So, byteslicing code is changed to support 32way parallel.
And it exports some aria-avx functions such as encrypt() and decrypt().
There are two main logics, s-box layer and diffusion layer.
These codes are the same as aria-avx implementation.
But some instruction are exchanged because they don't support 256bit
registers.
Also, AES-NI doesn't support 256bit register.
So, aesenclast and aesdeclast are used twice like below:
vextracti128 $1, ymm0, xmm6;
vaesenclast xmm7, xmm0, xmm0;
vaesenclast xmm7, xmm6, xmm6;
vinserti128 $1, xmm6, ymm0, ymm0;
Benchmark with modprobe tcrypt mode=610 num_mb=8192, i3-12100:
ARIA-AVX2 with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) encryption
tcrypt: 1 operation in 2003 cycles (1024 bytes)
tcrypt: 1 operation in 5867 cycles (4096 bytes)
tcrypt: 1 operation in 2358 cycles (1024 bytes)
tcrypt: 1 operation in 7295 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx2) decryption
tcrypt: 1 operation in 2004 cycles (1024 bytes)
tcrypt: 1 operation in 5956 cycles (4096 bytes)
tcrypt: 1 operation in 2409 cycles (1024 bytes)
tcrypt: 1 operation in 7564 cycles (4096 bytes)
ARIA-AVX with GFNI(128bit and 256bit)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) encryption
tcrypt: 1 operation in 2761 cycles (1024 bytes)
tcrypt: 1 operation in 9390 cycles (4096 bytes)
tcrypt: 1 operation in 3401 cycles (1024 bytes)
tcrypt: 1 operation in 11876 cycles (4096 bytes)
testing speed of multibuffer ecb(aria) (ecb-aria-avx) decryption
tcrypt: 1 operation in 2735 cycles (1024 bytes)
tcrypt: 1 operation in 9424 cycles (4096 bytes)
tcrypt: 1 operation in 3369 cycles (1024 bytes)
tcrypt: 1 operation in 11954 cycles (4096 bytes)
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
aria-avx assembly code accesses members of aria_ctx with magic number
offset. If the shape of struct aria_ctx is changed carelessly,
aria-avx will not work.
So, we need to ensure accessing members of aria_ctx with correct
offset values, not with magic numbers.
It adds ARIA_CTX_enc_key, ARIA_CTX_dec_key, and ARIA_CTX_rounds in the
asm-offsets.c So, correct offset definitions will be generated.
aria-avx assembly code can access members of aria_ctx safely with
these definitions.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
avx accelerated aria module used local keystream array.
But, keystream array size is too big.
So, it puts the keystream array into request ctx.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For SEV_GET_ID2, the user provided length does not have a specified
limitation because the length of the ID may change in the future. The
kernel memory allocation, however, is implicitly limited to 4MB on x86 by
the page allocator, otherwise the kzalloc() will fail.
When this happens, it is best not to spam the kernel log with the warning.
Simply fail the allocation and return ENOMEM to the user.
Fixes: d6112ea0cb ("crypto: ccp - introduce SEV_GET_ID2 command")
Reported-by: Andy Nguyen <theflow@google.com>
Reported-by: Peter Gonda <pgonda@google.com>
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GFP_DMA does not guarantee that the returned memory is aligned
for DMA. It should be removed where it is superfluous.
However, kmalloc may start returning DMA-unaligned memory in future
so fix this by adding the alignment by hand.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GFP_DMA does not guarantee that the returned memory is aligned
for DMA. It should be removed where it is superfluous.
However, kmalloc may start returning DMA-unaligned memory in future
so fix this by adding the alignment by hand.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The kernel provides implementations of the NIST ECDSA signature
verification primitives. For key sizes of 256 and 384 bits respectively
they are approved and can be enabled in FIPS mode. Do so.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ghash may be used only as part of the gcm(aes) construction in FIPS
mode. Since commit d6097b8d5d ("crypto: api - allow algs only in specific
constructions in FIPS mode") there's support for using spawns which by
itself are marked as non-approved from approved template instantiations.
So simply mark plain ghash as non-approved in testmgr to block any attempts
of direct instantiations in FIPS mode.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
cbcmac(aes) may be used only as part of the ccm(aes) construction in FIPS
mode. Since commit d6097b8d5d ("crypto: api - allow algs only in specific
constructions in FIPS mode") there's support for using spawns which by
itself are marked as non-approved from approved template instantiations.
So simply mark plain cbcmac(aes) as non-approved in testmgr to block any
attempts of direct instantiations in FIPS mode.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
xts_fallback_setkey() in xts_aes_set_key() will now enforce key size
rule in FIPS mode when setting up the fallback algorithm keys, which
makes the check in xts_aes_set_key() redundant or unreachable. So just
drop this check.
xts_fallback_setkey() now makes a key size check in xts_verify_key():
xts_fallback_setkey()
crypto_skcipher_setkey() [ skcipher_setkey_unaligned() ]
cipher->setkey() { .setkey = xts_setkey }
xts_setkey()
xts_verify_key()
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
xts_check_key() is obsoleted by xts_verify_key(). Over time XTS crypto
drivers adopted the newer xts_verify_key() variant, but xts_check_key()
is still used by a number of drivers. Switch drivers to use the newer
xts_verify_key() and make a couple of cleanups. This allows us to drop
xts_check_key() completely and avoid redundancy.
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
According to FIPS 140-3 IG C.I., only (total) key lengths of either
256 bits or 512 bits are allowed with xts(aes). Make xts_verify_key() to
reject anything else in FIPS mode.
As xts(aes) is the only approved xts() template instantiation in FIPS mode,
the new restriction implemented in xts_verify_key() effectively only
applies to this particular construction.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
GFP_DMA does not guarantee that the returned memory is aligned
for DMA. In fact for sun8i-ss it is superfluous and can be removed.
However, kmalloc may start returning DMA-unaligned memory in future
so fix this by adding the alignment by hand.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The helper mpi_read_raw_from_sgl sets the number of entries in
the SG list according to nbytes. However, if the last entry
in the SG list contains more data than nbytes, then it may overrun
the buffer because it only allocates enough memory for nbytes.
Fixes: 2d4d1eea54 ("lib/mpi: Add mpi sgl helpers")
Reported-by: Roberto Sassu <roberto.sassu@huaweicloud.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reduce the stack usage further by splitting up the test function.
Also squash blocks and unaligned_blocks into one array.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 453de3eb08 ("crypto: ux500/cryp - delete driver") removes the
config CRYPTO_DEV_UX500_CRYP, but leaves an obsolete reference in the
dependencies of config CRYPTO_DEV_UX500_DEBUG.
Remove that obsolete reference, and adjust the description while at it.
Fixes: 453de3eb08 ("crypto: ux500/cryp - delete driver")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The memory sanitizer causes excessive register spills in this function:
crypto/wp512.c:782:13: error: stack frame size (2104) exceeds limit (2048) in 'wp512_process_buffer' [-Werror,-Wframe-larger-than]
Assume that this one is safe, and mark it as needing no checks to
get the stack usage back down to the normal level.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kmap_atomic() is used to create short-lived mappings of pages that may
not be accessible via the kernel direct map. This is only needed on
32-bit architectures that implement CONFIG_HIGHMEM, but it can be used
on 64-bit other architectures too, where the returned mapping is simply
the kernel direct address of the page.
However, kmap_atomic() does not support migration on CONFIG_HIGHMEM
configurations, due to the use of per-CPU kmap slots, and so it disables
preemption on all architectures, not just the 32-bit ones. This implies
that all scatterwalk based crypto routines essentially execute with
preemption disabled all the time, which is less than ideal.
So let's switch scatterwalk_map/_unmap and the shash/ahash routines to
kmap_local() instead, which serves a similar purpose, but without the
resulting impact on preemption on architectures that have no need for
CONFIG_HIGHMEM.
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Elliott, Robert (Servers)" <elliott@hpe.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>