struct s3c2410_dma_client gets defined multiple times as it is defined
in more than one header file. Changing it at the header file level causes
many more build breakages as they are interdependent in a complex way.
Hence fixing this problem by using the mach version of the header file.
Without this patch, following build error is observed:
arch/arm/plat-samsung/include/plat/dma-pl330.h:106:27: error:
redefinition of struct s3c2410_dma_client
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
without it:
|drivers/built-in.o:(.data+0x14588): undefined reference to `crypto_ablkcipher_type'
|drivers/built-in.o:(.data+0x14668): undefined reference to `crypto_ablkcipher_type'
Not sure when this broke.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
clk_put(dd->aes_clk) will dereference an error-valued pointer since the
dd->aes_clk is a ERR_PTR() value. The correct check is call clk_put()
if !IS_ERR(dd->aes_clk).
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add the missing unlock on the error handling path in function
tegra_aes_get_random() and tegra_aes_rng_reset().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce new assembler functions to avoid use temporary stack buffers in glue
code. This also allows use of vector instructions for xoring output in CTR and
CBC modes and construction of IVs for CTR mode.
ECB mode sees ~0.5% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~5%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce new assembler functions to avoid use temporary stack buffers in glue
code. This also allows use of vector instructions for xoring output in CTR and
CBC modes and construction of IVs for CTR mode.
ECB mode sees ~0.5% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~3%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce new assembler functions to avoid use temporary stack buffers in glue
code. This also allows use of vector instructions for xoring output in CTR and
CBC modes and construction of IVs for CTR mode.
ECB mode sees ~0.2% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~3%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Introduce new assembler functions to avoid use temporary stack buffers in
glue code. This also allows use of vector instructions for xoring output
in CTR and CBC modes and construction of IVs for CTR mode.
ECB mode sees ~0.5% decrease in speed because added one extra function
call. CBC mode decryption and CTR mode benefit from vector operations
and gain ~2%.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'u128' currently used for CTR mode is on little-endian 'long long' swapped
and would require extra swap operations by SSE/AVX code. Use of le128
instead of u128 allows IV calculations to be done with vector registers
easier.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Most DES3_EDE testvectors are short and do not test parallelised codepaths
well. Add larger testvectors to test large crypto operations and to test
multi-page crypto with DES3_EDE.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Most DES testvectors are short and do not test parallelised codepaths
well. Add larger testvectors to test large crypto operations and to test
multi-page crypto with DES.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Most AES testvectors are short and do not test parallelised codepaths
well. Add larger testvectors to test large crypto operations and to test
multi-page crypto with AES.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AVX2 implementation of serpent cipher processes 16 blocks parallel, so
we need to make test vectors larger to check parallel code paths.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AVX2 implementation of blowfish cipher processes 32 blocks parallel, so
we need to make test vectors larger to check parallel code paths.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
AVX/AES-NI implementation of camellia cipher processes 16 blocks
parallel, so we need to make test vectors larger to check parallel
code paths.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
VMAC implementation, as it is, does not work with blocks that
are not multiples of 128-bytes. Furthermore, this is a problem
when using the implementation on scatterlists, even
when the complete plain text is 128-byte multiple, as the pieces
that get passed to vmac_update can be pretty much any size.
I also added test cases for unaligned blocks.
Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the crc_pcl function that calculates CRC32C checksum using the
PCLMULQDQ instruction on processors that support this feature. This will
provide speedup over using CRC32 instruction only.
The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and
kernel_fpu_end and incur some overhead. So the new crc_pcl function is only
invoked for buffer size of 512 bytes or more. Larger sized
buffers will expect to see greater speedup. This feature is best used coupled
with eager_fpu which reduces the kernel_fpu_begin/end overhead. For
buffer size of 1K the speedup is around 1.6x and for buffer size greater than
4K, the speedup is around 3x compared to original implementation in crc32c-intel
module. Test was performed on Sandy Bridge based platform with constant frequency
set for cpu.
A white paper detailing the algorithm can be found here:
http://download.intel.com/design/intarch/papers/323405.pdf
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>