commit f2ea0f5f04 upstream.
Use standard ror64() instead of hand-written.
There is no standard ror64, so create it.
The difference is shift value being "unsigned int" instead of uint64_t
(for which there is no reason). gcc starts to emit native ROR instructions
which it doesn't do for some reason currently. This should make the code
faster.
Patch survives in-tree crypto test and ping flood with hmac(sha512) on.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a92d687c8 upstream.
Unfortunately in reducing W from 80 to 16 we ended up unrolling
the loop twice. As gcc has issues dealing with 64-bit ops on
i386 this means that we end up using even more stack space (>1K).
This patch solves the W reduction by moving LOAD_OP/BLEND_OP
into the loop itself, thus avoiding the need to duplicate it.
While the stack space still isn't great (>0.5K) it is at least
in the same ball park as the amount of stack used for our C sha1
implementation.
Note that this patch basically reverts to the original code so
the diff looks bigger than it really is.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 58d7d18b52 upstream.
The previous patch used the modulus operator over a power of 2
unnecessarily which may produce suboptimal binary code. This
patch changes changes them to binary ands instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 51fc6dc8f9 upstream.
For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
Consequently, keeping all W[80] array on stack is unnecessary,
only 16 values are really needed.
Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).
Line by line explanation:
* BLEND_OP
array is "circular" now, all indexes have to be modulo 16.
Round number is positive, so remainder operation should be
without surprises.
* initial full message scheduling is trimmed to first 16 values which
come from data block, the rest is calculated before it's needed.
* original loop body is unrolled version of new SHA512_0_15 and
SHA512_16_79 macros, unrolling was done to not do explicit variable
renaming. Otherwise it's the very same code after preprocessing.
See sha1_transform() code which does the same trick.
Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).
See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 84e31fdb7c upstream.
commit f9e2bca6c2
aka "crypto: sha512 - Move message schedule W[80] to static percpu area"
created global message schedule area.
If sha512_update will ever be entered twice, hash will be silently
calculated incorrectly.
Probably the easiest way to notice incorrect hashes being calculated is
to run 2 ping floods over AH with hmac(sha512):
#!/usr/sbin/setkey -f
flush;
spdflush;
add IP1 IP2 ah 25 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000025;
add IP2 IP1 ah 52 -A hmac-sha512 0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000052;
spdadd IP1 IP2 any -P out ipsec ah/transport//require;
spdadd IP2 IP1 any -P in ipsec ah/transport//require;
XfrmInStateProtoError will start ticking with -EBADMSG being returned
from ah_input(). This never happens with, say, hmac(sha1).
With patch applied (on BOTH sides), XfrmInStateProtoError does not tick
with multiple bidirectional ping flood streams like it doesn't tick
with SHA-1.
After this patch sha512_transform() will start using ~750 bytes of stack on x86_64.
This is OK for simple loads, for something more heavy, stack reduction will be done
separatedly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2bac6acf8 upstream.
As cryptd is depeneded on by other algorithms such as aesni-intel,
it needs to be registered before them. When everything is built
as modules, this occurs naturally. However, for this to work when
they are built-in, we need to use subsys_initcall in cryptd.
Tested-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We are going to use this for TCP/IP sequence number and fragment ID
generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
They are 64K and result in order-4 allocations, even with SLUB.
Therefore, just like we always have for the deflate buffers, use
vmalloc.
Reported-by: Martin Jackson <mjackson220.list@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (45 commits)
crypto: caam - add support for sha512 variants of existing AEAD algorithms
crypto: caam - remove unused authkeylen from caam_ctx
crypto: caam - fix decryption shared vs. non-shared key setting
crypto: caam - platform_bus_type migration
crypto: aesni-intel - fix aesni build on i386
crypto: aesni-intel - Merge with fpu.ko
crypto: mv_cesa - make count_sgs() null-pointer proof
crypto: mv_cesa - copy remaining bytes to SRAM only when needed
crypto: mv_cesa - move digest state initialisation to a better place
crypto: mv_cesa - fill inner/outer IV fields only in HMAC case
crypto: mv_cesa - refactor copy_src_to_buf()
crypto: mv_cesa - no need to save digest state after the last chunk
crypto: mv_cesa - print a warning when registration of AES algos fail
crypto: mv_cesa - drop this call to mv_hash_final from mv_hash_finup
crypto: mv_cesa - the descriptor pointer register needs to be set just once
crypto: mv_cesa - use ablkcipher_request_cast instead of the manual container_of
crypto: caam - fix printk recursion for long error texts
crypto: caam - remove unused keylen from session context
hwrng: amd - enable AMD hw rnd driver for Maple PPC boards
hwrng: amd - manage resource allocation
...
Loading fpu without aesni-intel does nothing. Loading aesni-intel
without fpu causes modes like xts to fail. (Unloading
aesni-intel will restore those modes.)
One solution would be to make aesni-intel depend on fpu, but it
seems cleaner to just combine the modules.
This is probably responsible for bugs like:
https://bugzilla.redhat.com/show_bug.cgi?id=589390
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of always creating a huge (268K) deflate_workspace with the
maximum compression parameters (windowBits=15, memLevel=8), allow the
caller to obtain a smaller workspace by specifying smaller parameter
values.
For example, when capturing oops and panic reports to a medium with
limited capacity, such as NVRAM, compression may be the only way to
capture the whole report. In this case, a small workspace (24K works
fine) is a win, whether you allocate the workspace when you need it (i.e.,
during an oops or panic) or at boot time.
I've verified that this patch works with all accepted values of windowBits
(positive and negative), memLevel, and compression level.
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits)
bonding: enable netpoll without checking link status
xfrm: Refcount destination entry on xfrm_lookup
net: introduce rx_handler results and logic around that
bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag
bonding: wrap slave state work
net: get rid of multiple bond-related netdevice->priv_flags
bonding: register slave pointer for rx_handler
be2net: Bump up the version number
be2net: Copyright notice change. Update to Emulex instead of ServerEngines
e1000e: fix kconfig for crc32 dependency
netfilter ebtables: fix xt_AUDIT to work with ebtables
xen network backend driver
bonding: Improve syslog message at device creation time
bonding: Call netif_carrier_off after register_netdevice
bonding: Incorrect TX queue offset
net_sched: fix ip_tos2prio
xfrm: fix __xfrm_route_forward()
be2net: Fix UDP packet detected status in RX compl
Phonet: fix aligned-mode pipe socket buffer header reserve
netxen: support for GbE port settings
...
Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c
with the staging updates.
ESP with separate encryption/authentication algorithms needs a special
treatment for the associated data. This patch add a new algorithm that
handles esp with extended sequence numbers.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit da7f033ddc (”crypto: cryptomgr - Add test infrastructure”) added a
const to variable which is later used as target buffer of memcpy.
crypto/tcrypt.c:217:12: warning: passing 'const char (*)[128]' to parameter of type 'void *' discards qualifiers
memset(&iv, 0xff, iv_len);
crypto/tcrypt.c:test_cipher_speed()
- unsigned char *key, iv[128];
+ const char *key, iv[128];
...
memset(&iv, 0xff, iv_len);
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In light of the recent discovery of the bug with partial block
processing on s390, we need best test coverage for that. This
patch adds a test vector for SHA1 that should catch such problems.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A self-test failure in fips mode means a panic. Well, gcm(aes)
self-tests currently fail in fips mode, as gcm is dependent on ghash,
which semi-recently got self-test vectors added, but wasn't marked as a
fips_allowed algorithm. Because of gcm's dependence on what is now seen
as a non-fips_allowed algorithm, its self-tests refuse to run.
Previously, ghash got a pass in fips mode, due to the lack of any test
vectors at all, and thus gcm self-tests were able to run. After this
patch, a 'modprobe tcrypt mode=35' no longer panics in fips mode, and
successful self-test of gcm(aes) is reported.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We (Red Hat) are intending to include dm-crypt functionality, using
xts(aes) for disk encryption, as part of an upcoming FIPS-140-2
certification effort, and xts(aes) *is* on the list of possible
mode/cipher combinations that can be certified. To make that possible, we
need to mark xts(aes) as fips_allowed in the crypto subsystem.
A 'modprobe tcrypt mode=10' in fips mode shows xts(aes) self-tests
passing successfully after this change.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
kcrypto_wq and pcrypt->wq's are used to run ciphers and may consume
considerable amount of CPU cycles. Mark both as CPU_INTENSIVE so that
they don't block other work items.
As the workqueues are primarily used to burn CPU cycles, concurrency
levels shouldn't matter much and are left at 1. A higher value may be
beneficial and needs investigation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>