Jeffrey Walton
65222dfe9e
Move location of CRYPTOPP_ARM_ACLE_AVAILABLE test in config.h
...
This should make it easier to detect when we need to include <arm_acle.h>
2017-12-09 13:07:50 -05:00
Jeffrey Walton
5856ab5a7e
Add Valgrind suppression file for Salsa20 and runtime's __memcmp_sse4_1
2017-12-08 17:46:44 -05:00
Jeffrey Walton
e457ca26f7
Add SSE3 <pmmintrin.h> for SImon and Speck
...
Add additional comments for WORKAROUND_GCC_OPTERON_ISSUE
2017-12-08 13:54:00 -05:00
Jeffrey Walton
148202369b
Fix Speck-64 CTR mode
...
It looks like the delay was due to some GCC 7 issue. We had to disable parallel blocks on Aarch64 with GCC 7. We may be running out of registers and that could be causing problems. It looks like GCC uses up to v30.
2017-12-07 22:30:03 -05:00
Jeffrey Walton
02037b5ce6
Fix Simon-64 CTR mode
...
This fixes CTR mode for Simon-64. We were only incrementing half the counters.
We still have Speck-64 to cleanup.
2017-12-07 19:45:32 -05:00
Jeffrey Walton
07f2a4fc3f
Fix Simon-64 and Speck-64 CTR mode
...
This fixes CTR mode for IA-32. We were only incrementing half the counters.
Added additional test vectors
2017-12-07 16:55:23 -05:00
Jeffrey Walton
fe257e92a9
Add const-ness to internal BLAKE2 functions (GH #527 )
2017-12-06 17:40:34 -05:00
Jeffrey Walton
b436411de5
Fix MSVC 2017 hang on BLAKE2 (GH #527 )
...
It looks like the macros for BLAKE2B and BLAKE2S round functions were too much for the compiler to handle
2017-12-06 14:02:28 -05:00
Jeffrey Walton
86acc8ed45
Use 6x-2x-1x for Simon and Speck on IA-32
...
For Simon-64 and Speck-64 this means we are effectively using 12x-4x-1x. We are mostly at the threshold for IA-32 and parallelization. At any time 10 to 13 XMM registers are being used.
Prefer movsd by way of _mm_load_sd and _mm_store_sd.
Fix "error C3861: _mm_cvtsi128_si64x identifier not found".
2017-12-06 06:18:46 -05:00
Jeffrey Walton
e9654192f2
Remove unneeded temp[] array
2017-12-05 20:35:57 -05:00
Jeffrey Walton
490701acca
Use 12x-4x-1x for Simon and Speck on ARM
2017-12-05 18:43:53 -05:00
Jeffrey Walton
7bc621da62
Enable NEON/ASIMD for Simon and Speck on Aarch32/Aarch64 (GH #545 )
2017-12-05 14:02:48 -05:00
Jeffrey Walton
9b61d4143d
Add big- and little-endian rotates for Aarch32 and Aarch64
2017-12-05 12:32:26 -05:00
Jeffrey Walton
9faa504a24
Fix Aarch32 and Aarch64 rotates
2017-12-05 11:15:26 -05:00
Jeffrey Walton
c18793f862
Fix SIMON-64 missing transform
2017-12-05 09:14:58 -05:00
Jeffrey Walton
4990ffe5b8
Add SIMON-64 NEON intrinsics
2017-12-05 08:53:57 -05:00
Jeffrey Walton
b208c8c1b4
Add 4 additional lanes to SPECK-64 for ARM
2017-12-05 07:16:34 -05:00
Jeffrey Walton
e09e6af1f8
Enable multi-block for SPECK-64 and SIMON-64
...
Also cleaned up SIMON-64 vector permute code. Thanks again to Peter Cordes
2017-12-05 04:19:44 -05:00
Jeffrey Walton
147ecba5df
Add temp working variable for SPECK64_AdvancedProcessBlocks_SSE41
...
Avoid potential undefined behavior by using aligned words
2017-12-04 14:52:36 -05:00
Jeffrey Walton
076937eb81
Update comments for vector permutes in SPECK-128
2017-12-04 12:31:32 -05:00
Jeffrey Walton
25709d2597
Fix SPECK64 vector permutes
...
Thanks to Peter Cordes for the suggestion on handling the case
2017-12-04 09:47:26 -05:00
Jeffrey Walton
46271660a1
Switch to uint64x2_t for SIMON-128
2017-12-04 05:47:34 -05:00
Jeffrey Walton
e9714b40d2
Switch to _mm_unpacklo_epi32 and _mm_unpackhi_epi32
...
The manual _mm_extract_epi32 and _mm_insert_epi32 are required during setup, be we can use SSE on teardown
2017-12-04 05:01:27 -05:00
Jeffrey Walton
cd31fa29dc
Switch to uint64x2_t for SPECK-128
2017-12-04 03:38:39 -05:00
Jeffrey Walton
1de143203e
Add SPECK-64 NEON intrinsics
2017-12-03 18:47:39 -05:00
Jeffrey Walton
cd55613b80
Disable NEON for SPECK-64
...
This was inadvertently checked-in
2017-12-03 11:02:15 -05:00
Jeffrey Walton
f0e49785f6
Fix incorrect SPECK-128 decrypt when blocks >= 6
...
Add defines for CRYPTOPP_SPECK64_ADVANCED_PROCESS_BLOCKS and CRYPTOPP_SPECK128_ADVANCED_PROCESS_BLOCKS
2017-12-03 09:00:39 -05:00
Jeffrey Walton
18ccd89965
Add SSE4 flags to makefile for Simon and Speck
2017-12-03 06:02:24 -05:00
Jeffrey Walton
081afde0fd
Add SIMON-64 SSE intrinsics
...
Performance went from about 29 cpb (C++) to about 11.1 cpb (SSE)
2017-12-03 04:10:55 -05:00
Jeffrey Walton
6bb1f1d9c4
Add SPECK-64 SSE intrinsics
...
Performance went from about 11.9 cpb (C++) to about 4.5 cpb (SSE)
2017-12-03 02:28:40 -05:00
Jeffrey Walton
77ff7aa528
Add additional Simon test vectors
2017-12-02 21:07:33 -05:00
Jeffrey Walton
ca158d56f8
Add additional Speck test vectors
2017-12-02 20:00:32 -05:00
Jeffrey Walton
25493ded49
Add AVX512VL rotate support
2017-12-01 09:39:05 -05:00
Jeffrey Walton
49a119cbf7
Add SPECK-64 and SPECK-128 large block tests
...
The tests were generated using Crypto++ and the straight C++ implementation. It should allow us to test the SSE and NEON impelmentations and multiple blocks
2017-12-01 07:33:21 -05:00
Jeffrey Walton
3c1914b020
Add SIMON-64 and SIMON-128 large block tests
...
The tests were generated using Crypto++ and the straight C++ implementation. It should allow us to test the SSE and NEON impelmentations and multiple blocks
2017-12-01 07:10:42 -05:00
Jeffrey Walton
4792578f09
Rearrange statements and avoid intermediates
...
The folding of statements helps GCC elimate some of the intermediate stores it was performing. The elimination saved about 1.0 cpb. SIMON-128 is now running around 10 cpb, but it is still off the Simon and Speck team's numbers of 3.5 cpb
2017-12-01 04:11:31 -05:00
Jeffrey Walton
b7ced67892
Update comments
2017-12-01 02:38:19 -05:00
Jeffrey Walton
a7fec9c0f6
Fix assert in Debug builds
...
This was copy/paste from the template function
2017-11-30 11:54:21 -05:00
Jeffrey Walton
14e326482c
Update comments
2017-11-30 02:07:04 -05:00
Jeffrey Walton
22257c4b6e
Remove SunCC const cast workaround
...
This code does not suffer SunCC losing const-ness
2017-11-29 12:56:19 -05:00
Jeffrey Walton
39594a53b0
Add fast rotate-by-8 for Aarch32 and Aarch64
2017-11-29 12:33:34 -05:00
Jeffrey Walton
532f13fe53
Fix compile using SunCC 12.4
2017-11-29 12:10:19 -05:00
Jeffrey Walton
61ec50dabe
Change Doxygen comment style from //! to ///
...
Also see https://groups.google.com/forum/#!topic/cryptopp-users/A7-Xt5Knlzw
2017-11-29 10:54:33 -05:00
Jeffrey Walton
16ebfa72bf
Cleanup comments and whitespace
2017-11-29 10:15:41 -05:00
Jeffrey Walton
6e829cebee
Use EPI8 Shuffle rather than Shifts and Or for rotate when R=8
...
Louis Wingers and Bryan Weeks from the Simon and Speck team offered the suggestion. The change save 0.7 cpb for Speck, and 5 cpb for Simon on x86_64.
Speck is now running very close to the Team's time sor SSE4. Simon is still off, but we know the root cause. For Simon, the Team used a fast bit-sliced implementation
2017-11-29 08:53:48 -05:00
Jeffrey Walton
bdb2db7ac2
Uncouple GetAlignment from CRYPTOPP_DISABLE_SOSEMANUK_ASM
...
The class declaration needs to always include the functions for the platform. The implementation can simply return a different number, and that is hidden from the user
2017-11-29 08:00:21 -05:00
Jeffrey Walton
c6c8dd3b32
Add Valgrind suppression file to file list (GH #543 )
2017-11-29 07:15:42 -05:00
Jeffrey Walton
92436b9f9b
Re-enable Salsa20 ASM (GH #543 )
...
We are fairly certain this is a false positive due to glibc's __memcmp_sse4_1.
2017-11-29 06:55:19 -05:00
Jeffrey Walton
f86c6124a8
Add Valgrind suppression file (GH #543 )
2017-11-29 06:52:43 -05:00
Jeffrey Walton
33caa1e13f
Add Valgrind --track-origins=yes to recipe
2017-11-29 05:26:21 -05:00