Jeffrey Walton
8279fab432
Fix AdvancedProcessBlocks128_6x1_NEON template name
2018-06-23 12:35:06 -04:00
Ilja
ec6c442cc6
Remove extra ; from rijndael-simd.cpp (PR #621 )
2018-03-31 13:04:42 -04:00
Jeffrey Walton
bd8c20562c
Clear unused variable warnings
2018-02-20 17:03:32 -05:00
Jeffrey Walton
244c40ed61
Remove unneeded round parameter on Rijndael_UncheckedSetKey_SSE4_AESNI
2018-02-20 13:32:53 -05:00
Jeffrey Walton
33c10bc027
Fix ODR violation in AdvancedProcessBlocks_{ARCH} (GH #585 )
...
The ALTIVEC function required an inline declaration. Lack of inline caused the self test failure. Two NEON functions needed the same. We also cleaned up constants in unnamed namespaces
2018-02-20 13:17:05 -05:00
Jeffrey Walton
c80e28eec8
Remove unneeded parameter for Rijndael_UncheckedSetKey_POWER8
2018-02-20 06:42:43 -05:00
Jeffrey Walton
d30afa4d01
Whitespace check-in
2018-02-20 04:18:58 -05:00
Jeffrey Walton
2b2303bc75
Remove unneeded Rijndael_Subkey_POWER8 (GH #588 )
...
This is due to the removal of a path in Rijndael_UncheckedSetKey_POWER8
2018-02-20 02:24:09 -05:00
Jeffrey Walton
5b09d46665
Cleanup signed integer overflow on ppc64 (GH #588 )
...
The code below was flagged by undefined behavior santizier under GCC 8. The offender was the doubling at "r4 = vec_add(r4, r4)". R4 is rcon and an unsigned type. It depends on integer wrap but GCC is generating code that is being flagged for signed overflow. GCC 7 and below is OK.
for (unsigned int i=0; i<8; ++i)
{
r1 = Rijndael_Subkey_POWER8(r1, r4, r5);
r4 = vec_add(r4, r4);
skptr = IncrementPointerAndStore(r1, skptr);
}
// Final two rounds using table lookup
...
2018-02-20 02:10:17 -05:00
Jeffrey Walton
5cee4a6573
Improve logic for <arm_acle.h> include (GH #568 )
2018-01-20 13:23:41 -05:00
Jeffrey Walton
d6d53f2e9d
Add Power4 Vector Load, Store, Add and Xor
2018-01-02 08:13:42 -05:00
Jeffrey Walton
fac3a44a84
Move Altivec AdvancedProcessBlocks into adv-simd.h
2018-01-02 07:08:13 -05:00
Jeffrey Walton
66da740ad3
Use M128_CAST and CONST_M128_CAST for Clang
...
Also see http://bugs.llvm.org/show_bug.cgi?id=20670
2017-12-26 11:20:18 -05:00
Jeffrey Walton
8e916e7bac
Use M128_CAST and CONST_M128_CAST for Clang
...
Also see http://bugs.llvm.org/show_bug.cgi?id=20670
2017-12-26 11:16:52 -05:00
Jeffrey Walton
2c79be7a54
Add CRYPTOPP_POWER5_AVAILABLE
...
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
2017-12-12 08:17:17 -05:00
Jeffrey Walton
b7e636ac51
Rename ppc-crypto.h to ppc-simd.h
2017-12-12 07:15:59 -05:00
Jeffrey Walton
195ac2c7c9
Refactor rijndael-simd.cpp and simon.simd.cpp to use adv-simd.h
2017-12-10 11:09:50 -05:00
Jeffrey Walton
69c8a4f9c6
Prefix IS_LITTLE_ENDIAN and IS_BIG_ENDIAN with CRYPTOPP
2017-11-10 14:15:30 -05:00
Jeffrey Walton
ea3c80c949
Move Rijndael_AdvancedProcessBlocks_ARMV8 into anonymous namespace
2017-09-23 05:28:59 -04:00
Jeffrey Walton
26597059d9
Move to anonymous namespaces in rijndael-simd.cpp
2017-09-23 02:13:16 -04:00
Jeffrey Walton
12953fd0e4
Add IncrementPointerAndStore
...
This speeds up XL C/C++ by 0.1 to 0.2 cpb
2017-09-22 20:35:18 -04:00
Jeffrey Walton
1057f89363
Move Power8 crypto functions into ppc-crypto.h
2017-09-22 05:23:29 -04:00
Jeffrey Walton
3e55817819
Add C++ templates for additional Vector ops
...
Removed lower-level C-like functions such as Store8x16 and Store64x2
2017-09-22 04:15:33 -04:00
Jeffrey Walton
441e944a66
Switch to vec_vsx_ld, remove unaligned loads
...
Partially unroll loop Rijndael_UncheckedSetKey_POWER8 loop. It saves about another 60 cycles
2017-09-22 02:53:08 -04:00
Jeffrey Walton
d9592a303c
Updated comments
2017-09-21 21:45:23 -04:00
Jeffrey Walton
dabad4b409
Cleanup asserts and casts
2017-09-21 20:55:35 -04:00
Jeffrey Walton
1edea5a80f
Vectorize tail of Rijndael_UncheckedSetKey_POWER8
2017-09-21 20:02:40 -04:00
Jeffrey Walton
e43c0eee74
Fold ConditionalByteReverse for non-Power8 paths
2017-09-21 19:17:42 -04:00
Jeffrey Walton
f763bf3da6
Updated comments
2017-09-21 12:08:54 -04:00
Jeffrey Walton
e78464a1af
Enable little endian Rijndael_UncheckedSetKey_POWER8 using built-ins
...
The problem was vec_sld is endian sensitive. The built-in required more than us setting up arguments to ensure the vsx load resulted in a big endian value. Thanks to Paul R on Stack Overflow for sharing the information that IBM did not provide. Also see http://stackoverflow.com/q/46341923/608639
2017-09-21 09:56:37 -04:00
Jeffrey Walton
c6b096ddd4
Move Rijndael_UncheckedSetKey_POWER8 prior to GetUserKey call
...
Arg... GetUserKey was performing a 32-bit word reverse. It was part of the problem on little endian machines
2017-09-21 01:08:44 -04:00
Jeffrey Walton
9fd5d023f9
Load r5 mask once for key expansion
2017-09-20 20:27:58 -04:00
Jeffrey Walton
35c0fa82fd
Use <time.h> for Borland/Embarcadero (GH #512 )
2017-09-20 18:10:07 -04:00
Jeffrey Walton
c5a427d690
Add PowerPC VectorLoadKeyUnaligned for AES-192
...
Make internal functions static. We get better optimizations depsice using unnamed namespaces
Add PowerPC uint32x4 functions for handling 32-bit rcon and mask
2017-09-20 08:57:53 -04:00
Jeffrey Walton
c94d076aa1
Move r1 write to caller; remove from Rijndael_Subkey_POWER8
...
Signed-off-by: Jeffrey Walton <noloader@gmail.com>
2017-09-20 04:38:53 -04:00
Jeffrey Walton
5159d0803d
Add Power8 key expansion for big endian
...
This is AES-128 key expansion for big endian. Little endian has a bug in it so it can't be enabled at the moment. GDB is acting up on GCC112, so I've had trouble investigating it
2017-09-20 03:34:54 -04:00
Jeffrey Walton
6102333fc3
Add CRYPTOPP_NO_CPU_FEATURE_PROBES (GH #511 )
...
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
2017-09-19 21:08:37 -04:00
Jeffrey Walton
6440921723
Add Rijndael_UncheckedSetKey_POWER8
...
We are going to attempt to perform key setup using Power8 in-core vector instructions
2017-09-19 04:55:15 -04:00
Jeffrey Walton
923cf95571
ByteReverseArray → ReverseByteArrayLE
2017-09-18 18:40:19 -04:00
Jeffrey Walton
2c18fe8af8
Refactor LoadT() and StoreT(). Add separate ReverseT() for little endian machines
...
The refactoring has no effect on little endian machines. However, on big endian GCC119 using GCC 7.1 the performance improved by 2.5x for ECB and CTR modes:
BEFORE:
<TR><TH>AES/CTR (128-bit key)<TD>2723<TD>1.4<TD>0.163<TD>670
<TR><TH>AES/CTR (192-bit key)<TD>2560<TD>1.5<TD>0.175<TD>719
<TR><TH>AES/CTR (256-bit key)<TD>2728<TD>1.4<TD>0.183<TD>749
<TR><TH>AES/CBC (128-bit key)<TD>1204<TD>3.2<TD>0.135<TD>554
<TR><TH>AES/CBC (192-bit key)<TD>1066<TD>3.7<TD>0.148<TD>605
<TR><TH>AES/CBC (256-bit key)<TD>948<TD>4.1<TD>0.155<TD>635
<TR><TH>AES/OFB (128-bit key)<TD>1019<TD>3.8<TD>0.158<TD>648
<TR><TH>AES/CFB (128-bit key)<TD>949<TD>4.1<TD>0.192<TD>787
<TR><TH>AES/ECB (128-bit key)<TD>3564<TD>1.1<TD>0.082<TD>337
AFTER:
<TR><TH>AES/CTR (128-bit key)<TD>6484<TD>0.6<TD>0.163<TD>677
<TR><TH>AES/CTR (192-bit key)<TD>5641<TD>0.7<TD>0.176<TD>728
<TR><TH>AES/CTR (256-bit key)<TD>5005<TD>0.8<TD>0.183<TD>761
<TR><TH>AES/CBC (128-bit key)<TD>1223<TD>3.2<TD>0.135<TD>559
<TR><TH>AES/CBC (192-bit key)<TD>1080<TD>3.7<TD>0.147<TD>611
<TR><TH>AES/CBC (256-bit key)<TD>966<TD>4.1<TD>0.155<TD>642
<TR><TH>AES/OFB (128-bit key)<TD>1057<TD>3.7<TD>0.158<TD>656
<TR><TH>AES/CFB (128-bit key)<TD>1217<TD>3.3<TD>0.186<TD>774
<TR><TH>AES/ECB (128-bit key)<TD>7289<TD>0.5<TD>0.082<TD>342
2017-09-18 18:15:25 -04:00
Jeffrey Walton
f0c2324f6b
Fix armeabi and armv7-a for Android (GH #509 )
2017-09-17 20:07:53 -04:00
Jeffrey Walton
adea69ab68
Avoid increment during stores of 6x blocks
...
This provides another 0.1 cpb with GCC
2017-09-14 21:06:44 -04:00
Jeffrey Walton
25efb7a140
Use 6x blocks for ARMv8 AES rather than 4x
...
We gain 0.1 to 0.3 cpb, depending on the mode
2017-09-14 20:32:06 -04:00
Jeffrey Walton
58890ff053
Use 6x blocks for Power8 AES rather than 4x
...
Perforamnce increased for all modes when performing 6x vs 4x. 8x and 12x performed worse.
Here are the numbers:
4x Blocks:
<TR><TH>AES/CTR (128-bit key)<TD>1563<TD>2.1<TD>0.409<TD>1392
<TR><TH>AES/CTR (192-bit key)<TD>1403<TD>2.3<TD>0.450<TD>1529
<TR><TH>AES/CTR (256-bit key)<TD>1280<TD>2.5<TD>0.482<TD>1639
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.6<TD>0.359<TD>1222
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>6.3<TD>0.394<TD>1339
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.8<TD>0.432<TD>1469
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>6.1<TD>0.402<TD>1368
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.8<TD>0.461<TD>1568
<TR><TH>AES/ECB (128-bit key)<TD>1829<TD>1.8<TD>0.240<TD>817
6x Blocks:
<TR><TH>AES/CTR (128-bit key)<TD>1750<TD>1.7<TD>0.406<TD>1300
<TR><TH>AES/CTR (192-bit key)<TD>1638<TD>1.9<TD>0.447<TD>1432
<TR><TH>AES/CTR (256-bit key)<TD>1528<TD>2.0<TD>0.482<TD>1541
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.2<TD>0.358<TD>1145
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>5.9<TD>0.394<TD>1260
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.4<TD>0.431<TD>1379
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>5.7<TD>0.400<TD>1281
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.4<TD>0.461<TD>1476
<TR><TH>AES/ECB (128-bit key)<TD>1950<TD>1.6<TD>0.238<TD>763
2017-09-14 16:07:21 -04:00
Jeffrey Walton
08e4ee422e
Avoid increment during stores of 4x blocks
...
This provides another 0.1 cpb with GCC
2017-09-14 15:12:07 -04:00
Jeffrey Walton
ddeae859d0
Use vec_xl_be and vec_xst_be for IBM XL C/C++ compiler
2017-09-14 13:27:49 -04:00
Jeffrey Walton
63a0af4efa
Fix endianess for s_one on ARM big-endian
2017-09-13 22:52:29 -04:00
Jeffrey Walton
8e52ce6dd2
Load correct value fo 1 under ARM big endian
2017-09-13 21:42:15 -04:00
Jeffrey Walton
c22507e38b
Clear unused variable warnings under Clang
2017-09-13 21:37:55 -04:00
Jeffrey Walton
8d98417306
Add Aarch64 specific defines to Android cross-compile
...
Move <arm_acle.h> logic into "sonfig.h". Detecting when we can/should include <arm_acle.h> is proving to be troublesome
2017-09-13 17:16:57 -04:00