Commit Graph

21 Commits

Author SHA1 Message Date
Jeffrey Walton
565bd844fc
Clear GCC -Wcast-align warnings on ARM
The buffers and workspaces are aligned
2018-01-20 19:39:49 -05:00
Jeffrey Walton
a074722bfa
Switch to rotlConstant and rotrConstant
This will help Clang and its need for a constexpr
2017-11-25 02:52:19 -05:00
Jeffrey Walton
69c8a4f9c6
Prefix IS_LITTLE_ENDIAN and IS_BIG_ENDIAN with CRYPTOPP 2017-11-10 14:15:30 -05:00
Jeffrey Walton
a9534a7cf3
Use CRYPTOPP_SSE2_INTRIN_AVAILABLE for consistent naming 2017-08-18 02:11:41 -04:00
Jeffrey Walton
e2c377effd Split source files to support Base Implementation + SIMD implementation (GH #461)
Split source files to support Base Implementation + SIMD implementation
2017-08-17 12:33:43 -04:00
Jeffrey Walton
bc40d36075
Fixed ARIA self test failures under SunCC 2017-05-22 04:34:57 -04:00
Jeffrey Walton
1543649ead
Cleanup ARIA typedefs 2017-04-28 21:35:55 -04:00
Jeffrey Walton
dad532cb4b
Remove stdio.h header 2017-04-16 13:23:27 -04:00
Jeffrey Walton
1d1a150737
Avoid extra loads of workspace variables 2017-04-16 13:00:45 -04:00
Jeffrey Walton
ddc0f3a899
Switch to Put and Get blocks. Remove unneeded macros 2017-04-16 08:06:20 -04:00
Jeffrey Walton
b081f7c634
Use full S1 table for timing attack counter measures
Change stride to cache line size divided by word size based on Yun's 32-bit word implementation
2017-04-14 06:24:54 -04:00
Jeffrey Walton
70cf88f230
Apply S-box timing attack counter measures to ARIA
The ARIA S-boxes could leak timining information. This commit applies the counter measures present in Rijndael and Camellia to ARIA. We take a penalty of about 0.05 to 0.1 cpb. It equates to about 0 MiB/s on an ARM device, and about 2 MiB/s on a modern Skylake.

We recently gained some performance though use of SSE and NEON in ProcessAndXorBlock, so the net result is an improvement.
2017-04-13 17:46:51 -04:00
Jeffrey Walton
65c3c63b52
Breakout and cleanup macros. Add CRYPTOPP_ENABLE_ARIA_SSE2_INTRINSICS, CRYPTOPP_ENABLE_ARIA_SSSE3_INTRINSICS and CRYPTOPP_ENABLE_ARIA_NEON_INTRINSICS.
Tune CRYPTOPP_ENABLE_ARIA_SSE2_INTRINSICS and CRYPTOPP_ENABLE_ARIA_SSSE3_INTRINSICS macro for older GCC and Clang. Clang needs some more tuning on Aarch64 becuase performance is off by about 15%.

Add additional NEON code paths.

Remove keyBits from Aarch64 code paths.
2017-04-13 17:45:58 -04:00
Jeffrey Walton
04908cca48
Improve x86 and x64 ARIA performance
The changes were meant to improve Windows, but GCC benefited more. Windows gained 0.3 cpb, while GCC gained 1.2 cpb
2017-04-13 06:52:56 -04:00
Jeffrey Walton
35f95fb739
Fix unaligned pointer crash on Win32 due to _mm_load_si128
The SSSE3 intrinsics were performing aligned loads using _mm_load_si128 using user supplied pointers. The pointers are only a byte pointer, so its alignment can drop to 1 or 2. Switching to _mm_loadu_si128 will sidestep potential problems. The crash surfaced under Win32 testing.

Switch to memcpy's when performing bulk assignment x[0]=y[0] ... x[3]=y[3]. I believe Yun used the pattern to promote vectorization. Some compilers appear to be braindead and issue integer move's one word at a time. Non-braindead compiler will still take the optimization when advantageous, and slower compilers will benefit from the bulk move. We also cherry picked vectorization opportunities, like in ARIA_GSRK_NEON.

Remove keyBits variable. We now use UncheckedSetKey's keylen throughout.

Also fix a typo in CRYPTOPP_BOOL_SSSE3_INTRINSICS_AVAILABLE. __SSSE3__ was listed twice.
2017-04-13 04:28:02 -04:00
Jeffrey Walton
59767be52e
Add Intel and ARM intrinsics
Win32 and Win64 benefited from the Intel intrinsics. A32 and Aarch64 benefited from the ARM intrinsics. The intrinsics shaved 150 to 350 cycles from key setup.

The intrinsics slowed modern GCC down a small bit, and did not appear to affect old GCC. As such, Intel intrinsics were only enabled for Microsoft compilers.

We were not able to improve encryption and decryption. In fact, some of the attempted macro conversions and intrinsics attempts slowed things down considerably. For example, GCC 5.4 on x86_64 went from 120 MB/s to about 70 MB/s when we tried to improve code around the Key XOR Layer (ARIA_KXL).
2017-04-12 23:28:41 -04:00
Jeffrey Walton
f44e705c16 Add NEON intrinsics for ARIA_GSRK_NEON
Update documentation
2017-04-12 12:15:32 -04:00
Jeffrey Walton
af561758df
Rework ARIA_GSRK to have MSVC generate "rotate imm" rather than "rot reg"
The immediate version of rotate can be 4 to 6 times faster than the register version
2017-04-11 20:47:54 -04:00
Jeffrey Walton
d6b295203b
Additional library integration for ARIA 2017-04-11 16:19:36 -04:00
Jeffrey Walton
0d742591e0
Switch to code based on 32-bit implementation
The 32-bit code is based on Aaram Yun's code. Yun's code combined with a few library specific tweaks improves performance to roughly Camellia.
2017-04-11 11:39:45 -04:00
Jeffrey Walton
8ca0f47939
Add ARIA block cipher
This is the reference implementation, test data and test vectors from the ARIA.zip package on the KISA website. The website is located at http://seed.kisa.or.kr/iwt/ko/bbs/EgovReferenceList.do?bbsId=BBSMSTR_000000000002.

We have optimized routines that improve Key Setup and Bulk Encryption performance, but they are not being checked-in at the moment. The ARIA team is updating its implementation for contemporary hardware and we would like to use it as a starting point before we wander too far away from the KISA implementation.
2017-04-10 10:52:40 -04:00