Autotools sets up its config.h file with the '#define XXX 0' or '#define XXX 1' pattern. This check-in makes the sources Autotools aware. We need to verify CMake does the same
Also use CRYPTOPP_DISABLE_XXX_ASM consistently. The pattern is needed for Clang which still can't compile Intel assembly language. Also see http://llvm.org/bugs/show_bug.cgi?id=24232.
We were able to gut CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS for everything except Rijndael. Rijndael uses unaligned accesses on x86 to harden against timing attacks.
There's a little more to CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS and Rijndael. If we remove unaligned access then AliasedWithTable hangs in an endless loop on non-AESNI machines. So care must be taken when trying to remove the vestige from Rijndael.
Move m_aliasBlock into Rijndael::Base. m_aliasBlock is now an extra data member for Dec because the aliased table is only used for Enc when unaligned data access is in effect. However, the SecBlock is not allocated in the Dec class so there is no runtime penalty.
Moving m_aliasBlock into Base also allowed us to remove the Enc::Enc() constructor, which always appeared as a wart in my eyes. Now m_aliasBlock is sized in UncheckedSetKey, so there's no need for the ctor initialization.
Also see https://stackoverflow.com/q/46561818/608639 on Stack Overflow. The SO question had an unusual/unexpected interaction with CMake, so the removal of the Enc::Enc() ctor should help the problem.
The problem was vec_sld is endian sensitive. The built-in required more than us setting up arguments to ensure the vsx load resulted in a big endian value. Thanks to Paul R on Stack Overflow for sharing the information that IBM did not provide. Also see http://stackoverflow.com/q/46341923/608639
This increases performance to about 1.6 cpb. We are about 0.5 cpb behind Botan, and about 1.0 cpb behind OpenSSL. However, it beats the snot out of C/C++, which runs at 20 to 30 cpb
This is the forward direction on encryption only. Crypto++ uses the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. The library library will need to re-work the decryption key scheduling routines. (We may be able to work around it another way, but I have not investigated it).
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit adds debug asserts to warn of under-aligned and misaligned buffers in debug builds.
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit provides the data alignment requirements.
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
Reworked SHA class internals to align all the implementations. Formerly all hashes were software based, IterHashBase handled endian conversions, IterHashBase repeatedly called the single block SHA{N}::Transform. The rework added SHA{N}::HashMultipleBlocks, and the SHA classes attempt to always use it.
Now SHA{N}::Transform calls into SHA{N}_HashMultipleBlocks, which is a free standing function. An added wrinkle is hardware wants little endian data and software presents big endian data, so HashMultipleBlocks accepts a ByteOrder for the incoming data. Hardware based SHA{N}_HashMultipleBlocks can often perform the endian swap much easier by setting an EPI mask so it was profitable to defer to hardware when available.
The rework also removed the hacked-in pointers to implementations. The class now looks more like AES, GCM, etc.
When MSVC init_seg or GCC init_priority is available, we don't need to use the Singleton. We only need to create a file scope class variable and place it in the segment for MSVC or provide the attribute for GCC.
An additional upside is we cleared all the memory leaks that used to be reported by MSVC for debug builds.