Use PowerPC unaligned loads and stores with Power8. Formerly we were using Power7 as the floor because the IBM POWER Architecture manuals said unaligned loads and stores were available. However, some compilers generate bad code for unaligned loads and stores using `-march=power7`, so bump to a known good.
GCM can do some bulk XOR's using the SIMD unit. However, we still need loads and stores to be fast. Fast loads and stores of unaligned data requires the VSX unit
Autotools sets up its config.h file with the '#define XXX 0' or '#define XXX 1' pattern. This check-in makes the sources Autotools aware. We need to verify CMake does the same
Also use CRYPTOPP_DISABLE_XXX_ASM consistently. The pattern is needed for Clang which still can't compile Intel assembly language. Also see http://llvm.org/bugs/show_bug.cgi?id=24232.
GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL work as expected on Linux (ppc64-le) and AIX (ppc64-be). We are still working on GCM_AuthenticateBlocks_VMULL.
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish this god damn compiler would stop pretending to be other compilers when it can't consume the same program. Even the GCC devs have told the LLVM devs to stop ding that crap
Clang causes too many problems. Early versions of the compiler simply crashes. Later versions of the compiler still have trouble with Intel ASM and still produce incorrect results on occassion. Additionally, we have to special case the integrated assemvler. Its making a mess of the code and causing self test failures
The macros that invoke GCC inline ASM have better code generation and speedup GCM ops by about 70 MiB/s on an Opteron 1100. The intrinsics are still available for Windows platforms and Visual Studio 2017 and above
It appears Apple Clang disgorges carryless multiply (PMULL) from Crypto (AES and SHA). The breakout added CRYPTOPP_BOOL_ARM_PMULL_INTRINSICS_AVAILABLE for PMULL, and retained CRYPTOPP_BOOL_ARM_CRYPTO_INTRINSICS_AVAILABLE for AES and SHA only
trap.h and CRYPTOPP_ASSERT has existed for over a year in Master. We deferred on the cut-over waiting for a minor version bump (5.7). We have to use it now due to CVE-2016-7420