This resolves failing AES-GCM tests on amd64 with GCC 11 on Linux
with SSE2 on and other CPU features off.
While here, remove unused r11 and clobber ebx unconditionally.
Co-authored-by: v1ne <v1ne2go@gmail.com>
Use PowerPC unaligned loads and stores with Power8. Formerly we were using Power7 as the floor because the IBM POWER Architecture manuals said unaligned loads and stores were available. However, some compilers generate bad code for unaligned loads and stores using `-march=power7`, so bump to a known good.
GCM can do some bulk XOR's using the SIMD unit. However, we still need loads and stores to be fast. Fast loads and stores of unaligned data requires the VSX unit
Autotools sets up its config.h file with the '#define XXX 0' or '#define XXX 1' pattern. This check-in makes the sources Autotools aware. We need to verify CMake does the same
Also use CRYPTOPP_DISABLE_XXX_ASM consistently. The pattern is needed for Clang which still can't compile Intel assembly language. Also see http://llvm.org/bugs/show_bug.cgi?id=24232.
GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL work as expected on Linux (ppc64-le) and AIX (ppc64-be). We are still working on GCM_AuthenticateBlocks_VMULL.
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish this god damn compiler would stop pretending to be other compilers when it can't consume the same program. Even the GCC devs have told the LLVM devs to stop ding that crap
Clang causes too many problems. Early versions of the compiler simply crashes. Later versions of the compiler still have trouble with Intel ASM and still produce incorrect results on occassion. Additionally, we have to special case the integrated assemvler. Its making a mess of the code and causing self test failures
The macros that invoke GCC inline ASM have better code generation and speedup GCM ops by about 70 MiB/s on an Opteron 1100. The intrinsics are still available for Windows platforms and Visual Studio 2017 and above
It appears Apple Clang disgorges carryless multiply (PMULL) from Crypto (AES and SHA). The breakout added CRYPTOPP_BOOL_ARM_PMULL_INTRINSICS_AVAILABLE for PMULL, and retained CRYPTOPP_BOOL_ARM_CRYPTO_INTRINSICS_AVAILABLE for AES and SHA only