We need to ensure SSE2 does not cross pollinate into other CPU functions since SSE2 is greater than the minimum arch. The minimum arch is i586/i686, and both lack SSE2 instructions
This avoids the probe for SSE2 in most circumstances. The SSE2 test is mostly benign nowadays since SSE2 and OS support is nearly ubiquitous. But the define CRYPTOPP_NO_CPU_FEATURE_PROBES added for Apple OSes was interacting badly on x86 machines. Also see GH #511.
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The inline ASM code now uses local variables to save the EAX-EDX registers, and then copies the locals into the function parameters. It side steps problems with calling conventions
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish GCC would get its head out of its ass and define the apprpriate defines. NEON/ASIMD cannot be disgorged from Aarch32/Aarch64 just like SSE2 cannot be disgorged from x86_64. They are core instruction sets
Wrap DetectArmFeatures and DetectX86Features in InitializeCpu class
Use init_priority for InitializeCpu
Remove HAVE_GCC_CONSTRUCTOR1 and HAVE_GCC_CONSTRUCTOR0
Use init_seg(<name>) on Windows and explicitly insert at XCU segment
Simplify logic for HAVE_GAS
Remove special recipies for MACPORTS_GCC_COMPILER
Move C++ static initializers into anonymous namespace when possible
Add default NullNameValuePairs ctor for Clang
When MSVC init_seg or GCC init_priority is available, we don't need to use the Singleton. We only need to create a file scope class variable and place it in the segment for MSVC or provide the attribute for GCC.
An additional upside is we cleared all the memory leaks that used to be reported by MSVC for debug builds.
The macros that invoke GCC inline ASM have better code generation and speedup GCM ops by about 70 MiB/s on an Opteron 1100. The intrinsics are still available for Windows platforms and Visual Studio 2017 and above
It appears Apple Clang disgorges carryless multiply (PMULL) from Crypto (AES and SHA). The breakout added CRYPTOPP_BOOL_ARM_PMULL_INTRINSICS_AVAILABLE for PMULL, and retained CRYPTOPP_BOOL_ARM_CRYPTO_INTRINSICS_AVAILABLE for AES and SHA only
This should not be needed, but it does not hurt. According to Ian Lance Taylor (http://gcc.gnu.org/ml/gcc-help/2014-02/msg00023.html), the CC clobber causes GCC to forget its internal representation of flag state. It should not be needed for cpuid. However, Clang has some odd behave in a couple of versions of its compiler when using cpuid. Both JW and UB experienced it on separate occassions.