This cleans up the compile on old PwerMac G5's. Our Altivec and Crypto code relies on Power7 and Power8 extensions. There's no need to shoehorn Altivec and Power4 into old platforms, so we disable Altivec and Crypto unless Power7 is available. The GNUmakefile sets CRYPTOPP_DISABLE_ALTIVEC if Power7 is not available.
Enable aligned allocations under IBM XL C/C++. Based on the AIX malloc man pages, "... the block is aligned so that it can be used for any type of data". Previously CRYPTOPP_NO_ALIGNED_ALLOC was in effect.
Use malloc instead of calloc on OS X. Based on the OS X malloc man pages, "... the allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types". Additionally, calloc zero'd the memory it allocated which slowed things down on Apple systems.
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish GCC would get its head out of its ass and define the apprpriate defines. NEON/ASIMD cannot be disgorged from Aarch32/Aarch64 just like SSE2 cannot be disgorged from x86_64. They are core instruction sets
CRYPTOPP_NO_UNALIGNED_DATA_ACCESS was required in Crypto++ 5.6 and earlier because unaligned data access was the norm. It caused problems at -O3 and on ARM NEON.
At Crypto++ 6.0 no unaligned data access became a first class citizen. Folks who want to allow it must now define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
When compiling with Visual Studio 2015+, Crypto++ uses CryptoNG by
default. CryptoNG is only available on Windows Vista and later and
Crypto++ currently ignores if the user explicitly wants to target
Windows XP. Unlike with other Windows SDK features, everything
compiles, but the application doesn't start on Windows XP because
bcrypt.dll is missing. That is an issue when updating Visual Studio
because the root cause is hard to find.
Making use of CryptoNG when targeting Windows 8+ instead by default,
regardless of the Visual Studio version, to fix this.
This reverts commit eb3b27a6a5. The change broke GCC 4.8 and unknown version of Clang on OS X. UB reported the OS X break, and JW found duplicated the break on a ARM CubieTruck with GCC 4.8.
Most of these appear to have been cleared over the last couple of years.
C4127 is too prevelant. We are probably going to have to live with it.
We may be able to clear C4250 with a using statement. For example 'using ASN1CryptoMaterial::Load'.
MSVC resisted clearing C4661 by pushing/poping in iterhash.h and osrng.h. It was like MSVC simply ignored it.
Tune CRYPTOPP_ENABLE_ARIA_SSE2_INTRINSICS and CRYPTOPP_ENABLE_ARIA_SSSE3_INTRINSICS macro for older GCC and Clang. Clang needs some more tuning on Aarch64 becuase performance is off by about 15%.
Add additional NEON code paths.
Remove keyBits from Aarch64 code paths.
The SSSE3 intrinsics were performing aligned loads using _mm_load_si128 using user supplied pointers. The pointers are only a byte pointer, so its alignment can drop to 1 or 2. Switching to _mm_loadu_si128 will sidestep potential problems. The crash surfaced under Win32 testing.
Switch to memcpy's when performing bulk assignment x[0]=y[0] ... x[3]=y[3]. I believe Yun used the pattern to promote vectorization. Some compilers appear to be braindead and issue integer move's one word at a time. Non-braindead compiler will still take the optimization when advantageous, and slower compilers will benefit from the bulk move. We also cherry picked vectorization opportunities, like in ARIA_GSRK_NEON.
Remove keyBits variable. We now use UncheckedSetKey's keylen throughout.
Also fix a typo in CRYPTOPP_BOOL_SSSE3_INTRINSICS_AVAILABLE. __SSSE3__ was listed twice.
Win32 and Win64 benefited from the Intel intrinsics. A32 and Aarch64 benefited from the ARM intrinsics. The intrinsics shaved 150 to 350 cycles from key setup.
The intrinsics slowed modern GCC down a small bit, and did not appear to affect old GCC. As such, Intel intrinsics were only enabled for Microsoft compilers.
We were not able to improve encryption and decryption. In fact, some of the attempted macro conversions and intrinsics attempts slowed things down considerably. For example, GCC 5.4 on x86_64 went from 120 MB/s to about 70 MB/s when we tried to improve code around the Key XOR Layer (ARIA_KXL).
Couple use of initialization priorities to no NO_OS_DEPENDENCE
Add comments explaining what integer does, how it does it, and why we want to inprove on the Singleton pattern as a resource manager.
Update documentation.
This effectively decouples Integer and Public Key from the rest of the library. The change means a compile time define is used rather than a runtime pointer. It avoids the race with Issue 389.
The Public Key algorithms will fail if you use them. For example, running the self tests with CRYPTOPP_NO_ASSIGN_TO_INTEGER in effect results in "CryptoPP::Exception caught: NameValuePairs: type mismatch for 'EquivalentTo', stored 'i', trying to retrieve 'N8CryptoPP7IntegerE'". The exception is expected, and the same happend when g_pAssignIntToInteger was present.
Wrap DetectArmFeatures and DetectX86Features in InitializeCpu class
Use init_priority for InitializeCpu
Remove HAVE_GCC_CONSTRUCTOR1 and HAVE_GCC_CONSTRUCTOR0
Use init_seg(<name>) on Windows and explicitly insert at XCU segment
Simplify logic for HAVE_GAS
Remove special recipies for MACPORTS_GCC_COMPILER
Move C++ static initializers into anonymous namespace when possible
Add default NullNameValuePairs ctor for Clang
When MSVC init_seg or GCC init_priority is available, we don't need to use the Singleton. We only need to create a file scope class variable and place it in the segment for MSVC or provide the attribute for GCC.
An additional upside is we cleared all the memory leaks that used to be reported by MSVC for debug builds.