We got reports that x86_64 was producing incorrect results. Also, the problem persisted in i386 builds. I don't think we can work around this issue. Oracle must fix it.
This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
We were able to gut CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS for everything except Rijndael. Rijndael uses unaligned accesses on x86 to harden against timing attacks.
There's a little more to CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS and Rijndael. If we remove unaligned access then AliasedWithTable hangs in an endless loop on non-AESNI machines. So care must be taken when trying to remove the vestige from Rijndael.
Visual Studio 2008 kind of forced out hand with this. VS2008 lacks <stdint.h> and <cstdint> and it caused compile problems in NaCl gear. We were being a tad bit lazy by relying on int8_t, int32_t and int64_t, but the compiler errors made us act
Perforance increases significantly, but there's still room for improvement. Even OpenSSL's numbers are relatively dull. We expect Power8's SHA-256 to be somewhere between 2 to 8 cpb but we are not hitting them.
SHA-256, GCC112 (ppc64-le): C++ 23.43, Power8 13.24 cpb (+ 110 MiB/s)
SHA-256, GCC119 (ppc64-be): C++ 10.16, Power8 9.74 cpb (+ 50 MiB/s)
SHA-512, GCC112 (ppc64-le): C++ 14.00, Power8 9.25 cpb (+ 150 MiB/s)
SHA-512, GCC119 (ppc64-be): C++ 21.05, Power8 6.17 cpb (+ 450 MiB/s)
$ CXXFLAGS=-std=gnu++17 gmake
clang++ -std=gnu++17 -fPIC -pthread -pipe -c cryptlib.cpp
In file included from cryptlib.cpp:19:
./misc.h:2542:43: error: no member named 'bind2nd' in namespace 'std'
return std::find_if(first, last, std::bind2nd(std::not_equal_to<T>(), value));
~~~~~^
1 error generated.
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
Well, the IBM docs were not quite correct when they stated "The block is aligned so that it can be used for any type of data". The vector data types are pretty standard, even across different machines from diffent manufacturers
TweetNaCl is a compact reimplementation of the NaCl library by Daniel J. Bernstein, Bernard van Gastel, Wesley Janssen, Tanja Lange, Peter Schwabe and Sjaak Smetsers. The library is less than 20 KB in size and provides 25 of the NaCl library functions.
The compact library uses curve25519, XSalsa20, Poly1305 and SHA-512 as default primitives, and includes both x25519 key exchange and ed25519 signatures. The complete list of functions can be found in TweetNaCl: A crypto library in 100 tweets (20140917), Table 1, page 5.
Crypto++ retained the function names and signatures but switched to data types provided by <stdint.h> to promote interoperability with Crypto++ and avoid size problems on platforms like Cygwin. For example, NaCl typdef'd u64 as an unsigned long long, but Cygwin, MinGW and MSYS are LP64 systems (not LLP64 systems). In addition, Crypto++ was missing NaCl's signed 64-bit integer i64.
Crypto++ enforces the 0-key restriction due to small points. The TweetNaCl library allowed the 0-keys to small points. Also see RFC 7748, Elliptic Curves for Security, Section 6.
TweetNaCl is well written but not well optimized. It runs 2x to 3x slower than optimized routines from libsodium. However, the library is still 2x to 4x faster than the algorithms NaCl was designed to replace.
The Crypto++ wrapper for TweetNaCl requires OS features. That is, NO_OS_DEPENDENCE cannot be defined. It is due to TweetNaCl's internal function randombytes. Crypto++ used DefaultAutoSeededRNG within randombytes, so OS integration must be enabled. You can use another generator like RDRAND to avoid the restriction.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
Both BLAKE2 and SPECK slow down when using NEON/ASIMD. When just BLAKE2 experienced the issue, it was a one-off problem. Its now wider than a one-off, so add the formal define
* Set default target Windows version for MinGW to XP
The original MinGW from mingw.org targets Windows 2000 by default, but lacks
the <wspiapi.h> include needed for Windows 2000 support.
* Disable CRYPTOPP_CXX11_SYNCHRONIZATION for original MinGW
std::mutex is only available in libstdc++ if _GLIBCXX_HAS_GTHREADS is defined,
which is not the case for original MinGW. Make the existing fix for AIX more
general to fix this. Unfortunately, any C++ header has to be included to
detect the standard library and the otherwise empty <ciso646> is going to be
removed from C++20, so use <cstddef> instead.
This cleans up the compile on old PwerMac G5's. Our Altivec and Crypto code relies on Power7 and Power8 extensions. There's no need to shoehorn Altivec and Power4 into old platforms, so we disable Altivec and Crypto unless Power7 is available. The GNUmakefile sets CRYPTOPP_DISABLE_ALTIVEC if Power7 is not available.
Enable aligned allocations under IBM XL C/C++. Based on the AIX malloc man pages, "... the block is aligned so that it can be used for any type of data". Previously CRYPTOPP_NO_ALIGNED_ALLOC was in effect.
Use malloc instead of calloc on OS X. Based on the OS X malloc man pages, "... the allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types". Additionally, calloc zero'd the memory it allocated which slowed things down on Apple systems.
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish GCC would get its head out of its ass and define the apprpriate defines. NEON/ASIMD cannot be disgorged from Aarch32/Aarch64 just like SSE2 cannot be disgorged from x86_64. They are core instruction sets
CRYPTOPP_NO_UNALIGNED_DATA_ACCESS was required in Crypto++ 5.6 and earlier because unaligned data access was the norm. It caused problems at -O3 and on ARM NEON.
At Crypto++ 6.0 no unaligned data access became a first class citizen. Folks who want to allow it must now define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
When compiling with Visual Studio 2015+, Crypto++ uses CryptoNG by
default. CryptoNG is only available on Windows Vista and later and
Crypto++ currently ignores if the user explicitly wants to target
Windows XP. Unlike with other Windows SDK features, everything
compiles, but the application doesn't start on Windows XP because
bcrypt.dll is missing. That is an issue when updating Visual Studio
because the root cause is hard to find.
Making use of CryptoNG when targeting Windows 8+ instead by default,
regardless of the Visual Studio version, to fix this.
This reverts commit eb3b27a6a5. The change broke GCC 4.8 and unknown version of Clang on OS X. UB reported the OS X break, and JW found duplicated the break on a ARM CubieTruck with GCC 4.8.
Most of these appear to have been cleared over the last couple of years.
C4127 is too prevelant. We are probably going to have to live with it.
We may be able to clear C4250 with a using statement. For example 'using ASN1CryptoMaterial::Load'.
MSVC resisted clearing C4661 by pushing/poping in iterhash.h and osrng.h. It was like MSVC simply ignored it.
Tune CRYPTOPP_ENABLE_ARIA_SSE2_INTRINSICS and CRYPTOPP_ENABLE_ARIA_SSSE3_INTRINSICS macro for older GCC and Clang. Clang needs some more tuning on Aarch64 becuase performance is off by about 15%.
Add additional NEON code paths.
Remove keyBits from Aarch64 code paths.
The SSSE3 intrinsics were performing aligned loads using _mm_load_si128 using user supplied pointers. The pointers are only a byte pointer, so its alignment can drop to 1 or 2. Switching to _mm_loadu_si128 will sidestep potential problems. The crash surfaced under Win32 testing.
Switch to memcpy's when performing bulk assignment x[0]=y[0] ... x[3]=y[3]. I believe Yun used the pattern to promote vectorization. Some compilers appear to be braindead and issue integer move's one word at a time. Non-braindead compiler will still take the optimization when advantageous, and slower compilers will benefit from the bulk move. We also cherry picked vectorization opportunities, like in ARIA_GSRK_NEON.
Remove keyBits variable. We now use UncheckedSetKey's keylen throughout.
Also fix a typo in CRYPTOPP_BOOL_SSSE3_INTRINSICS_AVAILABLE. __SSSE3__ was listed twice.