This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
We were able to gut CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS for everything except Rijndael. Rijndael uses unaligned accesses on x86 to harden against timing attacks.
There's a little more to CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS and Rijndael. If we remove unaligned access then AliasedWithTable hangs in an endless loop on non-AESNI machines. So care must be taken when trying to remove the vestige from Rijndael.
Visual Studio 2008 kind of forced out hand with this. VS2008 lacks <stdint.h> and <cstdint> and it caused compile problems in NaCl gear. We were being a tad bit lazy by relying on int8_t, int32_t and int64_t, but the compiler errors made us act
Perforance increases significantly, but there's still room for improvement. Even OpenSSL's numbers are relatively dull. We expect Power8's SHA-256 to be somewhere between 2 to 8 cpb but we are not hitting them.
SHA-256, GCC112 (ppc64-le): C++ 23.43, Power8 13.24 cpb (+ 110 MiB/s)
SHA-256, GCC119 (ppc64-be): C++ 10.16, Power8 9.74 cpb (+ 50 MiB/s)
SHA-512, GCC112 (ppc64-le): C++ 14.00, Power8 9.25 cpb (+ 150 MiB/s)
SHA-512, GCC119 (ppc64-be): C++ 21.05, Power8 6.17 cpb (+ 450 MiB/s)
$ CXXFLAGS=-std=gnu++17 gmake
clang++ -std=gnu++17 -fPIC -pthread -pipe -c cryptlib.cpp
In file included from cryptlib.cpp:19:
./misc.h:2542:43: error: no member named 'bind2nd' in namespace 'std'
return std::find_if(first, last, std::bind2nd(std::not_equal_to<T>(), value));
~~~~~^
1 error generated.
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
Well, the IBM docs were not quite correct when they stated "The block is aligned so that it can be used for any type of data". The vector data types are pretty standard, even across different machines from diffent manufacturers
TweetNaCl is a compact reimplementation of the NaCl library by Daniel J. Bernstein, Bernard van Gastel, Wesley Janssen, Tanja Lange, Peter Schwabe and Sjaak Smetsers. The library is less than 20 KB in size and provides 25 of the NaCl library functions.
The compact library uses curve25519, XSalsa20, Poly1305 and SHA-512 as default primitives, and includes both x25519 key exchange and ed25519 signatures. The complete list of functions can be found in TweetNaCl: A crypto library in 100 tweets (20140917), Table 1, page 5.
Crypto++ retained the function names and signatures but switched to data types provided by <stdint.h> to promote interoperability with Crypto++ and avoid size problems on platforms like Cygwin. For example, NaCl typdef'd u64 as an unsigned long long, but Cygwin, MinGW and MSYS are LP64 systems (not LLP64 systems). In addition, Crypto++ was missing NaCl's signed 64-bit integer i64.
Crypto++ enforces the 0-key restriction due to small points. The TweetNaCl library allowed the 0-keys to small points. Also see RFC 7748, Elliptic Curves for Security, Section 6.
TweetNaCl is well written but not well optimized. It runs 2x to 3x slower than optimized routines from libsodium. However, the library is still 2x to 4x faster than the algorithms NaCl was designed to replace.
The Crypto++ wrapper for TweetNaCl requires OS features. That is, NO_OS_DEPENDENCE cannot be defined. It is due to TweetNaCl's internal function randombytes. Crypto++ used DefaultAutoSeededRNG within randombytes, so OS integration must be enabled. You can use another generator like RDRAND to avoid the restriction.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
Both BLAKE2 and SPECK slow down when using NEON/ASIMD. When just BLAKE2 experienced the issue, it was a one-off problem. Its now wider than a one-off, so add the formal define