The Crypto++ functions follow IBM's lead and provide VectorLoad, VectorLoadBE, VectorStore, and VectorStoreBE. Additionally, VectorLoadKey was removed in favor of vanilla VectorLoad.
Fix two compilation errors encountered with C++Builder (Starter Edition):
- In `cpu.cpp`, 0ccdc197b introduced a dependency on `_xgetbv()` from `<immintrin.h>` that doesn't exist on C++Builder. Enlist it for the workaround, similar to SunCC in 692ed2a2b.
- In `adv-simd.h`, `<pmmintrin.h>` is being #included under the `CRYPTOPP_SSE2_INTRIN_AVAILABLE` macro. This header, [which apparently provides SSE3 intrinsics](https://stackoverflow.com/a/11228864/1433768), is not shipped with C++Builder. (This section of code was recently downgraded from a SSSE3 to a SSE2 block in 09c8ae28, followed by moving away from `<immintrin.h>` in bc8da71a, followed by reintroducing the SSSE3 check in d1e646a5.) Split the SSE2 and SSSE3 cases such that `<pmmintrin.h>` is not #included for SSE2. This seems safe to do, because some `git grep` analysis shows that:
- `adv-simd.h` is not #included by any other header, but only directly #included by some `.cpp` files.
- Among those `.cpp` files, only `sm4-simd.cpp` has a `CRYPTOPP_SSE2_INTRIN_AVAILABLE` preprocessor block, and there it again includes the other two headers (`<emmintrin.h>` and `<xmmintrin.h>`).
NOTE: I was compiling via the IDE after [setting up a project file](https://github.com/tanzislam/cryptopals/wiki/Importing-into-Embarcadero-C%E2%94%BC%E2%94%BCBuilder-Starter-10.2#using-the-crypto-library). My compilation command was effectively:
```
bcc32c.exe -DCRYPTOPP_NO_CXX11 -DCRYPTOPP_DISABLE_SSSE3 -D__SSE2__ -D__SSE__ -D__MMX__
```
This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
This was needed a while ago but we mostly side-stepped the issues with casts. CHAM64 uses a word16 type for subkeys and a cast won't fix it because we favor word32 for 64-bit block sizes.
The ALTIVEC function required an inline declaration. Lack of inline caused the self test failure. Two NEON functions needed the same. We also cleaned up constants in unnamed namespaces
Analysis tools are generating findings when the pointer xorBlocks is used as the flag. The other missing piece is, xorBlocks is never NULL when either BT_XorInput or BT_XorOuput. But we don't know how to train the analyzers with the information, so we make it explicit with the boolean flags xorInput and xorOutput.
Switching to the explicit flags costs us about 0.01 cpb on a modern Intel Core processor. In the typical case 0.01 is negligible.
We were cating UBsan findings under Clang similar to "adv-simd.h:1138:26: runtime error: addition of unsigned offset to 0x000002d41410 overflowed to 0x000002d41400". The problem was CRYPTOPP_CONSTANT, which used an enum. The compiler is allowed to pick the underlying data type, and Clang was picking a signed type
This also fixes the SPECK64 bug where CTR mode self tests fail. It was an odd failure because it only affected 64-bit SPECK. SIMON was fine and it used nearly the same code. We tracked it down through trial and error to the table based rotates.