Some comments in config.h were old. Time for a refresh.
Switch from CRYPTOPP_BOOL_ARM64 to CRYPTOPP_BOOL_ARMV8. Aarch32 is ARMv8, and that's the important part.
Fix two compilation errors encountered with C++Builder (Starter Edition):
- In `cpu.cpp`, 0ccdc197b introduced a dependency on `_xgetbv()` from `<immintrin.h>` that doesn't exist on C++Builder. Enlist it for the workaround, similar to SunCC in 692ed2a2b.
- In `adv-simd.h`, `<pmmintrin.h>` is being #included under the `CRYPTOPP_SSE2_INTRIN_AVAILABLE` macro. This header, [which apparently provides SSE3 intrinsics](https://stackoverflow.com/a/11228864/1433768), is not shipped with C++Builder. (This section of code was recently downgraded from a SSSE3 to a SSE2 block in 09c8ae28, followed by moving away from `<immintrin.h>` in bc8da71a, followed by reintroducing the SSSE3 check in d1e646a5.) Split the SSE2 and SSSE3 cases such that `<pmmintrin.h>` is not #included for SSE2. This seems safe to do, because some `git grep` analysis shows that:
- `adv-simd.h` is not #included by any other header, but only directly #included by some `.cpp` files.
- Among those `.cpp` files, only `sm4-simd.cpp` has a `CRYPTOPP_SSE2_INTRIN_AVAILABLE` preprocessor block, and there it again includes the other two headers (`<emmintrin.h>` and `<xmmintrin.h>`).
NOTE: I was compiling via the IDE after [setting up a project file](https://github.com/tanzislam/cryptopals/wiki/Importing-into-Embarcadero-C%E2%94%BC%E2%94%BCBuilder-Starter-10.2#using-the-crypto-library). My compilation command was effectively:
```
bcc32c.exe -DCRYPTOPP_NO_CXX11 -DCRYPTOPP_DISABLE_SSSE3 -D__SSE2__ -D__SSE__ -D__MMX__
```
This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
There are no corresponding defines in config.h at the moment. Programs will have to use the preprocessor macros __AVX__ and __AVX2__ to determine when they are available.
It looks like the 0 return value for _SC_LEVEL1_DCACHE_LINESIZE is not a 1-off problem with PPC. It appears Glibc regularly returns 0 instead of failure. Also see https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/sysconf.c.
We were OK before the change. The difference now is, we expect all Glibc queries to misbehave
If CRYPTOPP_GETAUXV_AVAILABLE is undefined, getauxval function is
defined to return 0 however AT_HWCAP and AT_HWCAP2 are not defined so
compilation on toolchain without getauxval and these variables such as
uclibc-ng will fail.
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
This was a latent bug that just surfaced on a Sun Core2 workstation. RDSEED caused an illegal instruction exception on the Core2. It seems we managed to miss it because old processors had family and stepping values so low they never set CPUID.EBX.RDSEED[bit 18] = 1. Newer processors had the feature so CPUID.EBX.RDSEED[bit 18] = 1 was accurate.
We need to ensure SSE2 does not cross pollinate into other CPU functions since SSE2 is greater than the minimum arch. The minimum arch is i586/i686, and both lack SSE2 instructions
This avoids the probe for SSE2 in most circumstances. The SSE2 test is mostly benign nowadays since SSE2 and OS support is nearly ubiquitous. But the define CRYPTOPP_NO_CPU_FEATURE_PROBES added for Apple OSes was interacting badly on x86 machines. Also see GH #511.
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The inline ASM code now uses local variables to save the EAX-EDX registers, and then copies the locals into the function parameters. It side steps problems with calling conventions
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
I wish GCC would get its head out of its ass and define the apprpriate defines. NEON/ASIMD cannot be disgorged from Aarch32/Aarch64 just like SSE2 cannot be disgorged from x86_64. They are core instruction sets
Wrap DetectArmFeatures and DetectX86Features in InitializeCpu class
Use init_priority for InitializeCpu
Remove HAVE_GCC_CONSTRUCTOR1 and HAVE_GCC_CONSTRUCTOR0
Use init_seg(<name>) on Windows and explicitly insert at XCU segment
Simplify logic for HAVE_GAS
Remove special recipies for MACPORTS_GCC_COMPILER
Move C++ static initializers into anonymous namespace when possible
Add default NullNameValuePairs ctor for Clang
When MSVC init_seg or GCC init_priority is available, we don't need to use the Singleton. We only need to create a file scope class variable and place it in the segment for MSVC or provide the attribute for GCC.
An additional upside is we cleared all the memory leaks that used to be reported by MSVC for debug builds.
The macros that invoke GCC inline ASM have better code generation and speedup GCM ops by about 70 MiB/s on an Opteron 1100. The intrinsics are still available for Windows platforms and Visual Studio 2017 and above
It appears Apple Clang disgorges carryless multiply (PMULL) from Crypto (AES and SHA). The breakout added CRYPTOPP_BOOL_ARM_PMULL_INTRINSICS_AVAILABLE for PMULL, and retained CRYPTOPP_BOOL_ARM_CRYPTO_INTRINSICS_AVAILABLE for AES and SHA only
This should not be needed, but it does not hurt. According to Ian Lance Taylor (http://gcc.gnu.org/ml/gcc-help/2014-02/msg00023.html), the CC clobber causes GCC to forget its internal representation of flag state. It should not be needed for cpuid. However, Clang has some odd behave in a couple of versions of its compiler when using cpuid. Both JW and UB experienced it on separate occassions.