This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
The ALTIVEC function required an inline declaration. Lack of inline caused the self test failure. Two NEON functions needed the same. We also cleaned up constants in unnamed namespaces
The code below was flagged by undefined behavior santizier under GCC 8. The offender was the doubling at "r4 = vec_add(r4, r4)". R4 is rcon and an unsigned type. It depends on integer wrap but GCC is generating code that is being flagged for signed overflow. GCC 7 and below is OK.
for (unsigned int i=0; i<8; ++i)
{
r1 = Rijndael_Subkey_POWER8(r1, r4, r5);
r4 = vec_add(r4, r4);
skptr = IncrementPointerAndStore(r1, skptr);
}
// Final two rounds using table lookup
...
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
The problem was vec_sld is endian sensitive. The built-in required more than us setting up arguments to ensure the vsx load resulted in a big endian value. Thanks to Paul R on Stack Overflow for sharing the information that IBM did not provide. Also see http://stackoverflow.com/q/46341923/608639
Make internal functions static. We get better optimizations depsice using unnamed namespaces
Add PowerPC uint32x4 functions for handling 32-bit rcon and mask
This is AES-128 key expansion for big endian. Little endian has a bug in it so it can't be enabled at the moment. GDB is acting up on GCC112, so I've had trouble investigating it
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.