The CORE function provides the implementation for ChaCha_OperateKeystream_ALTIVEC, ChaCha_OperateKeystream_POWER7, BLAKE2_Compress32_ALTIVEC and BLAKE2_Compress32_POWER7. Depending on the options used to compile the source files, either POWER7 or ALTIVEC will be used.
This is needed to support the "new toolchain, ancient hardware" use case.
This is kind of tricky. We automatically drop from POWER7 to POWER4 if 7 is notavailable. However, if POWER7 is available the runtime test checks for HasAltivec(), and not HasPower7(), if the drop does not occur.
All of this goodness is happening on an old Apple G4 laptop with Gentoo. It is a "new toolchain on old hardware".
This is kind of tricky. We automatically drop from POWER7 to POWER4 if 7 is not available. However, if POWER7 is available the runtime test checks for HasAltivec(), and not HasPower7(), if the drop does not occur.
All of this goodness is happening on an old Apple G4 laptop with Gentoo. It is a "new toolchain on old hardware".
GCM can do some bulk XOR's using the SIMD unit. However, we still need loads and stores to be fast. Fast loads and stores of unaligned data requires the VSX unit
The way the existing ppc_simd.h is written makes it hard to to switch between the old Altivec loads and stores and the new POWER7 loads and stores. This checkin rewrites the wrappers to use _ALTIVEC_, _ARCH_PWR7 and _ARCH_PWR8. The wrappers in this file now honor -maltivec, -mcpu-power7 and -mcpu=power8. It allows users to compile a source file, like chacha_simd.cpp, with a lower ISA and things just work for them.
This costs about 0.6 cpb (700 MB/s on GCC112), but it makes the faster algorithm available to more machines. In the future we may want to provide both POWER7 and POWER8