We are going to keep the bug report open until we get some official test vectors. We will probably have to modify one of the Blake team's test programs since they did not publish test vectors using salt or personalization
Use PowerPC unaligned loads and stores with Power8. Formerly we were using Power7 as the floor because the IBM POWER Architecture manuals said unaligned loads and stores were available. However, some compilers generate bad code for unaligned loads and stores using `-march=power7`, so bump to a known good.
The ParameterBlocks for BLAKE2 had undefined behavior. We relied on the compiler packing the bytes in the structure, then we used the first byte as the start of an array.
This rewrite does things correctly. We don't memset the structure, and we don't treat the structure as a contiguous array.
The CORE function provides the implementation for ChaCha_OperateKeystream_ALTIVEC, ChaCha_OperateKeystream_POWER7, BLAKE2_Compress32_ALTIVEC and BLAKE2_Compress32_POWER7. Depending on the options used to compile the source files, either POWER7 or ALTIVEC will be used.
This is needed to support the "new toolchain, ancient hardware" use case.
This is kind of tricky. We automatically drop from POWER7 to POWER4 if 7 is not available. However, if POWER7 is available the runtime test checks for HasAltivec(), and not HasPower7(), if the drop does not occur.
All of this goodness is happening on an old Apple G4 laptop with Gentoo. It is a "new toolchain on old hardware".
This commit also favors init priorities over C++ dynamic initialization. After the std::call_once problems on Sparc and PowerPC I'm worried about problems with Dynamic Initialization and Destruction with Concurrency.
We also do away with supressing warnings and use CRYPTOPP_UNUSED instead.
Both BLAKE2 and SPECK slow down when using NEON/ASIMD. When just BLAKE2 experienced the issue, it was a one-off problem. Its now wider than a one-off, so add the formal define
There was no aliasing violation in practice. We used a to assign the right pointer. If the compiler would have removed the unneeded assignment based on T_64bit, then we would not have been flagged.