Use PowerPC unaligned loads and stores with Power8. Formerly we were using Power7 as the floor because the IBM POWER Architecture manuals said unaligned loads and stores were available. However, some compilers generate bad code for unaligned loads and stores using `-march=power7`, so bump to a known good.
Some comments in config.h were old. Time for a refresh.
Switch from CRYPTOPP_BOOL_ARM64 to CRYPTOPP_BOOL_ARMV8. Aarch32 is ARMv8, and that's the important part.
Autotools sets up its config.h file with the '#define XXX 0' or '#define XXX 1' pattern. This check-in makes the sources Autotools aware. We need to verify CMake does the same
Commit afbd3e60f6 effectively treated a symptom and not the underlying problem. The problem was linkers on 32-bit systems ignore CRYPTOPP_ALIGN_DAT(16) passed down by the compiler and align to 8-bytes or less. We have to use Wei's original code in some places. It is not a bad thing, but the bit fiddling is something we would like to contain a little more by depending more on language or platform features.
This commit keeps the original changes which improve partial specializations; but fixes 32-bit linker behavior by effectively reverting afbd3e60f6 and e054d36dc8. We also add more comments so the next person has understands why things are done they way they are.
This allocator still has some demons buried inside due to the bit fiddling. This commit should isolate the demons to aligned stack allocations when an alignment facility from the platform or OS is not available. That is, we use CRYPTOPP_ALIGN_DATA when we can because it is most reliable.
We can tell when things have gone sideways using Debug builds. The CRYPTOPP_ASSERT(m_allocated) will fire on destruction because the flag gets overwritten.
Commit 3ed38e42f6 added the POWER8 infrastructure for GCM mode. It also added GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL. This commit adds the remainder, which includes GCM_AuthenticateBlocks_VMULL.
GCC is OK on Linux (ppc64-le) and AIX (ppc64-be). We may need some touchups for XLC compiler
GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL work as expected on Linux (ppc64-le) and AIX (ppc64-be). We are still working on GCM_AuthenticateBlocks_VMULL.