This one should have been fixed before the Crypto++ 6.1 release. Its no big deal, however. Power8 accelerated SHA-256 is 1.5x to 2x slower than straight C++. SHA-512 may be better, but the implementation is not ready to performance test.
This was a latent bug that just surfaced on a Sun Core2 workstation. RDSEED caused an illegal instruction exception on the Core2. It seems we managed to miss it because old processors had family and stepping values so low they never set CPUID.EBX.RDSEED[bit 18] = 1. Newer processors had the feature so CPUID.EBX.RDSEED[bit 18] = 1 was accurate.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
We need to ensure SSE2 does not cross pollinate into other CPU functions since SSE2 is greater than the minimum arch. The minimum arch is i586/i686, and both lack SSE2 instructions
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
This is the forward direction on encryption only. Crypto++ uses the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. The library library will need to re-work the decryption key scheduling routines. (We may be able to work around it another way, but I have not investigated it).