Perforance increases significantly, but there's still room for improvement. Even OpenSSL's numbers are relatively dull. We expect Power8's SHA-256 to be somewhere between 2 to 8 cpb but we are not hitting them.
SHA-256, GCC112 (ppc64-le): C++ 23.43, Power8 13.24 cpb (+ 110 MiB/s)
SHA-256, GCC119 (ppc64-be): C++ 10.16, Power8 9.74 cpb (+ 50 MiB/s)
SHA-512, GCC112 (ppc64-le): C++ 14.00, Power8 9.25 cpb (+ 150 MiB/s)
SHA-512, GCC119 (ppc64-be): C++ 21.05, Power8 6.17 cpb (+ 450 MiB/s)
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.