These changes made it in by accident at Commit b74a6f4445. We were going to try to let them ride but they broke versioning. They may be added later but we should avoid the change at this time.
We also switched to "IsAligned<HashWordType>(input)". Using word64 was due to debug testing on Solaris (the alignment check is needed). Hard coding word64 should not have been checked in.
It looks like GetAlignmentOf was returning the "UnsignedMin(4U, sizeof(T))" for SunCC. It was causing SIGBUSes on Sparc when T=word64. OpenCSW provided access to their build farm and we were able to test "__alignof__(T)" back to an early SunCC on Solaris 9.
Man, Sparc does not mess around with unaligned buffers. Without -xmemalign=4i the hardware wants 8-byte aligned word64's so it can use the high performance 64-bit move or add.
Since we do not use -xmemalign we get the default behavior of either -xmemalgin=8i or -xmemalgin=8s. It shoul dnot matter to us since we removed unaligned data access at GH #682.
IteratedHashBase::Update aligns the buffer, but IteratedHashBase::HashBlock does not. It was causing a fair number of asserts to fire when the code was instrumented with alignment checks. Linux benchmarks shows the code does not run materially slower on i686 or x86_64.
I believe Andrew Marlow first reported it. At the time we could not get our hands on hardware to fully test things. Instead we were using -xmemalign=4i option as a band-aide to avoid running afoul of the Sparc instruction that moves 64-bits of data in one shot.
Local scopes and loading the constants with _mm_set_epi32 saves about 0.03 cpb. It does not sound like much but it improves GMAC by about 500 MB/s. GMAC is just shy of 8 GB/s.
We got reports that x86_64 was producing incorrect results. Also, the problem persisted in i386 builds. I don't think we can work around this issue. Oracle must fix it.
I think we have this issue somewhat sorted out. First, there is a compiler bug. Second, it seems to be triggered when function parameters mix const and non-const references. Third, to work around it, all parameters need to be non-const (as in this patch).
I'm really glad we kind of got to the bottom of things. The crash when compiling GCM has been bothering me for nearly 3 years.