We think this is another instance problem that surfaced under GH #683 when inString==outString. It violates aliasing rules and the compiler begins removing code.
The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb.
When combined with the updated xorbuf from GH #1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
We found we can avoid the memcpy in the previous workaround by using a volatile pointer. The pointer appears to tame the optimizer so the compiler does not short-circuit some calls when outString == inString.
So it looks like sed added a '\r' between the closing paren and the semi. Grepping for '^;' failed because the '\r' was considered part of the previous line, so it showed no hits. I finally had to write a C program to properly identify and fix those damn stray semicolons.
* remove superfluous semicolon
* Remove C-style casts from public headers
clang warns about them with -Wold-style-cast. It also warns about
implicitly casting away const with -Wcast-qual. Fix both by removing
unnecessary casts and converting the remaining ones to C++ casts.
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
This commit supports the upcoming AltiVec and Power8 processor support for stream ciphers. This commit affects GlobalRNG() most because its an AES-based generator. The commit favors AlignedSecByteBlock over SecByteBlock in places where messages are handled on the AltiVec and Power8 processor data paths. The data paths include all block cipher modes of operation, and some filters like FilterWithBufferedInput.
Intel and ARM processors are tolerant of under-aligned buffers when using crypto instructions. AltiVec and Power8 are less tolerant, and they simply ignore the three low-order bits to ensure an address is aligned. The AltiVec and Power8 have caused a fair number of wild writes on the stack and in the heap.
Testing on a 64-bit Intel Skylake show a marked improvement in performance. We suspect GCC is generating better code since it knows the alignment of the pointers, and does not have to emit fixup code for under-aligned and mis-aligned data. Testing on an mid-2000s 32-bit VIA C7-D with SSE2+SSSE3 showed no improvement, and no performance was lost.
This reverts commit eb3b27a6a5. The change broke GCC 4.8 and unknown version of Clang on OS X. UB reported the OS X break, and JW found duplicated the break on a ARM CubieTruck with GCC 4.8.
trap.h and CRYPTOPP_ASSERT has existed for over a year in Master. We deferred on the cut-over waiting for a minor version bump (5.7). We have to use it now due to CVE-2016-7420
- added AuthenticatedSymmetricCipher interface class and Filter wrappers
- added CCM, GCM (with SSE2 assembly), CMAC, and SEED
- improved AES speed on x86 and x64
- removed WORD64_AVAILABLE; compiler 64-bit int support is now required