Commit Graph

7 Commits

Author SHA1 Message Date
Jeffrey Walton
babdf8b38b
Add XOP aware CHAM and LEA 2018-10-24 17:12:03 -04:00
Jeffrey Walton
ed4d57cecb
Add XOP aware ChaCha
ChaCha is about 50% faster using XOP for the rotates on AMD machines
2018-10-24 16:15:13 -04:00
Jeffrey Walton
b4c4c5aa14
Add SSSE3 rotates when available
This change obtains the remaining 0.1 to 0.15 cpb. It should be engaged with -march=native
2018-10-24 15:34:54 -04:00
Jeffrey Walton
18dcbdf514
Move input xor to ChaCha_OperateKeystream_SSE2
This picks up about 0.2 cpb in ChaCha::OperateKeystream. It may not sound like much but it puts SSE2 intrinsics version on par with the ASM version of Salsa20. Salsa20 leads ChaCha by 0.1 to 0.15 cpb, which equates to about 50 MB/s.
2018-10-24 11:00:35 -04:00
Jeffrey Walton
d230999b40
Fix ChaCha compile on ARM and MIPS 2018-10-24 01:11:45 -04:00
Jeffrey Walton
6a5d2ab03d
Remove unneeded params from ChaCha_OperateKeystream_SSE2 2018-10-23 08:52:29 -04:00
Jeffrey Walton
916c4484a2
Add ChaCha SSE2 implementation
Thanks to Jack Lloyd and Botan for allowing us to use the implementation.
The numbers for SSE2 are very good. When compared with Salsa20 ASM the results are:
  * Salsa20 2.55 cpb; ChaCha/20 2.90 cpb
  * Salsa20/12 1.61 cpb; ChaCha/12 1.90 cpb
  * Salsa20/8 1.34 cpb; ChaCha/8 1.5 cpb
2018-10-23 07:57:59 -04:00