Thanks to Jack Lloyd and Botan for allowing us to use the implementation.
The numbers for SSE2 are very good. When compared with Salsa20 ASM the results are:
* Salsa20 2.55 cpb; ChaCha/20 2.90 cpb
* Salsa20/12 1.61 cpb; ChaCha/12 1.90 cpb
* Salsa20/8 1.34 cpb; ChaCha/8 1.5 cpb
These changes made it in by accident at Commit b74a6f4445. We were going to try to let them ride but they broke versioning. They may be added later but we should avoid the change at this time.