which performs generally worse than simpler loop finalizer
(see https://github.com/Cyan4973/xxHash/pull/519#issuecomment-807868078)
especially on 32-bit / arm systems.
The switch finalizer also largely increases the binary size of XXH64 function.
removed XXH_REROLL_XXH64 which is no longer needed.
simplifies the code base.
- AArch64 does not benefit enough from unrolling XXH64_finalize to
justify the code size increase.
- Include aarch64 in the XXH32 asm guard. Clang autovectorizes this
incorrectly. 2.5->4 GB/s on Clang 11 + Snapdragon 730G.
- Replace the asm guards with a macro.
XXH32's API, XXH*_state_s, the config macros, and some other things are
documented in Doxygen format, basic Doxyfile added.
Some functions were also redocumented in detail.
Still need to work on structuring the main page (set it to README.md?).
How to view docs for now:
$ sudo apt install doxygen
$ npm install -g http-server # or you can use LLVM's scan-view
$ doxygen && hs doxygen/html -s -c-1 -o
when state was already used with the same `seed` previously.
This skips secret generation when `seed` remains constant.
Also : improves streaming speed on small data
thanks to lighter memset() during reset().
Following comments from @koraa :
- fix endianess issue, by using canonical representation
- all segments are derived from first one, in order to reduce dependency chain
- all derived segments use a seed, which is a combination raw custom seed content and segNb
- updated documentation
This variant breaks the possibility for an actor
to derive the entire secret from the knowledge of one of its segments
since it requires the seed, which is derived from original custom seed,
which is not present anywhere, condensed in the scrambling operation.
update 128-bit scrambler at each 16-bytes round.
ensure there is no delta-correlation possible between segments,
even when the initial custom seed is poor (<= 16 bytes, or full of `\0` characters).
as this architecture offers decent performance for unaligned memory accesses (like x86/x64).
This circumvent the problem described in #383,
as the direct read path using `const xxh_u32*` will not be generated by default.
Note however that this does not go at the root of the problem,
which is a possibility of strict-aliasing issue
when using the direct read path _and_ inlining xxh32.
This issue however is more difficult to solve without a reproduction case.
I also used this opportunity to fix or reinforce documentation.
- Every symbol, typedef, and macro is prefixed with XXH or xxh.
- For compatibility with legacy code that messes with the internals,
XXH_OLD_NAMES can be defined to wrap the old names in macros. This
also brings back U32 and friends.