answering : https://github.com/Cyan4973/xxHash/issues/175#issuecomment-548108921
The probability of receiving an empty string is larger than random (> 1 / 2^64),
making the generated hash more "common".
For some algorithm, it's an issue if this "more common" value is 0.
Maps it instead to an avalanche of an arbitrary start value (prime64).
The start value is blended with the `seed` and the `secret`,
so that the result is dependent on those ones too.
Inlining hash functions is generally beneficial for performance.
It becomes extremely beneficial whenever input size is a compile-time constant.
To inline xxhash functions, one just needs to include it this way :
One potential issue is that "xxhash.h" may have already been included previously,
typically as part of another included `*.h` .
In which case, the second `#include` statement will have no effect.
This patch fixes this situation :
now, when XXH_INLINE_ALL is defined, all identifiers are renamed,
in order to avoid name collision and confusion with already included indentifiers.
The renaming process uses XXH_NAMESPACE.
XXH_NAMESPACE must be available (i.e. not already used) for renaming to work.
A test has been added in `tests/` to ensure this scenario works correctly.
AArch64 and ARMv7 now use the same codepath, with the VZIP.32 hack being
abstracted away in a macro.
I also fully (over?) documented it to explain the hack.
This marks all internal functions as `static`, and gives the compiler
full control over whether to inline functions or not.
This is automatically defined on GCC and Clang when `-O0`, `-Os`, `-Oz`,
or `-fno-inline` is used.
With clang -Oz for AArch64, the .text section goes from 16880 bytes to
8136 bytes.
When gcc was updated to version 10 in Fedora the build of the xxhash
package started failing. The compilation succeeds, but the checks
trigger an error. This change, suggested by the maintainers of the gcc
package in Fedora, addresses the issue. F0r details see:
https://bugzilla.redhat.com/show_bug.cgi?id=1798908
- Removes permxor optimization, it isn't worth the boilerplate
- Uses memcpy and automatically swaps in the load
This removes big endian checks in the loop and improves portability,
hopefully maintaining performance.
- Loads are ugly. I haven't found any good documentation about
unaligned loads.
- Hopefully reduce the conditionals
I mostly want to test on Travis, as I don't have an s390x toolchain
at the moment.