Commit Graph

777 Commits

Author SHA1 Message Date
Yann Collet
1c3bfee5f0
Merge pull request #284 from Cyan4973/s390x
added s390x tests on travis
2019-12-14 13:31:36 -08:00
easyaspi314 (Devin)
6aa9beeb50 [s390x] Identify s390x in xxhsum 2019-12-14 16:30:22 -05:00
Yann Collet
4863cba0fa added s390x tests on travis 2019-12-14 13:07:04 -08:00
Yann Collet
9fd98d7c5f
Merge pull request #283 from adamretter/dev
Use Travis CI Arm64 and PPC64le arch
2019-12-14 12:50:25 -08:00
Adam Retter
618dd731b6 Use Travis CI Arm64 and PPC64le arch 2019-12-14 21:34:07 +01:00
Yann Collet
ae245428b8 update internal benchmark
to reduce risks of rounding bias
when a measurement uses a too small amount of time.
2019-12-10 13:12:28 -08:00
Yann Collet
c173142da0 removed ===== comment separators
for easier compatibility with git merge conflict.

fix #277
2019-12-03 16:44:41 -08:00
Yann Collet
3109b8ef98
Merge pull request #278 from Cyan4973/xxh_implem
transferred implementation inside xxhash.h
2019-12-03 16:41:15 -08:00
Yann Collet
cff3643459 fixed make command
on Windows, command line doesn't accept environment variables before the command
2019-12-03 15:55:50 -08:00
Yann Collet
8f6e9c92e6 fixed mingw+clang compilation test
which do not work using c90 strict mode,
due to the (incorrect) presence of `inline` keyword
in some standard library's header files.

The previous method was disabling the `inline` keyword,
but this introduces other problems with more complex multi-files project,
such as benchHash, which has been recently added as part of `make test`.

Added a new environment variable to disable the c90 compatibility test :
NO_C90_TEST=true

note : apparently, Appveyor doesn't like comments inside () sub-blocks :(
2019-12-02 17:32:56 -08:00
Yann Collet
4e2199def8
Merge pull request #280 from clockfort/patch-2
fix docs link
2019-11-24 11:49:53 -08:00
Chris Lockfort
a96cd994e7
fix docs link 2019-11-24 04:15:44 -08:00
Yann Collet
aa61b378b3 removed extraneous include
also : attempts to fix benchHash compilation on Windows
using msys2 environment with clang compiler
2019-11-04 21:17:07 -08:00
Yann Collet
d3a76b3a28 attempt to reconcile usan and benchHash 2019-11-04 17:25:42 -08:00
Yann Collet
a642aba0f5 transferred implementation inside xxhash.h
instead of xxhash.c .

This seems preferable for some build systems,
which don't like the `#include "xxhash.c"` statement
when inlining xxhash, as reported by @pdillinger .

Note that `xxhash.c` still exists,
it just includes the implementation and instantiates it.
2019-11-04 16:04:28 -08:00
Yann Collet
44cd858d49 removed non-ascii character 2019-10-24 15:57:00 -07:00
Yann Collet
8e5fdcbe70 added benchHash compilation to test target 2019-10-11 08:12:25 -07:00
Yann Collet
e6dc2443f0 fix: multiple include with/without XXH_STATIC_LINKING 2019-10-11 07:50:06 -07:00
Yann Collet
1cc9634053 improve speed for large inputs
by up to +20%
2019-10-08 20:58:02 -07:00
Yann Collet
1ea98d6a38 changed strict-overflow warning to level 2
since it inexplicably complains about `main` since 4e4570f751
2019-10-07 10:56:02 -07:00
Yann Collet
c6ea1d2107 updated man page regarding -q 2019-10-07 08:34:39 -07:00
Yann Collet
4e4570f751 removed non-error messages from stderr when specifying -q 2019-10-07 08:25:57 -07:00
Yann Collet
c5f72f87af added guard macro in xxhash.c
since it can be included.
2019-10-07 07:56:42 -07:00
Yann Collet
73e6c5206c simplified type declaration
some types were not needed.

Also : xxh_u* type are only necessary within libxxhash,
not xxhsum
2019-10-07 07:52:32 -07:00
Yann Collet
c996521986 reuse XXHnn_hasht_t definitions
for internal typedef xxh_u32 and xxh_u64
2019-10-07 01:00:50 -07:00
Yann Collet
8dab0315ac
Merge pull request #271 from easyaspi314/typedefs
Improve typedefs, fix 16-bit int/seed type bug
2019-10-07 00:52:53 -07:00
easyaspi314 (Devin)
368a6f9699 Improve typedefs, fix 16-bit int/seed type bug
Fixes #258.

```c
BYTE -> xxh_u8
U32  -> xxh_u32
U64  -> xxh_u64
```

Additionally, I hopefully fixed an issue for targets where int is 16
bits. XXH32 used unsigned int for its seed, and in C90 mode, unsigned
int as its U32. This would cause truncation issues. I check limits.h in
C90 mode to make sure UINT_MAX == 0xFFFFFFFFUL, and if it isn't, use
unsigned long.

We should see if we can set up an AVR CI test. Just to run the
verification program, though, as the benchmark will take a very long
time.

Lastly, the seed types are XXH32_hash_t and XXH64_hash_t for XXH32/64.
This matches xxhash.c and prevents the aforementioned 16-bit int bug.
2019-10-06 19:14:12 -04:00
Yann Collet
b604c7bee5
Merge pull request #269 from easyaspi314/endianness_fix
Fix endianness detection on GCC, avoid XXH_cpuIsLittleEndian.
2019-10-04 13:38:51 -07:00
easyaspi314 (Devin)
028c0fd534 Fix endianness detection on GCC, avoid XXH_cpuIsLittleEndian. 2019-10-04 09:47:09 -04:00
Yann Collet
71a8150b6f
Merge pull request #267 from easyaspi314/main_loop_cleanup
[SCALAR] Improve scalar XXH3_accumulate_512 loop
2019-10-03 10:38:43 -07:00
easyaspi314 (Devin)
9b6fa1067f [SCALAR] Improve scalar XXH3_accumulate_512 loop
The previous XXH3_accumulate_512 loop didn't fare well since XXH128
started swapping the addition.

Neither GCC nor Clang could follow the barely-readable loop, resulting
in garbage code output.

This made XXH3 much slower. Take 32-bit scalar ARM.

Ignoring loads and potential interleaving optimizations, in the main
loop, XXH32 takes 16 cycles for 8 bytes on a typical ARMv6+ CPU, or 2 cpb.

```asm
        mla     r0, r2, r5, r0  @ 4 cycles
	ror     r0, r0, #19     @ 1 cycle
	mul     r0, r0, r6      @ 3 cycles
	mla     r1, r3, r5, r1  @ 4 cycles
	ror     r1, r1, #19     @ 1 cycle
	mul     r1, r1, r6      @ 3 cycles
```

XXH3_64b takes 9, or 1.1 cpb:
```asm
        adds    r0, r0, r2      @ 2 cycles
	adc     r1, r1, r3      @ 1 cycle
	eor     r4, r4, r2      @ 1 cycle
	eor     r5, r5, r3      @ 1 cycle
	umlal   r0, r1, r4, r5  @ 4 cycles
```

Benchmarking on a Pixel 2 XL (with a binary for ARMv4T), previously,
XXH32 got 1.8 GB/s, while XXH3_64b got 1.7.

Now, XXH3_64b gets 2.3 GB/s! This calculates out well (as additional
loads and stores have some overhead).

Unlike before, it is better to disable autovectorization completely, as
the compiler can't vectorize it as well. (Especially with Clang and
NEON, where it extracts to multiply instead of the obvious vmlal.u32!).

On that same device in aarch64 mode XXH3's scalar version when compiled
with `clang-8 -O3 -DXXH_VECTOR=0 -fno-vectorize -fno-slp-vectorize`,
XXH3 went from 2.3 GB/s to 4.3 GB/s. For comparison, the NEON version
gets 6.0 GB/s.

However, almost all platforms with decent autovectorization have a
handwritten intrinsics version which is much faster.

For optimal performance, use -fno-tree-vectorize -fno-tree-slp-vectorize
(or simply disable SIMD instructions entirely).

From testing, ARM32 also prefers forced inlining, so I enabled it.

I also fixed some typos.
2019-10-03 09:56:18 -04:00
Yann Collet
96e8472380 documented opened API consistency questions 2019-10-02 14:47:59 -07:00
Yann Collet
28950be40c updated code comments
especially on the canonical representation paragraph,
to make it clear it's the preferred format for storage and transmission.
2019-10-02 14:31:14 -07:00
Yann Collet
2b956f86b0
Merge pull request #266 from easyaspi314/varnames
Try to improve some variable names.
2019-10-02 11:28:45 -07:00
Yann Collet
b2154f3583
Merge pull request #265 from easyaspi314/fair_bench
Use both seeded and unseeded variants in the bench
2019-10-02 11:01:31 -07:00
easyaspi314 (Devin)
1367385768 Fix mixed declaration
I need to stop coding before my coffee. :/
2019-10-02 13:01:51 -04:00
easyaspi314 (Devin)
425dbd8d86 Try to improve some variable names.
It's a start, but an improvement. I still have more things I would like
to change but it is good for now.
2019-10-02 12:28:01 -04:00
easyaspi314 (Devin)
91d6e4927e Use both seeded and unseeded variants in the bench
Previously, XXH3_64bits looked much faster than XXH3_128bits. The truth
is that they are similar in long keys. The difference was that
XXH3_64b's benchmark was unseeded, putting it at an unfair advantage
over XXH128 which is seeded.

I don't think I am going to do the dummy bench. That made things moe
complicated.
2019-10-01 23:23:55 -04:00
Yann Collet
3df9e91856
Merge pull request #264 from easyaspi314/voidptrfix
Reduce void pointers and evil casts.
2019-10-01 20:07:53 -07:00
easyaspi314 (Devin)
cb4adfcc10 Typo 2019-10-01 19:00:28 -04:00
easyaspi314 (Devin)
f90b0aba40 Reduce void pointers and evil casts. 2019-10-01 18:52:21 -04:00
Yann Collet
a44629ace1
Merge pull request #262 from Cyan4973/xxh128_17p
improve xxh128 for mid-size
2019-09-30 23:11:01 -07:00
Yann Collet
c8f3fb514c factorized mix32B
changing xxh128 results for len within 129-240.
2019-09-30 22:36:07 -07:00
Yann Collet
9d79fd7bc1 factor mix32 2019-09-30 17:55:46 -07:00
Yann Collet
43b5c76b4c fixed mistake in last ingested segment 2019-09-30 17:33:38 -07:00
Yann Collet
0bed0c2e5b updated self-test values for xxh128 2019-09-30 17:26:04 -07:00
Yann Collet
6896c5798f fix input distribution over 128-bit state
for mid-size length 17+
2019-09-30 17:13:59 -07:00
Yann Collet
cd0f5c2209 slightly updated xxh128 at len 1-3
for a slightly better bias
2019-09-28 20:02:55 -07:00
Yann Collet
ea5c659701 update man page 2019-09-28 17:55:41 -07:00
Yann Collet
eab46160a9 update examples and comment 2019-09-28 17:39:00 -07:00