Yann Collet
f2163b8b75
changed algorithm for small inputs
...
- seed modifies the key values,
the value that can trigger a zero multiply is now seed-dependent
- accumulator also receive input as addition,
cancelling the impact of zero multiply.
Performance on small inputs seems slightly slower, within noise measurement level.
2019-03-16 21:27:39 -07:00
Yann Collet
f3d4bf4eef
xxh3: ensure all 64-bits of the seed are ingested
...
in the len=[1-3] case
2019-03-16 20:32:20 -07:00
Yann Collet
4229399fc9
fixed minor warnings on Visual
2019-03-16 09:35:10 -07:00
Yann Collet
b810177b0a
added Visual target on Appveyor
2019-03-16 09:18:56 -07:00
Yann Collet
701423eeda
fixed most Visual compilation issues
...
still this dllimport thing,
I don't know why it was added,
maybe something to remove altogether.
2019-03-16 06:59:46 -07:00
Yann Collet
40dbf78fa9
renamed XXH128_hash_t members to low64 and high64
2019-03-14 13:08:38 -07:00
easyaspi314 (Devin)
c1ae3287a1
Update ARM NEON code
...
The NEON algorithms have now been updated to match the SSE2 algorithm.
2019-03-12 22:20:45 -04:00
Yann Collet
8423e82ef8
fixed last integration issues
2019-03-12 18:13:46 -07:00
Yann Collet
af852ac752
fixed last strict aliasing issues
2019-03-12 17:48:59 -07:00
Yann Collet
e6433e8dfd
restored clang #pragma unroll statement
...
that has been accidentally lost in an update.
2019-03-12 17:36:37 -07:00
Yann Collet
3fe53a4ab9
fixed endianess issue
2019-03-12 15:57:56 -07:00
Yann Collet
51ac7dc7e9
fixed minor conversion warning
...
detected on ARM 32-bit
2019-03-12 12:56:52 -07:00
Yann Collet
c76d96454b
xxh3: fixed declaration after statement in AVX2 path
...
also :
- added header license
- fixed alignment declaration
2019-03-12 11:55:37 -07:00
Yann Collet
405e49403c
xxh3: fixed scalar variant
...
scrambling stage wasn't updated to match new formula
2019-03-11 15:40:01 -07:00
Yann Collet
638993f16b
added consistency tests for XXH3_64b
...
validated against SSE2 path
2019-03-11 15:09:27 -07:00
Yann Collet
2010b7e7de
fixed addition discrepancy between scalar and vector code
...
let's both have a 64-bit addition with carry
2019-03-09 00:19:40 -05:00
Yann Collet
c92d6bcdd8
Merge pull request #172 from easyaspi314/xxh3-pi-test
...
Add unroll pragma for Clang in XXH3_accumulate.
2019-03-08 22:37:31 -05:00
Yann Collet
a5d5bf778f
improve algorithm by compensating UMAC deficiency
...
no longer possibly to nullify one member through another
2019-03-08 22:32:11 -05:00
easyaspi314 (Devin)
60215c5bfb
Fix typo causing build failure on 32-bit
2019-03-08 22:26:25 -05:00
easyaspi314 (Devin)
c5953f132c
Add unroll pragma for Clang in XXH3_accumulate.
...
Clang doesn't unroll the XXH3_accumulate loop for some reason. Using
`#pragma clang loop unroll(enable)` to hint to Clang that it should
unroll results in a huge 1.4-1.5x speedup.
Before: 15 GB/s
After: 21 GB/s
2019-03-08 22:07:08 -05:00
Yann Collet
2afd24d8bb
xxh128: minor modifications to improve bias
...
1.4% => 0.6%
2019-03-08 16:03:24 -05:00
Yann Collet
27b4f31b77
Merge branch 'xxh3_128' into xxh3
2019-03-08 15:54:41 -05:00
Yann Collet
4f4f63c73b
modified xxh128 so that low part == xxh3_64b
2019-03-08 15:37:06 -05:00
easyaspi314 (Devin)
02d0ba79a0
Remove preprocessor statement leftover from testing
...
What '0 &&' ? No idea what you are talking about...
2019-03-07 19:51:39 -05:00
easyaspi314 (Devin)
7558f18493
Add improved 128-bit multiply routine for 32-bit and use intrinsics long multiply
2019-03-07 17:26:49 -05:00
Yann Collet
a951c0aeba
xxh3: updated mul128 with a 32-bits backup path
...
also:
started XXH128 (not finished yet)
2019-03-06 23:42:04 -05:00
Yann Collet
8d96de3e1c
added variant with seed
2019-03-06 17:46:42 -05:00
Yann Collet
48e3d724d1
updated xxh3
2019-03-06 11:55:48 -05:00
Yann Collet
1d32e2664a
Merge pull request #167 from easyaspi314/xxh3
...
xxh3: add NEON support
2019-02-28 17:40:33 -08:00
easyaspi314 (Devin)
8d345470e6
xxh3: add NEON support
...
Signed-off-by: easyaspi314 (Devin) <easyaspi314@users.noreply.github.com>
2019-02-28 20:28:29 -05:00
Yann Collet
b348fa896a
restored 8-way mixer
2019-02-28 16:43:44 -08:00
Yann Collet
fa31d0b02f
xxh3: fixed last minor quality metric
...
in extended tests
2019-02-27 15:03:23 -08:00
Yann Collet
5b827f538c
improved 8-ways mixer
2019-02-26 18:38:20 -08:00
Yann Collet
2be95459cd
fixed minor c90 warning
2019-02-26 16:42:50 -08:00
Yann Collet
7784d41ce3
fixed ARM compilation error
2019-02-26 16:36:03 -08:00
Yann Collet
94bebd5b86
xxh3: more c90 compatibility
2019-02-26 15:55:25 -08:00
Yann Collet
43c10239c9
minor C90 adaptation fixes
...
added -Wconversion flag
2019-02-26 13:45:56 -08:00
Yann Collet
45f39e6d34
first implementation of XXH3_64b
...
currently can only be used for benchmarking (`-b`)
2019-02-26 12:36:23 -08:00