Commit Graph

955 Commits

Author SHA1 Message Date
easyaspi314 (Devin)
06d13f72b5 Document 1to3 input setup 2020-02-27 10:17:08 -05:00
easyaspi314 (Devin)
9d278c565a Document the shift in XXH3_len_4to8_128b 2020-02-27 10:13:59 -05:00
easyaspi314 (Devin)
9d375060c8 Document 128-bit ops on XXH3_len_9to16_128b 2020-02-27 10:12:12 -05:00
Yann Collet
b54708f05b xxh128 len[4-8] : minor change
it's not useful to swap input segments
the differentiation from seed is already taken care of by the seed itself
and keeping number in the low bits slightly improves dispersion.
Also may improve speed for specific case len=8 (constant)
2020-02-26 19:32:01 -08:00
Yann Collet
48933f0037
Merge pull request #311 from Cyan4973/xxh128
Simplify len [4,8] for xxh128
2020-02-26 18:13:37 -08:00
Yann Collet
5543c3dbe9 xxh128 len[4-8]: improved distribution quality 2020-02-26 16:07:18 -08:00
Yann Collet
4ca5b6e20e xxh128 : len [4-8]: shift len by << 2
to preserve oddness of multiplier,
as suggested by @easyaspi314.

Also : stats from << 2 look better than << 1
2020-02-26 14:50:28 -08:00
Yann Collet
c6013d80d9 xxh128: slight optimization for len [4,8] 2020-02-24 18:14:32 -08:00
Yann Collet
935f280a76 xxh128: speedup len [4,8] 2020-02-24 17:39:33 -08:00
Yann Collet
b0104d2a82 Merge branch 'dev' into xxh128 2020-02-24 16:16:10 -08:00
Yann Collet
eba72be9fe fixed prng using seed 2020-02-24 16:10:20 -08:00
Yann Collet
3a0c1c3336 fixed PerlinNoise test 2020-02-24 12:25:23 -08:00
Yann Collet
64f655a28e
Merge pull request #304 from easyaspi314/unicode-windows-fixes
Fix Unicode support on Windows, minor Windows tweaks
2020-02-24 09:55:47 -08:00
Yann Collet
71f0f6ffd3
Merge pull request #308 from Cyan4973/mul32len8test
Last variant for the 4to8 segment (mul32to64)
2020-02-24 09:52:33 -08:00
Yann Collet
8d80010b7b fixed seed space reduction
thanks to @easyaspi314
2020-02-22 10:29:04 -08:00
Yann Collet
ee460fdbbb minor variation passing the PRNG test 2020-02-21 10:46:57 -08:00
Yann Collet
c8c4cc0f81
Merge pull request #309 from easyaspi314/compiler-specific-fixes
Compiler specific fixes
2020-02-21 10:14:59 -08:00
easyaspi314 (Devin)
5309e282ce Force -O2 on GCC + AVX2, document split load
GCC for AVX2 goes overboard on the unrolling with -O3, causing slower
code than MSVC and Clang.

We can override that with a pragma that forces GCC to use -O2 instead.

Note that GCC still generates the best scalar and SSE2 code with -O3.

I also mentioned the fact that GCC will split _mm256_loadu_si256 into
two instructions on a generic+avx2 target (which is an optimization that
only applies to the non-AVX2 Sandy and Ivy Bridge chips), and provide
the recommended flags.
2020-02-21 10:11:14 -05:00
easyaspi314 (Devin)
777ec6529a Implement alternative byteshift load
XXH_FORCE_MEMORY_ACCESS==3 will use a byteshift operation. This is
preferred on older compilers which don't inline `memcpy()` or some
big-endian systems without a native byteswap.

Also fix a small typo.
2020-02-21 10:01:04 -05:00
easyaspi314 (Devin)
558c9a97bf XXH_mult32to64: Use downcast+upcast instead of mask
Old/stupid compilers may generate an erroneous mask in XXH_mult32to64,
e.g. ARM GCC 2.95:

```c
xxh_u64 XXH_mult32to64(xxh_u64 a, xxh_u64 b)
{
    return (a & 0xffffffff) * (b & 0xffffffff);
}
```

`arm-gcc-2.95 -O3 -S -march=armv4t -mcpu=arm7tdmi -fomit-frame-pointer`
```asm
XXH_mult32to64:
        push    {r4, r5, r6, r7, lr}
        mov     r5, #0
        mov     r4, #0xffffffff
        mov     r7, r5
        mov     r6, r4
        @ mask 32-bit registers by 0x00000000 and 0xffffffff ?!?!?!
        and     r6, r6, r0
        and     r7, r7, r1
        and     r4, r4, r2
        and     r5, r5, r3
        @ full 64x64->64 multiply
        umull   r0, r1, r6, r4
        mla     r1, r6, r5, r1
        mla     r1, r4, r7, r1
        pop     {r4, r5, r6, r7, pc}
```

Meanwhile, using a downcast followed by an upcast generates the expected
code, albeit with some understandable regalloc weirdness (ARM support
was only recently added).

```c
xxh_u64 XXH_mult32to64(xxh_u64 a, xxh_u64 b)
{
    return (xxh_u64)(xxh_u32)a * (xxh_u64)(xxh_u32)b;
}
```

`arm-gcc-2.95 -O3 -S -march=armv4t -mcpu=arm7tdmi -fomit-frame-pointer`
```asm
XXH_mult32to64:
        push    {r4, lr}
        umull   r3, r4, r0, r2
        mov     r1, r4
        mov     r0, r3
        pop     {r4, pc}
```

Switching to this implementation may also remove the requirement for
`__emulu` on MSVC x86, but it hasn't been tested yet.

All modern compilers should recognize both patterns, but it seems that
old 32-bit compilers will prefer the latter, making this a free
optimization.
2020-02-21 06:09:06 -05:00
easyaspi314 (Devin)
77b74c9dc9 Put __attribute__((aligned)) after struct member
Improves compatibility with old GCC versions.
2020-02-21 06:05:55 -05:00
Yann Collet
00d5458761 minor optimization
slightly faster for ARM and x86
2020-02-20 15:15:56 -08:00
Yann Collet
6c3f96a9cc minor simplification 2020-02-20 09:00:32 -08:00
Yann Collet
de90226410 try a variation of len_4to8 using mult32to64
adding swap instructions to better dispatch bits.
2020-02-20 00:29:12 -08:00
Yann Collet
5fad59e746 make len8 part of 8to16 2020-02-19 21:02:00 -08:00
Yann Collet
77d65ff45f fixed Perlin Noise test 2020-02-19 17:45:21 -08:00
Yann Collet
741400a5b1 disabled checksum validation
while formula is in flux
2020-02-19 16:08:36 -08:00
Yann Collet
f1051cda49 joined len==8 into 4to8 2020-02-19 16:01:20 -08:00
Yann Collet
123db71fd0 improvement vs mul0 2020-02-19 15:58:19 -08:00
Yann Collet
f486b3c7c4 try a mul32to64 formula for len_4to8 2020-02-19 15:45:43 -08:00
Yann Collet
6456c04490 added likely
removed bijectivity
2020-02-19 14:41:56 -08:00
easyaspi314 (Devin)
0197a2b5b0 Improve comments for Windows Unicode wrappers. 2020-02-14 20:48:12 -05:00
easyaspi314 (Devin)
f0627bc321 Add explicit rules for object files to include FLAGS
This fixes the Clang appveyor build.

Now, FLAGS will always be applied to the object files and linker files.
2020-02-14 19:44:30 -05:00
easyaspi314 (Devin)
b5cef6dce0 Fix accidental typo
Don't know how I did that one.
2020-02-14 19:14:08 -05:00
easyaspi314 (Devin)
cac3ca4d5d Implement a safer Unicode test
This new test doesn't use any Unicode in the source files, instead
encoding all UTF-8 and UTF-16 as hex.

The test script will be generated from a C file, in which both a shell
script and a batch script will be generated, as well as the Unicode file
to test.

On Cygwin, MinGW, and MSYS, we will automatically bail from the shell
script to the batch script, as cmd.exe has more reliable Unicode
support, at least on Windows 7 and later.

When the make rule is called, it first checks if `$LANG` contains UTF-8,
defining the (overridable) ENABLE_UNICODE flag. If so, it will skip the
test with a warning.

Also fixed an issue with printf in multiInclude.c causing warnings on
old MinGW versions which expect %I64, and updated the .gitignore.
2020-02-14 19:08:09 -05:00
Yann Collet
993dcf89f7 fixed xxhsum verification values (partial) 2020-02-13 21:53:35 -08:00
Yann Collet
9df2729931 Merge branch 'dev' into smallInputs 2020-02-13 21:31:16 -08:00
Yann Collet
0a5f34f8cf modified small inputs for xxh3
in order to pass the new Perlin_noise test.

Sizes 4-8 should also be slightly faster.
2020-02-13 19:14:58 -08:00
easyaspi314 (Devin)
9bd98b0b45 Fix errors on older MinGW and MSVC
Always use wmain on MSVC, and use _wfopen instead of _wfopen_s.
2020-02-13 18:48:25 -05:00
easyaspi314 (Devin)
dbe2addcc1 Move test-unicode to test-all.
There are some theoretical systems which don't handle Unicode well, and
test is designed to be pretty much universal.

This locks it behind test-all.
2020-02-12 20:58:10 -05:00
easyaspi314 (Devin)
e460437a9d Fix typo 2020-02-12 20:54:13 -05:00
easyaspi314 (Devin)
3593758487 Fix minor typo 2020-02-12 20:46:50 -05:00
easyaspi314 (Devin)
261c28b676 Fix Unicode support on Windows, minor Windows tweaks
- Unicode filenames should now work, with a method that works with
and without Unicode mode on Windows.
   - Added a test in the Makefile
 - Use unbuffered stderr output on Windows, fixes output not updating
immediately on MinGW.
 - Fix some missing $(EXT)s in the Makefile, causing Clang to emit
xxhsum instead of xxhsum.exe on Windows, as well as xxhsum's rule
ignoring $(FLAGS).
2020-02-12 20:37:34 -05:00
Yann Collet
16f6cee1bf
Merge pull request #303 from Cyan4973/nullstring
return non-zero on empty string :
2020-02-12 16:45:03 -08:00
Yann Collet
fa0a6ebc7f fixed emptry-string results on Big-Endian 2020-02-12 15:32:45 -08:00
Yann Collet
1a67ed4437 return non-zero on empty string :
answering : https://github.com/Cyan4973/xxHash/issues/175#issuecomment-548108921

The probability of receiving an empty string is larger than random (> 1 / 2^64),
making the generated hash more "common".

For some algorithm, it's an issue if this "more common" value is 0.

Maps it instead to an avalanche of an arbitrary start value (prime64).
The start value is blended with the `seed` and the `secret`,
so that the result is dependent on those ones too.
2020-02-12 15:22:13 -08:00
Yann Collet
aee51d5e7b
Merge pull request #302 from Cyan4973/inline_all
xxhash can be inlined even when previously included
2020-02-12 15:01:41 -08:00
Yann Collet
dadf1ef766 fix xxhash.h include from xxh3.h 2020-02-12 14:36:11 -08:00
Yann Collet
b8d761e5a7 Merge branch 'dev' into inline_all 2020-02-12 14:19:13 -08:00
Yann Collet
99077eca58
Merge pull request #301 from Cyan4973/s390x
S390x
2020-02-12 14:18:37 -08:00