Commit Graph

819 Commits

Author SHA1 Message Date
easyaspi314 (Devin)
e5883c4a33 Improve wording on XXH3_accumulate_512 comment
Adds more details of the changes compared to original UMAC and reduces
run-ons.
2020-02-27 18:48:16 -05:00
easyaspi314 (Devin)
f2a78adb2a Be consistent with wording 2020-02-27 18:10:34 -05:00
easyaspi314 (Devin)
9363917d08 Swap good and bad in 64-bit subset comment, expand
Flows better.
2020-02-27 18:07:48 -05:00
easyaspi314 (Devin)
2685b58257 Add a quick comment about the 64-bit arithmetic subset
Added in the place for the Thumb sanity check because it made the most
sense.

Also noted that the requirements were not much more than XXH32.
2020-02-27 17:53:26 -05:00
easyaspi314 (Devin)
ce652cc503 Note the primary reused subroutines 2020-02-27 17:21:55 -05:00
easyaspi314 (Devin)
1acf797a85 Move 17to128 and 129to240 into a more logical location
Now, it is sorted by length from short to long.

Also, mention how mix32B is slower but better resists multiply by zero.
2020-02-27 15:43:18 -05:00
easyaspi314 (Devin)
fb999b1960 Fix yet another typo
Thanks, @aras-p
2020-02-27 15:36:35 -05:00
easyaspi314 (Devin)
b655d2db93 Fix copy-paste issue 2020-02-27 14:43:30 -05:00
easyaspi314 (Devin)
39feadea2e Fix minor typi 2020-02-27 14:40:07 -05:00
easyaspi314 (Devin)
a1ca4ff0e2 Comment on FARSH keys and Mum variant, indent fix 2020-02-27 14:22:45 -05:00
easyaspi314 (Devin)
0ffbf28843 Add some extra details, fix typo 2020-02-27 14:13:57 -05:00
easyaspi314 (Devin)
bba53920a5 Mention the seed-dependent collisions in mix16B
We know it exists, don't hide it.

It is highly unlikely to occur with proper seeding and random inputs,
and it doesn't occur on the 128-bit version, so make sure people are
aware of it.
2020-02-27 14:00:54 -05:00
easyaspi314 (Devin)
daee1fb34e Document the short hash redo compared to XXH64. 2020-02-27 11:15:12 -05:00
easyaspi314 (Devin)
0767d9601d Document accumulate_512 and scrambleAcc, rename vsx typedefs
Comments are now synchronized across all SIMD implementations, and both
now have a summary block comment.

Additionally, VSX now uses xxh_u64x2 to match the scalar typedefs.
2020-02-27 10:52:22 -05:00
easyaspi314 (Devin)
06d13f72b5 Document 1to3 input setup 2020-02-27 10:17:08 -05:00
easyaspi314 (Devin)
9d278c565a Document the shift in XXH3_len_4to8_128b 2020-02-27 10:13:59 -05:00
easyaspi314 (Devin)
9d375060c8 Document 128-bit ops on XXH3_len_9to16_128b 2020-02-27 10:12:12 -05:00
Yann Collet
b54708f05b xxh128 len[4-8] : minor change
it's not useful to swap input segments
the differentiation from seed is already taken care of by the seed itself
and keeping number in the low bits slightly improves dispersion.
Also may improve speed for specific case len=8 (constant)
2020-02-26 19:32:01 -08:00
Yann Collet
48933f0037
Merge pull request #311 from Cyan4973/xxh128
Simplify len [4,8] for xxh128
2020-02-26 18:13:37 -08:00
Yann Collet
5543c3dbe9 xxh128 len[4-8]: improved distribution quality 2020-02-26 16:07:18 -08:00
Yann Collet
4ca5b6e20e xxh128 : len [4-8]: shift len by << 2
to preserve oddness of multiplier,
as suggested by @easyaspi314.

Also : stats from << 2 look better than << 1
2020-02-26 14:50:28 -08:00
Yann Collet
c6013d80d9 xxh128: slight optimization for len [4,8] 2020-02-24 18:14:32 -08:00
Yann Collet
935f280a76 xxh128: speedup len [4,8] 2020-02-24 17:39:33 -08:00
Yann Collet
b0104d2a82 Merge branch 'dev' into xxh128 2020-02-24 16:16:10 -08:00
Yann Collet
eba72be9fe fixed prng using seed 2020-02-24 16:10:20 -08:00
Yann Collet
3a0c1c3336 fixed PerlinNoise test 2020-02-24 12:25:23 -08:00
Yann Collet
64f655a28e
Merge pull request #304 from easyaspi314/unicode-windows-fixes
Fix Unicode support on Windows, minor Windows tweaks
2020-02-24 09:55:47 -08:00
Yann Collet
71f0f6ffd3
Merge pull request #308 from Cyan4973/mul32len8test
Last variant for the 4to8 segment (mul32to64)
2020-02-24 09:52:33 -08:00
Yann Collet
8d80010b7b fixed seed space reduction
thanks to @easyaspi314
2020-02-22 10:29:04 -08:00
Yann Collet
ee460fdbbb minor variation passing the PRNG test 2020-02-21 10:46:57 -08:00
Yann Collet
c8c4cc0f81
Merge pull request #309 from easyaspi314/compiler-specific-fixes
Compiler specific fixes
2020-02-21 10:14:59 -08:00
easyaspi314 (Devin)
5309e282ce Force -O2 on GCC + AVX2, document split load
GCC for AVX2 goes overboard on the unrolling with -O3, causing slower
code than MSVC and Clang.

We can override that with a pragma that forces GCC to use -O2 instead.

Note that GCC still generates the best scalar and SSE2 code with -O3.

I also mentioned the fact that GCC will split _mm256_loadu_si256 into
two instructions on a generic+avx2 target (which is an optimization that
only applies to the non-AVX2 Sandy and Ivy Bridge chips), and provide
the recommended flags.
2020-02-21 10:11:14 -05:00
easyaspi314 (Devin)
777ec6529a Implement alternative byteshift load
XXH_FORCE_MEMORY_ACCESS==3 will use a byteshift operation. This is
preferred on older compilers which don't inline `memcpy()` or some
big-endian systems without a native byteswap.

Also fix a small typo.
2020-02-21 10:01:04 -05:00
easyaspi314 (Devin)
558c9a97bf XXH_mult32to64: Use downcast+upcast instead of mask
Old/stupid compilers may generate an erroneous mask in XXH_mult32to64,
e.g. ARM GCC 2.95:

```c
xxh_u64 XXH_mult32to64(xxh_u64 a, xxh_u64 b)
{
    return (a & 0xffffffff) * (b & 0xffffffff);
}
```

`arm-gcc-2.95 -O3 -S -march=armv4t -mcpu=arm7tdmi -fomit-frame-pointer`
```asm
XXH_mult32to64:
        push    {r4, r5, r6, r7, lr}
        mov     r5, #0
        mov     r4, #0xffffffff
        mov     r7, r5
        mov     r6, r4
        @ mask 32-bit registers by 0x00000000 and 0xffffffff ?!?!?!
        and     r6, r6, r0
        and     r7, r7, r1
        and     r4, r4, r2
        and     r5, r5, r3
        @ full 64x64->64 multiply
        umull   r0, r1, r6, r4
        mla     r1, r6, r5, r1
        mla     r1, r4, r7, r1
        pop     {r4, r5, r6, r7, pc}
```

Meanwhile, using a downcast followed by an upcast generates the expected
code, albeit with some understandable regalloc weirdness (ARM support
was only recently added).

```c
xxh_u64 XXH_mult32to64(xxh_u64 a, xxh_u64 b)
{
    return (xxh_u64)(xxh_u32)a * (xxh_u64)(xxh_u32)b;
}
```

`arm-gcc-2.95 -O3 -S -march=armv4t -mcpu=arm7tdmi -fomit-frame-pointer`
```asm
XXH_mult32to64:
        push    {r4, lr}
        umull   r3, r4, r0, r2
        mov     r1, r4
        mov     r0, r3
        pop     {r4, pc}
```

Switching to this implementation may also remove the requirement for
`__emulu` on MSVC x86, but it hasn't been tested yet.

All modern compilers should recognize both patterns, but it seems that
old 32-bit compilers will prefer the latter, making this a free
optimization.
2020-02-21 06:09:06 -05:00
easyaspi314 (Devin)
77b74c9dc9 Put __attribute__((aligned)) after struct member
Improves compatibility with old GCC versions.
2020-02-21 06:05:55 -05:00
Yann Collet
00d5458761 minor optimization
slightly faster for ARM and x86
2020-02-20 15:15:56 -08:00
Yann Collet
6c3f96a9cc minor simplification 2020-02-20 09:00:32 -08:00
Yann Collet
de90226410 try a variation of len_4to8 using mult32to64
adding swap instructions to better dispatch bits.
2020-02-20 00:29:12 -08:00
Yann Collet
5fad59e746 make len8 part of 8to16 2020-02-19 21:02:00 -08:00
Yann Collet
77d65ff45f fixed Perlin Noise test 2020-02-19 17:45:21 -08:00
Yann Collet
741400a5b1 disabled checksum validation
while formula is in flux
2020-02-19 16:08:36 -08:00
Yann Collet
f1051cda49 joined len==8 into 4to8 2020-02-19 16:01:20 -08:00
Yann Collet
123db71fd0 improvement vs mul0 2020-02-19 15:58:19 -08:00
Yann Collet
f486b3c7c4 try a mul32to64 formula for len_4to8 2020-02-19 15:45:43 -08:00
Yann Collet
6456c04490 added likely
removed bijectivity
2020-02-19 14:41:56 -08:00
easyaspi314 (Devin)
0197a2b5b0 Improve comments for Windows Unicode wrappers. 2020-02-14 20:48:12 -05:00
easyaspi314 (Devin)
f0627bc321 Add explicit rules for object files to include FLAGS
This fixes the Clang appveyor build.

Now, FLAGS will always be applied to the object files and linker files.
2020-02-14 19:44:30 -05:00
easyaspi314 (Devin)
b5cef6dce0 Fix accidental typo
Don't know how I did that one.
2020-02-14 19:14:08 -05:00
easyaspi314 (Devin)
cac3ca4d5d Implement a safer Unicode test
This new test doesn't use any Unicode in the source files, instead
encoding all UTF-8 and UTF-16 as hex.

The test script will be generated from a C file, in which both a shell
script and a batch script will be generated, as well as the Unicode file
to test.

On Cygwin, MinGW, and MSYS, we will automatically bail from the shell
script to the batch script, as cmd.exe has more reliable Unicode
support, at least on Windows 7 and later.

When the make rule is called, it first checks if `$LANG` contains UTF-8,
defining the (overridable) ENABLE_UNICODE flag. If so, it will skip the
test with a warning.

Also fixed an issue with printf in multiInclude.c causing warnings on
old MinGW versions which expect %I64, and updated the .gitignore.
2020-02-14 19:08:09 -05:00
Yann Collet
993dcf89f7 fixed xxhsum verification values (partial) 2020-02-13 21:53:35 -08:00