xxHash

mirror of https://github.com/FEX-Emu/xxHash.git synced 2024-11-24 15:09:44 +00:00

Author	SHA1	Message	Date
Yann Collet	1c3bfee5f0	Merge pull request #284 from Cyan4973/s390x added s390x tests on travis	2019-12-14 13:31:36 -08:00
easyaspi314 (Devin)	6aa9beeb50	[s390x] Identify s390x in xxhsum	2019-12-14 16:30:22 -05:00
Yann Collet	4863cba0fa	added s390x tests on travis	2019-12-14 13:07:04 -08:00
Yann Collet	9fd98d7c5f	Merge pull request #283 from adamretter/dev Use Travis CI Arm64 and PPC64le arch	2019-12-14 12:50:25 -08:00
Adam Retter	618dd731b6	Use Travis CI Arm64 and PPC64le arch	2019-12-14 21:34:07 +01:00
Yann Collet	ae245428b8	update internal benchmark to reduce risks of rounding bias when a measurement uses a too small amount of time.	2019-12-10 13:12:28 -08:00
Yann Collet	c173142da0	removed ===== comment separators for easier compatibility with git merge conflict. fix #277	2019-12-03 16:44:41 -08:00
Yann Collet	3109b8ef98	Merge pull request #278 from Cyan4973/xxh_implem transferred implementation inside xxhash.h	2019-12-03 16:41:15 -08:00
Yann Collet	cff3643459	fixed make command on Windows, command line doesn't accept environment variables before the command	2019-12-03 15:55:50 -08:00
Yann Collet	8f6e9c92e6	fixed mingw+clang compilation test which do not work using c90 strict mode, due to the (incorrect) presence of `inline` keyword in some standard library's header files. The previous method was disabling the `inline` keyword, but this introduces other problems with more complex multi-files project, such as benchHash, which has been recently added as part of `make test`. Added a new environment variable to disable the c90 compatibility test : NO_C90_TEST=true note : apparently, Appveyor doesn't like comments inside () sub-blocks :(	2019-12-02 17:32:56 -08:00
Yann Collet	4e2199def8	Merge pull request #280 from clockfort/patch-2 fix docs link	2019-11-24 11:49:53 -08:00
Chris Lockfort	a96cd994e7	fix docs link	2019-11-24 04:15:44 -08:00
Yann Collet	aa61b378b3	removed extraneous include also : attempts to fix benchHash compilation on Windows using msys2 environment with clang compiler	2019-11-04 21:17:07 -08:00
Yann Collet	d3a76b3a28	attempt to reconcile usan and benchHash	2019-11-04 17:25:42 -08:00
Yann Collet	a642aba0f5	transferred implementation inside xxhash.h instead of xxhash.c . This seems preferable for some build systems, which don't like the `#include "xxhash.c"` statement when inlining xxhash, as reported by @pdillinger . Note that `xxhash.c` still exists, it just includes the implementation and instantiates it.	2019-11-04 16:04:28 -08:00
Yann Collet	44cd858d49	removed non-ascii character	2019-10-24 15:57:00 -07:00
Yann Collet	8e5fdcbe70	added benchHash compilation to `test` target	2019-10-11 08:12:25 -07:00
Yann Collet	e6dc2443f0	fix: multiple include with/without XXH_STATIC_LINKING	2019-10-11 07:50:06 -07:00
Yann Collet	1cc9634053	improve speed for large inputs by up to +20%	2019-10-08 20:58:02 -07:00
Yann Collet	1ea98d6a38	changed strict-overflow warning to level 2 since it inexplicably complains about `main` since `4e4570f751`	2019-10-07 10:56:02 -07:00
Yann Collet	c6ea1d2107	updated man page regarding -q	2019-10-07 08:34:39 -07:00
Yann Collet	4e4570f751	removed non-error messages from stderr when specifying -q	2019-10-07 08:25:57 -07:00
Yann Collet	c5f72f87af	added guard macro in xxhash.c since it can be included.	2019-10-07 07:56:42 -07:00
Yann Collet	73e6c5206c	simplified type declaration some types were not needed. Also : xxh_u* type are only necessary within libxxhash, not xxhsum	2019-10-07 07:52:32 -07:00
Yann Collet	c996521986	reuse XXHnn_hasht_t definitions for internal typedef xxh_u32 and xxh_u64	2019-10-07 01:00:50 -07:00
Yann Collet	8dab0315ac	Merge pull request #271 from easyaspi314/typedefs Improve typedefs, fix 16-bit int/seed type bug	2019-10-07 00:52:53 -07:00
easyaspi314 (Devin)	368a6f9699	Improve typedefs, fix 16-bit int/seed type bug Fixes #258. ```c BYTE -> xxh_u8 U32 -> xxh_u32 U64 -> xxh_u64 ``` Additionally, I hopefully fixed an issue for targets where int is 16 bits. XXH32 used unsigned int for its seed, and in C90 mode, unsigned int as its U32. This would cause truncation issues. I check limits.h in C90 mode to make sure UINT_MAX == 0xFFFFFFFFUL, and if it isn't, use unsigned long. We should see if we can set up an AVR CI test. Just to run the verification program, though, as the benchmark will take a very long time. Lastly, the seed types are XXH32_hash_t and XXH64_hash_t for XXH32/64. This matches xxhash.c and prevents the aforementioned 16-bit int bug.	2019-10-06 19:14:12 -04:00
Yann Collet	b604c7bee5	Merge pull request #269 from easyaspi314/endianness_fix Fix endianness detection on GCC, avoid XXH_cpuIsLittleEndian.	2019-10-04 13:38:51 -07:00
easyaspi314 (Devin)	028c0fd534	Fix endianness detection on GCC, avoid XXH_cpuIsLittleEndian.	2019-10-04 09:47:09 -04:00
Yann Collet	71a8150b6f	Merge pull request #267 from easyaspi314/main_loop_cleanup [SCALAR] Improve scalar XXH3_accumulate_512 loop	2019-10-03 10:38:43 -07:00
easyaspi314 (Devin)	9b6fa1067f	[SCALAR] Improve scalar XXH3_accumulate_512 loop The previous XXH3_accumulate_512 loop didn't fare well since XXH128 started swapping the addition. Neither GCC nor Clang could follow the barely-readable loop, resulting in garbage code output. This made XXH3 much slower. Take 32-bit scalar ARM. Ignoring loads and potential interleaving optimizations, in the main loop, XXH32 takes 16 cycles for 8 bytes on a typical ARMv6+ CPU, or 2 cpb. ```asm mla r0, r2, r5, r0 @ 4 cycles ror r0, r0, #19 @ 1 cycle mul r0, r0, r6 @ 3 cycles mla r1, r3, r5, r1 @ 4 cycles ror r1, r1, #19 @ 1 cycle mul r1, r1, r6 @ 3 cycles ``` XXH3_64b takes 9, or 1.1 cpb: ```asm adds r0, r0, r2 @ 2 cycles adc r1, r1, r3 @ 1 cycle eor r4, r4, r2 @ 1 cycle eor r5, r5, r3 @ 1 cycle umlal r0, r1, r4, r5 @ 4 cycles ``` Benchmarking on a Pixel 2 XL (with a binary for ARMv4T), previously, XXH32 got 1.8 GB/s, while XXH3_64b got 1.7. Now, XXH3_64b gets 2.3 GB/s! This calculates out well (as additional loads and stores have some overhead). Unlike before, it is better to disable autovectorization completely, as the compiler can't vectorize it as well. (Especially with Clang and NEON, where it extracts to multiply instead of the obvious vmlal.u32!). On that same device in aarch64 mode XXH3's scalar version when compiled with `clang-8 -O3 -DXXH_VECTOR=0 -fno-vectorize -fno-slp-vectorize`, XXH3 went from 2.3 GB/s to 4.3 GB/s. For comparison, the NEON version gets 6.0 GB/s. However, almost all platforms with decent autovectorization have a handwritten intrinsics version which is much faster. For optimal performance, use -fno-tree-vectorize -fno-tree-slp-vectorize (or simply disable SIMD instructions entirely). From testing, ARM32 also prefers forced inlining, so I enabled it. I also fixed some typos.	2019-10-03 09:56:18 -04:00
Yann Collet	96e8472380	documented opened API consistency questions	2019-10-02 14:47:59 -07:00
Yann Collet	28950be40c	updated code comments especially on the canonical representation paragraph, to make it clear it's the preferred format for storage and transmission.	2019-10-02 14:31:14 -07:00
Yann Collet	2b956f86b0	Merge pull request #266 from easyaspi314/varnames Try to improve some variable names.	2019-10-02 11:28:45 -07:00
Yann Collet	b2154f3583	Merge pull request #265 from easyaspi314/fair_bench Use both seeded and unseeded variants in the bench	2019-10-02 11:01:31 -07:00
easyaspi314 (Devin)	1367385768	Fix mixed declaration I need to stop coding before my coffee. :/	2019-10-02 13:01:51 -04:00
easyaspi314 (Devin)	425dbd8d86	Try to improve some variable names. It's a start, but an improvement. I still have more things I would like to change but it is good for now.	2019-10-02 12:28:01 -04:00
easyaspi314 (Devin)	91d6e4927e	Use both seeded and unseeded variants in the bench Previously, XXH3_64bits looked much faster than XXH3_128bits. The truth is that they are similar in long keys. The difference was that XXH3_64b's benchmark was unseeded, putting it at an unfair advantage over XXH128 which is seeded. I don't think I am going to do the dummy bench. That made things moe complicated.	2019-10-01 23:23:55 -04:00
Yann Collet	3df9e91856	Merge pull request #264 from easyaspi314/voidptrfix Reduce void pointers and evil casts.	2019-10-01 20:07:53 -07:00
easyaspi314 (Devin)	cb4adfcc10	Typo	2019-10-01 19:00:28 -04:00
easyaspi314 (Devin)	f90b0aba40	Reduce void pointers and evil casts.	2019-10-01 18:52:21 -04:00
Yann Collet	a44629ace1	Merge pull request #262 from Cyan4973/xxh128_17p improve xxh128 for mid-size	2019-09-30 23:11:01 -07:00
Yann Collet	c8f3fb514c	factorized mix32B changing xxh128 results for len within 129-240.	2019-09-30 22:36:07 -07:00
Yann Collet	9d79fd7bc1	factor mix32	2019-09-30 17:55:46 -07:00
Yann Collet	43b5c76b4c	fixed mistake in last ingested segment	2019-09-30 17:33:38 -07:00
Yann Collet	0bed0c2e5b	updated self-test values for xxh128	2019-09-30 17:26:04 -07:00
Yann Collet	6896c5798f	fix input distribution over 128-bit state for mid-size length 17+	2019-09-30 17:13:59 -07:00
Yann Collet	cd0f5c2209	slightly updated xxh128 at len 1-3 for a slightly better bias	2019-09-28 20:02:55 -07:00
Yann Collet	ea5c659701	update man page	2019-09-28 17:55:41 -07:00
Yann Collet	eab46160a9	update examples and comment	2019-09-28 17:39:00 -07:00

1 2 3 4 5 ...

777 Commits