Using "_WIN32" to decide,
if the MSVC extensions _aligned_malloc / _aligned_free are available
is a bug that breaks other Compiler on Windows. (OpenWatcom as Example)
Regards ... Detlef
So a lot of alterations had to be done to make this not worse and
so far, it's not really better, either. I had to force inlining for
the adler routine, I had to remove the x4 load instruction otherwise
pipelining stalled, and I had to use restrict pointers with a copy
idiom for GCC to inline a copy routine for the tail.
Still, we see a small benefit in benchmarks, particularly when done
with size of our window or larger. There's also an added benefit that
this will fix#1824.
Mark crc32_c and crc32_braid functions as internal, and remove prefix.
Reorder contents of generic_functions, and remove Z_INTERNAL hints from declarations.
Add test/benchmark output to indicate whether Chorba is used.
- Rename N and W to BRAID_N and BRAID_W
- Remove override capabilities for BRAID_N and BRAID_W
- Fix formatting in crc32_braid_tbl.h
- Make makecrct not rely on crc32_braid_p.h
Evidently this instruction, despite the intrinsic having a register operand,
is a memory-register instruction. There seems to be no alignment requirement
for the source operand. Because of this, compilers when not optimized are doing
the unaligned load and then dumping back to the stack to do the broadcasting load.
In doing this, MSVC seems to be dumping to the stack with an aligned move at an
unaligned address, causing a segfault. GCC does not seem to make this mistake, as
it stashes to an aligned address.
If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
bit load followed by a vinserti128. This _should_ fix this (issue #1861).
We have to disable the CRC32-VX implementation for some Clang versions
(18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics.
Warning as an error with GCC from Uubuntu 24.04:
```
/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj]
```
- Remove obsolete checks
- Fix checks that are inconsistent
- Stop compiling compare256/longest_match variants that never gets called
- Improve how the generic compare256 functions are handled.
- Allow overriding OPTIMAL_CMP
This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.
This fixes a rightful complaint from the alignment sanitizer that we
alias memory in an unaligned fashion. A nice added bonus is that this
improves performance a tiny bit on the larger buffers, perhaps due to
loops that idiomatically decrement a count and increment a single buffer
pointer rather than the maze of conditional pointer reassignments.
While here, let's write a unit test just for this. Since this is the only
variant that accesses memory in a potentially unaligned fashion that doesn't
explicitly go byte by byte or use intrinsics that don't require alignment,
we'll enable it only for this function for now. Adding more tests later if
need be should be possible. For everything else not crc, we're relying on
ubsan to hopefully catch things by chance.