mirror of
https://github.com/xenia-project/snappy.git
synced 2026-01-31 01:25:21 +01:00
2f0aaf8631d8fb2475ca1a6687c181efb14ed286
whether there's 16 bytes free and then checking right afterwards (when having subtracted the literal size) that there are now 5 bytes free, just check once for 21 bytes. This skips a compare and a branch; although it is easily predictable, it is still a few cycles on a fast path that we would like to get rid of. Benchmarking this yields very confusing results. On open-source GCC 4.8.1 on Haswell, we get exactly the expected results; the benchmarks where we hit the fast path for literals (in particular the two HTML benchmarks and the protobuf benchmark) give very nice speedups, and the others are not really affected. However, benchmarks with Google's GCC branch on other hardware is much less clear. It seems that we have a weak loss in some cases (and the win for the “typical” win cases are not nearly as clear), but that it depends on microarchitecture and plain luck in how we run the benchmark. Looking at the generated assembler, it seems that the removal of the if causes other large-scale changes in how the function is laid out, which makes it likely that this is just bad luck. Thus, we should keep this change, even though its exact current impact is unclear; it's a sensible change per se, and dropping it on the basis of microoptimization for a given compiler (or even branch of a compiler) would seem like a bad strategy in the long run. Microbenchmark results (all in 64-bit, opt mode): Nehalem, Google GCC: Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------------------ BM_UFlat/0 76747 75591 1.3GB/s html +1.5% BM_UFlat/1 765756 757040 886.3MB/s urls +1.2% BM_UFlat/2 10867 10893 10.9GB/s jpg -0.2% BM_UFlat/3 124 131 1.4GB/s jpg_200 -5.3% BM_UFlat/4 31663 31596 2.8GB/s pdf +0.2% BM_UFlat/5 314162 308176 1.2GB/s html4 +1.9% BM_UFlat/6 29668 29746 790.6MB/s cp -0.3% BM_UFlat/7 12958 13386 796.4MB/s c -3.2% BM_UFlat/8 3596 3682 966.0MB/s lsp -2.3% BM_UFlat/9 1019193 1033493 953.3MB/s xls -1.4% BM_UFlat/10 239 247 775.3MB/s xls_200 -3.2% BM_UFlat/11 236411 240271 606.9MB/s txt1 -1.6% BM_UFlat/12 206639 209768 571.2MB/s txt2 -1.5% BM_UFlat/13 627803 635722 641.4MB/s txt3 -1.2% BM_UFlat/14 845932 857816 538.2MB/s txt4 -1.4% BM_UFlat/15 402107 391670 1.2GB/s bin +2.7% BM_UFlat/16 283 279 683.6MB/s bin_200 +1.4% BM_UFlat/17 46070 46815 781.5MB/s sum -1.6% BM_UFlat/18 5053 5163 782.0MB/s man -2.1% BM_UFlat/19 79721 76581 1.4GB/s pb +4.1% BM_UFlat/20 251158 252330 697.5MB/s gaviota -0.5% Sum of all benchmarks 4966150 4980396 -0.3% Sandy Bridge, Google GCC: Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------------------ BM_UFlat/0 42850 42182 2.3GB/s html +1.6% BM_UFlat/1 525660 515816 1.3GB/s urls +1.9% BM_UFlat/2 7173 7283 16.3GB/s jpg -1.5% BM_UFlat/3 92 91 2.1GB/s jpg_200 +1.1% BM_UFlat/4 15147 14872 5.9GB/s pdf +1.8% BM_UFlat/5 199936 192116 2.0GB/s html4 +4.1% BM_UFlat/6 12796 12443 1.8GB/s cp +2.8% BM_UFlat/7 6588 6400 1.6GB/s c +2.9% BM_UFlat/8 2010 1951 1.8GB/s lsp +3.0% BM_UFlat/9 761124 763049 1.3GB/s xls -0.3% BM_UFlat/10 186 189 1016.1MB/s xls_200 -1.6% BM_UFlat/11 159354 158460 918.6MB/s txt1 +0.6% BM_UFlat/12 139732 139950 856.1MB/s txt2 -0.2% BM_UFlat/13 429917 425027 961.7MB/s txt3 +1.2% BM_UFlat/14 585255 587324 785.8MB/s txt4 -0.4% BM_UFlat/15 276186 266173 1.8GB/s bin +3.8% BM_UFlat/16 205 207 925.5MB/s bin_200 -1.0% BM_UFlat/17 24925 24935 1.4GB/s sum -0.0% BM_UFlat/18 2632 2576 1.5GB/s man +2.2% BM_UFlat/19 40546 39108 2.8GB/s pb +3.7% BM_UFlat/20 175803 168209 1048.9MB/s gaviota +4.5% Sum of all benchmarks 3408117 3368361 +1.2% Haswell, upstream GCC 4.8.1: Benchmark Base (ns) New (ns) Improvement ------------------------------------------------------------------------------ BM_UFlat/0 46308 40641 2.3GB/s html +13.9% BM_UFlat/1 513385 514706 1.3GB/s urls -0.3% BM_UFlat/2 6197 6151 19.2GB/s jpg +0.7% BM_UFlat/3 61 61 3.0GB/s jpg_200 +0.0% BM_UFlat/4 13551 13429 6.5GB/s pdf +0.9% BM_UFlat/5 198317 190243 2.0GB/s html4 +4.2% BM_UFlat/6 14768 12560 1.8GB/s cp +17.6% BM_UFlat/7 6453 6447 1.6GB/s c +0.1% BM_UFlat/8 1991 1980 1.8GB/s lsp +0.6% BM_UFlat/9 766947 770424 1.2GB/s xls -0.5% BM_UFlat/10 170 169 1.1GB/s xls_200 +0.6% BM_UFlat/11 164350 163554 888.7MB/s txt1 +0.5% BM_UFlat/12 145444 143830 832.1MB/s txt2 +1.1% BM_UFlat/13 437849 438413 929.2MB/s txt3 -0.1% BM_UFlat/14 603587 605309 759.8MB/s txt4 -0.3% BM_UFlat/15 249799 248067 1.9GB/s bin +0.7% BM_UFlat/16 191 188 1011.4MB/s bin_200 +1.6% BM_UFlat/17 26064 24778 1.4GB/s sum +5.2% BM_UFlat/18 2620 2601 1.5GB/s man +0.7% BM_UFlat/19 44551 37373 3.0GB/s pb +19.2% BM_UFlat/20 165408 164584 1.0GB/s gaviota +0.5% Sum of all benchmarks 3408011 3385508 +0.7% git-svn-id: https://snappy.googlecode.com/svn/trunk@78 03e5f5b5-db94-4691-08a0-1a8bf15f6143
Snappy, a fast compressor/decompressor. Introduction ============ Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. (For more information, see "Performance", below.) Snappy has the following properties: * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code. See "Performance" below. * Stable: Over the last few years, Snappy has compressed and decompressed petabytes of data in Google's production environment. The Snappy bitstream format is stable and will not change between versions. * Robust: The Snappy decompressor is designed not to crash in the face of corrupted or malicious input. * Free and open source software: Snappy is licensed under a BSD-type license. For more information, see the included COPYING file. Snappy has previously been called "Zippy" in some Google presentations and the like. Performance =========== Snappy is intended to be fast. On a single core of a Core i7 processor in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more. (These numbers are for the slowest inputs in our benchmark suite; others are much faster.) In our tests, Snappy usually is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable compression ratios. Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and other already-compressed data. Similar numbers for zlib in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are capable of achieving yet higher compression rates, although usually at the expense of speed. Of course, compression ratio will vary significantly with the input. Although Snappy should be fairly portable, it is primarily optimized for 64-bit x86-compatible processors, and may run slower in other environments. In particular: - Snappy uses 64-bit operations in several places to process more data at once than would otherwise be possible. - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap. On some platforms, these must be emulated with single-byte loads and stores, which is much slower. - Snappy assumes little-endian throughout, and needs to byte-swap data in several places if running on a big-endian platform. Experience has shown that even heavily tuned code can be improved. Performance optimizations, whether for 64-bit x86 or other platforms, are of course most welcome; see "Contact", below. Usage ===== Note that Snappy, both the implementation and the main interface, is written in C++. However, several third-party bindings to other languages are available; see the Google Code page at http://code.google.com/p/snappy/ for more information. Also, if you want to use Snappy from C code, you can use the included C bindings in snappy-c.h. To use Snappy from your own C++ program, include the file "snappy.h" from your calling file, and link against the compiled library. There are many ways to call Snappy, but the simplest possible is snappy::Compress(input.data(), input.size(), &output); and similarly snappy::Uncompress(input.data(), input.size(), &output); where "input" and "output" are both instances of std::string. There are other interfaces that are more flexible in various ways, including support for custom (non-array) input sources. See the header file for more information. Tests and benchmarks ==================== When you compile Snappy, snappy_unittest is compiled in addition to the library itself. You do not need it to use the compressor from your own library, but it contains several useful components for Snappy development. First of all, it contains unit tests, verifying correctness on your machine in various scenarios. If you want to change or optimize Snappy, please run the tests to verify you have not broken anything. Note that if you have the Google Test library installed, unit test behavior (especially failures) will be significantly more user-friendly. You can find Google Test at http://code.google.com/p/googletest/ You probably also want the gflags library for handling of command-line flags; you can find it at http://code.google.com/p/google-gflags/ In addition to the unit tests, snappy contains microbenchmarks used to tune compression and decompression performance. These are automatically run before the unit tests, but you can disable them using the flag --run_microbenchmarks=false if you have gflags installed (otherwise you will need to edit the source). Finally, snappy can benchmark Snappy against a few other compression libraries (zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time. To benchmark using a given file, give the compression algorithm you want to test Snappy against (e.g. --zlib) and then a list of one or more file names on the command line. The testdata/ directory contains the files used by the microbenchmark, which should provide a reasonably balanced starting point for benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they are used to verify correctness in the presence of corrupted data in the unit test.) Contact ======= Snappy is distributed through Google Code. For the latest version, a bug tracker, and other information, see http://code.google.com/p/snappy/
Description
⚠️ ARCHIVED: Original GitHub repository no longer exists. Preserved as backup on 2026-01-31T05:36:49.290Z
Languages
C++
93.1%
M4
3.5%
C
2.7%
Makefile
0.6%