snappy.mirrorbot@gmail.com 2f0aaf8631 In the fast path for decompressing literals, instead of checking
whether there's 16 bytes free and then checking right afterwards
(when having subtracted the literal size) that there are now 
5 bytes free, just check once for 21 bytes. This skips a compare
and a branch; although it is easily predictable, it is still
a few cycles on a fast path that we would like to get rid of.

Benchmarking this yields very confusing results. On open-source
GCC 4.8.1 on Haswell, we get exactly the expected results; the
benchmarks where we hit the fast path for literals (in particular
the two HTML benchmarks and the protobuf benchmark) give very nice
speedups, and the others are not really affected.

However, benchmarks with Google's GCC branch on other hardware
is much less clear. It seems that we have a weak loss in some cases
(and the win for the “typical” win cases are not nearly as clear),
but that it depends on microarchitecture and plain luck in how we run
the benchmark. Looking at the generated assembler, it seems that
the removal of the if causes other large-scale changes in how the
function is laid out, which makes it likely that this is just bad luck.

Thus, we should keep this change, even though its exact current impact is
unclear; it's a sensible change per se, and dropping it on the basis of
microoptimization for a given compiler (or even branch of a compiler)
would seem like a bad strategy in the long run.

Microbenchmark results (all in 64-bit, opt mode):

  Nehalem, Google GCC:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   76747     75591  1.3GB/s  html           +1.5%
  BM_UFlat/1                  765756    757040  886.3MB/s  urls         +1.2%
  BM_UFlat/2                   10867     10893  10.9GB/s  jpg           -0.2%
  BM_UFlat/3                     124       131  1.4GB/s  jpg_200        -5.3%
  BM_UFlat/4                   31663     31596  2.8GB/s  pdf            +0.2%
  BM_UFlat/5                  314162    308176  1.2GB/s  html4          +1.9%
  BM_UFlat/6                   29668     29746  790.6MB/s  cp           -0.3%
  BM_UFlat/7                   12958     13386  796.4MB/s  c            -3.2%
  BM_UFlat/8                    3596      3682  966.0MB/s  lsp          -2.3%
  BM_UFlat/9                 1019193   1033493  953.3MB/s  xls          -1.4%
  BM_UFlat/10                    239       247  775.3MB/s  xls_200      -3.2%
  BM_UFlat/11                 236411    240271  606.9MB/s  txt1         -1.6%
  BM_UFlat/12                 206639    209768  571.2MB/s  txt2         -1.5%
  BM_UFlat/13                 627803    635722  641.4MB/s  txt3         -1.2%
  BM_UFlat/14                 845932    857816  538.2MB/s  txt4         -1.4%
  BM_UFlat/15                 402107    391670  1.2GB/s  bin            +2.7%
  BM_UFlat/16                    283       279  683.6MB/s  bin_200      +1.4%
  BM_UFlat/17                  46070     46815  781.5MB/s  sum          -1.6%
  BM_UFlat/18                   5053      5163  782.0MB/s  man          -2.1%
  BM_UFlat/19                  79721     76581  1.4GB/s  pb             +4.1%
  BM_UFlat/20                 251158    252330  697.5MB/s  gaviota      -0.5%
  Sum of all benchmarks      4966150   4980396                          -0.3%


  Sandy Bridge, Google GCC:
  
  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   42850     42182  2.3GB/s  html           +1.6%
  BM_UFlat/1                  525660    515816  1.3GB/s  urls           +1.9%
  BM_UFlat/2                    7173      7283  16.3GB/s  jpg           -1.5%
  BM_UFlat/3                      92        91  2.1GB/s  jpg_200        +1.1%
  BM_UFlat/4                   15147     14872  5.9GB/s  pdf            +1.8%
  BM_UFlat/5                  199936    192116  2.0GB/s  html4          +4.1%
  BM_UFlat/6                   12796     12443  1.8GB/s  cp             +2.8%
  BM_UFlat/7                    6588      6400  1.6GB/s  c              +2.9%
  BM_UFlat/8                    2010      1951  1.8GB/s  lsp            +3.0%
  BM_UFlat/9                  761124    763049  1.3GB/s  xls            -0.3%
  BM_UFlat/10                    186       189  1016.1MB/s  xls_200     -1.6%
  BM_UFlat/11                 159354    158460  918.6MB/s  txt1         +0.6%
  BM_UFlat/12                 139732    139950  856.1MB/s  txt2         -0.2%
  BM_UFlat/13                 429917    425027  961.7MB/s  txt3         +1.2%
  BM_UFlat/14                 585255    587324  785.8MB/s  txt4         -0.4%
  BM_UFlat/15                 276186    266173  1.8GB/s  bin            +3.8%
  BM_UFlat/16                    205       207  925.5MB/s  bin_200      -1.0%
  BM_UFlat/17                  24925     24935  1.4GB/s  sum            -0.0%
  BM_UFlat/18                   2632      2576  1.5GB/s  man            +2.2%
  BM_UFlat/19                  40546     39108  2.8GB/s  pb             +3.7%
  BM_UFlat/20                 175803    168209  1048.9MB/s  gaviota     +4.5%
  Sum of all benchmarks      3408117   3368361                          +1.2%


  Haswell, upstream GCC 4.8.1:

  Benchmark                Base (ns)  New (ns)                       Improvement
  ------------------------------------------------------------------------------
  BM_UFlat/0                   46308     40641  2.3GB/s  html          +13.9%
  BM_UFlat/1                  513385    514706  1.3GB/s  urls           -0.3%
  BM_UFlat/2                    6197      6151  19.2GB/s  jpg           +0.7%
  BM_UFlat/3                      61        61  3.0GB/s  jpg_200        +0.0%
  BM_UFlat/4                   13551     13429  6.5GB/s  pdf            +0.9%
  BM_UFlat/5                  198317    190243  2.0GB/s  html4          +4.2%
  BM_UFlat/6                   14768     12560  1.8GB/s  cp            +17.6%
  BM_UFlat/7                    6453      6447  1.6GB/s  c              +0.1%
  BM_UFlat/8                    1991      1980  1.8GB/s  lsp            +0.6%
  BM_UFlat/9                  766947    770424  1.2GB/s  xls            -0.5%
  BM_UFlat/10                    170       169  1.1GB/s  xls_200        +0.6%
  BM_UFlat/11                 164350    163554  888.7MB/s  txt1         +0.5%
  BM_UFlat/12                 145444    143830  832.1MB/s  txt2         +1.1%
  BM_UFlat/13                 437849    438413  929.2MB/s  txt3         -0.1%
  BM_UFlat/14                 603587    605309  759.8MB/s  txt4         -0.3%
  BM_UFlat/15                 249799    248067  1.9GB/s  bin            +0.7%
  BM_UFlat/16                    191       188  1011.4MB/s  bin_200     +1.6%
  BM_UFlat/17                  26064     24778  1.4GB/s  sum            +5.2%
  BM_UFlat/18                   2620      2601  1.5GB/s  man            +0.7%
  BM_UFlat/19                  44551     37373  3.0GB/s  pb            +19.2%
  BM_UFlat/20                 165408    164584  1.0GB/s  gaviota        +0.5%
  Sum of all benchmarks      3408011   3385508                          +0.7%


git-svn-id: https://snappy.googlecode.com/svn/trunk@78 03e5f5b5-db94-4691-08a0-1a8bf15f6143
2013-06-30 19:24:03 +00:00
2013-02-05 14:36:15 +00:00
2013-02-05 14:36:15 +00:00

Snappy, a fast compressor/decompressor.


Introduction
============

Snappy is a compression/decompression library. It does not aim for maximum
compression, or compatibility with any other compression library; instead,
it aims for very high speeds and reasonable compression. For instance,
compared to the fastest mode of zlib, Snappy is an order of magnitude faster
for most inputs, but the resulting compressed files are anywhere from 20% to
100% bigger. (For more information, see "Performance", below.)

Snappy has the following properties:

 * Fast: Compression speeds at 250 MB/sec and beyond, with no assembler code.
   See "Performance" below.
 * Stable: Over the last few years, Snappy has compressed and decompressed
   petabytes of data in Google's production environment. The Snappy bitstream
   format is stable and will not change between versions.
 * Robust: The Snappy decompressor is designed not to crash in the face of
   corrupted or malicious input.
 * Free and open source software: Snappy is licensed under a BSD-type license.
   For more information, see the included COPYING file.

Snappy has previously been called "Zippy" in some Google presentations
and the like.


Performance
===========
 
Snappy is intended to be fast. On a single core of a Core i7 processor
in 64-bit mode, it compresses at about 250 MB/sec or more and decompresses at
about 500 MB/sec or more. (These numbers are for the slowest inputs in our
benchmark suite; others are much faster.) In our tests, Snappy usually
is faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,
etc.) while achieving comparable compression ratios.

Typical compression ratios (based on the benchmark suite) are about 1.5-1.7x
for plain text, about 2-4x for HTML, and of course 1.0x for JPEGs, PNGs and
other already-compressed data. Similar numbers for zlib in its fastest mode
are 2.6-2.8x, 3-7x and 1.0x, respectively. More sophisticated algorithms are
capable of achieving yet higher compression rates, although usually at the
expense of speed. Of course, compression ratio will vary significantly with
the input.

Although Snappy should be fairly portable, it is primarily optimized
for 64-bit x86-compatible processors, and may run slower in other environments.
In particular:

 - Snappy uses 64-bit operations in several places to process more data at
   once than would otherwise be possible.
 - Snappy assumes unaligned 32- and 64-bit loads and stores are cheap.
   On some platforms, these must be emulated with single-byte loads 
   and stores, which is much slower.
 - Snappy assumes little-endian throughout, and needs to byte-swap data in
   several places if running on a big-endian platform.

Experience has shown that even heavily tuned code can be improved.
Performance optimizations, whether for 64-bit x86 or other platforms,
are of course most welcome; see "Contact", below.


Usage
=====

Note that Snappy, both the implementation and the main interface,
is written in C++. However, several third-party bindings to other languages
are available; see the Google Code page at http://code.google.com/p/snappy/
for more information. Also, if you want to use Snappy from C code, you can
use the included C bindings in snappy-c.h.

To use Snappy from your own C++ program, include the file "snappy.h" from
your calling file, and link against the compiled library.

There are many ways to call Snappy, but the simplest possible is

  snappy::Compress(input.data(), input.size(), &output);

and similarly

  snappy::Uncompress(input.data(), input.size(), &output);

where "input" and "output" are both instances of std::string.

There are other interfaces that are more flexible in various ways, including
support for custom (non-array) input sources. See the header file for more
information.


Tests and benchmarks
====================

When you compile Snappy, snappy_unittest is compiled in addition to the
library itself. You do not need it to use the compressor from your own library,
but it contains several useful components for Snappy development.

First of all, it contains unit tests, verifying correctness on your machine in
various scenarios. If you want to change or optimize Snappy, please run the
tests to verify you have not broken anything. Note that if you have the
Google Test library installed, unit test behavior (especially failures) will be
significantly more user-friendly. You can find Google Test at

  http://code.google.com/p/googletest/

You probably also want the gflags library for handling of command-line flags;
you can find it at

  http://code.google.com/p/google-gflags/

In addition to the unit tests, snappy contains microbenchmarks used to
tune compression and decompression performance. These are automatically run
before the unit tests, but you can disable them using the flag
--run_microbenchmarks=false if you have gflags installed (otherwise you will
need to edit the source).

Finally, snappy can benchmark Snappy against a few other compression libraries
(zlib, LZO, LZF, FastLZ and QuickLZ), if they were detected at configure time.
To benchmark using a given file, give the compression algorithm you want to test
Snappy against (e.g. --zlib) and then a list of one or more file names on the
command line. The testdata/ directory contains the files used by the
microbenchmark, which should provide a reasonably balanced starting point for
benchmarking. (Note that baddata[1-3].snappy are not intended as benchmarks; they
are used to verify correctness in the presence of corrupted data in the unit
test.)


Contact
=======

Snappy is distributed through Google Code. For the latest version, a bug tracker,
and other information, see

  http://code.google.com/p/snappy/
Description
⚠️ ARCHIVED: Original GitHub repository no longer exists. Preserved as backup on 2026-01-31T05:36:49.290Z
Readme 2.3 MiB
Languages
C++ 93.1%
M4 3.5%
C 2.7%
Makefile 0.6%