mirror of
https://github.com/RPCS3/llvm.git
synced 2024-11-27 21:50:29 +00:00
[libFuzzer] Improve documentation
Reviewers: kcc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19585 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267892 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
12db936b00
commit
361c970533
@ -1,6 +1,6 @@
|
||||
========================================================
|
||||
LibFuzzer -- a library for coverage-guided fuzz testing.
|
||||
========================================================
|
||||
=======================================================
|
||||
libFuzzer – a library for coverage-guided fuzz testing.
|
||||
=======================================================
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
@ -8,68 +8,33 @@ LibFuzzer -- a library for coverage-guided fuzz testing.
|
||||
Introduction
|
||||
============
|
||||
|
||||
libFuzzer -- library for in-process evolutionary fuzzing of other libraries.
|
||||
LibFuzzer is a library for in-process, coverage-guided, evolutionary fuzzing
|
||||
of other libraries.
|
||||
|
||||
The typical workflow looks like the following.
|
||||
First, implement a fuzzing target function, like this::
|
||||
LibFuzzer is similar in concept to American Fuzzy Lop (AFL_), but it performs
|
||||
all of its fuzzing inside a single process. This in-process fuzzing can be more
|
||||
restrictive and fragile, but is potentially much faster as there is no overhead
|
||||
for process start-up.
|
||||
|
||||
// fuzz_target.cc
|
||||
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
|
||||
DoSomethingInterestingWithMyAPI(Data, Size);
|
||||
return 0; // Non-zero return values are reserved for future use.
|
||||
}
|
||||
|
||||
Next, build the Fuzzer library as a static archive. Note that libFuzzer contains the `main()` function::
|
||||
|
||||
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
|
||||
# Alternative: get libFuzzer from a dedicated git mirror:
|
||||
# git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer
|
||||
clang++ -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
|
||||
ar ruv libFuzzer.a Fuzzer*.o
|
||||
|
||||
Then build the target function and the library you are going to test.
|
||||
You should use SanitizerCoverage_ and one of ASan, MSan, or UBSan.
|
||||
Link it with `libFuzzer.a`::
|
||||
|
||||
clang -fsanitize-coverage=edge -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer
|
||||
|
||||
Create a directory with the initial "seed" samlpes.
|
||||
For some input types libFuzzer will work just fine w/o any seeds,
|
||||
but for complex inputs this step is very important::
|
||||
|
||||
mkdir CORPUS_DIR
|
||||
cp /some/input/samples/* CORPUS_DIR
|
||||
|
||||
Finally, run the fuzzer on the `CORPUS_DIR`::
|
||||
|
||||
./my_fuzzer CORPUS_DIR # -max_len=1000 -jobs=20 -more_lags=...
|
||||
The fuzzer is linked with the library under test, and feeds fuzzed inputs to the
|
||||
library via a specific fuzzing entrypoint (aka "target function"); the fuzzer
|
||||
then tracks which areas of the code are reached, and generates mutations on the
|
||||
corpus of input data in order to maximize the code coverage. The code coverage
|
||||
information for libFuzzer is provided by LLVM's SanitizerCoverage_
|
||||
instrumentation.
|
||||
|
||||
|
||||
As new interesting test cases are discovered they will be added to the corpus.
|
||||
If a bug is discovered by the sanitizer (ASan, etc) it will be reported as usual and the reproducer
|
||||
will be written to disk.
|
||||
Each Fuzzer process is single-threaded (unless the library starts its own
|
||||
threads). You can run the libFuzzer on the same corpus in multiple processes
|
||||
in parallel (use the flags `-jobs=N` and `-workers=N`).
|
||||
Versions
|
||||
========
|
||||
|
||||
libFuzzer is similar in concept to AFL_,
|
||||
but uses in-process Fuzzing, which is more fragile and restrictive, but
|
||||
potentially much faster as it has no overhead for process start-up.
|
||||
It uses LLVM's SanitizerCoverage_ instrumentation to get in-process
|
||||
coverage-feedback
|
||||
LibFuzzer is under active development so a current (or at least very recent)
|
||||
version of Clang is the only supported variant.
|
||||
|
||||
The code resides in the LLVM repository,
|
||||
requires the fresh Clang compiler to build
|
||||
and is used to fuzz various parts of LLVM,
|
||||
but the Fuzzer itself does not (and should not) depend on any
|
||||
part of LLVM and can be used for other projects w/o requiring the rest of LLVM.
|
||||
(If `building Clang from trunk`_ is too time-consuming or difficult, then
|
||||
the Clang binaries that the Chromium developers build are likely to be
|
||||
fairly recent:
|
||||
|
||||
Fresh Clang
|
||||
-----------
|
||||
|
||||
If you don't know where to get the fresh Clang binaries and don't want to build
|
||||
it from trunk (why wouldn't you?) you may grab the fresh Clang binaries
|
||||
maintained by the Chromium developers::
|
||||
.. code-block:: console
|
||||
|
||||
mkdir TMP_CLANG
|
||||
cd TMP_CLANG
|
||||
@ -77,42 +42,291 @@ maintained by the Chromium developers::
|
||||
cd ..
|
||||
TMP_CLANG/clang/scripts/update.py
|
||||
|
||||
This will install a reasonably fresh and well tested clang binaries as
|
||||
`third_party/llvm-build/Release+Asserts/bin/clang`
|
||||
This installs the Clang binary as
|
||||
``./third_party/llvm-build/Release+Asserts/bin/clang``)
|
||||
|
||||
Usage
|
||||
=====
|
||||
To run fuzzing pass 0 or more directories. New samples will be written into `dir1`, other directories will be read once during startup.::
|
||||
The libFuzzer code resides in the LLVM repository, and requires a recent Clang
|
||||
compiler to build (and is used to `fuzz various parts of LLVM itself`_).
|
||||
However the fuzzer itself does not (and should not) depend on any part of LLVM
|
||||
infrastructure and can be used for other projects without requiring the rest
|
||||
of LLVM.
|
||||
|
||||
./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ]
|
||||
|
||||
To run individual tests without fuzzing pass 1 or more files::
|
||||
Corpus
|
||||
======
|
||||
|
||||
./fuzzer [-flag1=val1 [-flag2=val2 ...] ] file1 [file2 ...]
|
||||
Coverage-guided fuzzers like libFuzzer rely on a corpus of sample inputs for the
|
||||
code under test. This corpus should ideally be seeded with a varied collection
|
||||
of valid and invalid inputs for the code under test; for example, for a graphics
|
||||
library the initial corpus might hold a variety of different small PNG/JPG/GIF
|
||||
files. The fuzzer generates random mutations based around the sample inputs in
|
||||
the current corpus. If a mutation triggers execution of a previously-uncovered
|
||||
path in the code under test, then that mutation is saved to the corpus for
|
||||
future variations.
|
||||
|
||||
The most important flags are::
|
||||
LibFuzzer will work fine without any initial seeds, but will be less
|
||||
efficient. In particular, if the library under test accepts complex,
|
||||
structured inputs then starting from a varied corpus is very important.
|
||||
|
||||
seed 0 Random seed. If 0, seed is generated.
|
||||
runs -1 Number of individual test runs (-1 for infinite runs).
|
||||
max_len 0 Maximum length of the test input. If 0, libFuzzer tries to guess a good value based on the corpus and reports it.
|
||||
timeout 1200 Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort.
|
||||
timeout_exitcode 77 Unless abort_on_timeout is set, use this exitcode on timeout.
|
||||
max_total_time 0 If positive, indicates the maximal total time in seconds to run the fuzzer.
|
||||
help 0 Print help.
|
||||
merge 0 If 1, the 2-nd, 3-rd, etc corpora will be merged into the 1-st corpus. Only interesting units will be taken.
|
||||
jobs 0 Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log.
|
||||
workers 0 Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used.
|
||||
use_traces 0 Experimental: use instruction traces
|
||||
only_ascii 0 If 1, generate only ASCII (isprint+isspace) inputs.
|
||||
artifact_prefix "" Write fuzzing artifacts (crash, timeout, or slow inputs) as $(artifact_prefix)file
|
||||
exact_artifact_path "" Write the single artifact on failure (crash, timeout) as $(exact_artifact_path). This overrides -artifact_prefix and will not use checksum in the file name. Do not use the same path for several parallel processes.
|
||||
print_final_stats 0 If 1, print statistics at exit.
|
||||
close_fd_mask 0 If 1, close stdout at startup; if 2, close stderr; if 3, close both. Be careful, this will also close e.g. asan's stderr/stdout.
|
||||
The corpus can also act as a sanity/regression check, to confirm that the
|
||||
fuzzing entrypoint still works and that all of the sample inputs run through
|
||||
the code under test without problems.
|
||||
|
||||
|
||||
Getting Started
|
||||
===============
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
|
||||
Building
|
||||
--------
|
||||
|
||||
The first step for using libFuzzer on a library is to implement a fuzzing
|
||||
target function that accepts a sequence of bytes, like this:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// fuzz_target.cc
|
||||
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
|
||||
DoSomethingInterestingWithMyAPI(Data, Size);
|
||||
return 0; // Non-zero return values are reserved for future use.
|
||||
}
|
||||
|
||||
Next, build the libFuzzer library as a static archive, without any sanitizer
|
||||
options. Note that the libFuzzer library contains the ``main()`` function:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
|
||||
# Alternative: get libFuzzer from a dedicated git mirror:
|
||||
# git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer
|
||||
clang++ -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
|
||||
ar ruv libFuzzer.a Fuzzer*.o
|
||||
|
||||
Then build the fuzzing target function and the library under test using
|
||||
the SanitizerCoverage_ option, which instruments the code so that the fuzzer
|
||||
can retrieve code coverage information (to guide the fuzzing). Linking with
|
||||
the libFuzzer code then gives an fuzzer executable.
|
||||
|
||||
You should also enable one or more of the *sanitizers*, which help to expose
|
||||
latent bugs by making incorrect behavior generate errors at runtime:
|
||||
|
||||
- AddressSanitizer_ detects memory access errors.
|
||||
- MemorySanitizer_ detects uninitialized reads: code whose behavior relies on memory
|
||||
contents that have not been initialized to a specific value.
|
||||
- UndefinedBehaviorSanitizer_ detects the use of various features of C/C++ that are explicitly
|
||||
listed as resulting in undefined behavior.
|
||||
|
||||
Finally, link with ``libFuzzer.a``::
|
||||
|
||||
clang -fsanitize-coverage=edge -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer
|
||||
|
||||
Running
|
||||
-------
|
||||
|
||||
To run the fuzzer, first create a Corpus_ directory that holds the
|
||||
initial "seed" sample inputs:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
mkdir CORPUS_DIR
|
||||
cp /some/input/samples/* CORPUS_DIR
|
||||
|
||||
Then run the fuzzer on the corpus directory:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
./my_fuzzer CORPUS_DIR # -max_len=1000 -jobs=20 ...
|
||||
|
||||
As the fuzzer discovers new interesting test cases (i.e. test cases that
|
||||
trigger coverage of new paths through the code under test), those test cases
|
||||
will be added to the corpus directory.
|
||||
|
||||
By default, the fuzzing process will continue indefinitely – at least until
|
||||
a bug is found. Any crashes or sanitizer failures will be reported as usual,
|
||||
stopping the fuzzing process, and the particular input that triggered the bug
|
||||
will be written to disk (typically as ``crash-<sha1>`` or ``timeout-<sha1>``).
|
||||
|
||||
|
||||
Parallel Fuzzing
|
||||
----------------
|
||||
|
||||
Each libFuzzer process is single-threaded, unless the library under test starts
|
||||
its own threads. However, it is possible to run multiple libFuzzer processes in
|
||||
parallel with a shared corpus directory; this has the advantage that any new
|
||||
inputs found by one fuzzer process will be available to the other fuzzer
|
||||
processes (unless you disable this with the ``-reload=0`` option).
|
||||
|
||||
This is primarily controlled by the ``-jobs=N`` option, which indicates that
|
||||
that `N` fuzzing jobs should be run to completion (i.e. until a bug is found or
|
||||
time/iteration limits are reached). These jobs will be run across a set of
|
||||
worker processes, by default using half of the available CPU cores; the count of
|
||||
worker processes can be overridden by the ``-workers=N`` option. For example,
|
||||
running with ``-jobs=30`` on a 12-core machine would run 6 workers by default,
|
||||
with each worker averaging 5 bugs by completion of the entire process.
|
||||
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
To run the fuzzer, pass zero or more corpus directories as command line
|
||||
arguments. The fuzzer will read test inputs from each of these corpus
|
||||
directories, and any new test inputs that are generated will be written
|
||||
back to the first corpus directory:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ]
|
||||
|
||||
If a list of files (rather than directories) are passed to the fuzzer program,
|
||||
then it will re-run those files as test inputs but will not perform any fuzzing.
|
||||
In this mode the fuzzer binary can be used as a regression test (e.g. on a
|
||||
continuous integration system) to check the target function and saved inputs
|
||||
still work.
|
||||
|
||||
The most important command line options are:
|
||||
|
||||
``-help``
|
||||
Print help message.
|
||||
``-seed``
|
||||
Random seed. If 0 (the default), the seed is generated.
|
||||
``-runs``
|
||||
Number of individual test runs, -1 (the default) to run indefinitely.
|
||||
``-max_len``
|
||||
Maximum length of a test input. If 0 (the default), libFuzzer tries to guess
|
||||
a good value based on the corpus (and reports it).
|
||||
``-timeout``
|
||||
Timeout in seconds, default 1200. If an input takes longer than this timeout,
|
||||
the process is treated as a failure case.
|
||||
``-timeout_exitcode``
|
||||
Exit code (default 77) to emit when terminating due to timeout, when
|
||||
``-abort_on_timeout`` is not set.
|
||||
``-max_total_time``
|
||||
If positive, indicates the maximum total time in seconds to run the fuzzer.
|
||||
If 0 (the default), run indefinitely.
|
||||
``-merge``
|
||||
If set to 1, any corpus inputs from the 2nd, 3rd etc. corpus directories
|
||||
that trigger new code coverage will be merged into the first corpus
|
||||
directory. Defaults to 0.
|
||||
``-reload``
|
||||
If set to 1 (the default), the corpus directory is re-read periodically to
|
||||
check for new inputs; this allows detection of new inputs that were discovered
|
||||
by other fuzzing processes.
|
||||
``-jobs``
|
||||
Number of fuzzing jobs to run to completion. Default value is 0, which runs a
|
||||
single fuzzing process until completion. If the value is >= 1, then this
|
||||
number of jobs performing fuzzing are run, in a collection of parallel
|
||||
separate worker processes; each such worker process has its
|
||||
``stdout``/``stderr`` redirected to ``fuzz-<JOB>.log``.
|
||||
``-workers``
|
||||
Number of simultaneous worker processes to run the fuzzing jobs to completion
|
||||
in. If 0 (the default), ``min(jobs, NumberOfCpuCores()/2)`` is used.
|
||||
``-dict``
|
||||
Provide a dictionary of input keywords; see Dictionaries_.
|
||||
``-use_counters``
|
||||
Use `coverage counters`_ to generate approximate counts of how often code
|
||||
blocks are hit; defaults to 1.
|
||||
``-use_traces``
|
||||
Use instruction traces (experimental, defaults to 0); see `Data-flow-guided fuzzing`_.
|
||||
``-only_ascii``
|
||||
If 1, generate only ASCII (``isprint``+``isspace``) inputs. Defaults to 0.
|
||||
``-artifact_prefix``
|
||||
Provide a prefix to use when saving fuzzing artifacts (crash, timeout, or
|
||||
slow inputs) as ``$(artifact_prefix)file``. Defaults to empty.
|
||||
``-exact_artifact_path``
|
||||
Ignored if empty (the default). If non-empty, write the single artifact on
|
||||
failure (crash, timeout) as ``$(exact_artifact_path)``. This overrides
|
||||
``-artifact_prefix`` and will not use checksum in the file name. Do not use
|
||||
the same path for several parallel processes.
|
||||
``-print_final_stats``
|
||||
If 1, print statistics at exit. Defaults to 0.
|
||||
``-close_fd_mask``
|
||||
Indicate output streams to close at startup. Be careful, this will also
|
||||
remove diagnostic output from the tools in use; for example the messages
|
||||
AddressSanitizer_ sends to ``stderr``/``stdout`` will also be lost.
|
||||
|
||||
- 0 (default): close neither ``stdout`` nor ``stderr``
|
||||
- 1 : close ``stdout``
|
||||
- 2 : close ``stderr``
|
||||
- 3 : close both ``stdout`` and ``stderr``.
|
||||
|
||||
For the full list of flags run the fuzzer binary with ``-help=1``.
|
||||
|
||||
Usage examples
|
||||
==============
|
||||
Output
|
||||
======
|
||||
|
||||
During operation the fuzzer prints information to ``stderr``, for example::
|
||||
|
||||
INFO: Seed: 3338750330
|
||||
Loaded 1024/1211 files from corpus/
|
||||
INFO: -max_len is not provided, using 64
|
||||
#0 READ units: 1211 exec/s: 0
|
||||
#1211 INITED cov: 2575 bits: 8855 indir: 5 units: 830 exec/s: 1211
|
||||
#1422 NEW cov: 2580 bits: 8860 indir: 5 units: 831 exec/s: 1422 L: 21 MS: 1 ShuffleBytes-
|
||||
#1688 NEW cov: 2581 bits: 8865 indir: 5 units: 832 exec/s: 1688 L: 19 MS: 2 EraseByte-CrossOver-
|
||||
#1734 NEW cov: 2583 bits: 8879 indir: 5 units: 833 exec/s: 1734 L: 27 MS: 3 ChangeBit-EraseByte-ShuffleBytes-
|
||||
...
|
||||
|
||||
The early parts of the output include information about the fuzzer options and
|
||||
configuration, including the current random seed (in the ``Seed:`` line; this
|
||||
can be overridden with the ``-seed=N`` flag).
|
||||
|
||||
Further output lines have the form of an event code and statistics. The
|
||||
possible event codes are:
|
||||
|
||||
``READ``
|
||||
The fuzzer has read in all of the provided input samples from the corpus
|
||||
directories.
|
||||
``INITED``
|
||||
The fuzzer has completed initialization, which includes running each of
|
||||
the initial input samples through the code under test.
|
||||
``NEW``
|
||||
The fuzzer has created a test input that covers new areas of the code
|
||||
under test. This input will be saved to the primary corpus directory.
|
||||
``pulse``
|
||||
The fuzzer has generated 2\ :sup:`n` inputs (generated periodically to reassure
|
||||
the user that the fuzzer is still working).
|
||||
``DONE``
|
||||
The fuzzer has completed operation because it has reached the specified
|
||||
iteration limit (``-runs``) or time limit (``-max_total_time``).
|
||||
``MIN<n>``
|
||||
The fuzzer is minimizing the combination of input corpus directories into
|
||||
a single unified corpus (due to the ``-merge`` command line option).
|
||||
``RELOAD``
|
||||
The fuzzer is performing a periodic reload of inputs from the corpus
|
||||
directory; this allows it to discover any inputs discovered by other
|
||||
fuzzer processes (see `Parallel Fuzzing`_).
|
||||
|
||||
Each output line also reports the following statistics (when non-zero):
|
||||
|
||||
``cov:``
|
||||
Total number of code blocks or edges covered by the executing the current
|
||||
corpus.
|
||||
``bits:``
|
||||
Rough measure of the number of code blocks or edges covered, and how often;
|
||||
only valid if the fuzzer is run with ``-use_counters=1``.
|
||||
``indir:``
|
||||
Number of distinct function `caller-callee pairs`_ executed with the
|
||||
current corpus; only valid if the code under test was built with
|
||||
``-fsanitize-coverage=indirect-calls``.
|
||||
``units:``
|
||||
Number of entries in the current input corpus.
|
||||
``exec/s:``
|
||||
Number of fuzzer iterations per second.
|
||||
|
||||
For ``NEW`` events, the output line also includes information about the mutation
|
||||
operation that produced the new input:
|
||||
|
||||
``L:``
|
||||
Size of the new input in bytes.
|
||||
``MS: <n> <operations>``
|
||||
Count and list of the mutation operations used to generate the input.
|
||||
|
||||
|
||||
Examples
|
||||
========
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
@ -120,7 +334,8 @@ Usage examples
|
||||
Toy example
|
||||
-----------
|
||||
|
||||
A simple function that does something interesting if it receives the input "HI!"::
|
||||
A simple function that does something interesting if it receives the input
|
||||
"HI!"::
|
||||
|
||||
cat << EOF >> test_fuzzer.cc
|
||||
#include <stdint.h>
|
||||
@ -142,8 +357,8 @@ You should get an error pretty quickly::
|
||||
|
||||
#0 READ units: 1 exec/s: 0
|
||||
#1 INITED cov: 3 units: 1 exec/s: 0
|
||||
#2 NEW cov: 5 units: 2 exec/s: 0 L: 64 MS: 0
|
||||
#19237 NEW cov: 9 units: 3 exec/s: 0 L: 64 MS: 0
|
||||
#2 NEW cov: 5 units: 2 exec/s: 0 L: 64 MS: 0
|
||||
#19237 NEW cov: 9 units: 3 exec/s: 0 L: 64 MS: 0
|
||||
#20595 NEW cov: 10 units: 4 exec/s: 0 L: 1 MS: 4 ChangeASCIIInt-ShuffleBytes-ChangeByte-CrossOver-
|
||||
#34574 NEW cov: 13 units: 5 exec/s: 0 L: 2 MS: 3 ShuffleBytes-CrossOver-ChangeBit-
|
||||
#34807 NEW cov: 15 units: 6 exec/s: 0 L: 3 MS: 1 CrossOver-
|
||||
@ -159,9 +374,10 @@ Here we show how to use libFuzzer on something real, yet simple: pcre2_::
|
||||
|
||||
COV_FLAGS=" -fsanitize-coverage=edge,indirect-calls,8bit-counters"
|
||||
# Get PCRE2
|
||||
svn co svn://vcs.exim.org/pcre2/code/trunk pcre
|
||||
# Build PCRE2 with AddressSanitizer and coverage.
|
||||
(cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
|
||||
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.20.tar.gz
|
||||
tar xf pcre2-10.20.tar.gz
|
||||
# Build PCRE2 with AddressSanitizer and coverage; requires autotools.
|
||||
(cd pcre2-10.20; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install)
|
||||
# Build the fuzzing target function that does something interesting with PCRE2.
|
||||
cat << EOF > pcre_fuzzer.cc
|
||||
#include <string.h>
|
||||
@ -186,52 +402,64 @@ Here we show how to use libFuzzer on something real, yet simple: pcre2_::
|
||||
clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive libFuzzer.a pcre_fuzzer.o -o pcre_fuzzer
|
||||
|
||||
This will give you a binary of the fuzzer, called ``pcre_fuzzer``.
|
||||
Now, create a directory that will hold the test corpus::
|
||||
Now, create a directory that will hold the test corpus:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
mkdir -p CORPUS
|
||||
|
||||
For simple input languages like regular expressions this is all you need.
|
||||
For more complicated inputs populate the directory with some input samples.
|
||||
Now run the fuzzer with the corpus dir as the only parameter::
|
||||
For more complicated/structured inputs, the fuzzer works much more efficiently
|
||||
if you can populate the corpus directory with a variety of valid and invalid
|
||||
inputs for the code under test.
|
||||
Now run the fuzzer with the corpus directory as the only parameter:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
./pcre_fuzzer ./CORPUS
|
||||
|
||||
You will see output like this::
|
||||
Initially, you will see Output_ like this::
|
||||
|
||||
Seed: 1876794929
|
||||
#0 READ cov 0 bits 0 units 1 exec/s 0
|
||||
#1 pulse cov 3 bits 0 units 1 exec/s 0
|
||||
#1 INITED cov 3 bits 0 units 1 exec/s 0
|
||||
#2 pulse cov 208 bits 0 units 1 exec/s 0
|
||||
#2 NEW cov 208 bits 0 units 2 exec/s 0 L: 64
|
||||
#3 NEW cov 217 bits 0 units 3 exec/s 0 L: 63
|
||||
#4 pulse cov 217 bits 0 units 3 exec/s 0
|
||||
|
||||
* The ``Seed:`` line shows you the current random seed (you can change it with ``-seed=N`` flag).
|
||||
* The ``READ`` line shows you how many input files were read (since you passed an empty dir there were inputs, but one dummy input was synthesised).
|
||||
* The ``INITED`` line shows you that how many inputs will be fuzzed.
|
||||
* The ``NEW`` lines appear with the fuzzer finds a new interesting input, which is saved to the CORPUS dir. If multiple corpus dirs are given, the first one is used.
|
||||
* The ``pulse`` lines appear periodically to show the current status.
|
||||
INFO: Seed: 2938818941
|
||||
INFO: -max_len is not provided, using 64
|
||||
INFO: A corpus is not provided, starting from an empty corpus
|
||||
#0 READ units: 1 exec/s: 0
|
||||
#1 INITED cov: 3 bits: 3 units: 1 exec/s: 0
|
||||
#2 NEW cov: 176 bits: 176 indir: 3 units: 2 exec/s: 0 L: 64 MS: 0
|
||||
#8 NEW cov: 176 bits: 179 indir: 3 units: 3 exec/s: 0 L: 63 MS: 2 ChangeByte-EraseByte-
|
||||
...
|
||||
#14004 NEW cov: 1500 bits: 4536 indir: 5 units: 406 exec/s: 0 L: 54 MS: 3 ChangeBit-ChangeBit-CrossOver-
|
||||
|
||||
Now, interrupt the fuzzer and run it again the same way. You will see::
|
||||
|
||||
Seed: 1879995378
|
||||
#0 READ cov 0 bits 0 units 564 exec/s 0
|
||||
#1 pulse cov 502 bits 0 units 564 exec/s 0
|
||||
INFO: Seed: 3398349082
|
||||
INFO: -max_len is not provided, using 64
|
||||
#0 READ units: 405 exec/s: 0
|
||||
#405 INITED cov: 1499 bits: 4535 indir: 5 units: 286 exec/s: 0
|
||||
#587 NEW cov: 1499 bits: 4540 indir: 5 units: 287 exec/s: 0 L: 52 MS: 2 InsertByte-EraseByte-
|
||||
#667 NEW cov: 1501 bits: 4542 indir: 5 units: 288 exec/s: 0 L: 39 MS: 2 ChangeBit-InsertByte-
|
||||
#672 NEW cov: 1501 bits: 4543 indir: 5 units: 289 exec/s: 0 L: 15 MS: 2 ChangeASCIIInt-ChangeBit-
|
||||
#739 NEW cov: 1501 bits: 4544 indir: 5 units: 290 exec/s: 0 L: 64 MS: 4 ShuffleBytes-ChangeASCIIInt-InsertByte-ChangeBit-
|
||||
...
|
||||
#512 pulse cov 2933 bits 0 units 564 exec/s 512
|
||||
#564 INITED cov 2991 bits 0 units 344 exec/s 564
|
||||
#1024 pulse cov 2991 bits 0 units 344 exec/s 1024
|
||||
#1455 NEW cov 2995 bits 0 units 345 exec/s 1455 L: 49
|
||||
|
||||
This time you were running the fuzzer with a non-empty input corpus (564 items).
|
||||
As the first step, the fuzzer minimized the set to produce 344 interesting items (the ``INITED`` line)
|
||||
On the second execution the fuzzer has a non-empty input corpus (405 items). As
|
||||
the first step, the fuzzer minimized this corpus (the ``INITED`` line) to
|
||||
produce 286 interesting items, omitting inputs that do not hit any additional
|
||||
code.
|
||||
|
||||
You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs::
|
||||
(Aside: although the fuzzer only saves new inputs that hit additional code, this
|
||||
does not mean that the corpus as a whole is kept minimized. For example, if
|
||||
an input hitting A-B-C then an input that hits A-B-C-D are generated,
|
||||
they will both be saved, even though the latter subsumes the former.)
|
||||
|
||||
|
||||
You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M
|
||||
|
||||
By default (``-reload=1``) the fuzzer processes will periodically scan the CORPUS directory
|
||||
By default (``-reload=1``) the fuzzer processes will periodically scan the corpus directory
|
||||
and reload any new tests. This way the test inputs found by one process will be picked up
|
||||
by all others.
|
||||
|
||||
@ -241,15 +469,15 @@ Heartbleed
|
||||
----------
|
||||
Remember Heartbleed_?
|
||||
As it was recently `shown <https://blog.hboeck.de/archives/868-How-Heartbleed-couldve-been-found.html>`_,
|
||||
fuzzing with AddressSanitizer can find Heartbleed. Indeed, here are the step-by-step instructions
|
||||
to find Heartbleed with LibFuzzer::
|
||||
fuzzing with AddressSanitizer_ can find Heartbleed. Indeed, here are the step-by-step instructions
|
||||
to find Heartbleed with libFuzzer::
|
||||
|
||||
wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz
|
||||
tar xf openssl-1.0.1f.tar.gz
|
||||
COV_FLAGS="-fsanitize-coverage=edge,indirect-calls" # -fsanitize-coverage=8bit-counters
|
||||
(cd openssl-1.0.1f/ && ./config &&
|
||||
make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS")
|
||||
# Get and build LibFuzzer
|
||||
# Get and build libFuzzer
|
||||
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
|
||||
clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer
|
||||
# Get examples of key/pem files.
|
||||
@ -303,7 +531,7 @@ Voila::
|
||||
#2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4
|
||||
|
||||
Note: a `similar fuzzer <https://boringssl.googlesource.com/boringssl/+/HEAD/FUZZING.md>`_
|
||||
is now a part of the boringssl source tree.
|
||||
is now a part of the BoringSSL_ source tree.
|
||||
|
||||
Advanced features
|
||||
=================
|
||||
@ -346,7 +574,9 @@ AFL compatibility
|
||||
-----------------
|
||||
LibFuzzer can be used together with AFL_ on the same test corpus.
|
||||
Both fuzzers expect the test corpus to reside in a directory, one file per input.
|
||||
You can run both fuzzers on the same corpus, one after another::
|
||||
You can run both fuzzers on the same corpus, one after another:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@
|
||||
./llvm-fuzz testcase_dir findings_dir # Will write new tests to testcase_dir
|
||||
@ -360,7 +590,9 @@ How good is my fuzzer?
|
||||
Once you implement your target function ``LLVMFuzzerTestOneInput`` and fuzz it to death,
|
||||
you will want to know whether the function or the corpus can be improved further.
|
||||
One easy to use metric is, of course, code coverage.
|
||||
You can get the coverage for your corpus like this::
|
||||
You can get the coverage for your corpus like this:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
ASAN_OPTIONS=coverage=1 ./fuzzer CORPUS_DIR -runs=0
|
||||
|
||||
@ -379,43 +611,39 @@ Startup initialization
|
||||
----------------------
|
||||
If the library being tested needs to be initialized, there are several options.
|
||||
|
||||
The simplest way is to have a statically initialized global object::
|
||||
The simplest way is to have a statically initialized global object:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
static bool Initialized = DoInitialization();
|
||||
|
||||
Alternatively, you may define an optional init function and it will receive
|
||||
the program arguments that you can read and modify::
|
||||
the program arguments that you can read and modify:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv) {
|
||||
ReadAndMaybeModify(argc, argv);
|
||||
return 0;
|
||||
}
|
||||
|
||||
Try to avoid initialization inside the target function itself as
|
||||
it will skew the coverage data. Don't do this::
|
||||
|
||||
extern "C" int LLVMFuzzerTestOneInput(...) {
|
||||
static bool initialized = false;
|
||||
if (!initialized) {
|
||||
...
|
||||
}
|
||||
}
|
||||
|
||||
Leaks
|
||||
-----
|
||||
|
||||
When running libFuzzer with AddressSanitizer_ the latter will be able to report
|
||||
memory leaks, but only when the process exits, so if you suspect memory leaks
|
||||
in your target you should run libFuzzer with `-runs=N` or `-max_total_time=N`.
|
||||
If a leak is reported at the end, you will not get the reproducer from libFuzzer.
|
||||
You will need to re-run the target on every file in the corpus separately to
|
||||
find which one causes the leak.
|
||||
Code that has been built with AddressSanitizer_ will report memory leaks,
|
||||
but only when the process exits. If you suspect memory leaks in the code
|
||||
under test, you will therefore need to use the ``-runs=N`` or
|
||||
``-max_total_time=N`` command line options to ensure that the fuzzing
|
||||
process completes and gives AddressSanitizer_ a chance to report leaks.
|
||||
Because the leak is only reported at the end of the process, this also means
|
||||
that it is not clear which input triggered the leak. To narrow this down,
|
||||
re-run each input file in the corpus separately through the target function.
|
||||
|
||||
If your target has massive leaks you will eventually run out of RAM.
|
||||
To protect your machine from OOM death you may use
|
||||
e.g. `ASAN_OPTIONS=hard_rss_limit_mb=2000` (with AddressSanitizer_).
|
||||
e.g. ``ASAN_OPTIONS=hard_rss_limit_mb=2000`` (with AddressSanitizer_).
|
||||
|
||||
In future libFuzzer may support finding/reporting leaks better than this, stay tuned.
|
||||
|
||||
Fuzzing components of LLVM
|
||||
==========================
|
||||
@ -427,14 +655,16 @@ clang-format-fuzzer
|
||||
-------------------
|
||||
The inputs are random pieces of C++-like text.
|
||||
|
||||
Build (make sure to use fresh clang as the host compiler)::
|
||||
Build (make sure to use fresh clang as the host compiler):
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm
|
||||
ninja clang-format-fuzzer
|
||||
mkdir CORPUS_DIR
|
||||
./bin/clang-format-fuzzer CORPUS_DIR
|
||||
|
||||
Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc).
|
||||
Optionally build other kinds of binaries (ASan+Debug, MSan, UBSan, etc).
|
||||
|
||||
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
|
||||
|
||||
@ -464,25 +694,27 @@ finds an invalid instruction or runs out of data.
|
||||
Please note that the command line interface differs slightly from that of other
|
||||
fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have
|
||||
a single dash, while other arguments control the operation mode and target in a
|
||||
similar manner to ``llvm-mc`` and should have two dashes. For example::
|
||||
similar manner to ``llvm-mc`` and should have two dashes. For example:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10
|
||||
|
||||
Buildbot
|
||||
--------
|
||||
|
||||
We have a buildbot that runs the above fuzzers for LLVM components
|
||||
24/7/365 at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
|
||||
A buildbot continuously runs the above fuzzers for LLVM components, with results
|
||||
shown at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
|
||||
|
||||
FAQ
|
||||
=========================
|
||||
|
||||
Q. Why libFuzzer does not use any of the LLVM support?
|
||||
------------------------------------------------------
|
||||
Q. Why doesn't libFuzzer use any of the LLVM support?
|
||||
-----------------------------------------------------
|
||||
|
||||
There are two reasons.
|
||||
|
||||
First, we want this library to be used outside of the LLVM w/o users having to
|
||||
First, we want this library to be used outside of the LLVM without users having to
|
||||
build the rest of LLVM. This may sound unconvincing for many LLVM folks,
|
||||
but in practice the need for building the whole LLVM frightens many potential
|
||||
users -- and we want more users to use this code.
|
||||
@ -494,7 +726,7 @@ coverage set of the process (since the fuzzer is in-process). In other words, by
|
||||
using more external dependencies we will slow down the fuzzer while the main
|
||||
reason for it to exist is extreme speed.
|
||||
|
||||
Q. What about Windows then? The Fuzzer contains code that does not build on Windows.
|
||||
Q. What about Windows then? The fuzzer contains code that does not build on Windows.
|
||||
------------------------------------------------------------------------------------
|
||||
|
||||
Volunteers are welcome.
|
||||
@ -504,7 +736,7 @@ Q. When this Fuzzer is not a good solution for a problem?
|
||||
|
||||
* If the test inputs are validated by the target library and the validator
|
||||
asserts/crashes on invalid inputs, in-process fuzzing is not applicable.
|
||||
* Bugs in the target library may accumulate w/o being detected. E.g. a memory
|
||||
* Bugs in the target library may accumulate without being detected. E.g. a memory
|
||||
corruption that goes undetected at first and then leads to a crash while
|
||||
testing another input. This is why it is highly recommended to run this
|
||||
in-process fuzzer with all sanitizers to detect most bugs on the spot.
|
||||
@ -512,7 +744,7 @@ Q. When this Fuzzer is not a good solution for a problem?
|
||||
consumption and infinite loops in the target library (still possible).
|
||||
* The target library should not have significant global state that is not
|
||||
reset between the runs.
|
||||
* Many interesting target libs are not designed in a way that supports
|
||||
* Many interesting target libraries are not designed in a way that supports
|
||||
the in-process fuzzer interface (e.g. require a file path instead of a
|
||||
byte array).
|
||||
* If a single test run takes a considerable fraction of a second (or
|
||||
@ -566,18 +798,21 @@ Trophies
|
||||
|
||||
* gRPC: `[1] <https://github.com/grpc/grpc/pull/6071/commits/df04c1f7f6aec6e95722ec0b023a6b29b6ea871c>`__ `[2] <https://github.com/grpc/grpc/pull/6071/commits/22a3dfd95468daa0db7245a4e8e6679a52847579>`__ `[3] <https://github.com/grpc/grpc/pull/6071/commits/9cac2a12d9e181d130841092e9d40fa3309d7aa7>`__ `[4] <https://github.com/grpc/grpc/pull/6012/commits/82a91c91d01ce9b999c8821ed13515883468e203>`__ `[5] <https://github.com/grpc/grpc/pull/6202/commits/2e3e0039b30edaf89fb93bfb2c1d0909098519fa>`__ `[6] <https://github.com/grpc/grpc/pull/6106/files>`__
|
||||
|
||||
|
||||
* LLVM: `Clang <https://llvm.org/bugs/show_bug.cgi?id=23057>`_, `Clang-format <https://llvm.org/bugs/show_bug.cgi?id=23052>`_, `libc++ <https://llvm.org/bugs/show_bug.cgi?id=24411>`_, `llvm-as <https://llvm.org/bugs/show_bug.cgi?id=24639>`_, Disassembler: http://reviews.llvm.org/rL247405, http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416, http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420, http://reviews.llvm.org/rL247422.
|
||||
|
||||
.. _pcre2: http://www.pcre.org/
|
||||
|
||||
.. _AFL: http://lcamtuf.coredump.cx/afl/
|
||||
|
||||
.. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html
|
||||
.. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow
|
||||
.. _DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html
|
||||
.. _AddressSanitizer: http://clang.llvm.org/docs/AddressSanitizer.html
|
||||
|
||||
.. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed
|
||||
|
||||
.. _FuzzerInterface.h: https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/FuzzerInterface.h
|
||||
.. _3.7.0: http://llvm.org/releases/3.7.0/docs/LibFuzzer.html
|
||||
.. _building Clang from trunk: http://clang.llvm.org/get_started.html
|
||||
.. _MemorySanitizer: http://clang.llvm.org/docs/MemorySanitizer.html
|
||||
.. _UndefinedBehaviorSanitizer: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
|
||||
.. _`coverage counters`: http://clang.llvm.org/docs/SanitizerCoverage.html#coverage-counters
|
||||
.. _`caller-callee pairs`: http://clang.llvm.org/docs/SanitizerCoverage.html#caller-callee-coverage
|
||||
.. _BoringSSL: https://boringssl.googlesource.com/boringssl/
|
||||
.. _`fuzz various parts of LLVM itself`: `Fuzzing components of LLVM`_
|
||||
|
Loading…
Reference in New Issue
Block a user