docs: Add some information about Fuzzing LLVM itself

This splits some content out of the libFuzzer docs and adds a fair
amount of detail about the fuzzers in LLVM.

llvm-svn: 315544
This commit is contained in:
Justin Bogner 2017-10-12 01:44:24 +00:00
parent 6925d8046a
commit b9249fcac2
3 changed files with 232 additions and 65 deletions

221
docs/FuzzingLLVM.rst Normal file
View File

@ -0,0 +1,221 @@
================================
Fuzzing LLVM libraries and tools
================================
.. contents::
:local:
:depth: 2
Introduction
============
The LLVM tree includes a number of fuzzers for various components. These are
built on top of :doc:`LibFuzzer <LibFuzzer>`.
Available Fuzzers
=================
clang-fuzzer
------------
A |generic fuzzer| that tries to compile textual input as C++ code. Some of the
bugs this fuzzer has reported are `on bugzilla <https://llvm.org/pr23057>`__
and `on OSS Fuzz's tracker
<https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-fuzzer>`__.
clang-proto-fuzzer
------------------
A |protobuf fuzzer| that compiles valid C++ programs generated from a protobuf
class that describes a subset of the C++ language.
This fuzzer accepts clang command line options after `ignore_remaining_args=1`.
For example, the following command will fuzz clang with a higher optimization
level:
.. code-block:: shell
% bin/clang-proto-fuzzer <corpus-dir> -ignore_remaining_args=1 -O3
clang-format-fuzzer
-------------------
A |generic fuzzer| that runs clang-format_ on C++ text fragments. Some of the
bugs this fuzzer has reported are `on bugzilla <https://llvm.org/pr23052>`__
and `on OSS Fuzz's tracker
<https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+clang-format-fuzzer`__.
.. _clang-format: https://clang.llvm.org/docs/ClangFormat.html
llvm-as-fuzzer
--------------
A |generic fuzzer| that tries to parse text as :doc:`LLVM assembly <LangRef>`.
Some of the bugs this fuzzer has reported are `on bugzilla
<https://llvm.org/pr24639>`__
llvm-dwarfdump-fuzzer
---------------------
A |generic fuzzer| that interprets inputs as object files and runs
:doc:`llvm-dwarfdump <CommandGuide/llvm-dwarfdump>` on them. Some of the bugs
this fuzzer has reported are `on OSS Fuzz's tracker
<https://bugs.chromium.org/p/oss-fuzz/issues/list?q=proj-llvm+llvm-dwarfdump-fuzzer`__.
llvm-isel-fuzzer
----------------
A |LLVM IR fuzzer| aimed at finding bugs in instruction selection.
This fuzzer accepts flags after `ignore_remaining_args=1`. The flags match
those of :doc:`llc <CommandGuide/llc>` and the triple is required. For example,
the following command would fuzz AArch64 with :doc:`GlobalISel`:
.. code-block:: shell
% bin/llvm-isel-fuzzer <corpus-dir> -ignore_remaining_args=1 -mtriple aarch64 -global-isel -O0
llvm-mc-assemble-fuzzer
-----------------------
A |generic fuzzer| that fuzzes the MC layer's assemblers by treating inputs as
target specific assembly.
Note that this fuzzer has an unusual command line interface which is not fully
compatible with all of libFuzzer's features. Fuzzer arguments must be passed
after ``--fuzzer-args``, and any ``llc`` flags must use two dashes. For
example, to fuzz the AArch64 assembler you might use the following command:
.. code-block:: console
llvm-mc-fuzzer --triple=aarch64-linux-gnu --fuzzer-args -max_len=4
This scheme will likely change in the future.
llvm-mc-disassemble-fuzzer
--------------------------
A |generic fuzzer| that fuzzes the MC layer's disassemblers by treating inputs
as assembled binary data.
Note that this fuzzer has an unusual command line interface which is not fully
compatible with all of libFuzzer's features. See the notes above about
``llvm-mc-assemble-fuzzer`` for details.
.. |generic fuzzer| replace:: :ref:`generic fuzzer <fuzzing-llvm-generic>`
.. |protobuf fuzzer|
replace:: :ref:`libprotobuf-mutator based fuzzer <fuzzing-llvm-protobuf>`
.. |LLVM IR fuzzer|
replace:: :ref:`structured LLVM IR fuzzer <fuzzing-llvm-ir>`
Mutators and Input Generators
=============================
The inputs for a fuzz target are generated via random mutations of a
:ref:`corpus <libfuzzer-corpus>`. There are a few options for the kinds of
mutations that a fuzzer in LLVM might want.
.. _fuzzing-llvm-generic:
Generic Random Fuzzing
----------------------
The most basic form of input mutation is to use the built in mutators of
LibFuzzer. These simply treat the input corpus as a bag of bits and make random
mutations. This type of fuzzer is good for stressing the surface layers of a
program, and is good at testing things like lexers, parsers, or binary
protocols.
Some of the in-tree fuzzers that use this type of mutator are `clang-fuzzer`_,
`clang-format-fuzzer`_, `llvm-as-fuzzer`_, `llvm-dwarfdump-fuzzer`_,
`llvm-mc-assemble-fuzzer`_, and `llvm-mc-disassemble-fuzzer`_.
.. _fuzzing-llvm-protobuf:
Structured Fuzzing using ``libprotobuf-mutator``
------------------------------------------------
We can use libprotobuf-mutator_ in order to perform structured fuzzing and
stress deeper layers of programs. This works by defining a protobuf class that
translates arbitrary data into structurally interesting input. Specifically, we
use this to work with a subset of the C++ language and perform mutations that
produce valid C++ programs in order to exercise parts of clang that are more
interesting than parser error handling.
To build this kind of fuzzer you need `protobuf`_ and its dependencies
installed, and you need to specify some extra flags when configuring the build
with :doc:`CMake <CMake>`. For example, `clang-proto-fuzzer`_ can be enabled by
adding ``-DCLANG_ENABLE_PROTO_FUZZER=ON`` to the flags described in
:ref:`building-fuzzers`.
The only in-tree fuzzer that uses ``libprotobuf-mutator`` today is
`clang-proto-fuzzer`_.
.. _libprotobuf-mutator: https://github.com/google/libprotobuf-mutator
.. _protobuf: https://github.com/google/protobuf
.. _fuzzing-llvm-ir:
Structured Fuzzing of LLVM IR
-----------------------------
We also use a more direct form of structured fuzzing for fuzzers that take
:doc:`LLVM IR <LangRef>` as input. This is achieved through the ``FuzzMutate``
library, which was `discussed at EuroLLVM 2017`_.
The ``FuzzMutate`` library is used to structurally fuzz backends in
`llvm-isel-fuzzer`_.
.. _discussed at EuroLLVM 2017: https://www.youtube.com/watch?v=UBbQ_s6hNgg
Building and Running
====================
.. _building-fuzzers:
Configuring LLVM to Build Fuzzers
---------------------------------
Fuzzers will be built and linked to libFuzzer by default as long as you build
LLVM with sanitizer coverage enabled. You would typically also enable at least
one sanitizer for the fuzzers to be particularly likely, so the most common way
to build the fuzzers is by adding the following two flags to your CMake
invocation: ``-DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=On``.
.. note:: If you have ``compiler-rt`` checked out in an LLVM tree when building
with sanitizers, you'll want to specify ``-DLLVM_BUILD_RUNTIME=Off``
to avoid building the sanitizers themselves with sanitizers enabled.
Continuously Running and Finding Bugs
-------------------------------------
There used to be a public buildbot running LLVM fuzzers continuously, and while
this did find issues, it didn't have a very good way to report problems in an
actionable way. Because of this, we're moving towards using `OSS Fuzz`_ more
instead.
https://github.com/google/oss-fuzz/blob/master/projects/llvm/project.yaml
https://bugs.chromium.org/p/oss-fuzz/issues/list?q=Proj-llvm
.. _OSS Fuzz: https://github.com/google/oss-fuzz
Utilities for Writing Fuzzers
=============================
There are some utilities available for writing fuzzers in LLVM.
Some helpers for handling the command line interface are available in
``include/llvm/FuzzMutate/FuzzerCLI.h``, including functions to parse command
line options in a consistent way and to implement standalone main functions so
your fuzzer can be built and tested when not built against libFuzzer.
There is also some handling of the CMake config for fuzzers, where you should
use the ``add_llvm_fuzzer`` to set up fuzzer targets. This function works
similarly to functions such as ``add_llvm_tool``, but they take care of linking
to LibFuzzer when appropriate and can be passed the ``DUMMY_MAIN`` argument to
enable standalone testing.

View File

@ -42,10 +42,10 @@ This installs the Clang binary as
``./third_party/llvm-build/Release+Asserts/bin/clang``)
The libFuzzer code resides in the LLVM repository, and requires a recent Clang
compiler to build (and is used to `fuzz various parts of LLVM itself`_).
However the fuzzer itself does not (and should not) depend on any part of LLVM
infrastructure and can be used for other projects without requiring the rest
of LLVM.
compiler to build (and is used to :doc:`fuzz various parts of LLVM itself
<FuzzingLLVM>`). However the fuzzer itself does not (and should not) depend on
any part of LLVM infrastructure and can be used for other projects without
requiring the rest of LLVM.
Getting Started
@ -137,6 +137,8 @@ Finally, link with ``libFuzzer.a``::
clang -fsanitize-coverage=trace-pc-guard -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer
.. _libfuzzer-corpus:
Corpus
------
@ -627,66 +629,6 @@ which was configured with ``-DLIBFUZZER_ENABLE_TESTS=ON`` flag.
ninja check-fuzzer
Fuzzing components of LLVM
==========================
.. contents::
:local:
:depth: 1
To build any of the LLVM fuzz targets use the build instructions above.
clang-format-fuzzer
-------------------
The inputs are random pieces of C++-like text.
.. code-block:: console
ninja clang-format-fuzzer
mkdir CORPUS_DIR
./bin/clang-format-fuzzer CORPUS_DIR
Optionally build other kinds of binaries (ASan+Debug, MSan, UBSan, etc).
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052
clang-fuzzer
------------
The behavior is very similar to ``clang-format-fuzzer``.
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23057
llvm-as-fuzzer
--------------
Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=24639
llvm-mc-fuzzer
--------------
This tool fuzzes the MC layer. Currently it is only able to fuzz the
disassembler but it is hoped that assembly, and round-trip verification will be
added in future.
When run in dissassembly mode, the inputs are opcodes to be disassembled. The
fuzzer will consume as many instructions as possible and will stop when it
finds an invalid instruction or runs out of data.
Please note that the command line interface differs slightly from that of other
fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have
a single dash, while other arguments control the operation mode and target in a
similar manner to ``llvm-mc`` and should have two dashes. For example:
.. code-block:: console
llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10
Buildbot
--------
A buildbot continuously runs the above fuzzers for LLVM components, with results
shown at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer .
FAQ
=========================
@ -808,4 +750,4 @@ Trophies
.. _`value profile`: #value-profile
.. _`caller-callee pairs`: http://clang.llvm.org/docs/SanitizerCoverage.html#caller-callee-coverage
.. _BoringSSL: https://boringssl.googlesource.com/boringssl/
.. _`fuzz various parts of LLVM itself`: `Fuzzing components of LLVM`_

View File

@ -183,6 +183,7 @@ For developers of applications which use LLVM as a library.
ProgrammersManual
Extensions
LibFuzzer
FuzzingLLVM
ScudoHardenedAllocator
OptBisect
@ -228,6 +229,9 @@ For developers of applications which use LLVM as a library.
:doc:`LibFuzzer`
A library for writing in-process guided fuzzers.
:doc:`FuzzingLLVM`
Information on writing and using Fuzzers to find bugs in LLVM.
:doc:`ScudoHardenedAllocator`
A library that implements a security-hardened `malloc()`.