syz-fuzzer organically grew from a small nice main function
into a huge single-file monster with tons of global state.
Start refactoring it into something more managable.
This change separates 2 things:
1. Proc: a single fuzzing process (ipc.Env wrapper).
2. WorkQueue: holds global non-fuzzing work items.
More work needed, but this is good first step.
Currently threaded/collide are global environment flags.
It can be useful to turn off collider during some executions
(minimization, triage, etc).
Make them per-program options.
Currently getting a complete report requires a complex,
multi-step dance (including getting information that
external users are not interested in -- guilty file).
Simplify interface down to 2 functions: Parse and Symbolize.
Parse does what it did before, Symbolize symbolizes report
and fills in maintainers. This simplifies both implementations
of Reporter interface and all users of the interface.
Potentially we could get this down to 1 function Parse
that does everything. However, (1) Symbolize can fail,
while Parse cannot, (2) usually we want to ignore (log)
Symbolize errors, but otherwise proceed with the report,
(3) repro does not need symbolization for all but the
last report.
Whole raw output is indivisble part of Report,
currently we always pass Output separately along with Report.
Make Output a Report field.
Then, put whole Report into manager Crash and repro context and Result.
There is little point in passing Report as aa bunch of separate fields.
This allows callers to get access to Report.Corrupted.
Better than adding 6-th return value and will allow
to pipe other report properties if necessary.
We currently have several names for crash attributes, which is disturbing.
E.g. crash title is called "Title" or "Desc". Name them consistently.
Title - single line bug identity.
Report - whole crash text.
Log - whole fuzzer/kernel output.
syz-execprog doesn't utilize info about fault injections from a prog log.
Since syz-execprog is used by the repro package to reproduce crashes,
crashes caused by fault injections might not reproduce.
This commit adds tools/check_links.py script, that checks that all local
links from documentation files are valid; fixes some of the invalid links
that we had; and makes travis buildbot check them as well.
During smashing we know what call gave new coverage,
so we can concentrate just on it.
This helps to reduce amount of hints generated (we have too many of them).
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
A hint is basically a tuple consisting of a pointer to an argument
in one of the syscalls of a program and a value, which should be
assigned to that argument.
A simplified version of hints workflow looks like this:
1. Fuzzer launches a program and collects all the comparisons' data
for every syscall in the program.
2. Next it tries to match the obtained comparison operands' values
vs. the input arguments' values.
3. For every such match the fuzzer mutates the program by
replacing the pointed argument with the saved value.
4. If a valid program is obtained, then fuzzer launches it and
checks if new coverage is obtained.
This commit includes:
1. All the code related to hints generation, parsing and mutations.
2. Fuzzer functions to launch the process.
3. Some new stats gathered by fuzzer and manager, related to hints.
4. An updated version of execprog to test the hints process.
The old parser in sys/sysparser is too hacky, difficult to extend
and drops debug info too early, so that we can't produce proper error messages.
Add a new parser that is build like a proper language parser
and preserves full debug info for every token.
On most distributions default grub target is i386-pc, which works.
However, on some default is x86_64-efi, which fails with:
grub-install: error: cannot find EFI directory.
Explicitly specify i386-pc target.
Exec total is affected by initial triage/minimize phase,
so two experiments can have the same execution speed
in the stable mode, but have constant diff due to the initial phase.
The one that is higher looks better, but that's not very important.
Provide execution speed characteristic that is not affected
by initial phase. It is not displayed by default.
If the script is aborted at an unfortunate point, it leaves the whole system broken.
E.g. we've seen that fdisk cannot update partition table until the next reboot.
If you really need to kill it, use a different signal. But better wait.
Currently we have unix permissions for new files/dirs
hardcoded throughout the code base. Some places use 0644,
some - 0640, some - 0600 and a variety of other constants.
Introduce osutil.MkdirAll/WriteFile that use the default
permissions and use them throughout the code base.
This makes permissions consistent and also allows to easily
change the permissions later if we change our minds.
Also merge pkg/fileutil into pkg/osutil as they become
dependent on each other. The line between them was poorly
defined anyway as both operate on files.
Currently syz-symbolize symbolizes whole input file.
Add a new mode (controlled with -report flag) when
it prints report as would be extracted by syz-manager.
vm/gce differs from other VM types in that it accepts image
in a weird, GCE-specific format (namely, image named disk.raw
is put into .tar.gz file). This makes it impossible to write
generic code that creates images for any VM types.
Make vm/gce accept just image like e.g. vm/qemu
and handle own specifics internally.
Sshkey is a property of image, which is in manager config.
Move sshkey to the same location as image.
The motivation for the move is as follows.
Continuous build produces an image and the key,
both need to be passed manager instance.
Continuous build system should not distinguish
different VM types and mess with their configs.
NOTE FOR USERS: this breaks manager configs again.
Hopefully the last time for now. Docs are updated.
We have 2 packages with the same name: pkg/config and syz-manager/config.
This leads to constant clashes. We either rename one to pkgconfig or
another to mgrconfig. This is not good and will become worse when/if
we have another program-specific config in a separate package.
Rename manager config to mgrconfig.
Other program-specific configs can use the same convention
in future -- fooconfig.
VM infrastructure currently has several problems:
- Config struct is complete mess with a superset of params for all VM types
- verification of Config is mess spread across several places
- there is no place where VM code could do global initialization
like creating GCE connection, uploading GCE image to GCS,
matching adb devices with consoles, etc
- it hard to add private VM implementations
such impl would need to add code to config package
which would lead to constant merge conflicts
- interface for VM implementation is mixed with interface for VM users
this does not allow to provide best interface for both of them
- there is no way to add common code for all VM implementations
This change solves these problems by:
- splitting VM interface for users (vm package) and VM interface
for VM implementations (vmimpl pacakge), this in turn allows
to add common code
- adding Pool concept that allows to do global initialization
and config checking at the right time
- decoupling manager config from VM-specific config
each VM type now defines own config
Note: manager configs need to be changed after this change:
VM-specific parts are moved to own "vm" subobject.
Note: this change also drops "local" VM type.
Its story was long unclear and there is now syz-stress which solves the same problem.
Introduce generic config.Load function that can be
reused across multiple programs (syz-manager, syz-gce, etc).
Move the generic config functionality to pkg/config package.
The idea is to move all helper (non-main) packages to pkg/ dir,
because we have more and more of them and they pollute the top dir.
Move the syz-manager config parts into syz-manager/config package.
This is mostly a cleanup change with little functional change.
In ipc.command.exec, remove the status fallback from the pipe to the
exit status. Once the executor is serving, it always writes the status
over the pipe; anything else is an error.
Remove the panic check in syz-stress, which is no longer needed.
Currently syz-symbolize uses report.Parse function
that extracts crash messages from console output.
Symbolize all console output instead.
E.g. there can be something on the console that is not crash.
If an external sandbox process wraps the executor, it may be helpful to
send a signal other than SIGKILL to the sandbox when the program times
out or fails to respond. This gives the sandbox the opportunity to emit
additional debugging information before exiting.
Add an 'abort' signal to ipc, which is sent to the executor before
SIGKILL. If the executor fails to exit within 5s, the signal is upgraded
to SIGKILL.
The default abort signal remains SIGKILL, maintaining existing behavior.
This commit adds Odroid C2 support to syzkaller.
It's now possible to specify "type": "odroid" in manager config.
Documentation on how to setup fuzzing with Odroid C2 board is here:
https://github.com/google/syzkaller/wiki/Setup:-Odroid-C2
Note, that after this change libusb-1.0-0-dev package should be
installed to build syzkaller.
Currently syzkaller uses per-call basic block (BB) coverage.
This change implements edge (not-per-call) coverage.
Edge coverage is more detailed than BB coverage as it captures
not-taken branches, looping, etc. So it provides better feedback signal.
This coverage is now called "signal" throughout the code.
BB code coverage is also collected as it is required for visualisation.
Not doing per-call coverage reduces corpus ~6-7x (from ~35K to ~5K),
this has profound effect on fuzzing efficiency.
kcovtrace is like strace but show kernel coverage collected with KCOV.
It is very simplistic at this point and does not support multithreaded processes, etc.
It can be used to understand, for example, exact location where kernel bails out
with an error for a particular syscall.
Add new config parameter "ignores" which contains list of regexp expressions.
If one of the expressions is matched against oops line,
crash report is not saved and VM is not restarted.
create-image.sh adds the string "V0:23:respawn:/sbin/getty 115200 hvc0" to inittab
of a virtual machine, but a fresh debian-wheezy doesn't have a hvc0 device.
So getty fails to start and respawns over and over again:
INIT: Id "V0" respawning too fast: disabled for 5 minutes
Let's fix create-image.sh to have a working VM terminal.
Signed-off-by: Alexander Popov <alex.popov@linux.com>
With this change manager will run reproduction on crashes
until reproducer is discovered, but at most 3 times.
If reproducer is discovered it is saved with crashes and shown on the web UI.
Factor out repro logic from syz-repro tool,
so that it can be used in syz-manager.
Also, support sandboxes in code generated by
csoure. This is required to reproduce crashes
that require e.g. namespace sandbox.
Create archive ready to use with syz-gce (pack disk image,
vmlinux, key and tag into a single tar.gz).
Also use sudo only for specific commands, otherwise we create key
file readable only root which is inconvinient.
Log is a simple wrapper around std log package.
It is meant to solve 2 main problems:
1. Logging from non-main packages (mainly, vm/* packages).
Currently they can either always log or not log at all.
But they can't respect program verbosity setting.
Log package allows all packages to use the same verbosity setting.
2. Exposing recent logs in html UI.
Namely we want to tee logs to console and html UI.
Unify and factor out VM monitoring loop used in syz-manager and syz-repro.
This allows syz-repro to detect all the same bugs (e.g. "no output", "lost connection", etc).
And also just deduplicates code.
The new namespace-based sanboxing is good,
but it's not always what one wants
(and also requires special kernel configs).
Change dropprivs config value to sandbox,
which can have different values (currently: none, setuid, namespace).
Setuid mode uses setuid(nobody) before fuzzing as before.
In future we can add more sandboxing modes or, say,
extend -sandbox=setuid to -sandbox=setuid:johndoe
to impersonolate into given user.
When executors send coverage data to the manager, they clamp the addresses
of covered blocks to 32 bits. Manager uses RestorePC() to restore the original
addresses.
Previously, RestorePC() assumed that the upper 4 bytes of a kernel code
address were 0xffffffff, which is not so on Android.
Instead we now parse `readelf -SW vmlinux` output to obtain the upper bytes of
PROGBITS sections VMAs in the case those VMAs are non-zero. We assume that
the upper 4 bytes are the same for every section.
Kmemleak has false positives. To mitigate most of them, it checksums
potentially leaked objects, and reports them only on the next scan
iff the checksum does not change. Because of that we do the following
intricate dance:
Scan, sleep, scan again. At this point we can get some leaks.
If there are leaks, we sleep and scan again, this can remove
false leaks. Then, read kmemleak again. If we get leaks now, then
hopefully these are true positives during the previous testing cycle.
When chroot'ing into the generated debian rootfs PATH is inherited from the host
and assumed to reference each of: /bin, /sbin, /usr/bin, /usr/sbin,
/usr/local/bin and /usr/local/sbin. Not all distros use all of these, so enforce
these in the chroot command.
Use manual parsing instead of a regexp.
Regexp takes ~220ms for typical output size. New code takes ~2ms.
Brings manager CPU consumption from ~250% down to ~25%.
Remove master process entirely, it is not useful in its current form.
We first need to understand what we want from it, and them re-implement it.
Prefix all binaries with syz- to avoid name clashes.
This avoids exec per test.
Also allows to pre-map shared memory regions.
And will allow to pre-map coverage regions, etc.
Seems to work already, but probably there are still some bugs.