Corrupted reports are usually associated with frequently happenning races.
Since they are frequently happenning, we should get a repro for them
without corrupted reports. Reproducing is expensive, so doing it
when we will the repro anyway is harmful.
This allows callers to get access to Report.Corrupted.
Better than adding 6-th return value and will allow
to pipe other report properties if necessary.
When manager is stopped there are sometimes runaway qemu
processes still running. Set PDEATHSIG for all subprocesses.
We never need child processes outliving parents.
We currently have several names for crash attributes, which is disturbing.
E.g. crash title is called "Title" or "Desc". Name them consistently.
Title - single line bug identity.
Report - whole crash text.
Log - whole fuzzer/kernel output.
Add coverage and number of reproducing programs to the periodic messages.
When all machines are busy reproducing crashes, it appears that
syz-manager hanged as number of executed programs does not increase.
Coverage is just a nice characteristic.
Also print machine check message, it appears once and contains useful info.
This adds /rawcover handler which returns a file with all covered so far PCs, e.g.:
0xffffffff8100763e
0xffffffff81007667
...
0xffffffff8100767d
Useful for offline coverage processing, diffing coverage, etc.
In particular allows to do:
curl http://localhost:1234/rawcover | addr2line -e vmlinux
It's currently both optional and non optional.
We require it to be non-empty, but at the same time allow fake "-"
which effectively means "no vmlinux". Make it optional.
We currently build binaries for all targets into bin.
This makes mess in bin/ and does not allow testing of different archs.
Build target binaries into bin/OS_ARCH/ subdirs.
Host binaries are still built into bin/.
Update #333
Update #324
Update #191
If kernel or syzkaller binaries are rebuilt when manager uses them,
nothing good will happen. Manager can start mixing coverage from
old and new kernels, or crash on unknown syscalls.
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
We have implemented a new version of KCOV, which is able to dump
comparison operands' data, obtained from Clang's instrumentation hooks
__sanitizer_cov_trace_cmp[1248], __sanitizer_cov_trace_const_cmp[1248]
and __sanitizer_cov_trace_switch.
Current KCOV implementation can work in two modes: "Dump only the PCs"
or "Dump only comparisons' data". Mode selection is done by the
following series of calls:
fd = open(KCOV_PATH, ...); // works as previous
ioctl(fd, KCOV_INIT_TRACE, ...); // works as previous
mmap(fd, ...); // works as previous
ioctl(fd, KCOV_ENABLE, mode);
// mode = KCOV_MODE_TRACE_CMP or mode = KCOV_MODE_TRACE_PC
Note that this new interface is backwards compatible, as old KCOV
devices will just return -EINVAL for the last ioctl. This way we can
distinguish if the KCOV device is able to dump the comparisons.
Main changes in this commit:
1. Fuzzer now checks at startup which type (new/old) of KCOV device
is running.
2. Executor now receives an additional flag, which indicates if
executor should read the comparisons data from KCOV. The flag works on
per-call basis, so executor can collect PCs or Comps for each
individual syscall.
Currently hub allows managers to exchange programs from corpus.
But reproducers are not exchanged and we don't know if a crash
happens on other managers as well or not.
Allow hub to exchange reproducers.
Reproducers are stored in a separate db file with own sequence numbers.
This allows to throttle distribution of reproducers to managers,
so that they are not overloaded with reproducers and don't lose them on restarts.
Based on patch by Andrey Konovalov:
https://github.com/google/syzkaller/pull/325Fixes#282
If manager is connected to dashboard it now does not save crashes.
Which means that when we save a repro the crash dir may not exist yet.
Create the dir when saving repros.
We can start reproducing one crash, but end up reproducing another.
Currently we still attribute the resulting repro to the original crash.
This is wrong.
Save the resulting desc/report for reproducers and use that in manager.
Currently we have unix permissions for new files/dirs
hardcoded throughout the code base. Some places use 0644,
some - 0640, some - 0600 and a variety of other constants.
Introduce osutil.MkdirAll/WriteFile that use the default
permissions and use them throughout the code base.
This makes permissions consistent and also allows to easily
change the permissions later if we change our minds.
Also merge pkg/fileutil into pkg/osutil as they become
dependent on each other. The line between them was poorly
defined anyway as both operate on files.
We usually store reports as []byte, not as string. They can be large.
So change arg type to []byte.
Also rename it from log to report. In our terminology log is
not symblized/processed crash output. What this function wants
is called report in manager.
Sshkey is a property of image, which is in manager config.
Move sshkey to the same location as image.
The motivation for the move is as follows.
Continuous build produces an image and the key,
both need to be passed manager instance.
Continuous build system should not distinguish
different VM types and mess with their configs.
NOTE FOR USERS: this breaks manager configs again.
Hopefully the last time for now. Docs are updated.
It was useful only for vm/local which was removed.
The param wasn't documented and if one tries to change it,
it will break manager in obscure way (i.e. spurious
"test machine is not executing programs" crashes).
We have 2 packages with the same name: pkg/config and syz-manager/config.
This leads to constant clashes. We either rename one to pkgconfig or
another to mgrconfig. This is not good and will become worse when/if
we have another program-specific config in a separate package.
Rename manager config to mgrconfig.
Other program-specific configs can use the same convention
in future -- fooconfig.
1. Don't start reproducing crashes until we triage
all inputs from corpus and hub. This minimizes
chances of losing inputs from hub. Also allows
to faster get idea of total coverage.
2. Fix bug when vmCount%instacesPerRepro != 0.
Currently we stop the remainder of instances
and it stays idle.
VM infrastructure currently has several problems:
- Config struct is complete mess with a superset of params for all VM types
- verification of Config is mess spread across several places
- there is no place where VM code could do global initialization
like creating GCE connection, uploading GCE image to GCS,
matching adb devices with consoles, etc
- it hard to add private VM implementations
such impl would need to add code to config package
which would lead to constant merge conflicts
- interface for VM implementation is mixed with interface for VM users
this does not allow to provide best interface for both of them
- there is no way to add common code for all VM implementations
This change solves these problems by:
- splitting VM interface for users (vm package) and VM interface
for VM implementations (vmimpl pacakge), this in turn allows
to add common code
- adding Pool concept that allows to do global initialization
and config checking at the right time
- decoupling manager config from VM-specific config
each VM type now defines own config
Note: manager configs need to be changed after this change:
VM-specific parts are moved to own "vm" subobject.
Note: this change also drops "local" VM type.
Its story was long unclear and there is now syz-stress which solves the same problem.
Introduce generic config.Load function that can be
reused across multiple programs (syz-manager, syz-gce, etc).
Move the generic config functionality to pkg/config package.
The idea is to move all helper (non-main) packages to pkg/ dir,
because we have more and more of them and they pollute the top dir.
Move the syz-manager config parts into syz-manager/config package.
Currently hub sends all inputs on first manager connect.
This can be 100K+ inputs and can take long time
and consume tons of memory. Send inputs in 1K parts.
Also increase rpc timeouts as hub still has global mutex.
This commit adds Odroid C2 support to syzkaller.
It's now possible to specify "type": "odroid" in manager config.
Documentation on how to setup fuzzing with Odroid C2 board is here:
https://github.com/google/syzkaller/wiki/Setup:-Odroid-C2
Note, that after this change libusb-1.0-0-dev package should be
installed to build syzkaller.
Recalculating dynamic priorities requires deserializing all programs,
and that is slow. So do it at most once per 30 mins and don't hold
the mutex during prio calculation.
If hub hangs, it causes all managers to hang as well as they call
hub under the global mutex. So move common rpc code into rpctype
and make it more careful about timeouts (tcp keepalives, call timeouts).
Also don't call hub under the mutex, the call can be slow.
Currently syzkaller uses per-call basic block (BB) coverage.
This change implements edge (not-per-call) coverage.
Edge coverage is more detailed than BB coverage as it captures
not-taken branches, looping, etc. So it provides better feedback signal.
This coverage is now called "signal" throughout the code.
BB code coverage is also collected as it is required for visualisation.
Not doing per-call coverage reduces corpus ~6-7x (from ~35K to ~5K),
this has profound effect on fuzzing efficiency.
In benchmarking mode (if the new -bench flag is specified)
syz-manager writes execution statistics into the specified file.
This allows later comparison of different runs (baseline vs some experiment).
For example, verify that some fuzzing modification indeed leads to larger coverage.
Fuzzing time is amount of time we spent actually fuzzing.
It excludes VM creation time, crash reproducing time, etc.
On the other hand it is multipled by number of currently
fuzzing VMs, so it can be larger than uptime time.
Minimization takes considerable time on start, but the programs were already minimized.
There are some chances that we could minimize it better this time,
but still it does not worth very slow start (which is especially painful for development).
Uncovered PCs were handled very badly:
we added PCs from the same function multiple times
and did not remove covered PCs. As the result total
number of uncovered PCs was terrific.
Fix that.
Currently we read lots of unnecessary files. This is slow on GCE.
Read only necessary info.
For summary report use on readdirnames (which does not do stat on every file).
For detailed crash report read additional info, but only for this crash.
Package db implements a simple key-value database.
The database is cached in memory and mirrored on disk.
It is used to store corpus in syz-manager and syz-hub.
The database strives to minimize number of disk accesses
as they can be slow in virtualized environments (GCE).
Use db in syz-manager instead of the old PersistentSet.
Add new config parameter "ignores" which contains list of regexp expressions.
If one of the expressions is matched against oops line,
crash report is not saved and VM is not restarted.
With this change manager will run reproduction on crashes
until reproducer is discovered, but at most 3 times.
If reproducer is discovered it is saved with crashes and shown on the web UI.
Log is a simple wrapper around std log package.
It is meant to solve 2 main problems:
1. Logging from non-main packages (mainly, vm/* packages).
Currently they can either always log or not log at all.
But they can't respect program verbosity setting.
Log package allows all packages to use the same verbosity setting.
2. Exposing recent logs in html UI.
Namely we want to tee logs to console and html UI.
It is not useful to pass manager verbosity flag to fuzzer,
as fuzzer output is not visible. But it increases amount of fuzzer
output that needs to be parsed by manager. Also increased fuzzer
verbosity reduces effective crash log size (less programs fit).
Enable fuzzer verbosity only if debug flag is given.
Save up to 100 reports. If we already have 100, overwrite the oldest one.
Newer reports are generally more useful. Overwriting is also needed
to be able to understand if a particular bug still happens or already fixed.
If config contains "tag" parameter, save it along with crash reports.
The tag is meant to contain kernel branch/commit hash.
If workdir contains crashes from different kernel versions,
it is useful to be able to find out on what kernel revision a crash happened.
One common issue we see with android devices is that
fuzzing drains battery episodically, device goes down and
then does not boot until one presses the power button.
Check battery level at the beginning of each cycles
and wait if it is too low.
Current numbers are: wait if level < 20% until it is >=30%.
Let's see how it works.
Fixes#79
Add an option to view unique coverage per syscall (i.e. not covered
by any other calls) and unique coverage per-program (not covered by
any other program).
Unify and factor out VM monitoring loop used in syz-manager and syz-repro.
This allows syz-repro to detect all the same bugs (e.g. "no output", "lost connection", etc).
And also just deduplicates code.
Now crashes dir contains 1 subdirectory per unique crash type.
Each subdirectory contains 'description' file with a unique string identifying
the crash type (e.g. "KASAN: slab-out-of-bounds Read of size 2 in bit_putcs"),
and up to 100 logN and reportN files with raw crash log (as before) and
post processed kernel oops message.
This splits generation process into two phases:
1. Extract values of constants from linux kernel sources.
2. Generate Go code.
Constant values are checked in.
The advantage is that the second phase is now completely independent
from linux source files, kernel version, presence of headers for
particular drivers, etc. This allows to change what Go code we generate
any time without access to all kernel headers (which in future won't be
limited to only upstream headers).
Constant extraction process does require proper kernel sources,
but this can be done only once by the person who added the driver
and has access to the required sources. Then the constant values
are checked in for others to use.
Consant extraction process is per-file/per-arch. That is,
if I am adding a driver that is not present upstream and that
works only on a single arch, I will check in constants only for
that driver and for that arch.
Type "none" in config says manager to not manage any VMs,
and just manage the corpus (it still server RPCs).
This is useful when something else manages the VMs
and starts fuzzer processes on them.
The new namespace-based sanboxing is good,
but it's not always what one wants
(and also requires special kernel configs).
Change dropprivs config value to sandbox,
which can have different values (currently: none, setuid, namespace).
Setuid mode uses setuid(nobody) before fuzzing as before.
In future we can add more sandboxing modes or, say,
extend -sandbox=setuid to -sandbox=setuid:johndoe
to impersonolate into given user.
When executors send coverage data to the manager, they clamp the addresses
of covered blocks to 32 bits. Manager uses RestorePC() to restore the original
addresses.
Previously, RestorePC() assumed that the upper 4 bytes of a kernel code
address were 0xffffffff, which is not so on Android.
Instead we now parse `readelf -SW vmlinux` output to obtain the upper bytes of
PROGBITS sections VMAs in the case those VMAs are non-zero. We assume that
the upper 4 bytes are the same for every section.
Add timeout to adb invocations and do more reliable reboot.
Clean up temporary files from previous runs.
Also pass enabled syscalls via rpc, as adb barks at too long command line.
Abd is still unreliable, though. Devices hang.
Use manual parsing instead of a regexp.
Regexp takes ~220ms for typical output size. New code takes ~2ms.
Brings manager CPU consumption from ~250% down to ~25%.
Current interface is suitable only for running syz-fuzzer.
Make the interface more generic (boot, copy file, run an arbitrary command).
This allows to build other tools on top of vm package
(e.g. reproducer creation).
Remove master process entirely, it is not useful in its current form.
We first need to understand what we want from it, and them re-implement it.
Prefix all binaries with syz- to avoid name clashes.