Over time we relaxed parsing to handle all kinds of invalid programs
(excessive/missing args, wrong types, etc).
This is useful when reading old programs from corpus.
But this is harmful for e.g. reading test inputs as they can become arbitrary outdated.
For runtests which creates additional problem of executing not
what is actually written in the test (or at least what author meant).
Add strict parsing mode that does not tolerate any errors.
For now it just checks excessive syscall arguments.
1. Use dashboard style.
2. Allow sorting of tables.
3. Show old crashes in grey.
4. Use tables instead of text output for more pages.
5. Show corpus inputs on a separate page to allow copy-pasting.
6. Use standard JS sorting instead of custom bubble sort (much faster).
7. Fix off-by one in table sorting.
Fixes#694
KMEMLEAK has lots of false positives and bugs without repros
may be unactionable. It's not completely clear how to handle
such cases in automatic systematic testing.
But let's try this and see how it works.
Rewind kmemleak fd before reading it second time,
otherwise we will read truncated reports.
Auto-learn what leak reports we've already seen
and ignore them in future. This is required because
there are some false positives and some fire too frequently.
So now we will hit each leak only once per manager run,
but we still will try to reproduce them.
If we don't determine correct prefix (e.g. some paths are not full paths),
we can plumb kernel source path twice. It seems that it's not possible
to do the right thing in all possible combinations of what can be in
debug info, if the kernel sources were moved or not, if we have kernel_src
or not. But at least don't plumb kernel_src second time.
The tool is run as:
$ syz-runtest -config manager.config
This runs all programs from sys/*/test/* in different modes
on actual VMs and checks results.
Fixes#603
Move work with hub into a separate file and fully separate
its state from the rest of the manager state.
First step towards splitting manager into managable parts.
This also required to rework stats as they are used throughout the code.
Update #538
Update #605
mgrconfig was used only by syz-manager initially,
but now it's used by a dozen of packages and it's
weird to import from under a binary dir.
pkg/ is much more reasonable dir for a widely used
helper package.
docs: fix default sandbox value
The docs and code comments state in several places that 'setuid'
is the default sandbox value. However, the default is actually
'none'. Fix docs.
Only akaros needs OS, because the rest assume host OS.
But speciying OS for all OSes breaks patch testing on syzbot
because old execprog does not have os flag.
Provide stats and logs for failed repro and save it in manager.
In particular log is useful for failed repros,
currently there is no visibility into why bugs
failed to reproduce.
RPC package does excessive caching per connection,
so if a larger object is ever sent in any direction,
rpc connection consumes large amount of memory persistently.
This makes manager consume gigs of memory with large
number of VMs and larger corpus/coverage.
Make all communication done in very limited batches.
Currently all (linux-specific) suppressions are hardcoded in mgrconfig.
This is very wrong. Move them to pkg/report and allow to specify per OS.
Add gvisor-specific suppressions.
This required a bit of refactoring. Introduce mgrconfig.KernelObj finally.
Make report.NewReporter and vm.Create accept mgrconfig directly
instead of passing it as multiple scattered args.
Remove tools/syz-parse and it always did the same as tools/syz-symbolize.
Simplify global vars in syz-manager/cover.go.
Create reporter eagerly in manager. Use sort.Slice more.
Overall -90 lines removed.
We have fallback coverage implmentation for freebsd.
1. It's broken after some recent changes.
2. We need it for fuchsia, windows, akaros, linux too.
3. It's painful to work with C code.
Move fallback coverage to ipc package,
fix it and provide for all OSes.
Currently we only ignore programs that contain syscalls
that are not statically enabled in config. This does not
account for syscalls that are not supported on target
machine. Load corpus after we got machine check with
actual list of supported syscalls.
Currently host feature detection/setup code is spread
across platform-independent fuzzer code, pkg/host, pkg/ipc
and executor.
Move this all into pkg/host and show readable info
about features on manager start.
Fixes#46
We now check for manager-fuzzer-executor commit mismatch (see Manager.Check).
But in some cases commit mismatch is not detected gracefully, and instead
leads to panics in fuzzer. Namely, when -enabled_syscalls fuzzer flag includes
large syscalls numbers, so large that they are no present at all in the an old
revision that fuzzer uses, in such case fuzzer panics.
Notify manager about invalid calls instead.
Fixes#464
We currently have native cross-compilation logic duplicated
in Makefile and in sys/targets. Some pieces are missed in one
place, some are in another. Only pkg/csource knows how to check
for -static support.
Move all CC/CFLAGS logic to sys/targets and pull results in Makefile.
This should make Makefile work on distros that have broken x86_64-linux-gnu-gcc,
now we will use just gcc. And this removes the need to define NOSTATIC,
as it's always auto-detected.
This also paves the way for making pkg/csource work on OSes other than Linux.
syz-ci uses partial (incomplete) manager config in several places.
Currently it is implemented in some ugly way.
Provide better support and unexport DefaultValues and SplitTarget.
Update #501
It turns out to be too difficult to specify a precise set
of syscalls when, say, all setsockopt's for some sockets
need to be enabled, but not enabled for other sockets.
Just warn user about disabled syscalls, but don't abort.
The previous change in behavior break a bunch of existing configs
("bpf" does not match anything). Restore old behavior.
To get only write syscall, one can do:
enable_syscalls: "write",
disable_syscalls: "write$*"
syz-manager used to silently transitively disable syscalls
for which input resources can't be created.
This caused lots of confusion, or worse, users did not notice
that syzkaller does not actually test what they want.
Fail loudly with a readable explanation when a syscall
explicitly enabled in enable_syscalls is actually disabled.
Note: this requires to slightly change enable/disable_syscalls
matching logic. Previously "foo" would match "foo" and all "foo$BAR",
now it matches only "foo". But "foo*" can be used to match all
disciminations.
Use big-endian match/replace for both blobs and ints.
Sometimes we have unmarked blobs (no little/big-endian info);
for ANYBLOBs we intentionally lose all marking;
but even for marked ints we may need this too.
Consider that kernel code does not convert the data
(i.e. not ntohs(pkt->proto) == ETH_P_BATMAN),
but instead converts the constant (i.e. pkt->proto == htons(ETH_P_BATMAN)).
In such case we will see dynamic operand that does not
match what we have in the program.
Type "none" is a special case for debugging/development when manager
does not start any VMs, but instead you start them manually
and start syz-fuzzer there.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
With commit: syz-manager: add simple email support, it will send
emails when a bug is hit for the first time during that particular
run of syz-manager. In other words, if you restart syz-manager and
the same bug is hit, a new email will be sent again. This is due to
the fact that mgr.crashTypes[crash.Title] doesn't keep track of logs
already written to the disk.
Fixed by moving emailCrash() to logic handling log writing.
Fixes#484
Signed-off-by: Tim Tianyang Chen <soapcn@gmail.com>
By default we don't re-minimize/re-smash programs from corpus,
it takes lots of time on start and is unnecessary.
However, when we improve/fix minimization/smashing,
we may want to.
Introduce corpus database versions and allow to re-minimize/re-smash
on version bumps.
Users can specify an email address to reveive notifications when a
bug is discovered for the first time, without setting up a full fledged
dashboard. The supported mailer is mailx.
Signed-off-by: Tim Tianyang Chen <soapcn@gmail.com>
Make it possible to monitor health and operation
of all managers from dashboard.
1. Notify dashboard about internal syz-ci errors
(currently we don't know when/if they happen).
2. Send statistics from managers to dashboard.
Currently getting a complete report requires a complex,
multi-step dance (including getting information that
external users are not interested in -- guilty file).
Simplify interface down to 2 functions: Parse and Symbolize.
Parse does what it did before, Symbolize symbolizes report
and fills in maintainers. This simplifies both implementations
of Reporter interface and all users of the interface.
Potentially we could get this down to 1 function Parse
that does everything. However, (1) Symbolize can fail,
while Parse cannot, (2) usually we want to ignore (log)
Symbolize errors, but otherwise proceed with the report,
(3) repro does not need symbolization for all but the
last report.
Whole raw output is indivisble part of Report,
currently we always pass Output separately along with Report.
Make Output a Report field.
Then, put whole Report into manager Crash and repro context and Result.
There is little point in passing Report as aa bunch of separate fields.
Corrupted reports are usually associated with frequently happenning races.
Since they are frequently happenning, we should get a repro for them
without corrupted reports. Reproducing is expensive, so doing it
when we will the repro anyway is harmful.
This allows callers to get access to Report.Corrupted.
Better than adding 6-th return value and will allow
to pipe other report properties if necessary.
When manager is stopped there are sometimes runaway qemu
processes still running. Set PDEATHSIG for all subprocesses.
We never need child processes outliving parents.
We currently have several names for crash attributes, which is disturbing.
E.g. crash title is called "Title" or "Desc". Name them consistently.
Title - single line bug identity.
Report - whole crash text.
Log - whole fuzzer/kernel output.