1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
Detect informational kernel reports that are not bugs in itself,
but contain stack traces. If we see them in the middle of another
report, we know stacks are intermixed and the report is potentially
corrupted.
Put the underflow entry at the end.
Entries must end on an unconditional, non-goto entry,
otherwise fallthrough from the last entry is invalid.
Add arp tables support.
Split unspec matches/targets to unspec and inet.
Reset ipv6 and arp tables in executor.
Fix number of counters in tables.
Plus a bunch of assorted fixes for matches/targets.
1. Make extractStackFrame more picky about stray frames.
This fixes some TODO's in tests where we matched completley
unrelated frames printed by another task.
2. Extract KASAN guilty frame from report header
if the frame should not be skipped (e.g. not __lock_acquire).
This makes parsing more tolerant to corrupted reports.
If there are more than one report, detect where the second
report starts and extract description only from the first report.
There are too many cases where several reports gets intermixed
and as the result we extract bogus description.
1. Replace stacktraceRe with custom code which is more flexible.
stacktraceRe stumbled on any unrelated lines and
could not properly parse truncated stacks.
2. Match report regexp earlier.
If we match simler title regexp, but don't match
report regexp or fail to parse stack trace, the report is corrupted.
This eliminates lots of duplicate corrupted oops entries,
which were there only because we had complex regexp's in titles.
3. Ignore low-level frames during stack parsing.
E.g. we never want to report a GPF in lock_acquire or memcpy
(somewhat similar to what we do for guilty files).
4. Add a bunch of specialized formats for WARNINGs.
There is number of generic debugging facilities (like ODEBUG,
debug usercopy, kobject, refcount_t, etc), and the bug
is never in these facilities, it's in the caller instead.
5. Improve some other oops formats.
6. Add a bunch of additional tests.
This resolves most of TODOs in tests.
Fixes#515
We currently print unsupported consts to console during make extract.
But this is not very useful as there are too many output now.
This also does not allow to understand what's unsupported
in newly checked-in descriptions, or what's unsupported in all current
decriptions.
Save unsupported consts to the const files instead.
This solves all of the above problems.
Unions with only 1 field are not actually unions,
and can always be replaced with the option type.
However, they are still useful when there will be
more options in future but currently only 1 is described.
Alternatives are:
- not using union (but then all existing programs will be
broken when union is finally introduced)
- adding a fake field (ugly and reduces fuzzer efficiency)
Allow unions with only 1 field.
Consider the following example:
type len_templ1[DATA1, DATA2] {
data DATA1
inner len_temp2[DATA2]
}
type len_temp2[DATA] {
data DATA
len len[len_templ1, int8]
}
Here len refers to a parent struct, but the struct is a template,
so it's actual name is something like "len_templ1[int8, int16]".
Currently this does not work as compiler barks at incorrect
len target.
Make this work.
Collect kernel build commit title/date.
Add support for kernel repo aliases (to be able
to say linux-next instead of full git repo address).
Collect on what managers a bug happened.
Reuse Crash.ReportLen as generic crash reporting priority.
Make it possible to prioritize reporting of particular
kernel repos and arches.
Fixes#473
Currently we use the latest syzkaller commit that syz-ci uses itself.
As the result syz-execprog can fail to deserialize the reproducer.
Use the original syzkaller commit.
For sandbox=namespace we first create network devices
and then do CLONE_NEWNS, which brings us into a new
namespace which actually does not have any of these devices.
Tun mostly worked, because we hold fd to the tun device.
However, even for tun we could not see the "syz0" device.
We test in a new network namespace, which does not have any
devices set up (even lo). Create/up as many devices as possible.
Give them some addresses and use these addresses in descriptions.
Netlink descriptions contain tons of code duplication,
and need much more for proper descriptions. Introduce
type templates to simplify writing such descriptions
and remove code duplication.
Note: type templates are experimental, have poor error handling
and are subject to change.
Type templates can be declared as follows:
```
type buffer[DIR] ptr[DIR, array[int8]]
type fileoff[BASE] BASE
type nlattr[TYPE, PAYLOAD] {
nla_len len[parent, int16]
nla_type const[TYPE, int16]
payload PAYLOAD
} [align_4]
```
and later used as follows:
```
syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]])
```
Don't print object size (can change from kernel to kernel
and from config to config).
Fix function extraction regexp (must be non-eager).
Account for MSECS_MIN_AGE.
Ignore some known false positives.
This adds builtin:
type bool8 int8[0:1]
type bool16 int16[0:1]
type bool32 int32[0:1]
type bool64 int64[0:1]
type boolptr intptr[0:1]
We used to use just int's for bools.
But bool types provide several advantages:
- make true/false probability equal
- improve description expressiveness
- reduce search space (we will take advantage of this later)
Complex types that are often repeated can be given short type aliases using the
following syntax:
```
type identifier underlying_type
```
For example:
```
type signalno int32[0:65]
type net_port proc[20000, 4, int16be]
```
Then, type alias can be used instead of the underlying type in any contexts.
Underlying type needs to be described as if it's a struct field, that is,
with the base type if it's required. However, type alias can be used as syscall
arguments as well. Underlying types are currently restricted to integer types,
`ptr`, `ptr64`, `const`, `flags` and `proc` types.
On another machine both clang and gcc produce:
test.c:163:32: error: invalid suffix "+procid" on integer constant
*(uint32_t*)0x20001004 = 0x25dfdbfe+procid*4;
Not sure why this wasn't caught on buildbot.
Remove dup newlines around includes.
Makes int values shorter if not hurting readability.
Increase line len to 80.
Remove {} when not needed during copyout.
The "define uint64_t unsigned long long" were too good to work.
With a different toolchain I am getting:
cstdint:69:11: error: expected unqualified-id
using ::uint64_t;
^
executor/common.h:34:18: note: expanded from macro 'uint64_t'
Do it the proper way: introduce uint64/32/16/8 types and use them.
pkg/csource then does s/uint64/uint64_t/ to not clutter code with
additional typedefs.
Even if all 3 levels of processes in executor exit,
execprog will still recreate them.
Model the same in csource.
This matters when the inner process kills loop
and then everything stops.
I see a crash which says:
#0: too much cover 0 (errno 0)
while the code is:
uint64_t n = ...;
if (n >= kCoverSize)
fail("#%d: too much cover %u", th->id, n);
It seems that the high part of n is set, but we don't see it.
Add printf format attribute to fail and friends and fix all similar cases.
Caught a bunch of similar cases and a missing argument in:
exitf("opendir(%s) failed due to NOFILE, exiting");
Support the new scheme of associating fixing commits with bugs.
Now we provide a tag along the lines of:
Reported-by: <syzbot+a4a91f6fc35e102@syzkaller.appspotmail.com>
The tag is supposed to be added to the commit.
Then we parse commit logs and extract these tags.
The final part on the dashboard is not ready yet,
but syz-ci should already parse and send the tags.
Currently csource uses completely different, simpler way of scheduling
syscalls onto threads (thread per call with random sleeps).
Mimic the way calls are scheduled in executor.
Fixes#312
Generated program always uses pid=0 even when there are multiple processes.
Make each process use own pid.
Unfortunately required to do quite significant changes to prog,
because the current format only supported fixed pid.
Fixes#490
We always set RLIMIT_AS to 128MB. I've debugged a program with 21 syscalls.
With collide it creates 42 threads. With default stack size of 8MB this
requires: 42*8 = 336MB. Thread creation fails and nothing works.
Limit thread stacks the same way executor does.
Fixes#488
By default we don't re-minimize/re-smash programs from corpus,
it takes lots of time on start and is unnecessary.
However, when we improve/fix minimization/smashing,
we may want to.
Introduce corpus database versions and allow to re-minimize/re-smash
on version bumps.
syz-fuzzer organically grew from a small nice main function
into a huge single-file monster with tons of global state.
Start refactoring it into something more managable.
This change separates 2 things:
1. Proc: a single fuzzing process (ipc.Env wrapper).
2. WorkQueue: holds global non-fuzzing work items.
More work needed, but this is good first step.
Factor out program parsing from pkg/csource.
csource code that parses program and at the same time
formats output is very messy and complex.
New aproach also allows to understand e.g.
when a call has copyout instructions which is
useful for better C source output.
csource.go is too large and messy.
Move Build/Format into buid.go.
Move generation of common header into common.go.
Split generation of common header into smaller managable functions.
Currently threaded/collide are global environment flags.
It can be useful to turn off collider during some executions
(minimization, triage, etc).
Make them per-program options.
That was the last test that used inline input data.
Merge it into TestParse.
Test Output for all crashes in TestParse.
Support multiple oopes in crash
Add more test cases for start/end line.
linux_test.go is total mess and very hard to work with.
Turns out we had 2 tests that do exactly the same
(verify Report), but nobody ever noticed.
Move all test data to testdir/. One file per crash.
linux_test.go is total mess and very hard to work with.
Turns out we had 2 tests that do exactly the same
(verify Report), but nobody ever noticed.
Move all test data to testdir/. One file per crash.
Allow stack traces to be intermixed with random kernel messages that don't
start with a ' ' char (all frames in a stack trace do).
Also improve report headers for BUGs from mm/usercopy.c, as we get quite a
lot of those.
Try extracting report from console output only first. If that doesn't work,
try extracting it from the whole log.
Add regexp for executor printed BUGs.
Optimize regexps for rcu detected stalls.
Update rep.StartPos and rep.EndPos in vm/vm.go as well as rep.Output.
exitf function was not defined with some combinations of options in csource.
Fix defines and switch exitf back to fail, fail already checks ENOMEM/EAGAIN,
so there is no reason to use exitf in this particular case.
Boot and minimally test images before declaring them as good
and switching to using them.
If image build/boot/test fails, upload report about this to dashboard.
Currently getting a complete report requires a complex,
multi-step dance (including getting information that
external users are not interested in -- guilty file).
Simplify interface down to 2 functions: Parse and Symbolize.
Parse does what it did before, Symbolize symbolizes report
and fills in maintainers. This simplifies both implementations
of Reporter interface and all users of the interface.
Potentially we could get this down to 1 function Parse
that does everything. However, (1) Symbolize can fail,
while Parse cannot, (2) usually we want to ignore (log)
Symbolize errors, but otherwise proceed with the report,
(3) repro does not need symbolization for all but the
last report.
Whole raw output is indivisble part of Report,
currently we always pass Output separately along with Report.
Make Output a Report field.
Then, put whole Report into manager Crash and repro context and Result.
There is little point in passing Report as aa bunch of separate fields.
Turns out GetSerialPortOutput API does not work if instance has
serial port connections enabled (which we always have).
Get output from serial port relay service instead.
This allows callers to get access to Report.Corrupted.
Better than adding 6-th return value and will allow
to pipe other report properties if necessary.
When manager is stopped there are sometimes runaway qemu
processes still running. Set PDEATHSIG for all subprocesses.
We never need child processes outliving parents.
We currently have several names for crash attributes, which is disturbing.
E.g. crash title is called "Title" or "Desc". Name them consistently.
Title - single line bug identity.
Report - whole crash text.
Log - whole fuzzer/kernel output.
For some racy bugs syzkaller can generate a C reproducer with tun
enabled, when it's not actuallly required to trigger the bug.
Some kernel developers (that don't have CONFIG_TUN=y on their setups)
complain about such C repros.
When tun is not available, instead of exiting, print a message that tun
initialization failed and proceed.
syz-execprog doesn't utilize info about fault injections from a prog log.
Since syz-execprog is used by the repro package to reproduce crashes,
crashes caused by fault injections might not reproduce.
1. Fetch last 200K commits instead of commits for past year.
For merged commits both author date and commit date can be
arbitrary long in past (e.g. we got a commit dated by 2014).
2. Strip some commit prefixes from commits.
We have some trees where backports are prefixed with "BACKPORT:".
Previously we could no match such commits.
There is no need to specify '-' as the filename for sed(1):
- The default behavior is to read stdin
- It was not done in all places
- It breaks on NetBSD sed(1) (although I am tempted to fix it now :-)
and it does not work
1. Allows sending emails upstream.
2. Filter out duplicate emails coming from our mailing lists.
3. Increase retry attempts for email commands
(don't want them to fail due to concurrent crash reports from managers).
Including access size potentially leads to failure to deduplicate
reports when size comes from user or for racy bugs (bug is detected
on different accesses depending on timings).
We already drop size from UAF and OOB, drop it for other bug types.
We currently use more complex and functional protocol on linux,
and a simple ad-hoc protocol on other OSes.
This leads to code duplication in both ipc and executor.
Linux supports coverage, shared memory communication and fork server,
which would also be useful for most other OSes.
Unify communication protocol and parametrize it by
(1) use of shmem or only pipes, (2) use of fork server.
This reduces duplication in ipc and executor and will
allow to support the useful features for other OSes easily.
Finally, this fixes akaros support as it currently uses
syz-stress running on host (linux) and executor running on akaros.
We print all other output to stderr, write debug output to stderr as well.
This does not matter for the main use case of running syz-execprog -debug,
but can is helpful if we want to communicate with syz-executor via stdin/stdout.
Currently we always send 2MB of data to executor in ipc_simple.go.
Send only what's consumed by the program, and don't send the trailing zeros.
Serialized programs usually take only few KBs.
We currently return raw error, so sometimes it's hard to tell
even what call produced the error (e.g. just "invalid argument").
Extend the error so that it's clear that it comes from cmd.Start.
The call index check episodically fails:
2017/10/02 22:07:32 bad call index 1, calls 1, program:
under unknown circumstances. I've looked at the code again
and don't see where/how we can mess CallIndex.
Added a new test for minimization that especially checks resulting
CallIndex.
It would be good to understand what happens, but we don't have
any reproducers. CallIndex is actually unused at this point.
Manager only needs call name. So remove CallIndex entirely.
A recent linux commit "tun: enable napi_gro_frags() for TUN/TAP driver"
added support for fragmentation when emitting packets via tun.
Support this feature in syz_emit_ethernet.
This breaks circular dependency between:
sysgen -> sys/linux -> sys -> sysgen
With this circular dependency it is very difficult to
update format of generated descriptions because sysgen does not build.
We used to generate them only because manager had no idea
what arch it is testing. So syscalls numbers had to match
between all arches.
This is not needed anymore.
Also don't generate unreferenced structs/resources.
If kernel or syzkaller binaries are rebuilt when manager uses them,
nothing good will happen. Manager can start mixing coverage from
old and new kernels, or crash on unknown syscalls.
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
ptr64 is like ptr, but always takes 8 bytes of space.
Needed for some APIs. Unfortunately, most of these APIs
use buffer type, so we can't use ptr64 immidiately.
We have implemented a new version of KCOV, which is able to dump
comparison operands' data, obtained from Clang's instrumentation hooks
__sanitizer_cov_trace_cmp[1248], __sanitizer_cov_trace_const_cmp[1248]
and __sanitizer_cov_trace_switch.
Current KCOV implementation can work in two modes: "Dump only the PCs"
or "Dump only comparisons' data". Mode selection is done by the
following series of calls:
fd = open(KCOV_PATH, ...); // works as previous
ioctl(fd, KCOV_INIT_TRACE, ...); // works as previous
mmap(fd, ...); // works as previous
ioctl(fd, KCOV_ENABLE, mode);
// mode = KCOV_MODE_TRACE_CMP or mode = KCOV_MODE_TRACE_PC
Note that this new interface is backwards compatible, as old KCOV
devices will just return -EINVAL for the last ioctl. This way we can
distinguish if the KCOV device is able to dump the comparisons.
Main changes in this commit:
1. Fuzzer now checks at startup which type (new/old) of KCOV device
is running.
2. Executor now receives an additional flag, which indicates if
executor should read the comparisons data from KCOV. The flag works on
per-call basis, so executor can collect PCs or Comps for each
individual syscall.
Currently unsupported consts in structs and resources break build.
However, that can well happen for arch-specific devices (e.g. Android).
Make this non-fatal as it used to be.
We currently use uintptr for all values.
This won't work for 32-bit archs.
Moreover in some cases we use uintptr but assume
that it is always 64-bits (e.g. in encodingexec).
Switch everything to uint64.
Update #324
The old parser in sys/sysparser is too hacky, difficult to extend
and drops debug info too early, so that we can't produce proper error messages.
Add a new parser that is build like a proper language parser
and preserves full debug info for every token.
- save Message-ID and use In-Reply-To in subsequent messages
- remember additional CC entries added manually
- don't mail to maintainers if maintainers list is empty
- improve mail formatting and add a footer
- implement upstream/fix/dup/invalid commands over email
- add tests
On most distributions default grub target is i386-pc, which works.
However, on some default is x86_64-efi, which fails with:
grub-install: error: cannot find EFI directory.
Explicitly specify i386-pc target.
Repro can generate Sandbox="namespace"/UseTmpDir=false.
This combination is broken for two reasons:
- on second and subsequent executions of the program,
it fails to create syz-tmp dir
- with Procs>1, it fails right away, because all procs
try to create syz-tmp dir
Don't generate such combination.
We currently have 2 invalid options combinations:
- collide without threads
- procs>1 without repeat
They are invalid in the sense that result of csource.Write
is the same for them. Filter out these combinations.
This cuts csource testing time in half and reduces repro minimization time.
Locking memory is a reasonably legitimate local DoS vector.
E.g. bpf maps allow allocation of large chunks of kernel memory
without RLIMIT_MEMLOCK, which leads to hangups.
Set RLIMIT_MEMLOCK=8MB in executor.
Currently hub allows managers to exchange programs from corpus.
But reproducers are not exchanged and we don't know if a crash
happens on other managers as well or not.
Allow hub to exchange reproducers.
Reproducers are stored in a separate db file with own sequence numbers.
This allows to throttle distribution of reproducers to managers,
so that they are not overloaded with reproducers and don't lose them on restarts.
Based on patch by Andrey Konovalov:
https://github.com/google/syzkaller/pull/325Fixes#282
Dashboard needs to know when bug fixing commits reach
builders in order to fully close bugs.
Send commits that dashboard is interested in to dashboard.
We can't know the exact values of those sleeps in advance, they can be
different for different bugs. Making them random increases the chance that
the C repro executes with the right timings at some point.
Sometimes C reproducers don't work after the generic prog options were
simplified. This change makes syzkaller to try extracting a C repro before
simplifying prog options and after each simplification step. This gives
us more chance to generate a C reproducer.
If the script is aborted at an unfortunate point, it leaves the whole system broken.
E.g. we've seen that fdisk cannot update partition table until the next reboot.
If you really need to kill it, use a different signal. But better wait.
We can start reproducing one crash, but end up reproducing another.
Currently we still attribute the resulting repro to the original crash.
This is wrong.
Save the resulting desc/report for reproducers and use that in manager.
Currently we have unix permissions for new files/dirs
hardcoded throughout the code base. Some places use 0644,
some - 0640, some - 0600 and a variety of other constants.
Introduce osutil.MkdirAll/WriteFile that use the default
permissions and use them throughout the code base.
This makes permissions consistent and also allows to easily
change the permissions later if we change our minds.
Also merge pkg/fileutil into pkg/osutil as they become
dependent on each other. The line between them was poorly
defined anyway as both operate on files.
If panic_on_warn set, then we frequently have 2 stacks:
one for the actual report (or maybe even more than one),
and then one for panic caused by panic_on_warn. This makes
reports unnecessary long and the panic (current) stack
is always present in the actual report. So we strip the
panic message. However, we check that we have enough lines
before the panic, because sometimes we have, for example,
a single WARNING line without a stack and then the panic
with the stack.
ParsePatch is used by appengine app.
Appengine apps can't depend on syscall/unsafe,
but pkg/kernel currently does.
Move patch parsing to pkg/email which does not
depend on syscall/unsafe.
We usually store reports as []byte, not as string. They can be large.
So change arg type to []byte.
Also rename it from log to report. In our terminology log is
not symblized/processed crash output. What this function wants
is called report in manager.