Currently we first send on errs and then close env.
As the result process can exit before env.Close finishes,
which will leave garbage behind.
Close env before sending on errs.
RPC package does excessive caching per connection,
so if a larger object is ever sent in any direction,
rpc connection consumes large amount of memory persistently.
This makes manager consume gigs of memory with large
number of VMs and larger corpus/coverage.
Make all communication done in very limited batches.
Both manager and fuzzer consume huge amount of memory
(lots of gigs for manager) due to excessive caching
in rpc connections. Compress traffic to reduce memory
consumption.
It's possible to get no signal from normal coverage due to dedup,
in that case we don't want to add fallback coverage
because it can lead to corpus bloat.
Currently all (linux-specific) suppressions are hardcoded in mgrconfig.
This is very wrong. Move them to pkg/report and allow to specify per OS.
Add gvisor-specific suppressions.
This required a bit of refactoring. Introduce mgrconfig.KernelObj finally.
Make report.NewReporter and vm.Create accept mgrconfig directly
instead of passing it as multiple scattered args.
Remove tools/syz-parse and it always did the same as tools/syz-symbolize.
Simplify global vars in syz-manager/cover.go.
Create reporter eagerly in manager. Use sort.Slice more.
Overall -90 lines removed.
We have fallback coverage implmentation for freebsd.
1. It's broken after some recent changes.
2. We need it for fuchsia, windows, akaros, linux too.
3. It's painful to work with C code.
Move fallback coverage to ipc package,
fix it and provide for all OSes.
We did not handle quoted-printable because mime package handles it.
But we can have a non-mime email in quoted-printable.
Simply handle it always, it's not hard.
Currently host feature detection/setup code is spread
across platform-independent fuzzer code, pkg/host, pkg/ipc
and executor.
Move this all into pkg/host and show readable info
about features on manager start.
Fixes#46
Strictly saying, we may not get the connection when
the fuzzer process exits. The accepting goroutine
may have not been scheduled yet.
For the connection for up to 10 seconds.
We now check for manager-fuzzer-executor commit mismatch (see Manager.Check).
But in some cases commit mismatch is not detected gracefully, and instead
leads to panics in fuzzer. Namely, when -enabled_syscalls fuzzer flag includes
large syscalls numbers, so large that they are no present at all in the an old
revision that fuzzer uses, in such case fuzzer panics.
Notify manager about invalid calls instead.
Fixes#464
We see some crashes that suggest corruption of the syscall number:
invalid command number 1296 (errno 11)
invalid command number 107 (errno 110)
Make the table and the number constant to prevent corruption.
We currently have native cross-compilation logic duplicated
in Makefile and in sys/targets. Some pieces are missed in one
place, some are in another. Only pkg/csource knows how to check
for -static support.
Move all CC/CFLAGS logic to sys/targets and pull results in Makefile.
This should make Makefile work on distros that have broken x86_64-linux-gnu-gcc,
now we will use just gcc. And this removes the need to define NOSTATIC,
as it's always auto-detected.
This also paves the way for making pkg/csource work on OSes other than Linux.
Currently kernel build failures are insanely verbose
(contain full kernel build output) and there is no
way to separate short descriptions from full output.
Make it possible.
Also try to extract failure root cause froom build log.
Use this in pkg/bisect to not pollute log on build failures.
Update #501
With 5 tries sometimes only 1 fails,
and sometimes we probably have false negatives.
Increase number of tries to 8 and compress
results if they all are the same.
Update #501
If SYZ_DISABLE_SANDBOXING=yes is set, don't do user sandboxing.
Will be usefule for bisection tool which runs locally,
but needs to build kernel.
Update #501
Only check that syzkaller path is in GOPATH if we are going to build it.
syz-ci image testing does not have syzkaller path in GOPATH,
but it also does not build syzkaller.
Bisect bisects good..bad commit range against the provided predicate (wrapper around git bisect).
The predicate should return an error only if there is no way to proceed
(it will abort the process), if possible it should prefer to return BisectSkip.
Progress of the process is streamed to the provided trace.
Returns the first commit on which the predicate returns BisectBad.
Update #501
loop is much more standard than nbd and does not require additional modules.
nbd broke on Debian rolling.
loop also allows parallel execution thanks to losetup -f.
Use loop instead of nbd.
Also improve cleanup logic and add one missing sudo.
Update #501
gcc8 is stricter when dealing with strings and strncpy and demands that
the size of the actual string to be copied to be explicitly smaller than
the size of the destination, just to make sure the NULL terminator is
taken into considerantion. This patch fixes the issue.
Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
We call the binary syz-executor because it sometimes shows in bug titles,
and we don't want 2 different bugs for when a crash is triggered during
fuzzing and during repro.
Shallow repos created by CheckoutBranch conflict with
what CheckoutCommit tries to do.
Fetch of a shallow repo does not unshallow it.
And then checkout of a non-head commit fails.
This implements 2 features:
1. It's now possible to specify exact commit when testing as:
2. It's possible to test without patch attached
assuming the patch is already committed to the tested tree.
Fixes#558
This leads to false errors when we are switching between gcc and clang:
kernel build failed: failed to run /usr/bin/make [make bzImage -j 32 CC=/syzkaller/clang-kmsan/bin/clang]: exit status 2
arch/x86/Makefile:184: *** Compiler lacks asm-goto support.. Stop.
Fixes#568
Bridge device is used for forwarding. Bond/team device is used for
load balance and fail over. So it would make more sense to add two
slave interfaces for these devices.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Add a veth pair with name bond/team_slave and set their master
to bond0/team0.
Remove veth from devtypes because the cmd `ip link add veth0 type veth`
will actually failed with "RTNETLINK answers: File exists" and no veth
interface created. When create veth device, kernel will create a
pair of veth, so no need to create them one by one.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
1. If we see should_failslab frames during report parsing,
that's a corrupted report with intermixed frames from
fault injection stack.
2. If we matched report title and this report should contains
a guilty stack frame, but we failed to extract any frame,
consider it as corrupted.
New tests added. Also one of the old tests is fixed.
syz-manager always passes explicit value for the flag.
syz-stress does not need coverage.
The only real user is syz-execprog. syz-execprog already
forces coverage with -coverfile is given. Coverage is harmful
for external users trying to reproduce reported bugs.
For the remaining cases of syzkaller developers running
syz-execprog on KCOV-enabled kernel, the flag can be given
manually if really needed.
Fixes#554
syz-manager used to silently transitively disable syscalls
for which input resources can't be created.
This caused lots of confusion, or worse, users did not notice
that syzkaller does not actually test what they want.
Fail loudly with a readable explanation when a syscall
explicitly enabled in enable_syscalls is actually disabled.
Note: this requires to slightly change enable/disable_syscalls
matching logic. Previously "foo" would match "foo" and all "foo$BAR",
now it matches only "foo". But "foo*" can be used to match all
disciminations.
Now file names become:
string[filename]
with a possibility of using other string features:
stringnoz[filename]
string[filename, CONST_SIZE]
and filename is left as type alias as it is commonly used:
type filename string[filename]
SYS_memfd_create define produces warning in scource
if system headers already contain the definition (we strip all ifdefs!).
The same is true for CLONE_NEWCGROUP but we just never hit it yet.
Also fix format string for 32 bits.
Also fix potential uninit var in csource, and a missing new line.
Turns out creating a cgroup per test is too expensive.
Moreover, it leads to hanged tasks as cgroup destruction
is asynchronous and overloads kernel work queues.
Create only a single cgroup per proc, but restrict
descriptions to mess with that single group,
instead test processes create own nested cgroups for messing.
There is test failure on travis:
https://travis-ci.org/google/syzkaller/jobs/349948391
I can't reproduce it locally, and it only happened on 1.8, but not on 1.9?
But this seems to be what could have provoked such failure.
We use errno, vaargs, printf in all of fail/error/exitf,
but we include the corresponding headers only when SYZ_USE_TMP_DIR.
Include them whenever fail/error/exitf are used.
The new pseudo syscall allows opening sockets that can only
be created in init net namespace (BLUETOOTH, NFC, LLC).
Use it to open these sockets.
Unfortunately this only works with sandbox none at the moment.
The problem is that setns of a network namespace requires CAP_SYS_ADMIN
in the target namespace, and we've lost all privs in the init namespace
during creation of a user namespace.
We now always create net namespace for testing,
so socket ports and other IDs do not overlap between
different test processes.
Proc types play badly with squashing packets to ANYBLOB.
To squash into a block we need concrete value, but it depends
on process id.
Removing proc also makes tun setup and address descriptions simpler.
Currently when executor creates fd's it gets: 0, 3, 4.
When tun is enabled: 3, 4, 5.
For C programs: 3, 4, 5.
When run is enabled: 4, 5, 6.
Theoretically it should not matter,
but these fd numbers are probably sometimes are used as data.
So make them consistent in all these cases (3, 4, 5).
We currently use -1 as default value for resources
when the actual value is not available.
-1 is good for fd's, but is not the right default
value for pointers/keys/etc.
Pass from prog and use in executor proper default
value for resources.
Fix alignemnt calculation for packed structs with alignment and bitfields.
Amusingly this affected only a single real struct -- ipv6_fragment_ext_header.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
Detect informational kernel reports that are not bugs in itself,
but contain stack traces. If we see them in the middle of another
report, we know stacks are intermixed and the report is potentially
corrupted.
Put the underflow entry at the end.
Entries must end on an unconditional, non-goto entry,
otherwise fallthrough from the last entry is invalid.
Add arp tables support.
Split unspec matches/targets to unspec and inet.
Reset ipv6 and arp tables in executor.
Fix number of counters in tables.
Plus a bunch of assorted fixes for matches/targets.
1. Make extractStackFrame more picky about stray frames.
This fixes some TODO's in tests where we matched completley
unrelated frames printed by another task.
2. Extract KASAN guilty frame from report header
if the frame should not be skipped (e.g. not __lock_acquire).
This makes parsing more tolerant to corrupted reports.
If there are more than one report, detect where the second
report starts and extract description only from the first report.
There are too many cases where several reports gets intermixed
and as the result we extract bogus description.
1. Replace stacktraceRe with custom code which is more flexible.
stacktraceRe stumbled on any unrelated lines and
could not properly parse truncated stacks.
2. Match report regexp earlier.
If we match simler title regexp, but don't match
report regexp or fail to parse stack trace, the report is corrupted.
This eliminates lots of duplicate corrupted oops entries,
which were there only because we had complex regexp's in titles.
3. Ignore low-level frames during stack parsing.
E.g. we never want to report a GPF in lock_acquire or memcpy
(somewhat similar to what we do for guilty files).
4. Add a bunch of specialized formats for WARNINGs.
There is number of generic debugging facilities (like ODEBUG,
debug usercopy, kobject, refcount_t, etc), and the bug
is never in these facilities, it's in the caller instead.
5. Improve some other oops formats.
6. Add a bunch of additional tests.
This resolves most of TODOs in tests.
Fixes#515
We currently print unsupported consts to console during make extract.
But this is not very useful as there are too many output now.
This also does not allow to understand what's unsupported
in newly checked-in descriptions, or what's unsupported in all current
decriptions.
Save unsupported consts to the const files instead.
This solves all of the above problems.
Unions with only 1 field are not actually unions,
and can always be replaced with the option type.
However, they are still useful when there will be
more options in future but currently only 1 is described.
Alternatives are:
- not using union (but then all existing programs will be
broken when union is finally introduced)
- adding a fake field (ugly and reduces fuzzer efficiency)
Allow unions with only 1 field.
Consider the following example:
type len_templ1[DATA1, DATA2] {
data DATA1
inner len_temp2[DATA2]
}
type len_temp2[DATA] {
data DATA
len len[len_templ1, int8]
}
Here len refers to a parent struct, but the struct is a template,
so it's actual name is something like "len_templ1[int8, int16]".
Currently this does not work as compiler barks at incorrect
len target.
Make this work.
Collect kernel build commit title/date.
Add support for kernel repo aliases (to be able
to say linux-next instead of full git repo address).
Collect on what managers a bug happened.
Reuse Crash.ReportLen as generic crash reporting priority.
Make it possible to prioritize reporting of particular
kernel repos and arches.
Fixes#473
Currently we use the latest syzkaller commit that syz-ci uses itself.
As the result syz-execprog can fail to deserialize the reproducer.
Use the original syzkaller commit.
For sandbox=namespace we first create network devices
and then do CLONE_NEWNS, which brings us into a new
namespace which actually does not have any of these devices.
Tun mostly worked, because we hold fd to the tun device.
However, even for tun we could not see the "syz0" device.
We test in a new network namespace, which does not have any
devices set up (even lo). Create/up as many devices as possible.
Give them some addresses and use these addresses in descriptions.
Netlink descriptions contain tons of code duplication,
and need much more for proper descriptions. Introduce
type templates to simplify writing such descriptions
and remove code duplication.
Note: type templates are experimental, have poor error handling
and are subject to change.
Type templates can be declared as follows:
```
type buffer[DIR] ptr[DIR, array[int8]]
type fileoff[BASE] BASE
type nlattr[TYPE, PAYLOAD] {
nla_len len[parent, int16]
nla_type const[TYPE, int16]
payload PAYLOAD
} [align_4]
```
and later used as follows:
```
syscall(a buffer[in], b fileoff[int64], c ptr[in, nlattr[FOO, int32]])
```