We frequently see boot errors like:
[00010.201] 02991.03067> pkgsvr: 2018/06/30 23:39:41 system: failed to set system root from blob "ccbadb3901372b1e0fc5275f627f708bf3e5f3acfb0d4268638db0ff75fc7fd4": file does not exist
or:
[00003.691] 01126.01153> devmgr: launchpad /fs/blob/e66739acdd3d8efa3b7c9021e2107cf8431765c0b8eb0a1ec7f7dc7fd305f2f7 (pkgfs) failed: launchpad_vmo_from_file failure: -40
Presumably clean build may help.
Only akaros needs OS, because the rest assume host OS.
But speciying OS for all OSes breaks patch testing on syzbot
because old execprog does not have os flag.
Provide stats and logs for failed repro and save it in manager.
In particular log is useful for failed repros,
currently there is no visibility into why bugs
failed to reproduce.
Currently we first send on errs and then close env.
As the result process can exit before env.Close finishes,
which will leave garbage behind.
Close env before sending on errs.
RPC package does excessive caching per connection,
so if a larger object is ever sent in any direction,
rpc connection consumes large amount of memory persistently.
This makes manager consume gigs of memory with large
number of VMs and larger corpus/coverage.
Make all communication done in very limited batches.
Both manager and fuzzer consume huge amount of memory
(lots of gigs for manager) due to excessive caching
in rpc connections. Compress traffic to reduce memory
consumption.
It's possible to get no signal from normal coverage due to dedup,
in that case we don't want to add fallback coverage
because it can lead to corpus bloat.
Currently all (linux-specific) suppressions are hardcoded in mgrconfig.
This is very wrong. Move them to pkg/report and allow to specify per OS.
Add gvisor-specific suppressions.
This required a bit of refactoring. Introduce mgrconfig.KernelObj finally.
Make report.NewReporter and vm.Create accept mgrconfig directly
instead of passing it as multiple scattered args.
Remove tools/syz-parse and it always did the same as tools/syz-symbolize.
Simplify global vars in syz-manager/cover.go.
Create reporter eagerly in manager. Use sort.Slice more.
Overall -90 lines removed.
We have fallback coverage implmentation for freebsd.
1. It's broken after some recent changes.
2. We need it for fuchsia, windows, akaros, linux too.
3. It's painful to work with C code.
Move fallback coverage to ipc package,
fix it and provide for all OSes.
We did not handle quoted-printable because mime package handles it.
But we can have a non-mime email in quoted-printable.
Simply handle it always, it's not hard.
Currently host feature detection/setup code is spread
across platform-independent fuzzer code, pkg/host, pkg/ipc
and executor.
Move this all into pkg/host and show readable info
about features on manager start.
Fixes#46
Strictly saying, we may not get the connection when
the fuzzer process exits. The accepting goroutine
may have not been scheduled yet.
For the connection for up to 10 seconds.
We now check for manager-fuzzer-executor commit mismatch (see Manager.Check).
But in some cases commit mismatch is not detected gracefully, and instead
leads to panics in fuzzer. Namely, when -enabled_syscalls fuzzer flag includes
large syscalls numbers, so large that they are no present at all in the an old
revision that fuzzer uses, in such case fuzzer panics.
Notify manager about invalid calls instead.
Fixes#464
We see some crashes that suggest corruption of the syscall number:
invalid command number 1296 (errno 11)
invalid command number 107 (errno 110)
Make the table and the number constant to prevent corruption.
We currently have native cross-compilation logic duplicated
in Makefile and in sys/targets. Some pieces are missed in one
place, some are in another. Only pkg/csource knows how to check
for -static support.
Move all CC/CFLAGS logic to sys/targets and pull results in Makefile.
This should make Makefile work on distros that have broken x86_64-linux-gnu-gcc,
now we will use just gcc. And this removes the need to define NOSTATIC,
as it's always auto-detected.
This also paves the way for making pkg/csource work on OSes other than Linux.
Currently kernel build failures are insanely verbose
(contain full kernel build output) and there is no
way to separate short descriptions from full output.
Make it possible.
Also try to extract failure root cause froom build log.
Use this in pkg/bisect to not pollute log on build failures.
Update #501
With 5 tries sometimes only 1 fails,
and sometimes we probably have false negatives.
Increase number of tries to 8 and compress
results if they all are the same.
Update #501
If SYZ_DISABLE_SANDBOXING=yes is set, don't do user sandboxing.
Will be usefule for bisection tool which runs locally,
but needs to build kernel.
Update #501
Only check that syzkaller path is in GOPATH if we are going to build it.
syz-ci image testing does not have syzkaller path in GOPATH,
but it also does not build syzkaller.
Bisect bisects good..bad commit range against the provided predicate (wrapper around git bisect).
The predicate should return an error only if there is no way to proceed
(it will abort the process), if possible it should prefer to return BisectSkip.
Progress of the process is streamed to the provided trace.
Returns the first commit on which the predicate returns BisectBad.
Update #501
loop is much more standard than nbd and does not require additional modules.
nbd broke on Debian rolling.
loop also allows parallel execution thanks to losetup -f.
Use loop instead of nbd.
Also improve cleanup logic and add one missing sudo.
Update #501
gcc8 is stricter when dealing with strings and strncpy and demands that
the size of the actual string to be copied to be explicitly smaller than
the size of the destination, just to make sure the NULL terminator is
taken into considerantion. This patch fixes the issue.
Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
We call the binary syz-executor because it sometimes shows in bug titles,
and we don't want 2 different bugs for when a crash is triggered during
fuzzing and during repro.
Shallow repos created by CheckoutBranch conflict with
what CheckoutCommit tries to do.
Fetch of a shallow repo does not unshallow it.
And then checkout of a non-head commit fails.
This implements 2 features:
1. It's now possible to specify exact commit when testing as:
2. It's possible to test without patch attached
assuming the patch is already committed to the tested tree.
Fixes#558
This leads to false errors when we are switching between gcc and clang:
kernel build failed: failed to run /usr/bin/make [make bzImage -j 32 CC=/syzkaller/clang-kmsan/bin/clang]: exit status 2
arch/x86/Makefile:184: *** Compiler lacks asm-goto support.. Stop.
Fixes#568
Bridge device is used for forwarding. Bond/team device is used for
load balance and fail over. So it would make more sense to add two
slave interfaces for these devices.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Add a veth pair with name bond/team_slave and set their master
to bond0/team0.
Remove veth from devtypes because the cmd `ip link add veth0 type veth`
will actually failed with "RTNETLINK answers: File exists" and no veth
interface created. When create veth device, kernel will create a
pair of veth, so no need to create them one by one.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
1. If we see should_failslab frames during report parsing,
that's a corrupted report with intermixed frames from
fault injection stack.
2. If we matched report title and this report should contains
a guilty stack frame, but we failed to extract any frame,
consider it as corrupted.
New tests added. Also one of the old tests is fixed.
syz-manager always passes explicit value for the flag.
syz-stress does not need coverage.
The only real user is syz-execprog. syz-execprog already
forces coverage with -coverfile is given. Coverage is harmful
for external users trying to reproduce reported bugs.
For the remaining cases of syzkaller developers running
syz-execprog on KCOV-enabled kernel, the flag can be given
manually if really needed.
Fixes#554
syz-manager used to silently transitively disable syscalls
for which input resources can't be created.
This caused lots of confusion, or worse, users did not notice
that syzkaller does not actually test what they want.
Fail loudly with a readable explanation when a syscall
explicitly enabled in enable_syscalls is actually disabled.
Note: this requires to slightly change enable/disable_syscalls
matching logic. Previously "foo" would match "foo" and all "foo$BAR",
now it matches only "foo". But "foo*" can be used to match all
disciminations.
Now file names become:
string[filename]
with a possibility of using other string features:
stringnoz[filename]
string[filename, CONST_SIZE]
and filename is left as type alias as it is commonly used:
type filename string[filename]
SYS_memfd_create define produces warning in scource
if system headers already contain the definition (we strip all ifdefs!).
The same is true for CLONE_NEWCGROUP but we just never hit it yet.
Also fix format string for 32 bits.
Also fix potential uninit var in csource, and a missing new line.
Turns out creating a cgroup per test is too expensive.
Moreover, it leads to hanged tasks as cgroup destruction
is asynchronous and overloads kernel work queues.
Create only a single cgroup per proc, but restrict
descriptions to mess with that single group,
instead test processes create own nested cgroups for messing.
There is test failure on travis:
https://travis-ci.org/google/syzkaller/jobs/349948391
I can't reproduce it locally, and it only happened on 1.8, but not on 1.9?
But this seems to be what could have provoked such failure.