We are seeing some flakes during bisection and image testing.
Hard to tell what's the root cause because they are episodic.
But using non-preemptible VMs for bisection and image testing
looks good on all fronts.
Update #501
Separate kernel and syzkaller build failures.
Fix logic to understand when a build is fixed:
look if kernel/syzkaller commit changes to understand
if it's a new good build or re-upload of an old build.
Fixes#1014
There is a bit of a mess: dashboard expects the start commit
in build info, but syz-ci sends the resulting cause commit.
Moreover for inconclusive bisection the commit is not filled at all.
Fill start commit in build info on start.
Update #501
Allow separate sets of managers for patch testing and for bisection.
This makes things more flexible on syz-ci deployment side.
Remove previous hacks for bisection deployment.
Update #501
This adds bulk of support for bisection to dashboard/app and syz-ci:
- APIs to send bisection jobs and accept results
- syz-ci logic to execute bisection jobs
- formatting of emails with results
- showing of results on dashboard
Some difficulties we have to overcome:
- since linux is frequently build/boot broken, lots of bisections are inconclusive,
need to present such results too
- git bisect is poorly suitable for automation, have to resort to output parsing (is output stable?)
- git bisect turns out to fail (exit with non-0 status) when bisection is inconclusive
(multiple potential cause commits)
- older syzkaller revisions can't be built with newer (broken) kernel header, e.g.:
ebtables.h:197:19: error: invalid conversion from ‘void*’ to ‘ebt_entry_target*’
- newer compilers produce more warnings and break old syzkaller builds, e.g.:
kvm.S.h:6:12: error: ‘kvm_asm64_vm86’ defined but not used [-Werror=unused-const-variable=]
- figuring relevant emails to CC from a commit is non-trivial:
besides commit author, there can be some emails in commit tags, or not,
which tags to use is an interesting question (some may include irrelevant emails)
we can also run get_maintainers.pl on the commit, but this can produce too wide
list if commit touches lots of files, it can also produce too small list,
and then we need to resort to blame
- for inconclusive bisection we probably don't need to include emails referenced
in the commits (there can be too many of these commits)
- need to be careful to exclude own syzbot email from commit CC list,
now syzbot emails are referenced in some commits (Reported-by/Tested-by/etc)
(can cause some kind of infinite recursion)
- lots of commits reference stable mailing list,
we should not include it in CC because it's referenced for backports rather then bug reports
- since we add new Bug entity fields which we use in queries,
whole datastore need to be upgrades to add the new field to index
- we must not discard the crash that was used for bisection
(treat it as a reported crash)
- bisection results need 2 forms of reports:
one when we add bisection results to already reported bug
another when we report a bug first time with bisection results
- when reporting a bug with bisection results we need to use the crash
that was used for bisection
- some fraction of bisections will probably fail with various errors
and we will need some mechanism to retry bisection after the root cause is resolved
this is not implemented yet
- linux-next is problematic for 2 reasons:
fix bisection can't possibly run on linux-next as commits are not reachable from HEAD
lots of commits are missing in linux-next (even in linux-next-history)
e.g. we have some c63e9e91a254a52 which is now missing in linux-next/linux-next-history
- older kernels can't be build with fresh gcc/binutils/perl/make/glibc
for now we have to stop at v3.9 (this only requires switching gcc several times along the way)
- kernels past v4.11 do not build with gcc 7 and 8 (undefined reference to `____ilog2_NaN')
- v4.1 and back have only compiler-gcc5.h
- v3.17 and back have only compiler-gcc4.h
- v3.6 and back do not have make olddefconfig
- compat socket calls can't be bisected past "x86/entry/syscalls: Wire up 32-bit
direct socket calls" (v4.10) because of
https://syzkaller.appspot.com/bug?id=b5b150e322d5f48c869bcf1528cdbee08d1421cb
- v2.6.28 and below does not work with modern make:
*** mixed implicit and normal rules: deprecated syntax
- v3.8 build fails:
Can't use 'defined(@array)' (Maybe you should just omit the defined()?) at kernel/timeconst.pl line 373.
kernel/Makefile:134: recipe for target 'kernel/timeconst.h' failed
- make 3.81 works for v2.6.28.
3.81 almost works with current HEAD, you need to run make twice because first run spuriously fails with:
- v2.6.28 with gcc-4.9.4 broken with:
include/linux/kvm.h:240:9: error: duplicate member ‘padding’
- but even defconfig fails:
VDSO arch/x86/vdso/vdso.so.dbg
gcc: error: elf_x86_64: No such file or directory
gcc: error: unrecognized command line option ‘-m’
It seems that we also need old binutils.
- for v3.8 and below we need perl-5.14.4.
Unfortunately this or any manually built perl doesn't work for later kernels:
Can't locate strict.pm in @INC
- kernels starting from 4.14 and older are boot broken:
https://lkml.org/lkml/2018/9/7/648
- kernels older than 4.12 are broken during netdev setup
(fixed by commit 675c8da049fd6556eb2d6cdd745fe812752f07a8)
Update #501
Ctrl+C can kill a child process which will cause an error.
Ignore such errors, it's better to retry after restart then
to report a false failure.
Update #501
syzupdater fails to report build errors because it is constructing the
dashapi client with the wrong parameters (it is mixing dashboardAddr
with dashboardClient).
This commit swaps the order of those parameters in syzupdater's
uploadBuildError function.
Fixes#1044
Useful for local testing.
With -autoupdate=0 syz-ci does not need syzkaller repo,
will not poll, build and update itself.
So a binary with local changes can be tested without
pushing changes to some git repo.
This implements 2 features:
- syz-ci polls a set of additional repos to discover fixing commits sooner
(e.g. it can now discover a fixing commit in netfilter tree before
it reaches any of the tested trees).
- syz-ci uploads info about commits to dashboard.
For example, a user marks a bug as fixed by commit "foo: bar".
syz-ci will find this commit in the main namespace repo
and upload commmit hash/date/author to dashboard. This in turn
allows to show links to fixing commits.
Fixes#691Fixes#610
SSH keys are now included at the fx clean-build config.
A proper escape sequence looked weird so use a string literal to pass
that config.
Fixed some typos I found while debugging.
gometalinter says the function is too complex:
syz-ci/manager.go:155:⚠️ cyclomatic complexity 30 of function (*Manager).loop() is high (> 24) (gocyclo)
Split into 2 functions.
We currently have this list in multiple places (somewhat diverged).
Specify this "overcommit" property in VM implementations.
In particular, we also want to allow overcommit for "vmm" type.
Update #712
If update comes in the middle of a long build (bisection),
we will stop all other managers prematurely (bisection can take a day).
So wait for current builds to finish before starting shutdown.
Update #501
mgrconfig was used only by syz-manager initially,
but now it's used by a dozen of packages and it's
weird to import from under a binary dir.
pkg/ is much more reasonable dir for a widely used
helper package.
We append underlying error to the title of boot/test errors.
The error can come from anywhere and can contain dynamic data,
which can cause duplication of bugs.
Put the underlying error into report body instead.