Try extracting report from console output only first. If that doesn't work,
try extracting it from the whole log.
Add regexp for executor printed BUGs.
Optimize regexps for rcu detected stalls.
Update rep.StartPos and rep.EndPos in vm/vm.go as well as rep.Output.
GCE serial reply seems to be buggy, we see lots of "serialport: VM disconnected"
and "packet_write_wait: Connection to 1.2.3.4 port 9600: Broken pipe"
errors, which do not have any explanation.
Ignore all serial relay errors.
Make it possible to monitor health and operation
of all managers from dashboard.
1. Notify dashboard about internal syz-ci errors
(currently we don't know when/if they happen).
2. Send statistics from managers to dashboard.
Boot and minimally test images before declaring them as good
and switching to using them.
If image build/boot/test fails, upload report about this to dashboard.
Currently getting a complete report requires a complex,
multi-step dance (including getting information that
external users are not interested in -- guilty file).
Simplify interface down to 2 functions: Parse and Symbolize.
Parse does what it did before, Symbolize symbolizes report
and fills in maintainers. This simplifies both implementations
of Reporter interface and all users of the interface.
Potentially we could get this down to 1 function Parse
that does everything. However, (1) Symbolize can fail,
while Parse cannot, (2) usually we want to ignore (log)
Symbolize errors, but otherwise proceed with the report,
(3) repro does not need symbolization for all but the
last report.
Whole raw output is indivisble part of Report,
currently we always pass Output separately along with Report.
Make Output a Report field.
Then, put whole Report into manager Crash and repro context and Result.
There is little point in passing Report as aa bunch of separate fields.
Turns out GetSerialPortOutput API does not work if instance has
serial port connections enabled (which we always have).
Get output from serial port relay service instead.
This allows callers to get access to Report.Corrupted.
Better than adding 6-th return value and will allow
to pipe other report properties if necessary.
New console output code crashes with nil deref,
because we shadow outer err variable and then
dereference nil err.
Also express ssh connect timeout in real time.
Currently the timeout is on par of ~25 mins
(5s sleep + 10s connect timeout) * 100.
Reduce timeout to 5m of real time.
When manager is stopped there are sometimes runaway qemu
processes still running. Set PDEATHSIG for all subprocesses.
We never need child processes outliving parents.
We currently have several names for crash attributes, which is disturbing.
E.g. crash title is called "Title" or "Desc". Name them consistently.
Title - single line bug identity.
Report - whole crash text.
Log - whole fuzzer/kernel output.
Frequently it's the same condition.
In one case there is just a stray error message on console
that turns the crash into "not executing programs".
While in another case there is no stray message,
and then it's detected as "no output".
This is detected with newer Go toolchain:
vm/gce/gce.go:376: Errorf format %v reads arg #1, but call has only 0 args
vm/gce/gce.go:381: Errorf format %v reads arg #1, but call has only 0 args
Do not fail a reboot if the reboot command returns an error. Reduces the
wait time per ssh commands to 30 seconds.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Sometimes we get truncated console output during repro.
The problem is that we start the console reading ssh command,
but do not wait for it to actually connect and start piping console.
Wait while the command actually starts piping console before
starting the target command.
Add a new isolated VM for machines that you cannot easily manage. It
assumes the machine is only available through SSH and create a reverse
proxy to ensure the machine can connect back to syz-manager.
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Currently we have unix permissions for new files/dirs
hardcoded throughout the code base. Some places use 0644,
some - 0640, some - 0600 and a variety of other constants.
Introduce osutil.MkdirAll/WriteFile that use the default
permissions and use them throughout the code base.
This makes permissions consistent and also allows to easily
change the permissions later if we change our minds.
Also merge pkg/fileutil into pkg/osutil as they become
dependent on each other. The line between them was poorly
defined anyway as both operate on files.
Add a new VM option:
// Ensure that a device battery level is at 20+% before fuzzing.
// Sometimes we observe that a device can't charge during heavy fuzzing
// and eventually powers down (which then requires manual intervention).
// This option is enabled by default. Turn it off if your devices
// don't have battery service, or it causes problems otherwise.
Battery_Check bool
Fixes#258
* Port console to Darwin
* Get syz-executor to build correctly
* Do not export unix and syscall constants
* Add presubmit test
* Add myself to contributors
vm/gce differs from other VM types in that it accepts image
in a weird, GCE-specific format (namely, image named disk.raw
is put into .tar.gz file). This makes it impossible to write
generic code that creates images for any VM types.
Make vm/gce accept just image like e.g. vm/qemu
and handle own specifics internally.
Sshkey is a property of image, which is in manager config.
Move sshkey to the same location as image.
The motivation for the move is as follows.
Continuous build produces an image and the key,
both need to be passed manager instance.
Continuous build system should not distinguish
different VM types and mess with their configs.
NOTE FOR USERS: this breaks manager configs again.
Hopefully the last time for now. Docs are updated.