We currently have this list in multiple places (somewhat diverged).
Specify this "overcommit" property in VM implementations.
In particular, we also want to allow overcommit for "vmm" type.
Update #712
The problem with that commit is that for GCE implementation
we immidiately kill console connection too when receive diagnose signal.
This leads to truncated output.
Some kernel bugs don't stop kernel.
For such bugs whiel vm.MonitorExecution waits for kernel output for 10 secs,
fuzzer continues running programs and produces tons of output
after the kernel bug message. Kill fuzzer once MonitorExecution
detects a kernel bug.
Don't connect by hostname, this seems to be broken on GCE.
Episodically connecting by hostname gives:
Could not resolve hostname: Name or service not known
GCE serial reply seems to be buggy, we see lots of "serialport: VM disconnected"
and "packet_write_wait: Connection to 1.2.3.4 port 9600: Broken pipe"
errors, which do not have any explanation.
Ignore all serial relay errors.
Boot and minimally test images before declaring them as good
and switching to using them.
If image build/boot/test fails, upload report about this to dashboard.
Turns out GetSerialPortOutput API does not work if instance has
serial port connections enabled (which we always have).
Get output from serial port relay service instead.
New console output code crashes with nil deref,
because we shadow outer err variable and then
dereference nil err.
Also express ssh connect timeout in real time.
Currently the timeout is on par of ~25 mins
(5s sleep + 10s connect timeout) * 100.
Reduce timeout to 5m of real time.
When manager is stopped there are sometimes runaway qemu
processes still running. Set PDEATHSIG for all subprocesses.
We never need child processes outliving parents.
This is detected with newer Go toolchain:
vm/gce/gce.go:376: Errorf format %v reads arg #1, but call has only 0 args
vm/gce/gce.go:381: Errorf format %v reads arg #1, but call has only 0 args
Sometimes we get truncated console output during repro.
The problem is that we start the console reading ssh command,
but do not wait for it to actually connect and start piping console.
Wait while the command actually starts piping console before
starting the target command.
vm/gce differs from other VM types in that it accepts image
in a weird, GCE-specific format (namely, image named disk.raw
is put into .tar.gz file). This makes it impossible to write
generic code that creates images for any VM types.
Make vm/gce accept just image like e.g. vm/qemu
and handle own specifics internally.
Sshkey is a property of image, which is in manager config.
Move sshkey to the same location as image.
The motivation for the move is as follows.
Continuous build produces an image and the key,
both need to be passed manager instance.
Continuous build system should not distinguish
different VM types and mess with their configs.
NOTE FOR USERS: this breaks manager configs again.
Hopefully the last time for now. Docs are updated.
Currently gce accepts precreated GCE image name as image config param,
while all other VM types accept local file path as image.
This makes it impossible to write generic code that works with all VM types,
i.e. after building a new image it's unclear if it needs to be uploaded
to GCE or not, and what needs to be passed as image in config.
Eliminate this difference by making gce accept local image file as well.
VM infrastructure currently has several problems:
- Config struct is complete mess with a superset of params for all VM types
- verification of Config is mess spread across several places
- there is no place where VM code could do global initialization
like creating GCE connection, uploading GCE image to GCS,
matching adb devices with consoles, etc
- it hard to add private VM implementations
such impl would need to add code to config package
which would lead to constant merge conflicts
- interface for VM implementation is mixed with interface for VM users
this does not allow to provide best interface for both of them
- there is no way to add common code for all VM implementations
This change solves these problems by:
- splitting VM interface for users (vm package) and VM interface
for VM implementations (vmimpl pacakge), this in turn allows
to add common code
- adding Pool concept that allows to do global initialization
and config checking at the right time
- decoupling manager config from VM-specific config
each VM type now defines own config
Note: manager configs need to be changed after this change:
VM-specific parts are moved to own "vm" subobject.
Note: this change also drops "local" VM type.
Its story was long unclear and there is now syz-stress which solves the same problem.
syz-fuzzer never exits (normally) so this does not affect syz-manager.
But during reproduction we can run a short running program (no repeat mode)
and currently VMs treat premature exit as an error.
Properly detect when a program exits and let callers decide what to do with it.
If an image supports all GCE fanciness, we don't need a separate ssh key for it.
It should accept the instance private key that we specify during VM creation.
VM.Close is called when syz-manager terminates on SIGINT.
Waiting for instance deletion in this case is unnecessary,
creation of a new instance will handle deleting instance.
So exit faster.
Log is a simple wrapper around std log package.
It is meant to solve 2 main problems:
1. Logging from non-main packages (mainly, vm/* packages).
Currently they can either always log or not log at all.
But they can't respect program verbosity setting.
Log package allows all packages to use the same verbosity setting.
2. Exposing recent logs in html UI.
Namely we want to tee logs to console and html UI.