We have _some_ limits on program length, but they are really soft.
When we ask to generate a program with 10 calls, sometimes we get
100-150 calls. There are also no checks when we accept external
programs from corpus/hub. Issue #1630 contains an example where
this crashes VM (executor limit on number of 1000 resources is
violated). Larger programs also harm the process overall (slower,
consume more memory, lead to monster reproducers, etc).
Add a set of measure for hard control over program length.
Ensure that generated/mutated programs are not too long;
drop too long programs coming from corpus/hub in manager;
drop too long programs in hub.
As a bonus ensure that mutation don't produce programs with
0 calls (which is currently possible and happens).
Fixes#1630
If we have way too many programs to send (more than 100000),
cap total number to 100000 and give up sending all.
Otherwise new managers will never chew all this on a busy hub.
RPC package does excessive caching per connection,
so if a larger object is ever sent in any direction,
rpc connection consumes large amount of memory persistently.
This makes manager consume gigs of memory with large
number of VMs and larger corpus/coverage.
Make all communication done in very limited batches.
Currently hub allows managers to exchange programs from corpus.
But reproducers are not exchanged and we don't know if a crash
happens on other managers as well or not.
Allow hub to exchange reproducers.
Reproducers are stored in a separate db file with own sequence numbers.
This allows to throttle distribution of reproducers to managers,
so that they are not overloaded with reproducers and don't lose them on restarts.
Based on patch by Andrey Konovalov:
https://github.com/google/syzkaller/pull/325Fixes#282
Currently we have unix permissions for new files/dirs
hardcoded throughout the code base. Some places use 0644,
some - 0640, some - 0600 and a variety of other constants.
Introduce osutil.MkdirAll/WriteFile that use the default
permissions and use them throughout the code base.
This makes permissions consistent and also allows to easily
change the permissions later if we change our minds.
Also merge pkg/fileutil into pkg/osutil as they become
dependent on each other. The line between them was poorly
defined anyway as both operate on files.
Currently hub sends all inputs on first manager connect.
This can be 100K+ inputs and can take long time
and consume tons of memory. Send inputs in 1K parts.
Also increase rpc timeouts as hub still has global mutex.
If hub hangs, it causes all managers to hang as well as they call
hub under the global mutex. So move common rpc code into rpctype
and make it more careful about timeouts (tcp keepalives, call timeouts).
Also don't call hub under the mutex, the call can be slow.