Fixes#188
We now will write just ""/1000 to denote a 1000-byte output buffer.
Also we now don't store 1000-byte buffer in memory just to denote size.
Old format is still parsed.
Clone program only once.
Preallocate slices in clone.
Remove the clone full mode.
Always mutate args in place.
Allocate replacers map lazily.
Don't allocate res map at all (calculate valus on the go).
Remove sliceToUint64, pad.
benchmark old ns/op new ns/op delta
BenchmarkHints 122100048 7466013 -93.89%
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
A hint is basically a tuple consisting of a pointer to an argument
in one of the syscalls of a program and a value, which should be
assigned to that argument.
A simplified version of hints workflow looks like this:
1. Fuzzer launches a program and collects all the comparisons' data
for every syscall in the program.
2. Next it tries to match the obtained comparison operands' values
vs. the input arguments' values.
3. For every such match the fuzzer mutates the program by
replacing the pointed argument with the saved value.
4. If a valid program is obtained, then fuzzer launches it and
checks if new coverage is obtained.
This commit includes:
1. All the code related to hints generation, parsing and mutations.
2. Fuzzer functions to launch the process.
3. Some new stats gathered by fuzzer and manager, related to hints.
4. An updated version of execprog to test the hints process.
Right now Arg is a huge struct (160 bytes), which has many different fields
used for different arg kinds. Since most of the args we see in a typical
corpus are ArgConst, this results in a significant memory overuse.
This change:
- makes Arg an interface instead of a struct
- adds a SomethingArg struct for each arg kind we have
- converts all *Arg pointers into just Arg, since interface variable by
itself contains a pointer to the actual data
- removes ArgPageSize, now ConstArg is used instead
- consolidates correspondence between arg kinds and types, see comments
before each SomethingArg struct definition
- now LenType args that denote the length of VmaType args are serialized as
"0x1000" instead of "(0x1000)"; to preserve backwards compatibility
syzkaller is able to parse the old format for now
- multiple small changes all over to make the above work
After this change syzkaller uses twice less memory after deserializing a
typical corpus.
The optimization change removed validation too aggressively.
We do need program validation during deserialization,
because we can get bad programs from corpus or hub.
Restore program validation after deserialization.
A bunch of spot optmizations after cpu/memory profiling:
1. Optimize hot-path coverage comparison in fuzzer.
2. Don't allocate and copy serialized program, serialize directly into shmem.
3. Reduce allocations during parsing of output shmem (encoding/binary sucks).
4. Don't allocate and copy coverage arrays, refer directly to the shmem region
(we are not going to mutate them).
5. Don't validate programs outside of tests, validation allocates tons of memory.
6. Replace the choose primitive with simpler switches.
Choose allocates fullload of memory (for int, func, and everything the func refers).
7. Other minor optimizations.
Eliminate assignTypeAndDir function and instead assign
types to all args during construction.
This will allow considerable simplifation of assignSizes.