We will need a wrapper for target.SanitizeCall that will do more
than just calling the target-provided function. To avoid confusion
and potential mistakes, give the target function and prog function
different names. Prog package will continue to call this "sanitize",
which will include target's "neutralize" + more.
Also refactor API a bit: we need a helper function that sanitizes
the whole program because that's needed most of the time.
Fixes#477Fixes#502
The syz-expand tools allows to parse a program and print it including all
the default values. This is mainly useful for debugging, like doing manual
program modifications while trying to come up with a reproducer for some
particular kernel behavior.
Differences in code formatting between Go versions cause constant
problems for us (https://github.com/golang/go/issues/25161).
Currently we support 1.9 and 1.10. Switch to newer 1.11 and 1.12.
Fixes#1013
Add bulk of checks for strict parsing mode.
Probably not complete, but we can extend then in future as needed.
Turns out we can't easily use it for serialized programs
as they omit default args and during deserialization it looks like missing args.
Over time we relaxed parsing to handle all kinds of invalid programs
(excessive/missing args, wrong types, etc).
This is useful when reading old programs from corpus.
But this is harmful for e.g. reading test inputs as they can become arbitrary outdated.
For runtests which creates additional problem of executing not
what is actually written in the test (or at least what author meant).
Add strict parsing mode that does not tolerate any errors.
For now it just checks excessive syscall arguments.
Filename generated escaping paths in the past.
The reason for the check during validation is to
wipe old program from corpuses. Now that they are
hopefully wiped everywhere, we can relax the check
to restrict only filename to not produce escaping paths,
but allow existing programs with escaping paths.
This is useful in particular if we generate syzkaller
programs from strace output.
TestSerializeDeserializeRandom fails from time to time
because program is different after we serialize/deserialize it.
Turns out openbsd SanitizeCall is not idempotent.
Add a test for this and disable the logic for now.
Make as much code as possible shared between all OSes.
In particular main is now common across all OSes.
Make more code shared between executor and csource
(in particular, loop function and threaded execution logic).
Also make loop and threaded logic shared across all OSes.
Make more posix/unix code shared across OSes
(e.g. signal handling, pthread creation, etc).
Plus other changes along similar lines.
Also support test OS in executor (based on portable posix)
and add 4 arches that cover all execution modes
(fork server/no fork server, shmem/no shmem).
This change paves way for testing of executor code
and allows to preserve consistency across OSes and executor/csource.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
Few managers recently crashed with:
panic: syscall mknod$loop: per proc arg 'proc' has bad value '4294967295'
panic: sync: unlock of unlocked mutex
goroutine 35438 [running]:
sync.(*Mutex).Unlock(0xc42166e0c8)
sync/mutex.go:184 +0xc1
panic(0xb98980, 0xc448971aa0)
runtime/panic.go:491 +0x283
main.(*Manager).Connect(0xc42166e000, 0xc42056d060, 0xc42038f000, 0x0, 0x0)
syz-manager/manager.go:868 +0x11cc
And a similar issue was reported on mailing list.
It's unclear where these bogus programs come from.
It seems that hub was somehow involved here.
4294967295 is (uint32)-1 which is trucated special
value for proc types.
The test did not uncover any bugs, bug since I wrote it
and it looks like a useful test, let's commit it anyway.
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
We currently use uintptr for all values.
This won't work for 32-bit archs.
Moreover in some cases we use uintptr but assume
that it is always 64-bits (e.g. in encodingexec).
Switch everything to uint64.
Update #324
After a change in syscall description the number of syscall arguments
might change and some of the programs in corpus get invalidated.
This change makes syzkaller to generate missing arguments when decoding a
program as an attempt to fix and keep more programs from corpus.
Right now Arg is a huge struct (160 bytes), which has many different fields
used for different arg kinds. Since most of the args we see in a typical
corpus are ArgConst, this results in a significant memory overuse.
This change:
- makes Arg an interface instead of a struct
- adds a SomethingArg struct for each arg kind we have
- converts all *Arg pointers into just Arg, since interface variable by
itself contains a pointer to the actual data
- removes ArgPageSize, now ConstArg is used instead
- consolidates correspondence between arg kinds and types, see comments
before each SomethingArg struct definition
- now LenType args that denote the length of VmaType args are serialized as
"0x1000" instead of "(0x1000)"; to preserve backwards compatibility
syzkaller is able to parse the old format for now
- multiple small changes all over to make the above work
After this change syzkaller uses twice less memory after deserializing a
typical corpus.
A bunch of spot optmizations after cpu/memory profiling:
1. Optimize hot-path coverage comparison in fuzzer.
2. Don't allocate and copy serialized program, serialize directly into shmem.
3. Reduce allocations during parsing of output shmem (encoding/binary sucks).
4. Don't allocate and copy coverage arrays, refer directly to the shmem region
(we are not going to mutate them).
5. Don't validate programs outside of tests, validation allocates tons of memory.
6. Replace the choose primitive with simpler switches.
Choose allocates fullload of memory (for int, func, and everything the func refers).
7. Other minor optimizations.