Prog.Finalize combines assignSizesCall, SanitizeCall and validate.
Intended for users who build own programs,
so that we don't need to expose all individual methods.
Currently we only generate either valid user-space pointers or NULL.
Extend NULL to a set of special pointers that we will use in programs.
All targets now contain 3 special values:
- NULL
- 0xfffffffffffffff (invalid kernel pointer)
- 0x999999999999999 (non-canonical address)
Each target can add additional special pointers on top of this.
Also generate NULL/special pointers for non-opt ptr's.
This restriction was always too restrictive. We may want to generate
them with very low probability, but we do want to generate them.
Also change pointers to NULL/special during mutation
(but still not in the opposite direction).
The current code is total, unstructured mess.
Since we now have 1:1 type -> arg correspondence,
rework validation around args. This makes code
much cleaner and 30% shorter.
We currently use -1 as default value for resources
when the actual value is not available.
-1 is good for fd's, but is not the right default
value for pointers/keys/etc.
Pass from prog and use in executor proper default
value for resources.
Squash complex structs into flat byte array and mutate this array
with generic blob mutations. This allows to mutate what we currently
consider as paddings and add/remove paddings from structs, etc.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
Make Foreach* callback accept the arg and a context struct
that can contain lots of aux info.
This (1) removes lots of unuser base/parent args,
(2) provides foundation for stopping recursion,
(3) allows to merge foreachSubargOffset.
Put the underflow entry at the end.
Entries must end on an unconditional, non-goto entry,
otherwise fallthrough from the last entry is invalid.
Add arp tables support.
Split unspec matches/targets to unspec and inet.
Reset ipv6 and arp tables in executor.
Fix number of counters in tables.
Plus a bunch of assorted fixes for matches/targets.
This reduces size of a corpus in half.
We store corpus on manager and on hub,
so this will reduce their memory consumption.
But also makes large programs more readable.
Generated program always uses pid=0 even when there are multiple processes.
Make each process use own pid.
Unfortunately required to do quite significant changes to prog,
because the current format only supported fixed pid.
Fixes#490
Introduce isUsed(arg) helper, use it in several places.
Move method definitions closer to their types.
Simplify presence check for ArgUsed.Used() in several places.
Fixes#188
We now will write just ""/1000 to denote a 1000-byte output buffer.
Also we now don't store 1000-byte buffer in memory just to denote size.
Old format is still parsed.
The race initially showed up on the new benchmark (see race report below).
The race indicated a wrong call passed to replaceArg,
as the result we sanitized the wrong call and left the new call un-sanitized.
Fix this.
Add test that exposes this.
Run benchmarks in race mode during presubmit
(benchmarks have higher chances of uncovering races than tests).
WARNING: DATA RACE
Write at 0x00c42000d3f0 by goroutine 18:
github.com/google/syzkaller/sys/linux.(*arch).sanitizeCall()
sys/linux/init.go:155 +0x256
github.com/google/syzkaller/sys/linux.(*arch).(github.com/google/syzkaller/sys/linux.sanitizeCall)-fm()
sys/linux/init.go:42 +0x4b
github.com/google/syzkaller/prog.(*Prog).replaceArg()
prog/prog.go:357 +0x239
github.com/google/syzkaller/prog.generateHints.func2()
prog/hints.go:105 +0x124
github.com/google/syzkaller/prog.checkConstArg()
prog/hints.go:128 +0xf3
github.com/google/syzkaller/prog.generateHints()
prog/hints.go:120 +0x495
github.com/google/syzkaller/prog.(*Prog).MutateWithHints.func1()
prog/hints.go:72 +0x67
github.com/google/syzkaller/prog.foreachSubargImpl.func1()
prog/analysis.go:86 +0x9f
github.com/google/syzkaller/prog.foreachSubargImpl()
prog/analysis.go:104 +0xc8
github.com/google/syzkaller/prog.foreachArgArray()
prog/analysis.go:113 +0x89
github.com/google/syzkaller/prog.foreachArg()
prog/analysis.go:121 +0x50
github.com/google/syzkaller/prog.(*Prog).MutateWithHints()
prog/hints.go:71 +0x18e
github.com/google/syzkaller/prog.BenchmarkHints.func1()
prog/hints_test.go:477 +0x77
testing.(*B).RunParallel.func1()
testing/benchmark.go:626 +0x156
Previous read at 0x00c42000d3f0 by goroutine 17:
github.com/google/syzkaller/prog.clone()
prog/clone.go:38 +0xbaa
github.com/google/syzkaller/prog.(*Prog).cloneImpl()
prog/clone.go:21 +0x17f
github.com/google/syzkaller/prog.generateHints()
prog/hints.go:95 +0xd0
github.com/google/syzkaller/prog.(*Prog).MutateWithHints.func1()
prog/hints.go:72 +0x67
github.com/google/syzkaller/prog.foreachSubargImpl.func1()
prog/analysis.go:86 +0x9f
github.com/google/syzkaller/prog.foreachSubargImpl()
prog/analysis.go:104 +0xc8
github.com/google/syzkaller/prog.foreachArgArray()
prog/analysis.go:113 +0x89
github.com/google/syzkaller/prog.foreachArg()
prog/analysis.go:121 +0x50
github.com/google/syzkaller/prog.(*Prog).MutateWithHints()
prog/hints.go:71 +0x18e
github.com/google/syzkaller/prog.BenchmarkHints.func1()
prog/hints_test.go:477 +0x77
testing.(*B).RunParallel.func1()
testing/benchmark.go:626 +0x156
For string[N] we successfully deserialize a string of any length.
Similarly for a fixed-size array[T, N] we successfully deserialize
an array of any size.
Such programs later crash in foreachSubargOffset because static size
Type.Size() does not match what we've calculated iterating over fields.
The crash happens only in SerializeForExec in syz-fuzzer,
which is especially bad.
Fix this from both sides:
1. Validate sizes of arrays/buffers in Validate.
2. Repair incorrect sizes in Deserialize.
There is effectively infinite number of possible crypto
algorithm names due to templates. Plus there is tricky
relation between algorithms and algorithm type names.
This change adds custom mutator for sockaddr_alg struct
to improve variance in generated algorithms.
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
We currently use uintptr for all values.
This won't work for 32-bit archs.
Moreover in some cases we use uintptr but assume
that it is always 64-bits (e.g. in encodingexec).
Switch everything to uint64.
Update #324
After a change in syscall description the number of syscall arguments
might change and some of the programs in corpus get invalidated.
This change makes syzkaller to generate missing arguments when decoding a
program as an attempt to fix and keep more programs from corpus.
Right now Arg is a huge struct (160 bytes), which has many different fields
used for different arg kinds. Since most of the args we see in a typical
corpus are ArgConst, this results in a significant memory overuse.
This change:
- makes Arg an interface instead of a struct
- adds a SomethingArg struct for each arg kind we have
- converts all *Arg pointers into just Arg, since interface variable by
itself contains a pointer to the actual data
- removes ArgPageSize, now ConstArg is used instead
- consolidates correspondence between arg kinds and types, see comments
before each SomethingArg struct definition
- now LenType args that denote the length of VmaType args are serialized as
"0x1000" instead of "(0x1000)"; to preserve backwards compatibility
syzkaller is able to parse the old format for now
- multiple small changes all over to make the above work
After this change syzkaller uses twice less memory after deserializing a
typical corpus.
This change adds a `csum[kind, type]` type.
The only available kind right now is `ipv4`.
Using `csum[ipv4, int16be]` in `ipv4_header` makes syzkaller calculate
and embed correct checksums into ipv4 packets.