For string[N] we successfully deserialize a string of any length.
Similarly for a fixed-size array[T, N] we successfully deserialize
an array of any size.
Such programs later crash in foreachSubargOffset because static size
Type.Size() does not match what we've calculated iterating over fields.
The crash happens only in SerializeForExec in syz-fuzzer,
which is especially bad.
Fix this from both sides:
1. Validate sizes of arrays/buffers in Validate.
2. Repair incorrect sizes in Deserialize.
There is effectively infinite number of possible crypto
algorithm names due to templates. Plus there is tricky
relation between algorithms and algorithm type names.
This change adds custom mutator for sockaddr_alg struct
to improve variance in generated algorithms.
During smashing we know what call gave new coverage,
so we can concentrate just on it.
This helps to reduce amount of hints generated (we have too many of them).
Currently we always send 2MB of data to executor in ipc_simple.go.
Send only what's consumed by the program, and don't send the trailing zeros.
Serialized programs usually take only few KBs.
The call index check episodically fails:
2017/10/02 22:07:32 bad call index 1, calls 1, program:
under unknown circumstances. I've looked at the code again
and don't see where/how we can mess CallIndex.
Added a new test for minimization that especially checks resulting
CallIndex.
It would be good to understand what happens, but we don't have
any reproducers. CallIndex is actually unused at this point.
Manager only needs call name. So remove CallIndex entirely.
Now each prog function accepts the desired target explicitly.
No global, implicit state involved.
This is much cleaner and allows cross-OS/arch testing, etc.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
A hint is basically a tuple consisting of a pointer to an argument
in one of the syscalls of a program and a value, which should be
assigned to that argument.
A simplified version of hints workflow looks like this:
1. Fuzzer launches a program and collects all the comparisons' data
for every syscall in the program.
2. Next it tries to match the obtained comparison operands' values
vs. the input arguments' values.
3. For every such match the fuzzer mutates the program by
replacing the pointed argument with the saved value.
4. If a valid program is obtained, then fuzzer launches it and
checks if new coverage is obtained.
This commit includes:
1. All the code related to hints generation, parsing and mutations.
2. Fuzzer functions to launch the process.
3. Some new stats gathered by fuzzer and manager, related to hints.
4. An updated version of execprog to test the hints process.
We currently use uintptr for all values.
This won't work for 32-bit archs.
Moreover in some cases we use uintptr but assume
that it is always 64-bits (e.g. in encodingexec).
Switch everything to uint64.
Update #324
During minimization we create a single memory mapping that contains all
the smaller mmap() ranges, so that other mmap() calls can be dropped.
This "uber-mmap" used to start at 0x7f0000000000 regardless of where the
smaller mappings were located. Change its starting address to the
beginning of the first small mmap() range.
Due to https://github.com/google/syzkaller/issues/316 there're too many
mmap() calls in the programs, and syzkaller is spending quite a bit of
time mutating them. Most of the time changing mmap() calls won't give
us new coverage, so let's not do it too often.
After a change in syscall description the number of syscall arguments
might change and some of the programs in corpus get invalidated.
This change makes syzkaller to generate missing arguments when decoding a
program as an attempt to fix and keep more programs from corpus.
When syzkaller generates arg that uses a few structs that reference each
other via pointers, it can go into infinite recursion and crash.
Fix this by forcing pointer args to be null when the depth of recursion
reaches 3 for some struct.
Right now Arg is a huge struct (160 bytes), which has many different fields
used for different arg kinds. Since most of the args we see in a typical
corpus are ArgConst, this results in a significant memory overuse.
This change:
- makes Arg an interface instead of a struct
- adds a SomethingArg struct for each arg kind we have
- converts all *Arg pointers into just Arg, since interface variable by
itself contains a pointer to the actual data
- removes ArgPageSize, now ConstArg is used instead
- consolidates correspondence between arg kinds and types, see comments
before each SomethingArg struct definition
- now LenType args that denote the length of VmaType args are serialized as
"0x1000" instead of "(0x1000)"; to preserve backwards compatibility
syzkaller is able to parse the old format for now
- multiple small changes all over to make the above work
After this change syzkaller uses twice less memory after deserializing a
typical corpus.
Mark tests as parallel where makes sense.
Speed up sys.TransitivelyEnabledCalls.
Execution time is now:
ok github.com/google/syzkaller/config 0.172s
ok github.com/google/syzkaller/cover 0.060s
ok github.com/google/syzkaller/csource 3.081s
ok github.com/google/syzkaller/db 0.395s
ok github.com/google/syzkaller/executor 0.060s
ok github.com/google/syzkaller/fileutil 0.106s
ok github.com/google/syzkaller/host 1.530s
ok github.com/google/syzkaller/ifuzz 0.491s
ok github.com/google/syzkaller/ipc 1.374s
ok github.com/google/syzkaller/log 0.014s
ok github.com/google/syzkaller/prog 2.604s
ok github.com/google/syzkaller/report 0.045s
ok github.com/google/syzkaller/symbolizer 0.062s
ok github.com/google/syzkaller/sys 0.365s
ok github.com/google/syzkaller/syz-dash 0.014s
ok github.com/google/syzkaller/syz-hub/state 0.427s
ok github.com/google/syzkaller/vm 0.052s
However, main time is still taken by rebuilding sys package.
Fixes#182
I think resource being a part of a variable length array or a union option
is an acceptable usecase.
I've started hitting this panic with some SCTP setsockopts after making
SCTP assoc_id a resource.
mknod mode also includes ownership flags, so filter out the node type.
Also allow creation of loop nodes.
Remove mount$fs as it does not seem to make any sense.
Also generalize checksums into the two kinds: inet and pseudo.
Inet checksums is just the Internet checksum of a packet.
Pseudo checksum is the Internet checksum of a packet with a pseudo header.
This change adds a `csum[kind, type]` type.
The only available kind right now is `ipv4`.
Using `csum[ipv4, int16be]` in `ipv4_header` makes syzkaller calculate
and embed correct checksums into ipv4 packets.
The optimization change removed validation too aggressively.
We do need program validation during deserialization,
because we can get bad programs from corpus or hub.
Restore program validation after deserialization.
A bunch of spot optmizations after cpu/memory profiling:
1. Optimize hot-path coverage comparison in fuzzer.
2. Don't allocate and copy serialized program, serialize directly into shmem.
3. Reduce allocations during parsing of output shmem (encoding/binary sucks).
4. Don't allocate and copy coverage arrays, refer directly to the shmem region
(we are not going to mutate them).
5. Don't validate programs outside of tests, validation allocates tons of memory.
6. Replace the choose primitive with simpler switches.
Choose allocates fullload of memory (for int, func, and everything the func refers).
7. Other minor optimizations.
Currently we generate arrays of size [0,5] with equal probability.
Generate [0,10] with bias towards smaller arrays. But 0 has the lowest probability.
I've benchmark a slightly different change with max array size of 20,
results are somewhat inconclusive: it was better than baseline almost all way,
but baseline suddenly caught up at the end. It also considerably reduced
executions per second (by ~20%). So increasing array size to 10 should be a win...
Currently we stop mutating with 50% probability.
Stop mutating with 33% probability instead.
Benchmark shows both coverage increase and corpus reduction:
baseline oneof3 diff
coverage 65467 65604 137
corpus 35423 35354 -69
exec total 5474879 5023268 -451611
1. Basic support for arm64 kvm testing.
2. Fix compiler warnings in x86 kvm code.
3. Test all pseudo syz calls in csource.
4. Fix handling of real code in x86.
Add ifuzz package that can generate/mutate machine code.
It is based on Intel XED and for now supports only x86 code
(all of real, protected 16/32 and long modes).
This considerably increases KVM coverage.
Add new pseudo syscall syz_kvm_setup_cpu that setups VCPU into
interesting states for execution. KVM is too difficult to setup otherwise.
Lots of improvements possible, but this is a starting point.
bufio.Scanner has a default limit of 4K per line,
if a program contains longer line, it fails.
Extend the limit to 64K.
Also check scanning errors. Turns out even scanning of bytes.Buffer
can fail due to the line limit.
Currently the added test description leads to crashes:
--- FAIL: TestMinimizeRandom (0.12s)
prog_test.go:20: seed=1480014002950172453
panic: syscall syz_test$regression0: pointer arg 'f0' has output direction [recovered]
panic: syscall syz_test$regression0: pointer arg 'f0' has output direction
The description is OK. Fix that.
This allows to write:
string[salg_type, 14]
which will give a string buffer of size 14 regardless of actual string size.
Convert salg_type/salg_name to this.
Allow to define string flags in txt descriptions. E.g.:
filesystem = "ext2", "ext3", "ext4"
and then use it in string type:
ptr[in, string[filesystem]]
Eliminate assignTypeAndDir function and instead assign
types to all args during construction.
This will allow considerable simplifation of assignSizes.
Currently we store most types by value in sys.Type.
This is somewhat counter-intuitive for C++ programmers,
because one can't easily update the type object.
Store pointers to type objects for all types.
It also makes it easier to update types, e.g. adding paddings.
Add sys/test.txt file with description of syscalls for tests.
These descriptions can be used to ensure that we can parse everything we clain we can parse.
Use these descriptions to write several tests for exec serialization
(one test shows that alignment handling is currently incorrect).
These test descriptions can also be used to write e.g. mutation tests.
Update #78
Currently to add a new resource one needs to modify multiple source files,
which complicates descirption of new system calls.
Move resource descriptions from source code to text desciptions.
This splits generation process into two phases:
1. Extract values of constants from linux kernel sources.
2. Generate Go code.
Constant values are checked in.
The advantage is that the second phase is now completely independent
from linux source files, kernel version, presence of headers for
particular drivers, etc. This allows to change what Go code we generate
any time without access to all kernel headers (which in future won't be
limited to only upstream headers).
Constant extraction process does require proper kernel sources,
but this can be done only once by the person who added the driver
and has access to the required sources. Then the constant values
are checked in for others to use.
Consant extraction process is per-file/per-arch. That is,
if I am adding a driver that is not present upstream and that
works only on a single arch, I will check in constants only for
that driver and for that arch.
ioctl(FIFREEZE) renders machine dead.
FIFREEZE is an interesting thing, and we could test it
in namespace (?) or on manually mounted file systems (?).
But that will require more complex handling.
Disable it until we have that logic.
mknod of char/block devices can do all kinds of nasty stuff
(read/write to IO ports, kernel memory, etc).
Disable it for now.
Addresses that trigger SIGSEGV does not seem to uncover any bugs.
But they crash executor preventing programs from being executed.
Lower probability of generating addresses that lead to SIGSEGVs.
This commit supports inclusive ranged int, like foo int32[-10~10], which will
generate random integer between -10 and 10. In future we will support more than
one range, like int32[0, -5~10, 50, 100~200]
This solves several problems:
- host usually have outdates headers, so previously we need to define missing consts
- host may not have some headers at all
- generation depends on linux distribution and version
- some of the consts cannot be defined at all (e.g. ioctls that use struct arguments)
Syscall numbers for different architectures are now pulled in
from kernel headers. This solves 2 problems:
- we don't need to hardcode numbers for new syscalls (that don't present in typical distro headers)
- we have correct number for different archs (previously hardcoded numbers were for x86_64)
This also makes syscall numbers available for Go code, which can be useful.