Remvoe FieldName from Type and add a separate Field type
that holds field name. Use Field for struct fields, union options
and syscalls arguments, only these really have names.
Reduces size of sys/linux/gen/amd64.go from 5665583 to 5201321 (-8.2%).
Allows to not create new type for squashed any pointer.
But main advantages will follow, e.g. removing StructDesc,
using TypeRef in Arg, etc.
Update #1580
Name "Type" is confusing when referring to pointer/array element type.
Frequently there are too many Type/typ/typ1/t and typ.Type is not very informative.
It _is_ a type, but what's usually more relevant is that it's an _element_ type.
Let's leave type checking to compiler and give it a more meaningful name.
Having Dir is Type is handy, but forces us to duplicate lots of types.
E.g. if a struct is referenced as both in and out, then we need to
have 2 copies and 2 copies of structs/types it includes.
If also prevents us from having the struct type as struct identity
(because we can have up to 3 of them).
Revert to the old way we used to do it: propagate Dir as we walk
syscall arguments. This moves lots of dir passing from pkg/compiler
to prog package.
Now Arg contains the dir, so once we build the tree, we can use dirs
as before.
Reduces size of sys/linux/gen/amd64.go from 6058336 to 5661150 (-6.6%).
Update #1580
Prog.Finalize combines assignSizesCall, SanitizeCall and validate.
Intended for users who build own programs,
so that we don't need to expose all individual methods.
* fixing weird merge error
* fixing presubmit
* fixing presubmit
* removing parsing code because of -Xraw option
* fix presubmit
* update
* deleting vma_call_handlers as we are currently skipping most vma calls. This simplifies memory_tracker as we don't need to keep track of vma allocations
* removing custom handling of bpf_instruction union
* removing ifconf parsing
* update
* removed all expression types and replaced them with constant types. removing ipv6_addr parsing while -Xraw is getting fixed. Removing constants.go
* removing ipv6 parsing
* presubmit
* moving direction check from ipv4_addr out to genUnion
* removing code that parses kcov
* removing redundant test
* removing custom code in generate unions to fill ipv4_addr
* proggen: changing order of imports to make external packages import first
fixing presubmit
* changing log messages to lower case to be consistent with other packages.
* removing pointer type and simplifying memory_tracker
removing comment
* moving context and return_cache to seaparate files
* deleting default argument generation when we should probably throw an error
Filename generated escaping paths in the past.
The reason for the check during validation is to
wipe old program from corpuses. Now that they are
hopefully wiped everywhere, we can relax the check
to restrict only filename to not produce escaping paths,
but allow existing programs with escaping paths.
This is useful in particular if we generate syzkaller
programs from strace output.
Currently we only generate either valid user-space pointers or NULL.
Extend NULL to a set of special pointers that we will use in programs.
All targets now contain 3 special values:
- NULL
- 0xfffffffffffffff (invalid kernel pointer)
- 0x999999999999999 (non-canonical address)
Each target can add additional special pointers on top of this.
Also generate NULL/special pointers for non-opt ptr's.
This restriction was always too restrictive. We may want to generate
them with very low probability, but we do want to generate them.
Also change pointers to NULL/special during mutation
(but still not in the opposite direction).
Check that argument types match expected static types.
I.e. detect when, say, syscall argument is a resource,
but actual generated argument is a pointer.
Hints mutation could produce unsanitized calls.
Sanitize calls after hints mutation.
Also sanitize on load (in validate), because bad programs
can already be in corpuses. And it's just the right thing
to do because sanitization rules can change over time.
The current code is total, unstructured mess.
Since we now have 1:1 type -> arg correspondence,
rework validation around args. This makes code
much cleaner and 30% shorter.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
This reduces size of a corpus in half.
We store corpus on manager and on hub,
so this will reduce their memory consumption.
But also makes large programs more readable.
Introduce isUsed(arg) helper, use it in several places.
Move method definitions closer to their types.
Simplify presence check for ArgUsed.Used() in several places.
Fixes#188
We now will write just ""/1000 to denote a 1000-byte output buffer.
Also we now don't store 1000-byte buffer in memory just to denote size.
Old format is still parsed.
For string[N] we successfully deserialize a string of any length.
Similarly for a fixed-size array[T, N] we successfully deserialize
an array of any size.
Such programs later crash in foreachSubargOffset because static size
Type.Size() does not match what we've calculated iterating over fields.
The crash happens only in SerializeForExec in syz-fuzzer,
which is especially bad.
Fix this from both sides:
1. Validate sizes of arrays/buffers in Validate.
2. Repair incorrect sizes in Deserialize.
Large overhaul moves syscalls and arg types from sys to prog.
Sys package now depends on prog and contains only generated
descriptions of syscalls.
Introduce prog.Target type that encapsulates all targer properties,
like syscall list, ptr/page size, etc. Also moves OS-dependent pieces
like mmap call generation from prog to sys.
Update #191
We currently use uintptr for all values.
This won't work for 32-bit archs.
Moreover in some cases we use uintptr but assume
that it is always 64-bits (e.g. in encodingexec).
Switch everything to uint64.
Update #324
Right now Arg is a huge struct (160 bytes), which has many different fields
used for different arg kinds. Since most of the args we see in a typical
corpus are ArgConst, this results in a significant memory overuse.
This change:
- makes Arg an interface instead of a struct
- adds a SomethingArg struct for each arg kind we have
- converts all *Arg pointers into just Arg, since interface variable by
itself contains a pointer to the actual data
- removes ArgPageSize, now ConstArg is used instead
- consolidates correspondence between arg kinds and types, see comments
before each SomethingArg struct definition
- now LenType args that denote the length of VmaType args are serialized as
"0x1000" instead of "(0x1000)"; to preserve backwards compatibility
syzkaller is able to parse the old format for now
- multiple small changes all over to make the above work
After this change syzkaller uses twice less memory after deserializing a
typical corpus.
This change adds a `csum[kind, type]` type.
The only available kind right now is `ipv4`.
Using `csum[ipv4, int16be]` in `ipv4_header` makes syzkaller calculate
and embed correct checksums into ipv4 packets.
The optimization change removed validation too aggressively.
We do need program validation during deserialization,
because we can get bad programs from corpus or hub.
Restore program validation after deserialization.
A bunch of spot optmizations after cpu/memory profiling:
1. Optimize hot-path coverage comparison in fuzzer.
2. Don't allocate and copy serialized program, serialize directly into shmem.
3. Reduce allocations during parsing of output shmem (encoding/binary sucks).
4. Don't allocate and copy coverage arrays, refer directly to the shmem region
(we are not going to mutate them).
5. Don't validate programs outside of tests, validation allocates tons of memory.
6. Replace the choose primitive with simpler switches.
Choose allocates fullload of memory (for int, func, and everything the func refers).
7. Other minor optimizations.
Currently the added test description leads to crashes:
--- FAIL: TestMinimizeRandom (0.12s)
prog_test.go:20: seed=1480014002950172453
panic: syscall syz_test$regression0: pointer arg 'f0' has output direction [recovered]
panic: syscall syz_test$regression0: pointer arg 'f0' has output direction
The description is OK. Fix that.