The default configuration on PPC64 uses 64K system page size. Having it
4K was not a problem until recently when 365fba2440cee3aed74c77
"executor: surround the data mapping with PROT_NONE pages" added
surrounding mappings not aligned to the actual system page size.
This changes the page size for ppc64 to 64K and introduces the upper
limit to randPageCount() as we have the hard coded limit of 16MB.
If the unlikely event of a PPC64 system with 4K pages, we will end up
allocating less pages which is not great but acceptable.
This avoids using os.Getpagesize() as the page size on a building host
may be different than on the test machine so we always use the bigger
size for simplicity.
Signed-off-by: Alexey Kardashevskiy <aik@linux.ibm.com>
The linux string dictionary comes from extremely old times
when we did not have proper descriptions for almost anything,
and the dictionary was a quick hack to guess at least some
special strings.
Now we have way better descriptions and the dictionary
become both unnecessary and probably even harmful.
We chosen a non-deterministic resource in createResource
due to map iteration order.
This is caught by existing TestDeterminism,
but just very infrequently.
We are seeing some panics that say that some disabled
syscalls somehow get into corpus.
I don't see where/how this can happen.
Add a check to syz-fuzzer to panic whenever we execute
a program with disabled syscall. Hopefull the panic
stack will shed some light.
Also add a check in manager as the last defence line
so that bad programs don't get into the corpus.
Use Ref in Arg instead of full Type interface.
This reduces size of all args. In partiuclar the most common
ConstArg is reduces from 32 bytes to 16 and now does not
contain any pointers (better for GC).
Running syz-db bench on a beefy corpus: before:
allocs 7262 MB (18 M), next GC 958 MB, sys heap 1279 MB, live allocs 479 MB (8 M), time 9.704699958s
allocs 7262 MB (18 M), next GC 958 MB, sys heap 1279 MB, live allocs 479 MB (8 M), time 9.873792394s
allocs 7262 MB (18 M), next GC 958 MB, sys heap 1279 MB, live allocs 479 MB (8 M), time 9.820479906s
after:
allocs 7163 MB (18 M), next GC 759 MB, sys heap 1023 MB, live allocs 379 MB (8 M), time 8.938939937s
allocs 7163 MB (18 M), next GC 759 MB, sys heap 1087 MB, live allocs 379 MB (8 M), time 9.410243167s
allocs 7163 MB (18 M), next GC 759 MB, sys heap 1023 MB, live allocs 379 MB (8 M), time 9.38225806s
Max heap and live heap are reduced by 20%.
Update #1580
Currently ANY implementation fabricates new types dynamically.
This is something we don't do anywhere else, generally types
come from compiler and all are static.
Dynamic types will conflict with use of Ref in Arg optimization.
Move ANY types creation into compiler.
Update #1580
Remvoe FieldName from Type and add a separate Field type
that holds field name. Use Field for struct fields, union options
and syscalls arguments, only these really have names.
Reduces size of sys/linux/gen/amd64.go from 5665583 to 5201321 (-8.2%).
Allows to not create new type for squashed any pointer.
But main advantages will follow, e.g. removing StructDesc,
using TypeRef in Arg, etc.
Update #1580
Name "Type" is confusing when referring to pointer/array element type.
Frequently there are too many Type/typ/typ1/t and typ.Type is not very informative.
It _is_ a type, but what's usually more relevant is that it's an _element_ type.
Let's leave type checking to compiler and give it a more meaningful name.
Having Dir is Type is handy, but forces us to duplicate lots of types.
E.g. if a struct is referenced as both in and out, then we need to
have 2 copies and 2 copies of structs/types it includes.
If also prevents us from having the struct type as struct identity
(because we can have up to 3 of them).
Revert to the old way we used to do it: propagate Dir as we walk
syscall arguments. This moves lots of dir passing from pkg/compiler
to prog package.
Now Arg contains the dir, so once we build the tree, we can use dirs
as before.
Reduces size of sys/linux/gen/amd64.go from 6058336 to 5661150 (-6.6%).
Update #1580
Add common infrastructure for syscall attributes.
Add few attributes we want, but they are not implemented for now
(don't affect behavior, this will follow).
Make MakeMmap return more than 1 call.
This is a preparation for future changes.
Also remove addr/size as they are effectively
always the same and can be inferred from the target
(will also conflict with the future changes).
Also rename to MakeDataMmap to better represent
the new purpose: it's just some arbitrary mmap,
but rather mapping of the data segment.
We will need a wrapper for target.SanitizeCall that will do more
than just calling the target-provided function. To avoid confusion
and potential mistakes, give the target function and prog function
different names. Prog package will continue to call this "sanitize",
which will include target's "neutralize" + more.
Also refactor API a bit: we need a helper function that sanitizes
the whole program because that's needed most of the time.
Fixes#477Fixes#502
We have _some_ limits on program length, but they are really soft.
When we ask to generate a program with 10 calls, sometimes we get
100-150 calls. There are also no checks when we accept external
programs from corpus/hub. Issue #1630 contains an example where
this crashes VM (executor limit on number of 1000 resources is
violated). Larger programs also harm the process overall (slower,
consume more memory, lead to monster reproducers, etc).
Add a set of measure for hard control over program length.
Ensure that generated/mutated programs are not too long;
drop too long programs coming from corpus/hub in manager;
drop too long programs in hub.
As a bonus ensure that mutation don't produce programs with
0 calls (which is currently possible and happens).
Fixes#1630
Enables the syntax intN[start:end, alignment] for integer ranges. For
instance, int32[0:10, 2] represents even 32-bit numbers between 0 and 10
included. With this change, two NEED tags in syscall descriptions can be
addressed.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
We may be in createResource but have no resources at all because of ANYRES
that are not in target.Resources.
This is actually the case for some test targets. We have resources there,
but syscalls that create them are disabled.
In such case we crash in Intn(0).
Check that we have some resources before calling Intn.
Non-determinism is bad:
- it leads to flaky coverage reports
- it makes test failures non-reproducible
Remove 4 sources of non-determinism related to maps:
- file name generation
- string generation
- resource generation
- hints generation
All a test that ensures all main operations are fully deterministic.
Filename generated escaping paths in the past.
The reason for the check during validation is to
wipe old program from corpuses. Now that they are
hopefully wiped everywhere, we can relax the check
to restrict only filename to not produce escaping paths,
but allow existing programs with escaping paths.
This is useful in particular if we generate syzkaller
programs from strace output.
Currently we only generate either valid user-space pointers or NULL.
Extend NULL to a set of special pointers that we will use in programs.
All targets now contain 3 special values:
- NULL
- 0xfffffffffffffff (invalid kernel pointer)
- 0x999999999999999 (non-canonical address)
Each target can add additional special pointers on top of this.
Also generate NULL/special pointers for non-opt ptr's.
This restriction was always too restrictive. We may want to generate
them with very low probability, but we do want to generate them.
Also change pointers to NULL/special during mutation
(but still not in the opposite direction).
Now file names become:
string[filename]
with a possibility of using other string features:
stringnoz[filename]
string[filename, CONST_SIZE]
and filename is left as type alias as it is commonly used:
type filename string[filename]
Squash complex structs into flat byte array and mutate this array
with generic blob mutations. This allows to mutate what we currently
consider as paddings and add/remove paddings from structs, etc.
1. mmap all memory always, without explicit mmap calls in the program.
This makes lots of things much easier and removes lots of code.
Makes mmap not a special syscall and allows to fuzz without mmap enabled.
2. Change address assignment algorithm.
Current algorithm allocates unmapped addresses too frequently
and allows collisions between arguments of a single syscall.
The new algorithm analyzes actual allocations in the program
and places new arguments at unused locations.
Make Foreach* callback accept the arg and a context struct
that can contain lots of aux info.
This (1) removes lots of unuser base/parent args,
(2) provides foundation for stopping recursion,
(3) allows to merge foreachSubargOffset.