The previous commit "pkg/report: handle cases when whole stack is questionable"
mishandles frames that start with [PC] prefix before " ? ".
Restore that part.
If the report is identified as corrupted because there are no frames at all,
try to re-extract using questionable frames.
This is a bit risky and may produce lots of one-off corrupted reports
at random locations. But we won't know until we deploy this...
Fixes#1216
On X86-64, dereferencing a non-canonical address normally causes a #GP, for
which syzkaller already has a pattern. However, if the base register of the
non-canonical address is RBP (which can happen in builds that use RBP as a
general-purpose register because they don't use frame pointer unwinding),
#SS is thrown instead, for which syzkaller did not yet have a pattern.
To see this kind of fault, you can insert the following code in
kernel_init() after the call to rcu_end_inkernel_boot():
asm volatile(
"movabs $0x8000000000000000, %rbp\n\t"
"movq (%rbp), %rax\n\t"
"ud2\n\t"
);
Linux prints a different error message for #SS, so add that error message
to syzkaller's list of patterns.
The the added test for exception from exception corner case.
"BUG: spinlock lockup" fails to respect panic_on_warn and panic
after printing report (though, it's a BUG already, so it should
have been paniced even without panic_on_warn).
As the result we got "spinlock lockup" followed by "rcu stall" report.
And we have that special exception for rcu stalls b/c for them
the most of the report is irrelevant up to apic_timer_interrupt frame.
The code did not expect this weird double-report case and skipped
everything up to apic_timer_interrupt, though it's actually
a lockup in netfilter code.
An upcoming patch for Linux will change the error reporting pattern for
general protection faults such that the colon doesn't necessarily come
immediately after the string "general protection fault" (see
https://lore.kernel.org/lkml/20191118142144.GC6363@zn.tnic/).
Change the pattern in syzkaller before that happens.
Note that this is not necessarily the final format; in particular, the
ordering of the KASAN note and the "general protection fault" line might
swap.
The port-based exception APIs have been deprecated on Fuchsia and will
be removed shortly. Delete them from the syscall definitions and
modify the Fuchsia executor to use the new channel-based APIs instead.
Some syzkaller panics happen due to memory corruptions,
but it still would be useful at least to get some visibility into these crashes.
On some OSes we actualy already detect them as they have "panic:" oops pattern,
but not e.g. on linux.
Fixes#318
The problem with task hung reports is that they manifest at random victim stacks,
rather at the root cause stack. E.g. if there is something wrong with RCU subsystem,
we are getting hangs all over the kernel on all synchronize_* calls.
So before resotring to the common logic of skipping some common frames,
we look for 2 common buckets: hangs on synchronize_rcu and hangs on rtnl_lock
and group these together.