Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
Reviewers: dberris, echristo
Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D38102
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314005 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests which we add in this change.
What happened was the following:
- For tail exits, we used to unconditionally prepend the returns/exits
with a pseudo-instruction that gets lowered to the instrumentation
sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
suddenly also emit the tail exit pseudo-instruction, since the check
for whether a return instruction was also a call instruction meant it
was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
tests, or the tests being disabled/removed for continuous breakage.
This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.
Reviewers: echristo, timshen
Subscribers: nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D37570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312772 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently XRay compares its threshold against `Function::size()` . However, `Function::size()` returns the number of basic blocks (as I understand, such as cycle bodies, if/else bodies, switch-case bodies, etc.), rather than the number of instructions.
The name of the parameter `-fxray-instruction-threshold=N`, as well as XRay documentation at http://llvm.org/docs/XRay.html , suggests that instructions should be counted, rather than the number of basic blocks.
I see two options:
1. Count the number of MachineInstr`s in MachineFunction : this gives better estimate for the number of assembly instructions on the target. So a user can check in disassembly that the threshold works more or less correctly.
2. Count the number of Instruction`s in a Function : AFAIK, this gives correct number of IR instructions, which the user can check in IR listing. However, this number may be far (several times for small functions) from the number of assembly instructions finally emitted.
Option 1 is implemented in this patch because I think that having the closer estimate for the number of assembly instructions emitted is more important than to have a clear definition of the metric.
Reviewers: dberris, rengolin
Reviewed By: dberris
Subscribers: llvm-commits, iid_iunknown
Differential Revision: https://reviews.llvm.org/D34027
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305072 91177308-0d34-0410-b5e6-96231b3b80d8
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is an implementation of the loop detection logic that XRay needs to
determine whether a function might take time at runtime. Without this
heuristic, XRay will tend to not instrument short functions that have
loops that might have runtime dependent on inputs or external values.
While this implementation doesn't do any further analysis than just
figuring out whether there is a loop in the MachineFunction being
code-gen'ed, we're paving the way for being able to perform more
sophisticated analysis of the function in the future (for example to
determine whether the trip count for the loop might be constant, and
make a decision on that instead). This enables us to cover more
functions with the default heuristics, and potentially identify ones
that have variable runtime latency just by looking for the presence of
loops.
Reviewers: chandlerc, rnk, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32274
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302103 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.
Reviewed by sdardis, dberris
Differential: D27697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295164 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.
Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.
Reviewers: dberris, echristo, iteratee, kbarton, hfinkel
Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D29742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294781 91177308-0d34-0410-b5e6-96231b3b80d8
The original version of the code in XRayInstrumentation.cpp assumed that
functions may not have empty machine basic blocks (or that the first one
couldn't be). This change addresses that by special-casing that specific
situation.
We provide two .mir test-cases to make sure we're handling this
appropriately.
Fixes llvm.org/PR31424.
Reviewers: chandlerc
Subscribers: varno, llvm-commits
Differential Revision: https://reviews.llvm.org/D27913
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290091 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284456 91177308-0d34-0410-b5e6-96231b3b80d8
And associated commits, as they broke the Thumb bots.
This reverts commit r280935.
This reverts commit r280891.
This reverts commit r280888.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280967 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change promotes the 'isTailCall(...)' member function to
TargetInstrInfo as a query interface for determining on a per-target
basis whether a given MachineInstr is a tail call instruction. We build
upon this in the XRay instrumentation pass to emit special sleds for
tail call optimisations, where we emit the correct kind of sled.
The tail call sleds look like a mix between the function entry and
function exit sleds. Form-wise, the sled comes before the "jmp"
instruction that implements the tail call similar to how we do it for
the function entry sled. Functionally, because we know this is a tail
call, it behaves much like an exit sled -- i.e. at runtime we may use
the exit trampolines instead of a different kind of trampoline.
A follow-up change to recognise these sleds will be done in compiler-rt,
so that we can start intercepting these initially as exits, but also
have the option to have different log entries to more accurately reflect
that this is actually a tail call.
Reviewers: echristo, rSerge, majnemer
Subscribers: mehdi_amini, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D23986
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280334 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In this patch we implement the following parts of XRay:
- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.
There are some caveats here:
1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.
2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.
Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk
Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D19904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275367 91177308-0d34-0410-b5e6-96231b3b80d8