Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.
Reviewers: rengolin, jmolloy, rovka, olista01
Reviewed By: olista01
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D29020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294000 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In this function, virtual registers can be introduced (for example
through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs
will replace those virtual registers with concrete registers later on
in PrologEpilogInserter, which sets NoVRegs again.
This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which
failed with expensive checks.
https://llvm.org/bugs/show_bug.cgi?id=27484
Reviewers: rnk, bkramer, olista01
Reviewed By: olista01
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D28829
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292372 91177308-0d34-0410-b5e6-96231b3b80d8
Replace all uses of AddDefaultPred with MachineInstrBuilder::add(predOps()).
This makes the code building MachineInstrs more readable, because it allows us
to write code like:
MIB.addSomeOperand(blah)
.add(predOps())
.addAnotherOperand(blahblah)
instead of
AddDefaultPred(MIB.addSomeOperand(blah))
.addAnotherOperand(blahblah)
This commit also adds the predOps helper in the ARM backend, as well as the add
method taking a variable number of operands to the MachineInstrBuilder.
The transformation has been done mostly automatically with a custom tool based
on Clang AST Matchers + RefactoringTool.
Differential Revision: https://reviews.llvm.org/D28555
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291890 91177308-0d34-0410-b5e6-96231b3b80d8
Reverts r283938 to reinstate r283867 with a fix.
The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283942 91177308-0d34-0410-b5e6-96231b3b80d8
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.
This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.
In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.
We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.
There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.
In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.
Differential Revision: https://reviews.llvm.org/D24228
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283867 91177308-0d34-0410-b5e6-96231b3b80d8
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.
We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.
To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.
We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.
Differential Revision: https://reviews.llvm.org/D23516
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279506 91177308-0d34-0410-b5e6-96231b3b80d8
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278902 91177308-0d34-0410-b5e6-96231b3b80d8
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the ARM backend. In most cases, I made them less attractive
by preferring MachineInstr& or using a ranged-based for loop.
Once all the backends are fixed I'll make the operator explicit so that this
doesn't bitrot back.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274920 91177308-0d34-0410-b5e6-96231b3b80d8
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272512 91177308-0d34-0410-b5e6-96231b3b80d8
When setting the frame pointer, the offset from SP is calculated based on the
stack slot it gets allocated, but this slot is in turn based on the order of
the CSR list so that list should match the order we actually save the registers
in. Mostly it did, but in the edge-case of MachO AAPCS targets it was wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269459 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the AddPristinesAndCSRs parameters from
addLiveIns()/addLiveOuts().
We need to respect pristine registers after prologue epilogue insertion,
Seeing that we got this wrong in at least two commits already, we should
rather pay the small price to query MachineFrameInfo for it.
There are three cases that did not set AddPristineAndCSRs to true even
after register allocation:
- ExecutionDepsFix: live-out registers are used as a hint that the
register is used soon. This is not true for pristine registers so
use the new addLiveOutsNoPristines() to maintain this behaviour.
- SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like
a bug, should do the right thing automatically now.
- StackMapLivenessAnalysis: Not adding pristine registers looks like a
bug to me. Added a FIXME comment but maintain the current behaviour
as a change may need to get coordinated with GC runtimes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268336 91177308-0d34-0410-b5e6-96231b3b80d8
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265036 91177308-0d34-0410-b5e6-96231b3b80d8
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262149 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+
Test cases by Ana Pazos
Differential Revision: http://reviews.llvm.org/D15707
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256523 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.
Reviewers: jroelofs, qcolombet
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15126
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255047 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This had been broken for a very long time, but nobody noticed until
D14357 enabled shrink-wrapping by default.
Reviewers: jroelofs, qcolombet
Subscribers: tyomitch, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D14986
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254444 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.
D11268 is quite old and has merge conflicts against the current trunk.
This request
- rebases D11268 onto the new trunk;
- resolves the merge conflicts;
- fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.
Reviewers: echristo, rengolin, kubabrecka
Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl
Differential Revision: http://reviews.llvm.org/D14338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252177 91177308-0d34-0410-b5e6-96231b3b80d8
Shrink-wrapping can now be tested on ARM with -enable-shrink-wrap.
Related to <rdar://problem/20821730>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242908 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first step toward supporting shrink-wrapping for this target.
The changes could be summarized by these items:
- Expand the tail-call return as part of the expand pseudo pass.
- Get rid of the assumptions that the epilogue is the exit block:
* Do not assume which registers are free in the epilogue. (This indirectly
improve the lowering of the code for the segmented stacks, see the test
cases.)
* Take into account that the basic block can be empty.
Related to <rdar://problem/20821730>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242714 91177308-0d34-0410-b5e6-96231b3b80d8
This changes TargetFrameLowering::processFunctionBeforeCalleeSavedScan():
- Rename the function to determineCalleeSaves()
- Pass a bitset of callee saved registers by reference, thus avoiding
the function-global PhysRegUsed bitset in MachineRegisterInfo.
- Without PhysRegUsed the implementation is fine tuned to not save
physcial registers which are only read but never modified.
Related to rdar://21539507
Differential Revision: http://reviews.llvm.org/D10909
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242165 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8
merge Thumb1RegisterInfo and Thumb2RegisterInfo. This will enable
us to match the TargetMachine for our TargetRegisterInfo classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232117 91177308-0d34-0410-b5e6-96231b3b80d8
time. The target independent code was passing in one all the
time and targets weren't checking validity before using. Update
a few calls to pass in a MachineFunction where necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231970 91177308-0d34-0410-b5e6-96231b3b80d8
The main issue being fixed here is that APCS targets handling a "byval align N"
parameter with N > 4 were miscounting what objects were where on the stack,
leading to FrameLowering setting the frame pointer incorrectly and clobbering
the stack.
But byval handling had grown over many years, and had multiple layers of cruft
trying to compensate for each other and calculate padding correctly. This only
really needs to be done once, in the HandleByVal function. Elsewhere should
just do what it's told by that call.
I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits
byvals with the correct C ABI alignment), which simplified HandleByVal.
rdar://20095672
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231959 91177308-0d34-0410-b5e6-96231b3b80d8
Followup to r224294:
ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.
Debug info marks the first instruction without the FrameSetup flag
as being the end of the function prologue. Any CFI instructions in the
middle of the function prologue would cause debug info to end the prologue
too early and worse, attach the line number of the CFI instruction, which
incidentally is often 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224743 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214928 91177308-0d34-0410-b5e6-96231b3b80d8
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214881 91177308-0d34-0410-b5e6-96231b3b80d8
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203204 91177308-0d34-0410-b5e6-96231b3b80d8
The changes caused by folding an sp-adjustment into a "pop" previously
disrupted the forward search for the final real instruction in a
terminating block. This switches to a backward search (skipping debug
instrs).
This fixes PR18399.
Patch by Zhaoshi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199266 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.
But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.
This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198617 91177308-0d34-0410-b5e6-96231b3b80d8