537 Commits

Author SHA1 Message Date
Oliver Stannard
a04e9e4a0a [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279506 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-23 09:19:22 +00:00
Kyle Butt
a242cdcc77 Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."
This reverts commit 0fda93481c4231c06b838ef476c0c404c51ff875.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279288 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-19 18:17:04 +00:00
Kyle Butt
0fda93481c CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279168 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-18 22:09:27 +00:00
Diana Picus
12fc2327af Revert "CodeGen: If Convert blocks that would form a diamond when tail-merged."
This reverts commit r278287.

This commit broke the clang-cmake-thumbv7-a15-full-sh bot.
See https://llvm.org/bugs/show_bug.cgi?id=28949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278621 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-14 02:10:18 +00:00
Kyle Butt
3da1bfb213 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278287 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-10 20:45:56 +00:00
Sam Parker
822ef54156 [ARM] Improve sxta{b|h} and uxta{b|h} tests
Created a Thumb2 predicated pattern matcher that uses Thumb2 and
HasT2ExtractPack and used it to redefine the patterns for sxta{b|h}
and uxta{b|h}. Also used the similar patterns to fill in isel pattern
gaps for the corresponding instructions in the ARM backend.
The patch is mainly changes to tests since most of this functionality
appears not to have been tested.

Differential Revision: https://reviews.llvm.org/D23273


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278207 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-10 09:34:34 +00:00
Nico Weber
29b5c03449 Revert r277905, it caused PR28894
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277962 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-07 20:18:04 +00:00
Kyle Butt
d9a9f7d3ac CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
	%tmp1434 = icmp eq i32 %a, %b		; <i1> [#uses=1]
	br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:		; preds = %cond_false, %entry
	%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
	%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
	br label %bb

bb:		; preds = %cond_true, %bb.outer
	%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
	%tmp. = sub i32 0, %b_addr.021.0.ph
	%tmp.40 = mul i32 %indvar, %tmp.
	%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
	%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
	br i1 %tmp3, label %cond_true, label %cond_false

cond_true:		; preds = %bb
	%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
	%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
	%indvar.next = add i32 %indvar, 1
	br i1 %tmp1437, label %bb17, label %bb

cond_false:		; preds = %bb
	%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
	%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
	br i1 %tmp14, label %bb17, label %bb.outer

bb17:		; preds = %cond_false, %cond_true, %entry
	%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
	ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277905 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-06 01:52:37 +00:00
James Molloy
14cceb3342 [Thumb] Reapply r272251 with a fix for PR28348 (mk 2)
The important thing I was missing was ensuring newly added constants were kept in topological order. Repositioning the node is correct if the constant is newly added (so it has no topological ordering) but wrong if it already existed - positioning it next in the worklist would break the topological ordering.

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274543 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-05 12:37:13 +00:00
James Molloy
298a9a8a84 Revert "[Thumb] Reapply r272251 with a fix for PR28348"
This reverts commit r274510 - it made green dragon unhappy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274512 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-04 17:14:24 +00:00
James Molloy
eb181e82b8 [Thumb] Reapply r272251 with a fix for PR28348
We were using DAG->getConstant instead of DAG->getTargetConstant. This meant that we could inadvertently increase the use count of a constant if stars aligned, which it did in this testcase. Increasing the use count of the constant could cause ISel to fall over (because DAGToDAG lowering assumed the constant had only one use!)

Original commit message:
  [Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated

  If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;

    int i(int a) {
      return a & 0xfffffeec;
    }

  Used to produce:
      ldr r1, [CONSTPOOL]
      ands r0, r1
    CONSTPOOL: 0xfffffeec

  And now produces:
      movs    r1, #255
      adds    r1, #20  ; Less costly immediate generation
      bics    r0, r1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274510 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-04 16:35:41 +00:00
Kyle Butt
546c8eba34 Codegen: Fix broken assumption in Tail Merge.
Tail merge was making the assumption that a layout successor or
predecessor was always a cfg successor/predecessor. Remove that
assumption. Changes to tests are necessary because the errant cfg edges
were preventing optimizations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273700 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 18:16:36 +00:00
Artur Pilipenko
1fa3fc6d89 Fix an old memset signature in 2009-09-01-PostRAProlog.ll test causing a buildbot failure
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273573 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-23 16:07:10 +00:00
Rafael Espindola
b72793375f Don't print (PLT) on arm.
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.

This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272909 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-16 16:09:53 +00:00
Tim Northover
8189e3d887 ARM: stop emitting blx instructions for most calls on MachO.
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.

rdar://26182344

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269101 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-10 19:17:47 +00:00
Mandeep Singh Grang
efde4d38ac Fix PR26655: Bail out if all regs of an inst BUNDLE have the correct kill flag
Summary:
While setting kill flags on instructions inside a BUNDLE, we bail out as soon
as we set kill flag on a register.  But we are missing a check when all the
registers already have the correct kill flag set. We need to bail out in that
case as well.

This patch refactors the old code and simply makes use of the addRegisterKilled
function in MachineInstr.cpp in order to determine whether to set/remove kill
on an instruction.

Reviewers: apazos, t.p.northover, pete, MatzeB

Subscribers: MatzeB, davide, llvm-commits

Differential Revision: http://reviews.llvm.org/D17356

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269092 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-10 17:57:27 +00:00
Tim Northover
47cbd520cf ARM: use r7 as the frame-pointer on all MachO targets.
This is better for a few reasons:
  + It matches the other tooling for iOS.
  + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
    use ARM mode).
  + It leads to infinitesimally smaller code (0.2%, yay!).

rdar://25369506

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266003 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-11 22:27:40 +00:00
Kyle Butt
b76dcf45e6 [Codegen] Decrease minimum jump table density.
Minimum density for both optsize and non optsize are now options
-sparse-jump-table-density (default 10) for non optsize functions
-dense-jump-table-density (default 40) for optsize functions, which
matches the current default. This improves several benchmarks at google
at the cost of a small codesize increase. For code compiled with -Os,
the old behavior continues

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264689 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-29 00:23:41 +00:00
Matthias Braun
299583523e ARM: Introduce conservative load/store optimization mode
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.

These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.

So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.

Differential Revision: http://reviews.llvm.org/D17015

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262504 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-02 19:20:00 +00:00
Wei Mi
eafb39b656 [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.

1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
   In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
   The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.

2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
   The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.

Differential Revision: http://reviews.llvm.org/D12090



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259736 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-04 01:27:38 +00:00
Wei Mi
dcbf7c311e Revert r259662, which caused regressions on polly tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259675 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-03 18:05:57 +00:00
Wei Mi
e32bfe25a3 [SCEV] Try to reuse existing value during SCEV expansion
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.

This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.

Differential Revision: http://reviews.llvm.org/D12090



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259662 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-03 17:05:12 +00:00
David Majnemer
2404e4b025 Address buildbot fallout from r259065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259074 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-28 18:59:04 +00:00
Dan Gohman
f4e788949d [MC] Use .p2align instead of .align
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.

This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.

Differential Revision: http://reviews.llvm.org/D16549


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258750 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-26 00:03:25 +00:00
Bill Seurer
89c83dc08b Fix test case label check
Several (but not all) of the labels that are checked for in this test case
are checked as strings instead of labels.  This can cause an apparent test
case failure if they are tested in an appropriately named directory.

For example, one of them that fails:

define zeroext i32 @test2(i32 %A.u, i32 %B.u)  {
; A8: test2
; A8: uxtab  r0, r0, r1


Output that causes it to fail:

. . .
	.file	"/home/seurer/llvm/llvm-test2/test/CodeGen/Thumb2/thumb2-uxt_rot.ll"
. . .
	.globl	test2
	.align	1
	.type	test2,%function
	.code	16                      @ @test2
	.thumb_func
test2:
	.fnstart


The "A8: test2" matches on the directory name instead of the label.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253702 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-20 20:24:49 +00:00
Pete Cooper
6d024c616a Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253543 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19 05:56:52 +00:00
Pete Cooper
8b170f7f29 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253511 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-18 22:17:24 +00:00
Quentin Colombet
9eaef59528 [ARM] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14357

rdar://problem/21942589


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253411 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-18 00:40:54 +00:00
Renato Golin
0e66a5f53c Revert "[ARM] Enable shrink-wrapping by default."
This reverts commit r252825, as it broke ASAN on ARM. Investigating...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252889 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-12 13:34:50 +00:00
Matthias Braun
ea5575235a LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization
- Factor out code to query and modify the sign bit of a floatingpoint
  value as an integer. This also works if none of the targets integer
  types is big enough to hold all bits of the floatingpoint value.

- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
  otherwise perform bit manipulation on the sign bit. The previous code
  used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
  takes 34 instructions on ARM Cortex-M4. With this patch we only
  require 5:
    vldr d0, LCPI0_0
    vmov r2, r3, d0
    lsrs r2, r3, #31
    bfi r1, r2, #31, #1
    bx lr
  (This could be further improved if the compiler would recognize that
   r2, r3 is zero).

- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
  available otherwise perform bit manipulation on the sign bit.

- Perform the sign(x) test by masking out the sign bit and comparing
  with 0 rather than shifting the sign bit to the highest position and
  testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
    testl $32768, %eax
  rather than:
    shlq $48, %rax
    sets %al
    testb %al, %al

Differential Revision: http://reviews.llvm.org/D11172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252839 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-12 01:02:47 +00:00
Quentin Colombet
f241349619 [ARM] Enable shrink-wrapping by default.
Differential Revision: http://reviews.llvm.org/D14357

rdar://problem/21942589


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252825 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-11 23:31:46 +00:00
Akira Hatanaka
67f0f878cd [ARM] Handle t2ADDri in ARMAsmPrinter::EmitUnwindingInstruction.
This fixes a bug in ARMAsmPrinter::EmitUnwindingInstruction where
llvm_unreachable was reached because t2ADDri wasn't handled.

Test case provided by Tim Northover.

rdar://problem/23270609

http://reviews.llvm.org/D14518


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252557 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-10 00:10:41 +00:00
Artyom Skrobov
3685b697a2 [ARM] Renaming +t2dsp feature into +dsp, as discussed on llvm-dev
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251125 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-23 17:19:19 +00:00
Oliver Stannard
21348cfda2 [ARM] Use correct half-precision functions in EABI mode
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249565 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-07 16:58:49 +00:00
Jeroen Ketema
874371d11b [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248887 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-30 10:56:37 +00:00
Cong Hou
759d988de3 Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are scaled by a probability to avoid precision issue.
In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison.

Differential Revision: http://reviews.llvm.org/D12742



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248018 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-18 18:19:40 +00:00
Cong Hou
715dbbbc3c Distribute the weight on the edge from switch to default statement to edges generated in lowering switch.
Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors.

For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution.

There are some exceptions:

For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it.
For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it.
When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it.
In other cases, the default weight is evenly distributed to successors.

Differential Revision: http://reviews.llvm.org/D12418



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246522 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-01 01:42:16 +00:00
Matthias Braun
361054b1fa ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Re-apply r241926 with an additional check that r13 and r15 are not used
for LDRD/STRD. See http://llvm.org/PR24190. This also already includes
the fix from r241951.

Differential Revision: http://reviews.llvm.org/D10623

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242742 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-21 00:18:59 +00:00
Matthias Braun
f87866e744 Revert "ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2"
This reverts commit r241926. This caused http://llvm.org/PR24190

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-20 23:17:20 +00:00
Matthias Braun
0dec0e1ea5 ARM: Add scheduling information for LDRLIT instructions to swift scheduling model
These pseudo instructions are only lowered after register allocation and
are therefore still present when the machine scheduler runs.
Add a run: line to a testcase that uses the uncommon flags necessary to
actually produce a LDRLIT instruction on swift.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242587 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-17 23:18:26 +00:00
Matthias Braun
650d9427f0 Arm: Don't define a label twice with two setjmps in a function.
Constructing a name based on the function name didn't give us a unique
symbol if we had more than one setjmp in a function. Using
MCContext::createTempSymbol() always gives us a unique name.

Differential Revision: http://reviews.llvm.org/D9314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242482 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-16 22:34:20 +00:00
Alexey Bataev
a018099669 [SDAG] Optimize unordered comparison in soft-float mode (patch by Anton Nadolskiy)
Current implementation handles unordered comparison poorly in soft-float mode. 
Consider (a ULE b) which is a <= b. It is lowered to (ledf2(a, b) <= 0 || unorddf2(a, b) != 0) (in general). We can do better job by lowering it to (__gtdf2(a, b) <= 0). 
Such replacement is true for other CMP's (ult, ugt, uge). In general, we just call same function as for ordered case but negate comparison against zero.
Differential Revision: http://reviews.llvm.org/D10804


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242280 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-15 08:39:35 +00:00
Matthias Braun
21910d8bd5 Revert "LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization"
Accidental commit, needs review first.

This reverts commit r242107.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242108 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-14 02:09:57 +00:00
Matthias Braun
2a46e4c020 LegalizeDAG: Fix and improve FCOPYSIGN/FABS legalization
- Factor out code to query and modify the sign bit of a floatingpoint
  value as an integer. This also works if none of the targets integer
  types is big enough to hold all bits of the floatingpoint value.

- Legalize FABS(x) as FCOPYSIGN(x, 0.0) if FCOPYSIGN is available,
  otherwise perform bit manipulation on the sign bit. The previous code
  used "x >u 0 ? x : -x" which is incorrect for x being -0.0! It also
  takes 34 instructions on ARM Cortex-M4. With this patch we only
  require 5:
    vldr d0, LCPI0_0
    vmov r2, r3, d0
    lsrs r2, r3, #31
    bfi r1, r2, #31, #1
    bx lr
  (This could be further improved if the compiler would recognize that
   r2, r3 is zero).

- Only lower FCOPYSIGN(x, y) = sign(x) ? -FABS(x) : FABS(x) if FABS is
  available otherwise perform bit manipulation on the sign bit.

- Perform the sign(x) test by masking out the sign bit and comparing
  with 0 rather than shifting the sign bit to the highest position and
  testing for "<s 0". For x86 copysignl (on 80bit values) this gets us:
    testl $32768, %eax
  rather than:
    shlq $48, %rax
    sets %al
    testb %al, %al

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242107 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-14 02:08:26 +00:00
Matthias Braun
02e89ace70 ARMLoadStoreOptimizer: Create LDRD/STRD on thumb2
Differential Revision: http://reviews.llvm.org/D10623

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241926 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-10 18:28:49 +00:00
Matthias Braun
7d46df3626 ARMLoadStoreOptimizer: Fix errata 602117 handling and make testcase actually test for it
This fixes PR23912

Differential Revision: http://reviews.llvm.org/D10620

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240582 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-24 20:03:27 +00:00
David Majnemer
cc714e2142 Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239940 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-17 20:52:32 +00:00
Matthias Braun
e942914d29 ARM: Thumb2 LDRD/STRD supports independent input/output regs
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.

Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.

Differential Revision: http://reviews.llvm.org/D9694

Recommiting after the revert in r238821, the buildbot still failed with
the patch removed so there seems to be another reason for the breakage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238935 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-03 16:30:24 +00:00
Renato Golin
6b35bec8ef Revert "ARM: Thumb2 LDRD/STRD supports independent input/output regs"
This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot.

Since self-hosting issues with Clang are hard to investigate, I'm taking the
liberty to revert now, so we can investigate it offline.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238821 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-02 11:47:30 +00:00
Matthias Braun
d421582e90 ARM: Thumb2 LDRD/STRD supports independent input/output regs
The existing code would unnecessarily break LDRD/STRD apart with
non-adjacent registers, on thumb2 this is not necessary.

Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore
as there is not reason to set register hints anymore, changing that is
something for a future patch however.

Differential Revision: http://reviews.llvm.org/D9694

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238795 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-01 23:27:08 +00:00