Commit Graph

3043 Commits

Author SHA1 Message Date
Eli Friedman
251a136db4 [ARM] Prefer BIC over BFC in ARM mode.
BIC is generally faster, and it can put the output in a different
register from the input.

We already do this in Thumb2 mode; not sure why the equivalent fix
never got applied to ARM mode.

Differential Revision: https://reviews.llvm.org/D31797



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299803 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 22:01:23 +00:00
Diana Picus
f51d2756c0 [ARM] GlobalISel: Test hard float properly
It turns out -float-abi=hard doesn't set the hard float calling
convention for libcalls. We need to use a hard float triple instead
(e.g. gnueabihf).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299761 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 12:04:24 +00:00
Diana Picus
aa39cd364d [ARM] GlobalISel: Support frem for 64-bit values
Legalize to a libcall.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299756 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 10:50:02 +00:00
Diana Picus
3afaea7b42 [ARM] GlobalISel: Support frem for 32-bit values
Legalize to a libcall.
On this occasion, also start allowing soft float subtargets. For the
moment G_FREM is the only legal floating point operation for them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299753 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 09:41:39 +00:00
Eli Friedman
f25acacbe6 Turn on -addr-sink-using-gep by default.
The new codepath has been in the tree for years, and there isn't any
reason to use two codepaths here.

Differential Revision: https://reviews.llvm.org/D30596



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299723 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 22:42:18 +00:00
Huihui Zhang
e184e1f870 [SelectionDAG] [ARM CodeGen] Fix chain information of LowerMUL
In LowerMUL, the chain information is not preserved for the new
created Load SDNode.

For example, if a Store alias with one of the operand of Mul.
The Load for that operand need to be scheduled before the Store.
The dependence is recorded in the chain of Store, in TokenFactor.
However, when lowering MUL, the SDNodes for the new Loads for
VMULL are not updated in the TokenFactor for the Store. Thus the
chain is not preserved for the lowered VMULL.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299701 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 20:22:51 +00:00
Yi Kong
dc9458d5a7 Revert "[ARM] Add Kryo to available targets"
This reverts commit 942d6e6f58.

Build breakage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299689 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 19:16:14 +00:00
Nirav Dave
d4d2ab353e [SDAG] Fix visitAND optimization to deal with vector extract case again.
Summary:
Fix case elided by rL298920.

Fixes PR32545.

Reviewers: eli.friedman, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31759

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299688 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 19:05:41 +00:00
Yi Kong
942d6e6f58 [ARM] Add Kryo to available targets
Summary:
Host CPU detection now supports Kryo, so we need to recognize it in ARM
target.

Reviewers: mcrosier, t.p.northover, rengolin, echristo, srhines

Reviewed By: t.p.northover, echristo

Subscribers: aemerson

Differential Revision: https://reviews.llvm.org/D31775

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299674 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 18:10:08 +00:00
Sanjay Patel
a070c921e5 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299542 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 14:09:39 +00:00
Sanjay Patel
e20330959b add/move codegen tests for and/or of setcc; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299396 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 22:45:46 +00:00
Sanjay Patel
b476db57df [CodeGen] clean up and add tests for scalar and-of-setcc; NFC
https://bugs.llvm.org/show_bug.cgi?id=32401


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299034 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-29 21:58:52 +00:00
Javed Absar
47652291c2 Improve machine schedulers for in-order processors
This patch enables schedulers to specify instructions that 
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.

Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298885 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 20:46:37 +00:00
Eli Friedman
fb446bff1e [ARM] Fix mixup between Lo and Hi in SMLALBB formation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298752 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-25 00:13:24 +00:00
Pirama Arumuga Nainar
cdc303e5ed [ARM] Fix computeKnownBits for ARMISD::CMOV
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead.  This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.

This patch fixes the issue and adds a test case.

Reviewers: kristof.beyls, jmolloy

Subscribers: llvm-commits, aemerson, srhines, rengolin

Differential Revision: https://reviews.llvm.org/D31265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298624 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-23 16:47:47 +00:00
Volkan Keles
b5adc93d42 [GlobalISel] Fix shufflevector tests
clang-lld-x86_64-2stage fails because of the order
of the instructions. `CHECK-DAG` directives should
fix the problem.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298367 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 13:12:59 +00:00
Volkan Keles
de5c0c25c0 [GlobalISel] Translate shufflevector
Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders

Reviewed By: javed.absar

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30962

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298347 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 08:44:13 +00:00
Vadzim Dambrouski
a734e28872 [ARM] Fix PR32130: Handle promotion of zero sized constants.
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.

Patch by Tim Neumann.

Differential Revision: https://reviews.llvm.org/D31116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298320 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-20 22:59:57 +00:00
Diana Picus
0c1e456560 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298254 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-20 14:40:18 +00:00
Ahmed Bougacha
c86e4bce8e [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298224 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-19 16:13:00 +00:00
Eli Friedman
87552d6290 [SelectionDAG] Remove redundant stores more aggressively.
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects.  This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.

It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.

Differential Revision: https://reviews.llvm.org/D29845




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298156 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-17 22:15:50 +00:00
Eli Friedman
f5e1bc881e [ARM] Use alias analysis in ARMPreAllocLoadStoreOpt.
This allows the optimization to rearrange loads and stores more
aggressively. This doesn't really affect performance, but it helps
codesize.

Differential Revision: https://reviews.llvm.org/D30839



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298021 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-17 00:34:26 +00:00
Adrian Prantl
308a80b637 PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288

  The DWARF generated by LLVM includes this location:

  0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
  0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
  to assume the DWARF consumer knows which part of a register
  logically holds the value (low bytes, high bytes, how many bytes,
  etc) for a primitive value like an integer.

This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.

(This reapplies r297960 with two additional testcase updates).

rdar://problem/31069390
https://reviews.llvm.org/D31010

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297965 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-16 17:14:56 +00:00
Jonas Paulsson
1b6f5a39a9 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297930 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-16 07:17:12 +00:00
Tim Northover
f4523b0efd ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.

Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297871 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-15 18:38:13 +00:00
Sam Parker
75f7c44e38 [ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.

Differential Revision: https://reviews.llvm.org/D30044



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297809 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-15 08:27:11 +00:00
Sam Parker
53c73db7b9 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297716 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-14 09:13:22 +00:00
Nirav Dave
3bbf394145 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297695 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-14 00:34:14 +00:00
Diana Picus
9ce81e13ed [ARM] GlobalISel: Support SP in regbankselect
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297621 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 14:28:34 +00:00
Eli Friedman
9ac562c933 [DAGCombine] Simplify ISD::AND in GetDemandedBits.
This helps in cases involving bitfields where an AND is exposed by
legalization.

Differential Revision: https://reviews.llvm.org/D30472



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297249 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-08 00:56:35 +00:00
Arnold Schwaighofer
c20464cdad SjLjEHPrepare: Fix the pass for swifterror arguments
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.

swifterror arguments have restrictions on their uses.

rdar://30839288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297197 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 20:29:02 +00:00
Ranjeet Singh
f33a699079 [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297137 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 11:17:53 +00:00
Artyom Skrobov
f57d023cd1 In Thumb1, materialize a move between low registers as a movs, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297134 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 09:38:16 +00:00
Tim Northover
2c87ca8a0e GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297100 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 23:50:28 +00:00
Hans Wennborg
744959b431 Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other"
It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296926 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 23:19:31 +00:00
Ranjeet Singh
82eec824a7 [ARM] fpscr read/write intrinsics not aware of each other
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.

A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.

void foo(int *p) {
     p[0] = __builtin_arm_get_fpscr();
     __builtin_arm_set_fpscr(1);
     p[1] = __builtin_arm_get_fpscr();
}

in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.

Differential Revision: https://reviews.llvm.org/D30542



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296865 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 11:40:07 +00:00
Chandler Carruth
f970832c3b [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 10:02:25 +00:00
Eli Friedman
617c526c5c [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 21:39:39 +00:00
Sanjay Patel
d541a8113c [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296768 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 17:18:56 +00:00
Eli Friedman
c0998936de Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296718 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 00:08:50 +00:00
Eli Friedman
f58f29327f [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296708 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 23:20:29 +00:00
Eli Friedman
fc70228004 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296701 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 22:56:20 +00:00
Sanjay Patel
7685de201c [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296699 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 22:51:31 +00:00
Reid Kleckner
4c3428b604 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296683 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 21:42:00 +00:00
Artur Pilipenko
a4e52312e8 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296651 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 18:12:29 +00:00
Diana Picus
efd69e6ab5 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296631 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 15:35:14 +00:00
Nirav Dave
bfdb3f2a5a In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 14:24:15 +00:00
Diana Picus
109b635928 [ARM] GlobalISel: Lower i32 and fp call parameters on the stack
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.

We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296473 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 14:17:53 +00:00
Diana Picus
de6b1270fb [ARM] GlobalISel: Select 32-bit G_CONSTANT
Put it into a register by means of a MOVi.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296471 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 13:05:42 +00:00
Diana Picus
568082bafe [ARM] GlobalISel: Add mapping for G_CONSTANT
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296469 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 12:13:58 +00:00