9025 Commits

Author SHA1 Message Date
Nirav Dave
3bbf394145 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297695 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-14 00:34:14 +00:00
Artyom Skrobov
7a06df3cf9 [Thumb1] combine ADDC/SUBC with a negative immediate
Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400

Reviewers: efriedma, jmolloy

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297682 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 22:36:14 +00:00
Diana Picus
9ce81e13ed [ARM] GlobalISel: Support SP in regbankselect
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297621 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 14:28:34 +00:00
Sjoerd Meijer
f106d176a5 ARMDisassembler: loop over ARM decode tables
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.

Differential Revision: https://reviews.llvm.org/D30814


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297608 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 09:41:10 +00:00
Artyom Skrobov
a4a753ce0b imm_comp_XFORM (defined in ARMInstrThumb.td) duplicates imm_not_XFORM (defined in ARMInstrInfo.td)
Reviewers: grosbach, rengolin, jmolloy

Reviewed By: jmolloy

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D30782

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297456 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-10 13:21:12 +00:00
Artyom Skrobov
180a200844 Refactor the multiply-accumulate combines to act on
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].

Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.

Reviewers: jmolloy, rogfer01

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297453 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-10 12:41:33 +00:00
Artyom Skrobov
c2bf545fa9 For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.

Reviewers: jmolloy, rogfer01, efriedma

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30400

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297443 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-10 07:40:27 +00:00
Sjoerd Meijer
a992848024 [ARM] remove FIXMEs and add vcmp MC test
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.

Differential Revision: https://reviews.llvm.org/D30745


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297376 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-09 13:28:37 +00:00
John Brawn
2a998c0c51 [ARM] Correct handling of LSL #0 in an IT block
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.

Differential Revision: https://reviews.llvm.org/D30692


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297142 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 14:42:03 +00:00
Ranjeet Singh
f33a699079 [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297137 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 11:17:53 +00:00
Artyom Skrobov
f57d023cd1 In Thumb1, materialize a move between low registers as a movs, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297134 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 09:38:16 +00:00
Tim Northover
2c87ca8a0e GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297100 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 23:50:28 +00:00
Krzysztof Parzyszek
88a7ff46b1 Make TargetInstrInfo::isPredicable take a const reference, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296901 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 18:30:54 +00:00
Chandler Carruth
f970832c3b [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 10:02:25 +00:00
Eli Friedman
617c526c5c [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 21:39:39 +00:00
Matthew Simpson
201896c9fd [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 15:15:35 +00:00
Matthew Simpson
6523b45d2e [ARM/AArch64] Support wide interleaved accesses
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.

Differential Revision: https://reviews.llvm.org/D29466

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296750 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 15:11:20 +00:00
Eli Friedman
c0998936de Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296718 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 00:08:50 +00:00
Eli Friedman
f58f29327f [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296708 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 23:20:29 +00:00
Eli Friedman
fc70228004 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296701 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 22:56:20 +00:00
Diana Picus
1896046c88 clang-format r296631
Apparently I forgot to run it after fixing up some things...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296634 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 15:54:21 +00:00
Diana Picus
efd69e6ab5 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296631 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 15:35:14 +00:00
Oliver Stannard
b6dfd8e2c3 [ARM] Fix parsing of special register masks
This parsing code was incorrectly checking for invalid characters, so an
invalid instruction like:
  msr spsr_w, r0
would be emitted as:
  msr spsr_cxsf, r0

Differential revision: https://reviews.llvm.org/D30462



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296607 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 10:51:04 +00:00
Eli Friedman
a918957b28 [ARM] Don't generate deprecated T1 STM.
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.

Patch by Zhaoshi Zheng.

Differential Revision: https://reviews.llvm.org/D27910



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296538 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 23:32:55 +00:00
Nirav Dave
bfdb3f2a5a In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 14:24:15 +00:00
Diana Picus
109b635928 [ARM] GlobalISel: Lower i32 and fp call parameters on the stack
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.

We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296473 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 14:17:53 +00:00
Diana Picus
de6b1270fb [ARM] GlobalISel: Select 32-bit G_CONSTANT
Put it into a register by means of a MOVi.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296471 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 13:05:42 +00:00
Diana Picus
568082bafe [ARM] GlobalISel: Add mapping for G_CONSTANT
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296469 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 12:13:58 +00:00
Diana Picus
e855e8c475 [ARM] GlobalISel: Legalize 32-bit constants
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296468 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 11:33:46 +00:00
Diana Picus
77b493ab59 [ARM] GlobalISel: Select G_GEP
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296462 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 10:14:38 +00:00
Oliver Stannard
c492675c31 [ARM] Diagnose PC-writing instructions in IT blocks
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.

Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.

Differential Revision: https://reviews.llvm.org/D30398



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296459 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 10:04:36 +00:00
Diana Picus
76b7c3efa9 [ARM] GlobalISel: Add reg bank mapping for G_GEP
This should be the same as the mapping for G_ADD etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296455 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 09:35:10 +00:00
Diana Picus
761fac3e9b [ARM] GlobalISel: Legalize G_GEP with 32-bit offsets
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 09:02:42 +00:00
Sanjay Patel
2c5b9e376b [ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition
The transform in question claims to be doing:

// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))

...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.

This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.

The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's 
not affected by this code change.

Differential Revision:
https://reviews.llvm.org/D30355



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296389 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 21:30:54 +00:00
John Brawn
4042b8120e [ARM] LSL #0 is an alias of MOV
Currently we handle this correctly in arm, but in thumb we don't which leads to
an unpredictable instruction being emitted for LSL #0 in an IT block and SP not
being permitted in some cases when it should be.

For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the
.td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to
get the IT handling right. We also need to adjust the handling of
MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We
should also adjust it to allow SP in the same way that it is allowed in
MOV rd, rn, but I haven't done that here because it looks like it would take
quite a lot of work to get right.

Additionally correct the selection of the 16-bit shift instructions in
processInstruction, where it was checking if the two registers were equal when
it should have been checking if they were low. It appears that previously this
code was never executed and the 16-bit encoding was selected by default, but
the other changes I've done here have somehow made it start being used.

Differential Revision: https://reviews.llvm.org/D30294


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296342 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 14:40:51 +00:00
Nirav Dave
b89cc7e5e3 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296279 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-26 01:27:32 +00:00
Nirav Dave
32147cef64 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296252 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-25 11:43:58 +00:00
Diana Picus
27abbacf66 [ARM] GlobalISel: Select G_STORE
Same as selecting G_LOAD.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296122 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 14:01:27 +00:00
Diana Picus
11601c0bd3 [ARM] GlobalISel: Add reg bank mappings for stores
Same as the ones for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296115 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 13:07:25 +00:00
Diana Picus
29289da775 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296108 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 11:28:24 +00:00
Diana Picus
269c5fd3e9 Revert "[ARM] GlobalISel: Legalize stores"
This reverts commit r296103 because the test broke on one of the bots. Sorry!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296104 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 10:35:39 +00:00
Diana Picus
03012a011d [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 10:19:23 +00:00
Tim Northover
a328146a75 ARM: make sure FastISel bails on f64 operations for Cortex-M4.
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.

The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296031 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 22:35:00 +00:00
Diana Picus
31d09e83a5 [ARM] GlobalISel: Lower call returns
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295973 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 14:18:41 +00:00
Diana Picus
5e98318841 [ARM] GlobalISel: Lower call parameters in regs
Add support for lowering calls with parameters than can fit into regs.  Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295971 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 13:25:43 +00:00
Kristof Beyls
93418eeb1d Fix assertion failure in ARMConstantIslandPass.
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.

This fixes PR31997.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295964 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 12:24:55 +00:00
Roger Ferrer Ibanez
296acceca1 [ARM] Fix constant islands pass.
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.

Thus, remove the tPUSH as well in this case.

Differential Revision: https://reviews.llvm.org/D30207



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-22 09:06:21 +00:00
Javed Absar
6badf185b6 [ARM] Classification Improvements to ARM Sched-Models. NFCI.
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.

Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295811 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-22 07:22:57 +00:00
Dan Gohman
d660a5d68c [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295803 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-22 01:23:18 +00:00
Evgeniy Stepanov
2c0dd61dd0 Fix PR31896.
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295762 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 20:17:34 +00:00