3431 Commits

Author SHA1 Message Date
Nirav Dave
3bbf394145 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297695 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-14 00:34:14 +00:00
Tim Northover
d0188c3d44 Revert "GlobalISel: move vector extract/insert inside generic opcode region."
I was writing against an earlier branch and Volkan had already fixed this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297668 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 21:25:10 +00:00
Tim Northover
6ac2f2bb7f GlobalISel: move vector extract/insert inside generic opcode region.
Otherwise they won't be legalized or selected, causing instruction selection to
fail horribly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297666 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 21:18:59 +00:00
Volkan Keles
7fdb926c14 [GlobalISel] Update PRE_ISEL_GENERIC_OPCODE_END marker
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297663 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 20:31:45 +00:00
Jessica Paquette
af024ba867 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297653 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 18:39:33 +00:00
Craig Topper
27a7060a36 [SelectionDAG] Enhance SDTCisSameNumEltsAs to work with scalar types and use it on extend/trunc/round operations.
Currently we don't enforce that ISD::ANY_EXTEND, ZERO_EXTEND, SIGN_EXTEND, TRUNC, FP_ROUND, FP_EXTEND have the same number of elements(including scalar) between their input and output. Though we have them documented as such. Up until a few months ago x86 created nodes that violated this rule. That's all been fixed now, and we should enforce the rule going forward.

In order to do this we need to allow SDTCisSameNumEltsAs to support scalar types and not enforce being a vector. If one type is scalar we will force the other type to also be scalar.

Differential Revision: https://reviews.llvm.org/D30878



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297648 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-13 17:37:14 +00:00
Volkan Keles
27b5bfd0d0 [GlobalISel] Translate insertelement and extractelement
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30761

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297495 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-10 19:08:28 +00:00
Eli Friedman
bfa11453a9 Refactor alias check from MISched into common helper. NFC.
Differential Revision: https://reviews.llvm.org/D30598



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297421 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-09 23:33:36 +00:00
Volkan Keles
0e1e54e6f2 [GlobalISel] Translate floating-point negation
Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297171 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-07 18:03:28 +00:00
Tim Northover
2c87ca8a0e GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297100 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 23:50:28 +00:00
Volkan Keles
6f84539d5d [GlobalISel] Fix G_FPEXT’s description. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297088 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 22:47:19 +00:00
Jessica Paquette
d43adee378 [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-06 21:31:18 +00:00
Simon Pilgrim
ca6750e3d5 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296985 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-05 09:57:20 +00:00
Sanjay Patel
59071e49a9 [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296977 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-04 19:18:09 +00:00
Tim Northover
ce6504cc93 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296923 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 23:05:47 +00:00
Tim Northover
f4d294cc96 GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.

I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296921 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 22:46:09 +00:00
Krzysztof Parzyszek
88a7ff46b1 Make TargetInstrInfo::isPredicable take a const reference, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296901 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 18:30:54 +00:00
Chandler Carruth
f970832c3b [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-03 10:02:25 +00:00
Reid Kleckner
4c3428b604 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296683 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-01 21:42:00 +00:00
Nirav Dave
bfdb3f2a5a In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 14:24:15 +00:00
Matthias Braun
73ddbb7dff Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296426 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 02:24:30 +00:00
Matthias Braun
c043a889f1 Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296418 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-28 00:33:32 +00:00
Nirav Dave
b89cc7e5e3 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296279 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-26 01:27:32 +00:00
Nirav Dave
32147cef64 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296252 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-25 11:43:58 +00:00
Stanislav Mekhanoshin
fef0dbe59c Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296182 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 21:56:16 +00:00
Stanislav Mekhanoshin
0bf4d71d50 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296009 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 20:19:44 +00:00
Matt Arsenault
5736385164 TargetOptions: Fix not accounting for NoSignedZerosFPMath in ==
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295928 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-23 03:16:44 +00:00
Geoff Berry
fc170d8f5d [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295746 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 18:53:14 +00:00
Hans Wennborg
b6ae6ad928 [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295262 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-16 00:04:05 +00:00
Tim Northover
432026394b GlobalISel: support translating va_arg
Since (say) i128 and [16 x i8] map to the same type in generic MIR, we also
need to attach the required alignment info.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295254 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 23:22:33 +00:00
Simon Pilgrim
b529f91924 Fix spelling mistake - paramater -> parameter. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295182 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-15 15:11:36 +00:00
Tim Northover
173c7d4750 GlobalISel: introduce G_PTR_MASK to simplify alloca handling.
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295103 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 20:56:18 +00:00
Reid Kleckner
cfee44a41c [CodeGen] Use bitfields instead of manual masks in ArgFlagsTy, NFC
This revealed that we actually have 8 more unused flag bits, and byval
size doesn't need to be a bitfield at all.

This came up during code review here:
https://reviews.llvm.org/D29668#inline-258469

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294989 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-13 21:33:26 +00:00
Tim Northover
d8f030286c GlobalISel: translate @llvm.pow intrinsic to G_FPOW.
It'll usually be immediately legalized back to a libcall, but occasionally
something can be done with it so we'd just as well enable that flexibility from
the start.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294530 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 23:23:32 +00:00
Tim Northover
063022b81a GlobalISel: expand mul-with-overflow into mul-hi on AArch64.
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294519 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 21:22:15 +00:00
Tim Northover
de0a86879f GlobalISel: translate @llvm.va_start intrinsic.
Because we need to preserve the memory access being performed we need a
separate instruction to represent this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294492 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 17:57:20 +00:00
Matt Arsenault
a647924c48 TargetLowering: Remove AddrSpace parameter from GetAddrModeArguments
It doesn't make any sense to pass in to what is supposed to be parsing
the call, and this can be inferred from the pointer output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294412 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 07:09:03 +00:00
Hans Wennborg
34a6e0d36a [X86] Disable conditional tail calls (PR31257)
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294348 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-07 20:37:45 +00:00
Sanjoy Das
a0be0b1390 [ImplicitNullCheck] Extend Implicit Null Check scope by using stores
Summary:
This change allows usage of store instruction for implicit null check.

Memory Aliasing Analisys is not used and change conservatively supposes
that any store and load may access the same memory. As a result
re-ordering of store-store, store-load and load-store is prohibited.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: atrick, llvm-commits

Differential Revision: https://reviews.llvm.org/D29400

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294338 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-07 19:19:49 +00:00
Nirav Dave
529986a15d Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293915 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 18:24:55 +00:00
Nirav Dave
99b0642f83 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293893 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 14:39:42 +00:00
Dehao Chen
fe46230d58 Change debug-info-for-profiling from a TargetOption to a function attribute.
Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293833 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-01 22:45:09 +00:00
Evandro Menezes
20de1ea345 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293737 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-01 02:54:34 +00:00
Nirav Dave
53d52e9e2b [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293648 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 17:00:27 +00:00
Kristof Beyls
e9939d4d42 [GlobalISel] Add support for indirectbr
Differential Revision: https://reviews.llvm.org/D28079


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293470 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-30 09:13:18 +00:00
David Majnemer
7bf48fa3f7 [Target] Add NoSignedZerosFPMath to the TargetOptions constructor
Most flags were already initialized by the TargetOptions constructor but
we missed out on one.  Also, simplify the constructor by using field
initializers when possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293406 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-29 01:27:08 +00:00
Stanislav Mekhanoshin
be4948cead Replace addEarlyAsPossiblePasses callback with adjustPassManager
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293189 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 16:49:08 +00:00
Nirav Dave
d9031ef908 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293188 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 16:46:13 +00:00
Nirav Dave
dbb7a65598 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293184 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 16:02:24 +00:00
Krzysztof Parzyszek
d40c764784 Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293077 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 19:29:04 +00:00