2217 Commits

Author SHA1 Message Date
Craig Topper
d85a5427b8 [DAGCombiner] In visitINSERT_VECTOR_ELT, move check for BUILD_VECTOR being legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294108 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-04 23:26:34 +00:00
Amaury Sechet
ca299bbd89 Formatting in DAGCombiner. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294091 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-04 13:01:53 +00:00
Nirav Dave
529986a15d Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293915 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 18:24:55 +00:00
Amaury Sechet
8f643cf8d8 Use N0 instead of N->getOperand(0) in DagCombiner::visitAdd. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293903 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 16:07:44 +00:00
Nirav Dave
99b0642f83 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293893 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-02 14:39:42 +00:00
Nicolai Haehnle
ab43652716 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293635 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-31 14:35:37 +00:00
Craig Topper
a4249525ab [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293414 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-29 04:38:21 +00:00
Craig Topper
1b7b2ab925 [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-29 04:38:19 +00:00
Craig Topper
d22bc452e6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293412 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-29 04:38:16 +00:00
Nirav Dave
d9031ef908 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293188 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 16:46:13 +00:00
Nirav Dave
dbb7a65598 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293184 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 16:02:24 +00:00
Craig Topper
c3374e91f1 [DAGCombiner] Fold extract_subvector of undef to undef. Fold away inserting undef subvectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293152 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-26 05:38:46 +00:00
Artur Pilipenko
d10b17e88a Fix buildbot failures introduced by 293036
Fix unused variable, specify types explicitly to make VC compiler happy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293039 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 09:10:07 +00:00
Artur Pilipenko
b52af07c1d [DAGCombiner] Match load by bytes idiom and fold it into a single load. Attempt #2.
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html

This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.

From the original commit:

Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: RKSimon, filcab, chandlerc 

Differential Revision: https://reviews.llvm.org/D27861


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293036 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 08:53:31 +00:00
Matt Arsenault
9291d3c697 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293024 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 06:08:42 +00:00
Matt Arsenault
f337952e56 DAGCombiner: Allow negating ConstantFP after legalize
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293019 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 04:54:34 +00:00
Matt Arsenault
1f1c4b3f65 DAG: Don't fold vector extract into load if target doesn't want to
Fixes turning a 32-bit scalar load into an extending vector load
for AMDGPU when dynamically indexing a vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292842 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-23 22:48:53 +00:00
Simon Pilgrim
b648fac5ca [SelectionDAG] Add support for BITREVERSE constant folding
We were relying on constant folding of the legalized instructions to do what constant folding we had previously

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292114 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-16 13:39:00 +00:00
Benjamin Kramer
1fb85c6675 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291904 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-13 14:39:03 +00:00
Craig Topper
bd496c74a2 Revert r291645 "[DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent."
Some test appears to be hanging on the build bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291650 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 04:59:25 +00:00
Craig Topper
f7b662a8a4 [DAGCombiner] Teach DAG combiner to fold (vselect (N0 xor AllOnes), N1, N2) -> (vselect N0, N2, N1). Only do this if the target indicates its vector boolean type is ZeroOrNegativeOneBooleanContent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291645 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 04:02:23 +00:00
Matt Arsenault
a40945ed88 DAGCombiner: Add hasOneUse checks to fadd/fma combine
Even with aggressive fusion enabled, this requires duplicating
the fmul, or increases an fadd to another fma which is not an
improvement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291642 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 02:02:12 +00:00
Craig Topper
7b6c6fbcbb [DAGCombiner] Merge together duplicate checks for folding fold (select C, 1, X) -> (or C, X) and folding (select C, X, 0) -> (and C, X). Also be consistent about checking that both the condition and the result type are i1. NFC
I guess previously we just assumed if the result type was i1, then the condition type must also be i1?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291548 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 07:42:57 +00:00
Craig Topper
1cc207258c [DAGCombiner] Remove code for optimizing select (xor Cond, 0), X, Y -> select Cond, X, Y. Just let combine on the xor itself take care of it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291534 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 04:12:19 +00:00
Evgeny Stupachenko
c7ecdd32f3 The patch fixes (base, index, offset) match.
Summary:
Instead of matching:
  (a + i) + 1 -> (a + i, undef, 1)
Now it matches:
  (a + i) + 1 -> (a, i, 1)

Reviewers: rengolin

Differential Revision: http://reviews.llvm.org/D26367

From: Evgeny Stupachenko <evstupac@gmail.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291012 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 21:43:39 +00:00
Zijiao Ma
e365f8338a Make the canonicalisation on shifts benifit to more case.
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
  tests of machine-licm.Correct some check lines.

Differential Revision: https://reviews.llvm.org/D27916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290410 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-23 02:56:07 +00:00
Wei Mi
38a74cfe54 Change the interface of TLI.isMultiStoresCheaperThanBitsMerge.
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.

Differential Revision: https://reviews.llvm.org/D24707


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290363 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-22 19:38:22 +00:00
Chandler Carruth
a0c720897c Add extra headers that got deleted by my revert in r289916 but for which
new usage had already grown in the file.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289917 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 04:08:31 +00:00
Chandler Carruth
26fa756724 Revert patch series introducing the DAG combine to match a load-by-bytes
idiom.

r289538: Match load by bytes idiom and fold it into a single load
r289540: Fix a buildbot failure introduced by r289538
r289545: Use more detailed assertion messages in the code ...
r289646: Add a couple of assertions to the load combine code ...

This DAG combine has a bad crash in it that is quite hard to trigger
sadly -- it relies on sneaking code with UB through the SDAG build and
into this particular combine. I've responded to the original commit with
a test case that reproduces it.

However, the code also has other problems that will require substantial
changes to address and so I'm going ahead and reverting it for now. This
should unblock us and perhaps others that are hitting the crash in the
wild and will let a fresh patch with updated approach come in cleanly
afterward.

Sorry for any trouble or disruption!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-16 04:05:22 +00:00
Eli Friedman
986e6724ba Don't combine splats with other shuffles.
We sometimes end up creating shuffles which are worse than the obvious
translation of the IR.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 .

Differential Revision: https://reviews.llvm.org/D27793



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 22:41:40 +00:00
Eli Friedman
259f5fb782 Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.
Targets can't handle this case well in general; we often transform
a shuffle of two cheap BUILD_VECTORs to element-by-element insertion,
which is very inefficient.

Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially
fixes https://llvm.org/bugs/show_bug.cgi?id=31301.

Differential Revision: https://reviews.llvm.org/D27787



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289874 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 21:36:59 +00:00
Sanjay Patel
61cd9ae50b [DAG] allow more select folding for targets that have 'and not' (PR31175)
The original motivation for this patch comes from wanting to canonicalize 
more IR to selects and also canonicalizing min/max.

If we're going to do that, we need more backend fixups to undo select codegen 
when simpler ops will do. I chose AArch64 for the tests because that shows the
difference in the simplest way. This should fix:
https://llvm.org/bugs/show_bug.cgi?id=31175

Differential Revision: https://reviews.llvm.org/D27489


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289738 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 22:59:14 +00:00
Nirav Dave
e461ebb0f4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
Reverting due to ARM MCJIT and MIPS LLD error.

This reverts commit r289659.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289667 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 16:43:44 +00:00
Nirav Dave
ee001a514b In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289659 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 15:44:26 +00:00
Simon Pilgrim
ddae161376 [DAGCombiner] Try to use SelectionDAG::isKnownToBeAPowerOfTwo instead of just APInt::isPowerOf2
Generalize sdiv/udiv/srem/urem combines using APInt::isPowerOf2, which only works for const/splat-const values, to call SelectionDAG::isKnownToBeAPowerOfTwo instead which recognises many more cases.

Added a DAGCombiner::BuildLogBase2 helper since PowerOf2 combines often involve taking the log2 of such a value.

Differential Revision: https://reviews.llvm.org/D27714

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289654 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 15:08:13 +00:00
Artur Pilipenko
1452ff9efc Add a couple of assertions to the load combine code introduced by r289538
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289646 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-14 11:55:47 +00:00
Artur Pilipenko
54becec071 Use more detailed assertion messages in the code introduced by r289538
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289545 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-13 16:26:15 +00:00
Artur Pilipenko
5898f99379 Fix a buildbot failure introduced by r289538
Build failed because of unused variable in product mode.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289540 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-13 14:55:31 +00:00
Artur Pilipenko
bf4539eb3a [DAGCombiner] Match load by bytes idiom and fold it into a single load
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.

Assuming little endian target:
  i8 *a = ...
  i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
  i32 val = *((i32)a)

  i8 *a = ...
  i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
  i32 val = BSWAP(*((i32)a))

This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.

Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
  i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)

Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.

The general scheme is to match OR expressions by recursively calculating the origin of individual bits which constitute the resulting OR value. If all the OR bits come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.

Reviewed By: hfinkel, RKSimon, filcab

Differential Revision: https://reviews.llvm.org/D26149


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289538 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-13 14:21:14 +00:00
Artur Pilipenko
ac57dd01d3 Move BaseIndexOffset in DAGCombiner.cpp so it will be available for the upcoming user
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289537 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-13 14:16:02 +00:00
Nirav Dave
2c96583422 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r289221 which appears to be triggering an assertion

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289226 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-09 17:18:24 +00:00
Nirav Dave
615f3ccd10 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289221 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-09 16:15:12 +00:00
Simon Pilgrim
6e9255f2d0 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288842 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-06 19:09:37 +00:00
Nicolai Haehnle
42285f5cb7 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288506 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-02 16:06:18 +00:00
Nicolai Haehnle
77151a0de7 [SelectionDAG] Rename and clarify visitFMULForFMADCombine (NFC)
Summary: Suggested by @spatel in D26602.

Reviewers: spatel, hfinkel

Subscribers: spatel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27260

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288336 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-01 14:04:13 +00:00
Warren Ristow
14cd31679b Test commit. Comment changes. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288100 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 02:37:13 +00:00
Sanjay Patel
92c01d8697 [DAG] clean up foldSelectCCToShiftAnd(); NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288088 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-28 23:05:55 +00:00
Sanjay Patel
2365a7c201 [DAG] add helper function for selectcc --> and+shift transforms; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288073 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-28 21:47:41 +00:00
Nirav Dave
78f5fdf3e5 Revert "[DAG] Improve loads-from-store forwarding to handle TokenFactor"
This reverts commit r287773 which caused issues with ppc64le builds.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288035 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-28 14:30:29 +00:00
Simon Pilgrim
a9e6de7c73 Use SDValue helpers instead of explicitly going via SDValue::getNode(). NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287941 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-25 17:25:21 +00:00