1493 Commits

Author SHA1 Message Date
Nemanja Ivanovic
0b61b12b8c Implement vector count leading/trailing bytes with zero lsb and vector parity
builtins - llvm portion

This patch corresponds to review https://reviews.llvm.org/D26003.
Committing on behalf of Zaara Syeda.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285434 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 19:38:24 +00:00
Ehsan Amiri
34a73b3124 [PPC] Adding the removed testcase again
This testcase was originally part of r284995, but I put it in a wrong directory.
So I removed it. Before adding it back I did some small enhancements. Also I
changed the assertions a little bit, to take into account the impact of some
changes performed since code review is done.

This is similar to changes done for another testcase in the original commit.
See: https://reviews.llvm.org/D23614#577749
Basically for instead of vxor we now generate xxlxor in some cases, which is
better.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285333 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-27 19:10:09 +00:00
Nemanja Ivanovic
e9fdaa1bbb [PowerPC] - No SExt/ZExt needed for count trailing zeros
This patch corresponds to review:
https://reviews.llvm.org/D25896

It just eliminates the redundant ZExt after a count trailing zeros instruction.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285267 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-27 05:17:58 +00:00
Nemanja Ivanovic
73235dc8dc Do not assume that FP vector operands are never legalized by expanding
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.

This fixes PR 30715.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285231 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 19:51:35 +00:00
Nemanja Ivanovic
4e7356cfaf [PowerPC] Implement vec_insert_exp builtins - llvm portion
This revision corresponds to review: https://reviews.llvm.org/D25957.
Committing on behalf of Zaara Syeda.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285225 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 19:03:40 +00:00
Ehsan Amiri
300e976507 [PPC] Generate positive FP zero using xor insn instead of loading from constant area
https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-24 17:31:09 +00:00
Ehsan Amiri
5ba8f14a10 [PPC] Better codegen for AND, ANY_EXT, SRL sequence
https://reviews.llvm.org/D24924

This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284983 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-24 15:46:58 +00:00
Sanjay Patel
85745f9561 [DAG] optimize negation of bool
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.

This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239
https://reviews.llvm.org/rL284395

There may be others, so please let me know if you see any regressions.

Differential Revision: https://reviews.llvm.org/D25485


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284611 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-19 16:58:59 +00:00
Tim Northover
31164bc2c9 PowerPC: specify full triple to avoid different Darwin asm syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284281 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 21:25:29 +00:00
Sanjay Patel
65a7ee43be [PowerPC] add tests for PR30661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284279 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 20:51:41 +00:00
Guozhi Wei
8bb12b9f5e [PPC] Shorter sequence to load 64bit constant with same hi/lo words
This is a patch to implement pr30640.

When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.

This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.

Differential Revision: https://reviews.llvm.org/D25521



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284276 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 20:41:50 +00:00
Nirav Dave
080559c6d3 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284157 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-13 20:23:25 +00:00
Nirav Dave
19dc709f4b In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284151 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-13 19:20:16 +00:00
Tim Shen
13774eea42 [PPCMIPeephole] Fix splat elimination
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
  B = Splat A
  C = Splat B
=>
  C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
  B = Splat A
  C = Splat B
=>
  B = Splat A
  C = COPY B

Fixes PR30663.

Reviewers: echristo, iteratee, kbarton, nemanjai

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D25493


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283961 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-12 00:48:25 +00:00
Kyle Butt
2a18018c10 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283934 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11 20:36:43 +00:00
Daniel Jasper
ebc8a28377 Revert "Codegen: Tail-duplicate during placement."
This reverts commit r283842.

test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283857 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11 07:36:11 +00:00
Kyle Butt
be53d7c9c4 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283842 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11 01:20:33 +00:00
Kyle Butt
473ebca2dd Revert "Codegen: Tail-duplicate during placement."
This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283647 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-08 01:47:05 +00:00
Kyle Butt
71c312652c Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Differential revision: https://reviews.llvm.org/D18226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283619 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-07 22:33:20 +00:00
Michael Kuperstein
1ac52953b4 [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.

This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.

Differential Revision: https://reviews.llvm.org/D24683



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283480 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-06 18:58:24 +00:00
Kyle Butt
d03fefcc5e Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283292 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-05 01:39:29 +00:00
Kyle Butt
062ace9764 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283274 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 23:54:18 +00:00
Sanjay Patel
b60ab5d110 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283252 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 20:46:43 +00:00
Nemanja Ivanovic
94ec1e3c4f [Power9] Exploit D-Form VSX Scalar memory ops that target full VSX register set
This patch corresponds to review:

The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283212 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 11:25:52 +00:00
Nemanja Ivanovic
02ede7e72c Fix a test case failure on Apple PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283191 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 07:37:38 +00:00
Nemanja Ivanovic
d0e875cdad [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283190 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 06:59:23 +00:00
Kyle Butt
77893035df Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283172 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 00:38:23 +00:00
Kyle Butt
ff234efbe2 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283164 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 00:00:09 +00:00
Hal Finkel
4c305bebf0 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283060 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-02 02:10:20 +00:00
Nirav Dave
bb15ebf5c7 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282604 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 16:37:50 +00:00
Nirav Dave
a6d3e00dff In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282600 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 15:50:43 +00:00
Nemanja Ivanovic
7a5ffa3882 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282478 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-27 08:42:12 +00:00
Nemanja Ivanovic
a04f9019ef [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282246 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-23 13:25:31 +00:00
Nemanja Ivanovic
f2f9e2bcc5 [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282182 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 19:06:38 +00:00
Krzysztof Parzyszek
7b83fe6d98 [PPC] Set SP after loading data from stack frame, if no red zone is present
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.

This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24466


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282174 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 17:22:43 +00:00
Nemanja Ivanovic
a941fe247e [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282143 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-22 09:52:19 +00:00
Michael Kuperstein
6dc8c1ab98 Make test slightly more explicit. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281759 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 18:20:43 +00:00
Sanjoy Das
9becdeed48 [Stackmap] Added callsite counts to emitted function information.
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.

**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.

Patch by Kavon Farvardin!

Reviewers: atrick, ributzka, sanjoy

Subscribers: nemanjai

Differential Revision: https://reviews.llvm.org/D23487

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281532 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 20:22:03 +00:00
Nemanja Ivanovic
7328eb7558 Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 14:19:09 +00:00
Ayman Musa
7dc5b7e72a Remove MVT:i1 xor instruction before SELECT. (Performance improvement).
Differential Revision: https://reviews.llvm.org/D23764


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281308 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13 09:12:45 +00:00
Peter Collingbourne
5420de3f15 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281284 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13 01:12:59 +00:00
Hal Finkel
21c68fac74 [PowerPC] Fix address-offset folding for plain addi
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).

On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.

I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).

Fixes PR30280.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280789 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-07 07:36:11 +00:00
Hal Finkel
8d0ebe9402 [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).

Fixes PR30276.

For convenience, here's the commit message from r246507, which explains the
problem is greater detail:

[DAGCombine] Fixup SETCC legality checking

SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.

The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.

The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:

  (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
     (which is possible for some combinations of (op1, op2))

If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.

To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).

Fixes PR24636.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280767 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 23:02:23 +00:00
Krzysztof Parzyszek
6ee91ac76e [PPC] Claim stack frame before storing into it, if no red zone is present
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it 
there is no guarantee that this part of the stack will not be modified 
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.

This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.

Differential Revision: https://reviews.llvm.org/D24093


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280705 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 12:30:00 +00:00
Hal Finkel
204ba881b8 [PowerPC] Zero-extend constants in FastISel
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.

The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).

Fixes PR27721.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280614 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-04 06:07:19 +00:00
Hal Finkel
bcc57c0a39 [PowerPC] For larger offsets, when possible, fold offset into addis toc@ha
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280545 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02 21:37:07 +00:00
Hal Finkel
b7a37e9680 [PowerPC] hasAndNotCompare should return true
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).

Fixes PR27203.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280457 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02 02:58:25 +00:00
Hal Finkel
9f423e50b0 [PowerPC] Add a pattern for a runtime bit check
Following a suggestion by Sanjay, we should lower:

  %shl = shl i32 1, %y
  %and = and i32 %x, %shl
  %cmp = icmp eq i32 %and, %shl
  ret i1 %cmp

into:

  subfic r4, r4, 32
  rlwnm r3, r3, r4, 31, 31

Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280454 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02 02:34:44 +00:00
Hal Finkel
67a060f802 [PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.

Fixes PR28727.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280441 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02 00:28:20 +00:00
Hal Finkel
c426463f0b [PowerPC] Don't consider fusion in PPC64 address-formation peephole
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.

We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280440 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-02 00:27:50 +00:00