Commit Graph

1711 Commits

Author SHA1 Message Date
Kyle Butt
6ece35c79b PPC: Don't select lxv/stxv for insufficiently aligned stack slots.
The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.

Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312843 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 00:37:56 +00:00
Dean Michael Berris
f7f70f7f6c [XRay][CodeGen][PowerPC] Fix tail exit codegen for XRay in PPC
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests  which we add in this change.

What happened was the following:

- For tail exits, we used to unconditionally prepend the returns/exits
  with a pseudo-instruction that gets lowered to the instrumentation
  sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
  suddenly also emit the tail exit pseudo-instruction, since the check
  for whether a return instruction was also a call instruction meant it
  was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
  tests, or the tests being disabled/removed for continuous breakage.

This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.

Reviewers: echristo, timshen

Subscribers: nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D37570

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312772 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 01:47:56 +00:00
Hal Finkel
a481ab548d [PowerPC] Don't use xscvdpspn on the P7
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312612 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-06 03:08:26 +00:00
Hiroshi Inoue
7166ffbe09 [PowerPC] eliminate redundant compare instruction
If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example,

if (a == 0) { ... }
else if (a < 0) { ... }

can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch.

This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible.
To maximize the opportunity, we do canonicalization of code sequence before merging compares.
For the above example, the input for this pass looks like:

cmplwi r3, 0
beq    0, .LBB0_3
cmpwi  r3, -1
bgt    0, .LBB0_4

So, before merging two compares, we canonicalize it as

cmpwi  r3, 0       ; cmplwi and cmpwi yield same result for beq
beq    0, .LBB0_3
cmpwi  r3, 0       ; greather than -1 means greater or equal to 0
bge    0, .LBB0_4

The generated code should be

cmpwi  r3, 0
beq    0, .LBB0_3
bge    0, .LBB0_4

Differential Revision: https://reviews.llvm.org/D37211



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312514 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-05 04:15:17 +00:00
Sam McCall
c7c869be7e Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
This crashes on boringSSL on PPC (will send reduced testcase)

This reverts commit r312328.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 15:47:00 +00:00
Geoff Berry
d168a77ec3 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues addressed since original review:
- Moved removal of dead instructions found by
  LiveIntervals::shrinkToUses() outside of loop iterating over
  instructions to avoid instructions being deleted while pointed to by
  iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312328 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-01 14:27:20 +00:00
Eric Christopher
b9af9b04de Temporarily revert "Update branch coalescing to be a PowerPC specific pass"
From comments and code review it wasn't intended to be enabled by default yet.

This reverts commit r311588.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312214 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-31 05:56:16 +00:00
Hans Wennborg
92b6b153a4 Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding""
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")

> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
>   doing so can break code that implicitly relies on the physical
>   register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
>   can break the machine verifier by creating LiveRanges that don't
>   end on a use (since the undef operand is not considered a use).
>
>   [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
>   This change extends MachineCopyPropagation to do COPY source forwarding.
>
>   This change also extends the MachineCopyPropagation pass to be able to
>   be run during register allocation, after physical registers have been
>   assigned, but before the virtual registers have been re-written, which
>   allows it to remove virtual register COPY LiveIntervals that become dead
>   through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312178 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 22:11:37 +00:00
Geoff Berry
62c7c252f8 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
  doing so can break code that implicitly relies on the physical
  register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
  can break the machine verifier by creating LiveRanges that don't
  end on a use (since the undef operand is not considered a use).

  [MachineCopyPropagation] Extend pass to do COPY source forwarding

  This change extends MachineCopyPropagation to do COPY source forwarding.

  This change also extends the MachineCopyPropagation pass to be able to
  be run during register allocation, after physical registers have been
  assigned, but before the virtual registers have been re-written, which
  allows it to remove virtual register COPY LiveIntervals that become dead
  through the forwarding of all of their uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312154 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 18:41:07 +00:00
Adrian Prantl
69e607f200 Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().

If someone needs to upgrade out-of-tree testcases:
  perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 18:06:51 +00:00
Sanjay Patel
c8f9cf9e26 [DAG] convert vector select-of-constants to logic/math
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.

Steps in this patch:

  1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
     enable this transform. Other targets will probably want that anyway to distinguish scalars
     from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.

  2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
     vector ext is often free.

Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.

Differential Revision: https://reviews.llvm.org/D36840



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311731 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-24 23:24:43 +00:00
Lei Huang
35d604386e Update branch coalescing to be a PowerPC specific pass
Implementing this pass as a PowerPC specific pass.  Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.

Differential Revision : https: // reviews.llvm.org/D32776

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311588 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-23 19:25:04 +00:00
Hiroshi Inoue
ea638e645f [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
- recommitting after fixing a test failure on MacOS

On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311538 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-23 08:55:18 +00:00
Hiroshi Inoue
0722ecf05c Revert rL311526: [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
This reverts commit rL311526 due to failures in some buildbot.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311530 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-23 06:38:05 +00:00
Hiroshi Inoue
65bc8755b1 [PowerPC] better instruction selection for OR (XOR) with a 32-bit immediate
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).

This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.

e.g. (x | 0xFFFFFFFF) should be

	ori 3, 3, 65535
	oris 3, 3, 65535

but LLVM generates without this patch

	li 4, 0
	oris 4, 4, 65535
	ori 4, 4, 65535
	or 3, 3, 4

Differential Revision: https://reviews.llvm.org/D34757



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311526 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-23 05:15:15 +00:00
Sean Fertile
2a641f4cd1 [PPC] Refine checks for emiting TOC restore nop and tail-call eligibility.
For the medium and large code models we only need to check if a call crosses
dso-boundaries when considering tail-call elgibility.

Differential Revision: https://reviews.llvm.org/D34245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311353 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-21 17:35:32 +00:00
Stefan Pintilie
32d044fcf5 [PowerPC] Check if the pre-increment PHI Node already exists
Preparations to use the per-increment are sometimes done in the target
independent pass Loop Strength Reduction. We try to detect them in the PowerPC
specific pass so that they are not done twice and so that we do not add PHIs
that are not required.

Differential Revision: https://reviews.llvm.org/D36736

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311332 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-21 13:36:18 +00:00
Geoff Berry
6c9f36933c Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2
This reverts commit r311135.

sanitizer-x86_64-linux-android buildbot is timing out with just this
patch applied.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-18 01:43:11 +00:00
Geoff Berry
d93db263e5 Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
Two issues identified by buildbots were addressed:
    - The pass no longer forwards COPYs to physical register uses, since
      doing so can break code that implicitly relies on the physical
      register number of the use.
    - The pass no longer forwards COPYs to undef uses, since doing so
      can break the machine verifier by creating LiveRanges that don't
      end on a use (since the undef operand is not considered a use).

    [MachineCopyPropagation] Extend pass to do COPY source forwarding

    This change extends MachineCopyPropagation to do COPY source forwarding.

    This change also extends the MachineCopyPropagation pass to be able to
    be run during register allocation, after physical registers have been
    assigned, but before the virtual registers have been re-written, which
    allows it to remove virtual register COPY LiveIntervals that become dead
    through the forwarding of all of their uses.

    Reviewers: qcolombet, javed.absar, MatzeB, jonpa

    Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

    Differential Revision: https://reviews.llvm.org/D30751

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 23:06:55 +00:00
Sanjay Patel
36dc99ec47 [PowerPC] add tests for vector select-of-constants; NFC
We've discussed canonicalizing to this form in IR, so the backend
should be prepared to lower these in ways better than what we see
here.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311099 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 17:03:11 +00:00
Geoff Berry
a6a5be21df Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"
This reverts commit r311038.

Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change.  Reverting while
I investigate further.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 04:04:11 +00:00
Geoff Berry
31db6f3bd2 [MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.

This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.

Reviewers: qcolombet, javed.absar, MatzeB, jonpa

Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny

Differential Revision: https://reviews.llvm.org/D30751

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 20:50:01 +00:00
Lei Huang
af34a3a6d6 [PowerPC] Add codegen for VSX word extract convert to FP
Add codegen for VSX word extract conversion from signed/unsigned to single/double
precision.

For UINT_TO_FP:
Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239.
Here we will add the missing extract integer and conversion to double. This
utilizes the new P9 instruction xxextractuw to extracting an integer element
when the result will be converted to double thereby saving 2 direct moves
(VSR <-> GPR).

For SINT_TO_FP:
We will implement the following sequence which will also reduce the number of
instructions by saving 2 direct moves.

v4i32->f32:
        xxspltw
        xvcvsxwsp
        xscvspdpn

v4i32->f64:
        xxspltw
        xvcvsxwdp

Differential Revision: https://reviews.llvm.org/D35859

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310866 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-14 18:09:29 +00:00
Chandler Carruth
2de896abce [PowerPC] Revert r310346 (and followups r310356 & r310424) which
introduce a miscompile bug.

There appears to be a bug where the generated code to extract the sign
bit doesn't work correctly for 32-bit inputs. I've replied to the
original commit pointing out the problem. I think I see by inspection
(and reading the manual for PPC) how to fix this, but I can't be 100%
confident and I also don't know what the best way to test this is.
Currently it seems nearly impossible to get the backend to hit this code
path, but the patch autohr is likely in a better position to craft such
test cases than I am, and based on where the bug is it should be easily
done.

Original commit message for r310346:
"""
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE

Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it
adds the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048
"""

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310809 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-14 03:41:00 +00:00
Mikael Holmen
954fd5590e [IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert*
Summary:
This fixes PR32721 in IfConvertTriangle and possible similar problems in
IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond.

In PR32721 we had a triangle

   EBB
   | \
   |  |
   | TBB
   |  /
   FBB

where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).

The edge updating code and branch probability updating code is now pushed
into MergeBlocks() which allows us to share the same update logic between
more callsites. This lets us remove several dependencies on analyzeBranch
and completely eliminate RemoveExtraEdges.

One thing that showed up with this patch was that IfConversion sometimes
left a successor with 0% probability even if there was no branch or
fallthrough to the successor.

One such example from the test case ifcvt_bad_zero_prob_succ.mir. The
indirect branch tBRIND can only jump to bb.1, but without the patch we
got:

  bb.0:
    successors: %bb.1(0x80000000)

  bb.1:
    successors: %bb.1(0x80000000), %bb.2(0x00000000)
    tBRIND %r1, 1, %cpsr
    B %bb.1

  bb.2:

There is no way to jump from bb.1 to bb2, but still there is a 0% edge
from bb.1 to bb.2.

With the patch applied we instead get the expected:

  bb.0:
    successors: %bb.1(0x80000000)

  bb.1:
    successors: %bb.1(0x80000000)
    tBRIND %r1, 1, %cpsr
    B %bb.1

Since bb.2 had no predecessor at all, it was removed.

Several testcases had to be updated due to this since the removed
successor made the "Branch Probability Basic Block Placement" pass
sometimes place blocks in a different order.

Finally added a couple of new test cases:

* PR32721_ifcvt_triangle_unanalyzable.mir:
  Regression test for the original problem dexcribed in PR 32721.

* ifcvt_triangleWoCvtToNextEdge.mir:
  Regression test for problem that caused a revert of my first attempt
  to solve PR 32721.

* ifcvt_simple_bad_zero_prob_succ.mir:
  Test case showing the problem where a wrong successor with 0% probability
  was previously left.

* ifcvt_[diamond|forked_diamond|simple]_unanalyzable.mir
  Very simple test cases for the simple and (forked) diamond cases
  involving unanalyzable branches that can be nice to have as a base if
  wanting to write more complicated tests.

Reviewers: iteratee, MatzeB, grosser, kparzysz

Reviewed By: kparzysz

Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34099

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310697 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-11 06:57:08 +00:00
Nemanja Ivanovic
ccf6aaba91 [PowerPC] Don't crash on larger splats achieved through 1-byte splats
We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.

Differential Revision: https://reviews.llvm.org/D35650


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310358 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-08 13:52:45 +00:00
Nemanja Ivanovic
4db6f31bda [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE
Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds
the handling for the special case where RHS == 0.

Differential Revision: https://reviews.llvm.org/D34048


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-08 11:20:44 +00:00
Stefan Pintilie
07635d3971 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309876 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-02 20:07:21 +00:00
Matthias Braun
b0a0439255 PowerPC: Do not use llc -march in tests.
`llc -march` is problematic because it only switches the target
architecture, but leaves the operating system unchanged. This
occasionally leads to indeterministic tests because the OS from
LLVM_DEFAULT_TARGET_TRIPLE is used.

However we can simply always use `llc -mtriple` instead. This changes
all the tests to do this to avoid people using -march when they copy and
paste parts of tests.

This patch:
- Removes -march if the .ll file already has a matching `target triple`
  directive or -mtriple argument.
- In all other cases changes -march=ppc32/-march=ppc64 to
  -mtriple=ppc32--/-mtriple=ppc64--

See also the discussion in https://reviews.llvm.org/D35287

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309754 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01 22:20:41 +00:00
Sanjay Patel
8209d78723 [CGP] use subtract or subtract-of-cmps for result of memcmp expansion
As noted in the code comment, transforming this in the other direction might require 
a separate transform here in CGP given the block-at-a-time DAG constraint.

Besides that theoretical motivation, there are 2 practical motivations for the 
subtract-of-cmps form:

1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). 
   There is discussion about canonicalizing IR to the select form 
   ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), 
   so we probably need to add DAG transforms for those patterns anyway, but this improves the 
   memcmp output without waiting for that step.

2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
   that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
   if we choose to enable that.

Differential Revision: https://reviews.llvm.org/D34904



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309597 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-31 18:08:24 +00:00
Hiroshi Inoue
3356bc7f45 [PowerPC] enable optimizeCompareInstr for branch with static branch hint
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.

Differential Revision: https://reviews.llvm.org/D35801



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309255 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-27 08:14:48 +00:00
Nemanja Ivanovic
57a32cdffd [PowerPC] Pretty-print CR bits the way the binutils disassembler does
This patch just adds printing of CR bit registers in a more human-readable
form akin to that used by the GNU binutils.

Differential Revision: https://reviews.llvm.org/D31494


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309001 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-25 18:26:35 +00:00
Nemanja Ivanovic
84cbf60689 [PowerPC] - Recommit r304907 now that the issue has been fixed
This is just a recommit since the issue that the commit exposed is now
resolved.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308995 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-25 17:54:51 +00:00
Nemanja Ivanovic
35b282e0ac [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Differential Revision: https://reviews.llvm.org/D35007


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307934 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 18:17:10 +00:00
Matthias Braun
4b013660b8 Specify complete target triple in test
This should fix the problems on the greendragon build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307747 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 01:16:50 +00:00
Konstantin Zhuravlyov
8f85685860 Enhance synchscope representation
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
  global and local memory. These scopes restrict how synchronization is
  achieved, which can result in improved performance.

  This change extends existing notion of synchronization scopes in LLVM to
  support arbitrary scopes expressed as target-specific strings, in addition to
  the already defined scopes (single thread, system).

  The LLVM IR and MIR syntax for expressing synchronization scopes has changed
  to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
  replaces *singlethread* keyword), or a target-specific name. As before, if
  the scope is not specified, it defaults to CrossThread/System scope.

  Implementation details:
    - Mapping from synchronization scope name/string to synchronization scope id
      is stored in LLVM context;
    - CrossThread/System and SingleThread scopes are pre-defined to efficiently
      check for known scopes without comparing strings;
    - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
      the bitcode.

Differential Revision: https://reviews.llvm.org/D21723



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307722 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 22:23:00 +00:00
Tony Jiang
79b3d6018d [PPC] Fix one test case regression for patch https://reviews.llvm.org/D34337.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307691 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 19:07:10 +00:00
Tony Jiang
f6179755b3 [PPC] Fix two bugs in frame lowering.
1. The available program storage region of the red zone to compilers is 288
 bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).

Differential Revision: https://reviews.llvm.org/D34337

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307672 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-11 16:42:20 +00:00
Tony Jiang
dc4a67cca0 [PPC CodeGen] Expand the bitreverse.i64 intrinsic.
Differential Revision: https://reviews.llvm.org/D34908
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307563 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 18:11:23 +00:00
Lei Huang
b6988767a8 [PowerPC] Reduce register pressure by not materializing a constant just for use as an index register for X-Form loads/stores.
For this example:
float test (int *arr) {
    return arr[2];
}

We currently generate the following code:
  li r4, 8
  lxsiwax f0, r3, r4
  xscvsxdsp f1, f0

With this patch, we will now generate:
  addi r3, r3, 8
  lxsiwax f0, 0, r3
  xscvsxdsp f1, f0

Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307553 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 16:44:45 +00:00
Tony Jiang
9163803bf0 [PPC CodeGen] Expand the bitreverse.i32 intrinsic.
Differential Revision: https://reviews.llvm.org/D33572
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307413 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 16:41:55 +00:00
Sean Fertile
cb358c5493 [PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass.
Commiting on behalf of Stefan Pintilie.
Differential Revision: https://reviews.llvm.org/D34829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307180 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 17:57:57 +00:00
Tony Jiang
d1071cff22 [Power9] Exploit vector extract with variable index.
This patch adds the exploitation for new power 9 instructions which extract
variable elements from vectors:
VEXTUBLX
VEXTUBRX
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX

Differential Revision: https://reviews.llvm.org/D34032
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307174 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 16:55:00 +00:00
Tony Jiang
ee1d801b4f [Power9] Exploit vector integer extend instructions when indices aren't correct.
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.

Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307169 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 16:00:38 +00:00
Nemanja Ivanovic
66da567057 Add the missing triple to the test case added as part of r307120.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307122 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 05:14:43 +00:00
Nemanja Ivanovic
c91f749298 [PowerPC] Fix for PR33636
Remove casts to a constant when a node can be an undef.

Differential Revision: https://reviews.llvm.org/D34808


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307120 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-05 04:51:29 +00:00
Sanjay Patel
9a170a630e [PowerPC] auto-generate check lines; NFC
The existing check lines were more flexible, but these are
small enough tests that there shouldn't be much question
about register allocation. I've been hand-modifying this 
file as I change the CGP memcmp expansion, but that's
more error-prone and time-consuming than just running the 
update script.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-30 19:20:54 +00:00
Hiroshi Inoue
16d661a030 [PowerPC] fix potential verification error on __tls_get_addr
This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed.

Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP.

Differential Revision: https://reviews.llvm.org/D34347



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306678 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-29 14:13:38 +00:00
Sanjay Patel
dbbccbae97 [CGP] add specialization for memcmp expansion with only one basic block
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306485 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 23:15:01 +00:00
Sanjay Patel
ca9df19568 [CGP] eliminate a sub instruction in memcmp expansion
As noted in D34071, there are some IR optimization opportunities that could be 
handled by normal IR passes if this expansion wasn't happening so late in CGP.

Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, 
so I'm proposing this change:
  %s = sub i32 %x, %y
  %r = icmp ne %s, 0
    =>
  %r = icmp ne %x, %y

Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.

The fact that the PowerPC backend doesn't eliminate the 'subf.' might be 
something for PPC folks to investigate separately.

Differential Revision: https://reviews.llvm.org/D34416


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-27 21:46:34 +00:00