The lxv/stxv instructions require an offset that is 0 % 16. Previously we were
selecting lxv/stxv for loads and stores to the stack where the offset from the
slot was a multiple of 16, but the stack slot was not 16 or more byte aligned.
When the frame gets lowered these transform to r(1|31) + slot + offset.
If slot is not aligned, slot + offset may not be 0 % 16.
Now we require 16 byte or more alignment for select lxv/stxv to stack slots.
Includes a testcase that shows both sufficiently and insufficiently aligned
stack slots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312843 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes code-gen for XRay in PPC. The regression wasn't caught by
codegen tests which we add in this change.
What happened was the following:
- For tail exits, we used to unconditionally prepend the returns/exits
with a pseudo-instruction that gets lowered to the instrumentation
sled (and leave the actual return/exit instruction as-is).
- Changes to the XRay instrumentation pass caused the tail exits to
suddenly also emit the tail exit pseudo-instruction, since the check
for whether a return instruction was also a call instruction meant it
was a tail exit instruction.
- None of the tests caught the regression either due to non-existent
tests, or the tests being disabled/removed for continuous breakage.
This change re-introduces some of the basic tests and verifies that
we're back to a state that allows the back-end to generate appropriate
XRay instrumented binaries for PPC in the presence of tail exits.
Reviewers: echristo, timshen
Subscribers: nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D37570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312772 91177308-0d34-0410-b5e6-96231b3b80d8
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312612 91177308-0d34-0410-b5e6-96231b3b80d8
If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example,
if (a == 0) { ... }
else if (a < 0) { ... }
can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch.
This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible.
To maximize the opportunity, we do canonicalization of code sequence before merging compares.
For the above example, the input for this pass looks like:
cmplwi r3, 0
beq 0, .LBB0_3
cmpwi r3, -1
bgt 0, .LBB0_4
So, before merging two compares, we canonicalize it as
cmpwi r3, 0 ; cmplwi and cmpwi yield same result for beq
beq 0, .LBB0_3
cmpwi r3, 0 ; greather than -1 means greater or equal to 0
bge 0, .LBB0_4
The generated code should be
cmpwi r3, 0
beq 0, .LBB0_3
bge 0, .LBB0_4
Differential Revision: https://reviews.llvm.org/D37211
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312514 91177308-0d34-0410-b5e6-96231b3b80d8
Issues addressed since original review:
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312328 91177308-0d34-0410-b5e6-96231b3b80d8
From comments and code review it wasn't intended to be enabled by default yet.
This reverts commit r311588.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312214 91177308-0d34-0410-b5e6-96231b3b80d8
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")
> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
> doing so can break code that implicitly relies on the physical
> register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
> can break the machine verifier by creating LiveRanges that don't
> end on a use (since the undef operand is not considered a use).
>
> [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
> This change extends MachineCopyPropagation to do COPY source forwarding.
>
> This change also extends the MachineCopyPropagation pass to be able to
> be run during register allocation, after physical registers have been
> assigned, but before the virtual registers have been re-written, which
> allows it to remove virtual register COPY LiveIntervals that become dead
> through the forwarding of all of their uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312178 91177308-0d34-0410-b5e6-96231b3b80d8
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312154 91177308-0d34-0410-b5e6-96231b3b80d8
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8
This goes back to a discussion about IR canonicalization. We'd like to preserve and convert
more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in
to the backend and transform it to better DAG ops.
Steps in this patch:
1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely
enable this transform. Other targets will probably want that anyway to distinguish scalars
from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary.
2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the
vector ext is often free.
Implementing a more general fold using xor+and can be a follow-up for targets that don't have
a legal vselect. It's also possible that we can remove the TLI hook for the special case fold
implemented here because we're eliminating a constant, but it needs to be tested on other
targets.
Differential Revision: https://reviews.llvm.org/D36840
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311731 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes
the analyzeBranch method which currently does not include any implicit operands.
This is not an issue on PPC but must be handled on other targets.
Differential Revision : https: // reviews.llvm.org/D32776
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311588 91177308-0d34-0410-b5e6-96231b3b80d8
- recommitting after fixing a test failure on MacOS
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
Differential Revision: https://reviews.llvm.org/D34757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311538 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC64, OR (XOR) with a 32-bit immediate can be done with only two instructions, i.e. ori + oris.
But the current LLVM generates three or four instructions for this purpose (and also it clobbers one GPR).
This patch makes PPC backend generate ori + oris (xori + xoris) for OR (XOR) with a 32-bit immediate.
e.g. (x | 0xFFFFFFFF) should be
ori 3, 3, 65535
oris 3, 3, 65535
but LLVM generates without this patch
li 4, 0
oris 4, 4, 65535
ori 4, 4, 65535
or 3, 3, 4
Differential Revision: https://reviews.llvm.org/D34757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311526 91177308-0d34-0410-b5e6-96231b3b80d8
Preparations to use the per-increment are sometimes done in the target
independent pass Loop Strength Reduction. We try to detect them in the PowerPC
specific pass so that they are not done twice and so that we do not add PHIs
that are not required.
Differential Revision: https://reviews.llvm.org/D36736
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311332 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r311135.
sanitizer-x86_64-linux-android buildbot is timing out with just this
patch applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8
Two issues identified by buildbots were addressed:
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8
We've discussed canonicalizing to this form in IR, so the backend
should be prepared to lower these in ways better than what we see
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311099 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r311038.
Several buildbots are breaking, and at least one appears to be due to
the forwarding of physical regs enabled by this change. Reverting while
I investigate further.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa
Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny
Differential Revision: https://reviews.llvm.org/D30751
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8
Add codegen for VSX word extract conversion from signed/unsigned to single/double
precision.
For UINT_TO_FP:
Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239.
Here we will add the missing extract integer and conversion to double. This
utilizes the new P9 instruction xxextractuw to extracting an integer element
when the result will be converted to double thereby saving 2 direct moves
(VSR <-> GPR).
For SINT_TO_FP:
We will implement the following sequence which will also reduce the number of
instructions by saving 2 direct moves.
v4i32->f32:
xxspltw
xvcvsxwsp
xscvspdpn
v4i32->f64:
xxspltw
xvcvsxwdp
Differential Revision: https://reviews.llvm.org/D35859
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310866 91177308-0d34-0410-b5e6-96231b3b80d8
introduce a miscompile bug.
There appears to be a bug where the generated code to extract the sign
bit doesn't work correctly for 32-bit inputs. I've replied to the
original commit pointing out the problem. I think I see by inspection
(and reading the manual for PPC) how to fix this, but I can't be 100%
confident and I also don't know what the best way to test this is.
Currently it seems nearly impossible to get the backend to hit this code
path, but the patch autohr is likely in a better position to craft such
test cases than I am, and based on where the bug is it should be easily
done.
Original commit message for r310346:
"""
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE
Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it
adds the handling for the special case where RHS == 0.
Differential Revision: https://reviews.llvm.org/D34048
"""
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310809 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes PR32721 in IfConvertTriangle and possible similar problems in
IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond.
In PR32721 we had a triangle
EBB
| \
| |
| TBB
| /
FBB
where FBB didn't have any successors at all since it ended with an
unconditional return. Then TBB and FBB were be merged into EBB, but EBB
would still keep its successors, and the use of analyzeBranch and
CorrectExtraCFGEdges wouldn't help to remove them since the return
instruction is not analyzable (at least not on ARM).
The edge updating code and branch probability updating code is now pushed
into MergeBlocks() which allows us to share the same update logic between
more callsites. This lets us remove several dependencies on analyzeBranch
and completely eliminate RemoveExtraEdges.
One thing that showed up with this patch was that IfConversion sometimes
left a successor with 0% probability even if there was no branch or
fallthrough to the successor.
One such example from the test case ifcvt_bad_zero_prob_succ.mir. The
indirect branch tBRIND can only jump to bb.1, but without the patch we
got:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000), %bb.2(0x00000000)
tBRIND %r1, 1, %cpsr
B %bb.1
bb.2:
There is no way to jump from bb.1 to bb2, but still there is a 0% edge
from bb.1 to bb.2.
With the patch applied we instead get the expected:
bb.0:
successors: %bb.1(0x80000000)
bb.1:
successors: %bb.1(0x80000000)
tBRIND %r1, 1, %cpsr
B %bb.1
Since bb.2 had no predecessor at all, it was removed.
Several testcases had to be updated due to this since the removed
successor made the "Branch Probability Basic Block Placement" pass
sometimes place blocks in a different order.
Finally added a couple of new test cases:
* PR32721_ifcvt_triangle_unanalyzable.mir:
Regression test for the original problem dexcribed in PR 32721.
* ifcvt_triangleWoCvtToNextEdge.mir:
Regression test for problem that caused a revert of my first attempt
to solve PR 32721.
* ifcvt_simple_bad_zero_prob_succ.mir:
Test case showing the problem where a wrong successor with 0% probability
was previously left.
* ifcvt_[diamond|forked_diamond|simple]_unanalyzable.mir
Very simple test cases for the simple and (forked) diamond cases
involving unanalyzable branches that can be nice to have as a base if
wanting to write more complicated tests.
Reviewers: iteratee, MatzeB, grosser, kparzysz
Reviewed By: kparzysz
Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34099
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310697 91177308-0d34-0410-b5e6-96231b3b80d8
We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.
Differential Revision: https://reviews.llvm.org/D35650
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310358 91177308-0d34-0410-b5e6-96231b3b80d8
`llc -march` is problematic because it only switches the target
architecture, but leaves the operating system unchanged. This
occasionally leads to indeterministic tests because the OS from
LLVM_DEFAULT_TARGET_TRIPLE is used.
However we can simply always use `llc -mtriple` instead. This changes
all the tests to do this to avoid people using -march when they copy and
paste parts of tests.
This patch:
- Removes -march if the .ll file already has a matching `target triple`
directive or -mtriple argument.
- In all other cases changes -march=ppc32/-march=ppc64 to
-mtriple=ppc32--/-mtriple=ppc64--
See also the discussion in https://reviews.llvm.org/D35287
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309754 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in the code comment, transforming this in the other direction might require
a separate transform here in CGP given the block-at-a-time DAG constraint.
Besides that theoretical motivation, there are 2 practical motivations for the
subtract-of-cmps form:
1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still).
There is discussion about canonicalizing IR to the select form
( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ),
so we probably need to add DAG transforms for those patterns anyway, but this improves the
memcmp output without waiting for that step.
2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert
that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided
if we choose to enable that.
Differential Revision: https://reviews.llvm.org/D34904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309597 91177308-0d34-0410-b5e6-96231b3b80d8
In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible.
If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen.
This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations.
Differential Revision: https://reviews.llvm.org/D35801
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309255 91177308-0d34-0410-b5e6-96231b3b80d8
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307722 91177308-0d34-0410-b5e6-96231b3b80d8
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Differential Revision: https://reviews.llvm.org/D34337
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307672 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307169 91177308-0d34-0410-b5e6-96231b3b80d8
The existing check lines were more flexible, but these are
small enough tests that there shouldn't be much question
about register allocation. I've been hand-modifying this
file as I change the CGP memcmp expansion, but that's
more error-prone and time-consuming than just running the
update script.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306861 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed.
Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP.
Differential Revision: https://reviews.llvm.org/D34347
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306678 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in D34071, there are some IR optimization opportunities that could be
handled by normal IR passes if this expansion wasn't happening so late in CGP.
Regardless of that, it seems wasteful to knowingly produce suboptimal IR here,
so I'm proposing this change:
%s = sub i32 %x, %y
%r = icmp ne %s, 0
=>
%r = icmp ne %x, %y
Changing the predicate to 'eq' mimics what InstCombine would do, so that's just
an efficiency improvement if we decide this expansion should happen sooner.
The fact that the PowerPC backend doesn't eliminate the 'subf.' might be
something for PPC folks to investigate separately.
Differential Revision: https://reviews.llvm.org/D34416
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306471 91177308-0d34-0410-b5e6-96231b3b80d8