663 Commits

Author SHA1 Message Date
Hans Wennborg
5d413c4a80 Merging r325687:
------------------------------------------------------------------------
r325687 | sbaranga | 2018-02-21 16:20:32 +0100 (Wed, 21 Feb 2018) | 8 lines

[SCEV] Temporarily disable loop versioning for the purpose
of turning SCEVUnknowns of PHIs into AddRecExprs.

This feature is now hidden behind the -scev-version-unknown flag.

Fixes PR36032 and PR35432.


------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325773 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-22 12:53:02 +00:00
Hans Wennborg
71667f5037 Merging r324195:
------------------------------------------------------------------------
r324195 | mcrosier | 2018-02-04 16:42:24 +0100 (Sun, 04 Feb 2018) | 12 lines

[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking

The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.

PR35734
Differential Revision: https://reviews.llvm.org/D42309
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@325508 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-19 15:24:45 +00:00
Hans Wennborg
a1e0ced878 Merging r322473:
------------------------------------------------------------------------
r322473 | a.elovikov | 2018-01-15 02:56:07 -0800 (Mon, 15 Jan 2018) | 23 lines

[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc.

Summary:
This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.

This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.

Fixes PR35773.

Reviewers: dorit, Ayal, mssimpso

Reviewed By: dorit

Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41913
------------------------------------------------------------------------


git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_60@322673 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 15:57:43 +00:00
Florian Hahn
f605692d39 [LV] Remove unnecessary DoExtraAnalysis guard (silent bug)
canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true.
This doesn't make sense to me because reporting analysis information shouldn't alter legality
checks. This is probably the result of a last minute minor change before committing (?).

Patch by Diego Caballero.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D40973


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321172 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-20 13:28:38 +00:00
Hal Finkel
c4ce27e207 Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory
This test depends on X86's TTI; move into the X86 subdirectory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320914 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 05:10:20 +00:00
Hal Finkel
3f92210a79 [LV] Extend InstWidening with CM_Widen_Recursive
Changes to the original scalar loop during LV code gen cause the return value
of Legal->isConsecutivePtr() to be inconsistent with the return value during
legal/cost phases (further analysis and information of the bug is in D39346).
This patch is an alternative fix to PR34965 following the CM_Widen approach
proposed by Ayal and Gil in D39346. It extends InstWidening enum with
CM_Widen_Reverse to properly record the widening decision for consecutive
reverse memory accesses and, consequently, get rid of the
Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner
solution to PR34965 than the one in D39346.

Fixes PR34965.

Patch by Diego Caballero, thanks!

Differential Revision: https://reviews.llvm.org/D40742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320913 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-16 02:55:24 +00:00
Dorit Nuzman
3ce8b664b3 [LV] Support efficient vectorization of an induction with redundant casts
D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose
update chain involves casts; PSCEV can now build an AddRecurrence for some
forms of such phi nodes, under the proper runtime overflow test. This means
that we can identify such phi nodes as an induction, and the loop-vectorizer
can now vectorize such inductions, however inefficiently. The vectorizer
doesn't know that it can ignore the casts, and so it vectorizes them.

This patch records the casts in the InductionDescriptor, so that they could
be marked to be ignored for cost calculation (we use VecValuesToIgnore for
that) and ignored for vectorization/widening/scalarization (i.e. treated as
TriviallyDead).

In addition to marking all these casts to be ignored, we also need to make
sure that each cast is mapped to the right vector value in the vector loop body
(be it a widened, vectorized, or scalarized induction). So whenever an
induction phi is mapped to a vector value (during vectorization/widening/
scalarization), we also map the respective cast instruction (if exists) to that
vector value. (If the phi-update sequence of an induction involves more than one
cast, then the above mapping to vector value is relevant only for the last cast
of the sequence as we allow only the "last cast" to be used outside the
induction update chain itself).

This is the last step in addressing PR30654.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320672 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-14 07:56:31 +00:00
Dorit Nuzman
330c5d954f [LV] Ignore the cost of values that will not appear in the vectorized loop
VecValuesToIgnore holds values that will not appear in the vectorized loop.
We should therefore ignore their cost when VF > 1.

Differential Revision: https://reviews.llvm.org/D40883



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320463 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-12 08:57:43 +00:00
Dorit Nuzman
6ae5d1f67f [SCEV] Fix wrong Equal predicate created in getAddRecForPhiWithCasts
CreateAddRecFromPHIWithCastsImpl() adds an IncrementNUSW overflow predicate
which allows the PSCEV rewriter to rewrite this scev expression:
 (zext i8 {0, + , (trunc i32 step to i8)} to i32)
into
 {0, +, (sext i8 (trunc i32 step to i8) to i32)}

But then it adds the wrong Equal predicate:
 %step == (zext i8 (trunc i32 %step to i8) to i32).
instead of:
 %step == (sext i8 (trunc i32 %step to i8) to i32)

This is fixed here.

Differential Revision: https://reviews.llvm.org/D40641



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320298 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-10 11:13:35 +00:00
Adam Nemet
d62c5bdaf9 [LV] Interleaved access vectorization: fix computing new alias info
As a new access is generated spanning across multiple fields, we need to
propagate alias info from all the fields to form the most generic alias info.

rdar://35602528

Differential Revision: https://reviews.llvm.org/D40617

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319979 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-06 22:42:24 +00:00
Gil Rapaport
a36486e33b [LV] Model masking in VPlan, introducing VPInstructions
This patch adds a new abstraction layer to VPlan and leverages it to model the planned
instructions that manipulate masks (AND, OR, NOT), introduced during predication.

The new VPValue and VPUser classes model how data flows into, through and out
of a VPlan, forming the vertices of a planned Def-Use graph. The new
VPInstruction class is a generic single-instruction Recipe that models a
planned instruction along with its opcode, operands and users. See
VectorizationPlan.rst for more details.

Differential Revision: https://reviews.llvm.org/D38676


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318645 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-20 12:01:47 +00:00
Gil Rapaport
c07df42b1a [LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe
This patch is part of D38676.

The patch introduces two new Recipes to handle instructions whose vectorization
involves masking. These Recipes take VPlan-level masks in D38676, but still rely
on ILV's existing createEdgeMask(), createBlockInMask() in this patch.

VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence
of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(),
which now handles only loop-header phi nodes.

VPWidenMemoryInstructionRecipe handles load/store which are to be widened
(but are not part of an Interleave Group). In this patch it simply calls
ILV::vectorizeMemoryInstruction on execute().

Differential Revision: https://reviews.llvm.org/D39068


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318149 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-14 12:09:30 +00:00
Dan Gohman
b5e0bec282 Add an @llvm.sideeffect intrinsic
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].

Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.

As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.

[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html

Differential Revision: https://reviews.llvm.org/D38336


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-08 21:59:51 +00:00
Dorit Nuzman
ee5e318039 [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a
single-iteration loop

This fixes PR34681. Avoid adding the "Stride == 1" predicate when we know that
Stride >= Trip-Count. Such a predicate will effectively optimize a single
or zero iteration loop, as Trip-Count <= Stride == 1.

Differential Revision: https://reviews.llvm.org/D38785



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317438 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-05 16:53:15 +00:00
Sanjay Patel
72428f5e04 [SimplifyCFG] use pass options and remove the latesimplifycfg pass
This is no-functional-change-intended.

This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and 
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired). 

The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.

Differential Revision: https://reviews.llvm.org/D38631



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316835 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-28 18:43:07 +00:00
Dehao Chen
b1bfcf247f Do not add discriminator encoding for debug intrinsics.
Summary: There are certain requirements for debug location of debug intrinsics, e.g. the scope of the DILocalVariable should be the same as the scope of its debug location. As a result, we should not add discriminator encoding for debug intrinsics.

Reviewers: dblaikie, aprantl

Reviewed By: aprantl

Subscribers: JDevlieghere, aprantl, bjope, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D39343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316703 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-26 21:20:52 +00:00
Ayal Zaks
5eb1a9dfa9 [LV] Fix PR34743 - handle casts that sink after interleaved loads
When ignoring a load that participates in an interleaved group, make sure to
move a cast that needs to sink after it.

Testcase derived from reproducer of PR34743.

Differential Revision: https://reviews.llvm.org/D38338


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314986 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05 15:45:14 +00:00
Ayal Zaks
0e8a63e67f [LV] Fix PR34711 - widen instruction ranges when sinking casts
Instead of trying to keep LastWidenRecipe updated after creating each recipe,
have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check
if it's a VPWidenRecipe when attempting to extend its range. This ensures that
such extensions, optimized to maintain the original instruction order, do so
only when the instructions are to maintain their relative order. The latter does
not always hold, e.g., when a cast needs to sink to unravel first order
recurrence (r306884).

Testcase derived from reproducer of PR34711.

Differential Revision: https://reviews.llvm.org/D38339


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314981 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-05 12:41:49 +00:00
Matthew Simpson
e0d97dea2f [LV] Use correct insertion point when type shrinking reductions
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.

Reference: https://bugs.llvm.org/show_bug.cgi?id=34687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314542 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29 18:07:39 +00:00
Sanjay Patel
00b5bff2c2 [InstCombine] remove extract-of-select vector transform (2nd try)
The 1st attempt at this:
https://reviews.llvm.org/rL314117
was reverted at:
https://reviews.llvm.org/rL314118

because of bot fails for clang tests that were checking optimized IR. That should be fixed with:
https://reviews.llvm.org/rL314144
...so try again. 

Original commit message:

The transform to convert an extract-of-a-select-of-vectors was added at:
https://reviews.llvm.org/rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314147 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 20:30:53 +00:00
Sanjay Patel
26f60ee8f1 revert r314117 because there are bogus clang tests that depend on the optimizer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314118 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 17:00:04 +00:00
Sanjay Patel
752909f8e3 [InstCombine] remove extract-of-select vector transform
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013

And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>

Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.

The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.

The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.

Differential Revision: https://reviews.llvm.org/D38006


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314117 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 16:41:34 +00:00
Alon Kom
dde48e1948 [LV] Fix maximum legal VF calculation
This patch fixes pr34283, which exposed that the computation of
maximum legal width for vectorization was wrong, because it relied
on MaxInterleaveFactor to obtain the maximum stride used in the loop,
however not all strided accesses in the loop have an interleave-group
associated with them.
Instead of recording the maximum stride in the loop, which can be over
conservative (e.g. if the access with the maximum stride is not involved
in the dependence limitation), this patch tracks the actual maximum legal
width imposed by accesses that are involved in dependencies.

Differential Revision: https://reviews.llvm.org/D37507

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313237 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-14 07:40:02 +00:00
Anna Thomas
fb0da33fc1 [LV] Avoid computing the register usage for default VF. NFC
These are changes to reduce redundant computations when calculating a
feasible vectorization factor:
1. early return when target has no vector registers
2. don't compute register usage for the default VF.

Suggested during review for D37702.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313176 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-13 19:35:45 +00:00
Ayal Zaks
3c0fd8c54b [LV] Fix PR34523 - avoid generating redundant selects
When converting a PHI into a series of 'select' instructions to combine the
incoming values together according their edge masks, initialize the first
value to the incoming value In0 of the first predecessor, instead of
generating a redundant assignment 'select(Cond[0], In0, In0)'. The latter
fails when the Cond[0] mask is null, representing a full mask, which can
happen only when there's a single incoming value.

No functional changes intended nor expected other than surviving null Cond[0]'s.

This fix follows D35725, which introduced using null to represent full masks.

Differential Revision: https://reviews.llvm.org/D37619


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313119 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-13 06:28:37 +00:00
Anna Thomas
2691122302 [LV] Clamp the VF to the trip count
Summary:
When the MaxVectorSize > ConstantTripCount, we should just clamp the
vectorization factor to be the ConstantTripCount.
This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF.

Earlier we were finding the maximum vector width, which could be greater than
the trip count itself. The Loop vectorizer does all the work for generating a
vectorizable loop, but in the end we would always choose the scalar loop (since
the VF > trip count). This allows us to choose the VF keeping in mind the trip
count if available.

This is a fix on top of rL312472.

Reviewers: Ayal, zvi, hfinkel, dneilson

Reviewed by: Ayal

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37702

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313046 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-12 16:32:45 +00:00
Zvi Rackover
3438d07f09 LoopVectorize: MaxVF should not be larger than the loop trip count
Summary:
Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.

Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:
1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
2. Compute MaxVF returned 16 on a target supporting AVX512.
3. OptForSize -> choose VF:=MaxVF.
4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.

With this patch step 2. will choose MaxVF=8 based on TripCount.

Reviewers: Ayal, dorit, mkuper, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D37425

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312472 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-04 08:35:13 +00:00
Adrian Prantl
69e607f200 Canonicalize the representation of empty an expression in DIGlobalVariableExpression
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().

If someone needs to upgrade out-of-tree testcases:
  perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-30 18:06:51 +00:00
Craig Topper
ccba49dfc2 [InstCombine] Teach select01 helper of foldSelectIntoOp to handle vector splats
We were handling some vectors in foldSelectIntoOp, but not if the operand of the bin op was any kind of vector constant. This patch fixes it to treat vector splats the same as scalars.

Differential Revision: https://reviews.llvm.org/D37232

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311940 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-28 22:00:27 +00:00
Ayal Zaks
a183d6cf2a [LV] Fix PR34248 - recommit D32871 after revert r311304
Original commit r311077 of D32871 was reverted in r311304 due to failures
reported in PR34248.

This recommit fixes PR34248 by restricting the packing of predicated scalars
into vectors only when vectorizing, avoiding doing so when unrolling w/o
vectorizing. Added a test derived from the reproducer of PR34248.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311849 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-27 12:55:46 +00:00
Chandler Carruth
3f7e6da696 Revert r311077: [LV] Using VPlan ...
This causes LLVM to assert fail on PPC64 and crash / infloop in other
cases. Filed http://llvm.org/PR34248 with reproducer attached.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311304 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-20 23:17:11 +00:00
Elena Demikhovsky
4a05fa1f1b Changed basic cost of store operation on X86
Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling.
This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks.

Differential Revision: https://reviews.llvm.org/D35888



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311285 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-20 12:34:29 +00:00
Aditya Kumar
70fb4705b4 [Loop Vectorize] Added a separate metadata
Added a separate metadata to indicate when the loop
has already been vectorized instead of setting width and count to 1.

Patch written by Divya Shanmughan and Aditya Kumar

Differential Revision: https://reviews.llvm.org/D36220

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311281 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-20 10:32:41 +00:00
Sam Elliott
74a34d9193 Keep Optimization Remark Yaml in NewPM
Summary:
The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds.

So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33951

Reviewers: anemet, chandlerc

Reviewed By: anemet, chandlerc

Subscribers: javed.absar, chandlerc, fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D36906

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311271 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-20 01:30:45 +00:00
Ayal Zaks
cd8f8f7fd4 [LV] Using VPlan to model the vectorized code and drive its transformation
VPlan is an ongoing effort to refactor and extend the Loop Vectorizer. This
patch introduces the VPlan model into LV and uses it to represent the vectorized
code and drive the generation of vectorized IR.

In this patch VPlan models the vectorized loop body: the vectorized control-flow
is represented using VPlan's Hierarchical CFG, with predication refactored from
being a post-vectorization-step into a vectorization planning step modeling
if-then VPRegionBlocks, and generating code inline with non-predicated code. The
vectorized code within each VPBasicBlock is represented as a sequence of
Recipes, each responsible for modelling and generating a sequence of IR
instructions. To keep the size of this commit manageable the Recipes in this
patch are coarse-grained and capture large chunks of LV's code-generation logic.
The constructed VPlans are dumped in dot format under -debug.

This commit retains current vectorizer output, except for minor instruction
reorderings; see associated modifications to lit tests.

For further details on the VPlan model see docs/Proposals/VectorizationPlan.rst
and its references.

Authors: Gil Rapaport and Ayal Zaks

Differential Revision: https://reviews.llvm.org/D32871


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311077 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-17 09:29:59 +00:00
Ayal Zaks
745921f87f [LV] Minor savings to Sink casts to unravel first order recurrence
Two minor savings: avoid copying the SinkAfter map and avoid moving a cast if it
is not needed.

Differential Revision: https://reviews.llvm.org/D36408


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310910 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-15 08:32:59 +00:00
Anna Thomas
0428548839 [LoopVectorize] Fix assertion failure in Fcmp vectorization
Summary:
When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.

Reviewers: Ayal, mssimpso, mkuper, gilr

Reviewed by: Ayal, mssimpso

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D36244

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310389 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-08 18:07:44 +00:00
Matt Arsenault
c8df92092d LV: Don't insert runtime ptr checks on divergent targets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309890 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-02 21:43:08 +00:00
Ayal Zaks
343f60c4b2 [LV] Avoid redundant operations manipulating masks
The Loop Vectorizer generates redundant operations when manipulating masks:
AND with true, OR with false, compare equal to true. Instead of relying on
a subsequent pass to clean them up, this patch avoids generating them.

Use null (no-mask) to represent all-one full masks, instead of a constant
all-one vector, following the convention of masked gathers and scatters.

Preparing for a follow-up VPlan patch in which these mask manipulating
operations are modeled using recipes.

Differential Revision: https://reviews.llvm.org/D35725


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309558 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-31 13:21:42 +00:00
Adrian Prantl
5d0334a48c Remove the obsolete offset parameter from @llvm.dbg.value
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309426 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-28 20:21:02 +00:00
Michael Zolotukhin
8641ab97a4 [tests] Cleanup vect.omp.persistence.ll test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308964 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-25 10:35:16 +00:00
Balaram Makam
788841cb66 [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop  unrolling and loop interleaving).

This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.

Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605

Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl

Reviewed By: efriedma

Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35411

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308422 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-19 08:53:34 +00:00
Ayal Zaks
e1f7499ee7 [LV] Test once if vector trip count is zero, instead of twice
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.

This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.

This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".

Original observation and initial patch by Evgeny Stupachenko.

Differential Revision: https://reviews.llvm.org/D34150


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308421 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-19 05:16:39 +00:00
Dorit Nuzman
bdc92341e1 PSCEV] Create AddRec for Phis in cases of possible integer overflow,
using runtime checks

Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.

Differential revision: http://reviews.llvm.org/D30041



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308299 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-18 11:57:08 +00:00
Michael Kuperstein
73d05a2a19 [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions.
This fixes PR33706.
Differential Revision: https://reviews.llvm.org/D35227


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307837 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-12 19:53:55 +00:00
Sanjay Patel
9e5be5ac4c [LoopVectorize] partly revert r307475
Bots are failing because of the additional checks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307476 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-08 16:34:46 +00:00
Sanjay Patel
3b56d63bde [LoopVectorize] auto-generate complete checks; NFC
I'm looking at a cmp transform in InstCombine that would affect these tests,
but it's hard to know if it makes things better or worse without seeing the
full IR. OTOH, maybe these tests shouldn't be running a bunch of transform
passes in the first place?



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307475 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-08 16:10:42 +00:00
NAKAMURA Takumi
c49c5b222e llvm/test/Transforms/LoopVectorize/X86/slm-no-vectorize.ll: -debug is available in +Asserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306979 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-02 14:25:27 +00:00
Mohammed Agabaria
ffb8f09571 [X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch
this patch updates the cost of addq\subq (add\subtract of vectors of 64bits)
based on the performance numbers of SLM arch.

Differential Revision: https://reviews.llvm.org/D33983



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306974 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-02 12:16:15 +00:00
Teresa Johnson
9d923b35aa Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default."
This still breaks PPC tests we have. I'll forward reproduction
instructions to dehao.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306936 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-01 03:24:09 +00:00