539 Commits

Author SHA1 Message Date
Matthew Simpson
54a309e4ea [LV] Rename RdxPHIsToFix to PHIsToFix (NFC)
In the future, we will vectorize recurrences other than reductions. This patch
renames a few variables and updates their associated comments to enable them to
be reused for non-reduction PHI nodes.

This change was requested in the review for D16197.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259364 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-01 16:07:01 +00:00
Matthew Simpson
cfa9b54c86 [LV] Avoid creating empty reduction entries (NFC)
This patch prevents us from unintentionally creating entries in the reductions
map for PHIs that are not actually reductions. This is currently not an issue
since we bail out if we encounter PHIs other than inductions or reductions.
However the behavior could become problematic as we add support for additional
recurrence types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256930 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-06 12:50:29 +00:00
Sanjoy Das
4b892417a6 [SCEV] Add and use SCEVConstant::getAPInt; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255921 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-17 20:28:46 +00:00
Cong Hou
e956465289 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255691 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-15 22:45:09 +00:00
Cong Hou
dbef3b079d Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255463 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 17:15:38 +00:00
Cong Hou
f26946fa52 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 16:55:46 +00:00
Cong Hou
6f344e5da6 Revert r255454 as it leads to several test failers on buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255456 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 09:28:57 +00:00
Cong Hou
c731de4630 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255454 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 08:44:08 +00:00
Silviu Baranga
90f6cd579a Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255122 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 16:06:28 +00:00
Silviu Baranga
bdd73bcbd7 Revert r255115 until we figure out how to fix the bot failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 15:25:28 +00:00
Silviu Baranga
69c30d5b6c [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255115 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 15:03:52 +00:00
Cong Hou
c5cf58b8a7 Fix a typo in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254813 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-05 01:00:22 +00:00
Cong Hou
aaaedd7f8f Fix a typo in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254549 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-02 21:33:47 +00:00
Charlie Turner
c8dc70b584 [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.
The order in which instructions are truncated in truncateToMinimalBitwidths
effects code generation. Switch to a map with a determinisic order, since the
iteration order over a DenseMap is not defined.

This code is not hot, so the difference in container performance isn't
interesting.

Many thanks to David Blaikie for making me aware of MapVector!

Fixes PR25490.

Differential Revision: http://reviews.llvm.org/D14981



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254179 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-26 20:39:51 +00:00
Chad Rosier
33e3b5e479 [LV] Add a helper function, isReductionVariable. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253565 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19 14:19:06 +00:00
Cong Hou
ce103d4605 Fix several long lines (>80) in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253527 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19 00:32:30 +00:00
Chad Rosier
08e6ab9d25 Typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253336 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-17 13:58:10 +00:00
James Molloy
ae263d48b0 [LoopVectorize] Address post-commit feedback on r250032
Implemented as many of Michael's suggestions as were possible:
  * clang-format the added code while it is still fresh.
  * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible.
  * Reduce the pass list on loop-vectorization-factors.ll.
  * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I].

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252469 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-09 14:32:05 +00:00
Elena Demikhovsky
2c4a333422 LoopVectorizer - skip 'bitcast' between GEP and load.
Skipping 'bitcast' in this case allows to vectorize load:

  %arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv
  %tmp53 = bitcast double** %arrayidx to i64*
  %tmp54 = load i64, i64* %tmp53, align 8

Differential Revision http://reviews.llvm.org/D14112



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251907 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-03 10:29:34 +00:00
Cong Hou
c895fd0d67 Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.

This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1).

Differential revision: http://reviews.llvm.org/D8943



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251850 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-02 22:53:48 +00:00
Silviu Baranga
a0b73c263e [SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning
Summary:
SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.

ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.

We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions

We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.

We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.

Reviewers: mzolotukhin, anemet, hfinkel, sanjoy

Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13595

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251800 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-02 14:41:02 +00:00
Cong Hou
e29e7e235a Revert the revision 251592 as it fails a test on some platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251617 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-29 05:35:22 +00:00
Cong Hou
72daf62570 Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251592 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-29 01:28:44 +00:00
NAKAMURA Takumi
c287d4cc99 Whitespace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251437 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 19:02:52 +00:00
NAKAMURA Takumi
50aba1e345 Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"
It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp.
See also PR25324.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251436 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 19:02:36 +00:00
Elena Demikhovsky
df91a787c0 Loop Vectorizer - skipping "bitcast" before GEP
Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit.
In some cases we have a "bitcast" between GEP and memory instruction.
I added code that skips the "bitcast".

http://reviews.llvm.org/D13886



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251291 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-26 13:42:41 +00:00
Michael Zolotukhin
df43fcd565 Refactor: Simplify boolean conditional return statements in lib/Transforms/Vectorize (NFC).
Summary: Use clang-tidy to simplify boolean conditional return statements

Differential Revision: http://reviews.llvm.org/D10003

Patch by Richard<legalize@xmission.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251206 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-24 20:16:42 +00:00
Duncan P. N. Exon Smith
f792618d1c Vectorize: Remove implicit ilist iterator conversions, NFC
Besides the usual, I finally added an overload to
`BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead
of `BasicBlock::iterator`.  Someone can go back and remove this overload
later (after updating the callers I'm going to skip going forward), but
the most common call seems to be
`BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's
better to add `->getIterator()` to every one than have the overload.
It's pretty hard to get the usage wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-19 22:06:09 +00:00
Elena Demikhovsky
da40167c02 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250686 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-19 07:43:38 +00:00
James Molloy
7dab7edf06 [LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250032 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-12 12:34:45 +00:00
Sanjoy Das
6d6e2b5a35 [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248362 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-23 01:59:04 +00:00
James Molloy
f30ebaced7 [LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions
We're currently losing any fast-math flags when synthesizing fcmps for
min/max reductions. In LV, make sure we copy over the scalar inst's
flags. In LoopUtils, we know we only ever match patterns with
hasUnsafeAlgebra, so apply that to any synthesized ops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248201 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-21 19:41:19 +00:00
Chandler Carruth
9146833fa3 [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247167 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 17:55:00 +00:00
James Molloy
2993ffe264 Rename ExitCount to BackedgeTakenCount, because that's what it is.
We called a variable ExitCount, stored the backedge count in it, then redefined it to be the exit count again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247140 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 12:51:10 +00:00
James Molloy
22168a9afd Delay predication of stores until near the end of vector code generation
Predicating stores requires creating extra blocks. It's much cleaner if we do this in one pass instead of mutating the CFG while writing vector instructions.

Besides which we can make use of helper functions to update domtree for us, reducing the work we need to do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247139 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 12:51:06 +00:00
James Molloy
6ce50a6980 [LV] Don't bail to MiddleBlock if a runtime check fails, bail to ScalarPH instead
We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch.

There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246637 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:39 +00:00
James Molloy
e2568a81dc [LV] Move some code around slightly to make the intent of the function more clear.
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246636 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:32 +00:00
James Molloy
4718dafb69 [LV] Cleanup: Sink an IRBuilder closer to its uses.
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246635 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:27 +00:00
James Molloy
2b7433d981 [LV] Refactor all runtime check emissions into helper functions.
This reduces the complexity of createEmptyBlock() and will open the door to further refactoring.

The test change is simply because we're now constant folding a trivial test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246634 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:22 +00:00
James Molloy
f6d9948d5a [LV] Pull creation of trip counts into a helper function.
... and do a tad of tidyup while we're at it. Because StartIdx must now be zero, there's no difference between Count and EndIdx.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246633 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:16 +00:00
James Molloy
cf062a0a18 [LV] Factor the creation of the loop induction variable out of createEmptyLoop()
It makes things easier to understand if this is in a helper method. This is part of my ongoing spaghetti-removal operation on createEmptyLoop.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246632 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:09 +00:00
James Molloy
fe89784bf1 [LV] Never widen an induction variable.
There's no need to widen canonical induction variables. It's just as efficient to create a *new*, wide, induction variable.

Consider, if we widen an indvar, then we'll have to truncate it before its uses anyway (1 trunc). If we create a new indvar instead, we'll have to truncate that instead (1 trunc) [besides which IndVars should go and clean up our mess after us anyway on principle].

This lets us remove a ton of special-casing code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246631 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:05 +00:00
James Molloy
390bcc0ac8 [LV] Switch to using canonical induction variables.
Vectorized loops only ever have one induction variable. All induction PHIs from the scalar loop are rewritten to be in terms of this single indvar.

We were trying very hard to pick an indvar that already existed, even if that indvar wasn't canonical (didn't start at zero). But trying so hard is really fruitless - creating a new, canonical, indvar only results in one extra add in the worst case and that add is trivially easy to push through the PHI out of the loop by instcombine.

If we try and be less clever here and instead let instcombine clean up our mess (as we do in many other places in LV), we can remove unneeded complexity.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246630 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:14:54 +00:00
Tyler Nowicki
5e59aab216 Improve vectorization diagnostic messages and extend vectorize(enable) pragma.
This patch changes the analysis diagnostics produced when loops with
floating-point recurrences or memory operations are identified. The new messages 
say "cannot prove it is safe to reorder * operations; allow reordering by
specifying #pragma clang loop vectorize(enable)". Depending on the type of 
diagnostic the message will include additional options such as ffast-math or
__restrict__.

This patch also allows the vectorize(enable) pragma to override the low pointer
memory check threshold. When the hint is given a higher threshold is used.

See the clang patch for the options produced for each diagnostic.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246187 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-27 18:56:49 +00:00
Chad Rosier
ed15c79fb6 [LoopVectorize] Add Support for Small Size Reductions.
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.

In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.

This fixes PR21369.
http://reviews.llvm.org/D12202

Patch by Matt Simpson <mssimpso@codeaurora.org>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246149 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-27 14:12:17 +00:00
James Molloy
18671a3a8f [LoopVectorize] Extract InductionInfo into a helper class...
... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup.

NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-27 09:53:00 +00:00
Tyler Nowicki
8faf2a455a Improved printing of analysis diagnostics in the loop vectorizer.
This patch ensures that every analysis diagnostic produced by the vectorizer
will be printed if the loop has a vectorization hint on it. The condition has
also been improved to prevent printing when a disabling hint is specified.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246132 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-27 01:02:04 +00:00
Wei Mi
e922c3ebfc The patch replace the overflow check in loop vectorization with the minimum loop iterations check.
The loop minimum iterations check below ensures the loop has enough trip count so the generated
vector loop will likely be executed, and it covers the overflow check.

Differential Revision: http://reviews.llvm.org/D12107.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245952 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-25 16:43:47 +00:00
Tyler Nowicki
23d9019965 Standardized 'failed' to 'Failed' in LoopVectorizationRequirements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245759 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 23:03:24 +00:00
Michael Zolotukhin
c23d147533 [LoopVectorize] Propagate 'nontemporal' attribute into vectorized instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245632 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-20 22:27:38 +00:00