Summary:
[VPlan] Implement vector code generation support for simple outer loops.
Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:
- force vector code generation using explicit vectorize_width
- add conservative early returns in cost model and other places for VPlanNativePath
- add code for setting up outer loop inductions
- support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
- support for generating uniform inner branches
We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.
Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal
Reviewed By: fhahn, hsaito
Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D50820
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342197 91177308-0d34-0410-b5e6-96231b3b80d8
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a follow up to D45420 (included here since it is still under review and this change is dependent on that) and D45072 (committed).
Actual change for this patch is LoopVectorize* and cmakefile. All others are all from D45420.
LoopVectorizationLegality is an analysis and thus really belongs to Analysis tree. It is modular enough and it is reusable enough ---- we can further improve those aspects once uses outside of LV picks up.
Hopefully, this will make it easier for people familiar with vectorization theory, but not necessarily LV itself to contribute, by lowering the volume of code they should deal with. We probably should start adding some code in LV to check its own capability (i.e., vectorization is legal but LV is not ready to handle it) and then bail out.
Reviewers: rengolin, fhahn, hfinkel, mkuper, aemerson, mssimpso, dcaballe, sguggill
Reviewed By: rengolin, dcaballe
Subscribers: egarcia, rogfer01, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D45552
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331139 91177308-0d34-0410-b5e6-96231b3b80d8
Patch #2 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces the basic infrastructure to detect, legality check
and process outer loops annotated with hints for explicit vectorization.
All these changes are protected under the feature flag
-enable-vplan-native-path. This should make this patch NFC for the existing
inner loop vectorizer.
Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso.
Differential Revision: https://reviews.llvm.org/D42447
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330739 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324893 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321994 91177308-0d34-0410-b5e6-96231b3b80d8
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted.
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.
Differential Revision: https://reviews.llvm.org/D41139
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320736 91177308-0d34-0410-b5e6-96231b3b80d8
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310260 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310255 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.
This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.
Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D32451
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305729 91177308-0d34-0410-b5e6-96231b3b80d8
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable. I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.
ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).
We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark. This is not perfect of course but it gives you at
least a rough idea about the tree. Then you can follow up with -view-slp-tree
to really see the actual tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302811 91177308-0d34-0410-b5e6-96231b3b80d8
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301429 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301424 91177308-0d34-0410-b5e6-96231b3b80d8
the latter to the Transforms library.
While the loop PM uses an analysis to form the IR units, the current
plan is to have the PM itself establish and enforce both loop simplified
form and LCSSA. This would be a layering violation in the analysis
library.
Fundamentally, the idea behind the loop PM is to *transform* loops in
addition to running passes over them, so it really seemed like the most
natural place to sink this was into the transforms library.
We can't just move *everything* because we also have loop analyses that
rely on a subset of the invariants. So this patch splits the the loop
infrastructure into the analysis management that has to be part of the
analysis library, and the transform-aware pass manager.
This also required splitting the loop analyses' printer passes out to
the transforms library, which makes sense to me as running these will
transform the code into LCSSA in theory.
I haven't split the unittest though because testing one component
without the other seems nearly intractable.
Differential Revision: https://reviews.llvm.org/D28452
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291662 91177308-0d34-0410-b5e6-96231b3b80d8
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289756 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.
Differential Revision: https://reviews.llvm.org/D25517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288115 91177308-0d34-0410-b5e6-96231b3b80d8