1302 Commits

Author SHA1 Message Date
Taewook Oh
9f93c9df69 Improve profile-guided heuristics to use estimated trip count.
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.

This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.

Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 18:48:58 +00:00
Dinar Temirbulatov
8bcd7ee921 Remove brackets, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305706 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 16:44:07 +00:00
George Burgess IV
9276050d30 [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
If we're shrinking a binary operation, it may be the case that the new
operations wraps where the old didn't. If this happens, the behavior
should be well-defined. So, we can't always carry wrapping flags with us
when we shrink operations.

If we do, we get incorrect optimizations in cases like:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] - 128;
}

which gets optimized to:

void foo(const unsigned char *from, unsigned char *to, int n) {
  for (int i = 0; i < n; i++)
    to[i] = from[i] | 128;
}

Because:
- InstCombine turned `sub i32 %from.i, 128` into
  `add nuw nsw i32 %from.i, 128`.
- LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a
  vector full of `i8 128`s
- InstCombine took advantage of the fact that the newly-shrunken add
  "couldn't wrap", and changed the `add` to an `or`.

InstCombine seems happy to figure out whether we can add nuw/nsw on its
own, so I just decided to drop the flags. There are already a number of
places in LoopVectorize where we rely on InstCombine to clean up.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305053 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-09 03:56:15 +00:00
Chandler Carruth
e3e43d9d57 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 11:49:48 +00:00
Ayal Zaks
2cfe765f46 [LV] Make scalarizeInstruction() non-virtual. NFC.
Following the request made in https://reviews.llvm.org/D32871,
scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is
hereby made non-virtual in InnerLoopVectorizer.

Should have been part of r297580 originally.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304685 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-04 13:29:51 +00:00
Galina Kistanova
143302b9f0 Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304636 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-03 05:18:46 +00:00
Alexey Bataev
c20deb63f8 [SLP] Improve comments and naming of functions/variables/members, NFC.
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Differential revision: https://reviews.llvm.org/D33320

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304616 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-03 00:08:21 +00:00
Alexey Bataev
cb453a0d29 Revert "[SLP] Improve comments and naming of functions/variables/members, NFC."
This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304609 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-02 23:09:15 +00:00
Alexey Bataev
37aaa827f4 [SLP] Improve comments and naming of functions/variables/members, NFC.
Summary:
Fixed some comments, added an additional description of the algorithms,
improved readability of the code.

Reviewers: anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33320

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304593 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-02 20:39:27 +00:00
Matthew Simpson
ed4243c350 [LV] Reapply r303763 with fix for PR33193
r303763 caused build failures in some out-of-tree tests due to an assertion in
TTI. The original patch updated cost estimates for induction variable update
instructions marked for scalarization. However, it didn't consider that the
incoming value of an induction variable phi node could be a cast instruction.
This caused queries for cast instruction costs with a mix of vector and scalar
types. This patch includes a fix for cast instructions and the test case from
PR33193.

The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>.

Reference: https://bugs.llvm.org/show_bug.cgi?id=33193
Original Differential Revision: https://reviews.llvm.org/D33457

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304235 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30 19:55:57 +00:00
Joerg Sonnenberger
ef8c4cd636 Revert r303763, results in asserts i.e. while building Ruby.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304179 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-29 22:52:17 +00:00
Matthew Simpson
9e8c6339d7 [LV] Update type in cost model for scalarization
For non-uniform instructions marked for scalarization, we should update
`VectorTy` when computing instruction costs to reflect the scalar type. In
addition to determining instruction costs, this type is also used to signal
that all instructions in the loop will be scalarized. This currently affects
memory instructions and non-pointer induction variables and their updates. (We
also mark GEPs scalar after vectorization, but their cost is computed together
with memory instructions.) For scalarized induction updates, this patch also
scales the scalar cost by the vectorization factor, corresponding to each
induction step.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303763 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-24 15:26:15 +00:00
Jonas Paulsson
a551a28baa [LoopVectorizer] Let target prefer scalar addressing computations.
The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:

* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)

Benchmarking show improvements on SystemZ with this new behaviour.

Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().

Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-24 13:42:56 +00:00
Ayal Zaks
f93293ef42 [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis
The default behavior of -Rpass-analysis=loop-vectorizer is to report only the
first reason encountered for not vectorizing, if one is found, at which time the
vectorizer aborts its handling of the loop. This patch allows multiple reasons
for not vectorizing to be identified and reported, at the potential expense of
additional compile-time, under allowExtraAnalysis which can currently be turned
on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed.

Removed from LoopVectorizationLegality::canVectorize() the redundant checking
and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also
does that. This redundancy is caught by a lit test once multiple reasons are
reported.

Patch initially developed by Dror Barak.

Differential Revision: https://reviews.llvm.org/D33396


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303613 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 07:08:02 +00:00
Amara Emerson
fc4cf8d86e Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather
Also s/0/nullptr in the call site in LV.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303416 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-19 10:40:18 +00:00
Reid Kleckner
816047d44c [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303362 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-18 17:24:10 +00:00
Matthew Simpson
61c3d36e2e Revert 303174, 303176, and 303178
These commits are breaking the bots. Reverting to investigate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303182 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 15:50:30 +00:00
Matthew Simpson
d9ed77b3ad [LV] Avoid potentential division by zero when selecting IC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303174 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-16 14:43:55 +00:00
Adam Nemet
2efa5091b6 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303116 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 21:15:01 +00:00
Craig Topper
32a237d8c0 [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.
This patch finishes off the conversion of ComputeSignBit to computeKnownBits.

Differential Revision: https://reviews.llvm.org/D33166

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303035 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 06:39:41 +00:00
Simon Pilgrim
07ac640d6c [LoopOptimizer][Fix]PR32859, PR24738
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:

before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]

and after this change we have:

%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]

Committed on behalf of @dtemirbulatov

Differential Revision: https://reviews.llvm.org/D33055

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302988 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-13 13:25:57 +00:00
Craig Topper
d49344495d [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302925 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 17:20:30 +00:00
Adam Nemet
65ad27f81e [SLP] Emit optimization remarks
The approach I followed was to emit the remark after getTreeCost concludes
that SLP is profitable.  I initially tried emitting them after the
vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely
remember missing a few cases for example in HorizontalReduction::tryToReduce.

ORE is placed in BoUpSLP so that it's available from everywhere (notably
HorizontalReduction::tryToReduce).

We use the first instruction in the root bundle as the locator for the remark.
In order to get a sense how far the tree is spanning I've include the size of
the tree in the remark.  This is not perfect of course but it gives you at
least a rough idea about the tree.  Then you can follow up with -view-slp-tree
to really see the actual tree.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302811 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 17:06:17 +00:00
Ayal Zaks
7a3330e101 [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC
Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.

Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.

Differential Revision: https://reviews.llvm.org/D32200


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302790 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 11:36:33 +00:00
Matthew Simpson
a4fa9d3a63 [AArch64] Consider widening instructions in cost calculations
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D32706

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302582 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-09 20:18:12 +00:00
Anna Thomas
1fbed43c3e [LV] Fix insertion point for shuffle vectors in first order recurrence
Summary:
In first order recurrence vectorization, when the previous value is a phi node, we need to
set the insertion point to the first non-phi node.
We can have the previous value being a phi node, due to the generation of new
IVs as part of trunc optimization [1].

[1] https://reviews.llvm.org/rL294967

Reviewers: mssimpso, mkuper

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302532 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-09 14:29:33 +00:00
Amara Emerson
8f1f7ce9d1 Introduce experimental generic intrinsics for horizontal vector reductions.
- This change allows targets to opt-in to using them instead of the log2
  shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
  factored out into LoopUtils, and now have a unified interface for generating
  reductions regardless of the preference of the target. LoopUtils now uses TTI
  to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.

Differential Revision: https://reviews.llvm.org/D30086



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302514 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-09 10:43:25 +00:00
Jonas Paulsson
245084ed3c Use right function in LoopVectorize.
-    unsigned AS = getMemInstAlignment(I);
+    unsigned AS = getMemInstAddressSpace(I);

Review: Hal Finkel

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302114 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-04 05:31:56 +00:00
Sanjoy Das
399b4d037d Rename WeakVH to WeakTrackingVH; NFC
This relands r301424.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301812 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-01 17:07:49 +00:00
Craig Topper
58c7fe69d0 [ValueTracking] Introduce a KnownBits struct to wrap the two APInts for computeKnownBits
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.

Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.

I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.

Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.

Differential Revision: https://reviews.llvm.org/D32376

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301432 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:39:58 +00:00
Sanjoy Das
263da12ab2 Reverts commit r301424, r301425 and r301426
Commits were:

"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"

The changes assumed pointers are 8 byte aligned on all architectures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301429 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:37:05 +00:00
Matthew Simpson
86197951a0 [LV] Handle external uses of floating-point induction variables
Reference: https://bugs.llvm.org/show_bug.cgi?id=32758
Differential Revision: https://reviews.llvm.org/D32445

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301428 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:23:02 +00:00
Sanjoy Das
d0cf26e443 Rename WeakVH to WeakTrackingVH; NFC
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.

Reviewers: dblaikie, davide

Reviewed By: davide

Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D32266

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301424 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-26 16:20:52 +00:00
Stanislav Mekhanoshin
dff33b262e Skip bitcasts while looking for GEP in LoadStoreVectorizer
Differential Revisison: https://reviews.llvm.org/D32101

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301343 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-25 18:00:08 +00:00
Craig Topper
21db70eaca [APInt] Use isSubsetOf, intersects, and bit counting methods to reduce temporary APInts
This patch uses various APInt methods to reduce temporary APInt creation.

This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.

I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.

Differential Revision: https://reviews.llvm.org/D32495

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301338 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-25 17:46:30 +00:00
Gil Rapaport
e9d7fd8bfe [LV] Remove redundant basic block split
This patch is part of D28975's breakdown.

Genreating the control-flow to guard predicated instructions modified to
only use SplitBlockAndInsertIfThen() for producing the if-then construct.

Differential Revision: https://reviews.llvm.org/D32224


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301293 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-25 05:57:22 +00:00
Matthew Simpson
915a4bc0f9 [LV] Model if-converted phi node costs
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.

Differential Revision: https://reviews.llvm.org/D31906

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300980 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-21 14:14:54 +00:00
Easwaran Raman
6e087c5152 [SLP vectorizer] Allow phi node reordering in tryToVectorizeList.
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.

Differential revision: https://reviews.llvm.org/D32065

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300574 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-18 18:16:57 +00:00
Gil Rapaport
e9d7070f01 [LV] Cache block mask values
This patch is part of D28975's breakdown.

Add caching for block masks similar to the cache already used for edge masks,
replacing generation per user with reusing the first generated value which
dominates all uses.

Differential Revision: https://reviews.llvm.org/D32054


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-18 14:43:43 +00:00
Gil Rapaport
e92557c930 [LV] Remove implicit single basic block assumption
This patch is part of D28975's breakdown - no change in output intended.

LV's code currently assumes the vectorized loop is a single basic block up
until predicateInstructions() is called. This patch removes two manifestations
of this assumption (loop phi incoming values, dominator tree update) by
replacing the use of vectorLoopBody with the vectorized loop's latch/header.

Differential Revision: https://reviews.llvm.org/D32040


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300310 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-14 07:30:23 +00:00
Anna Thomas
0bd00d7d97 [LV] Fix the vector code generation for first order recurrence
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.

Reviewers: mssimpso, mkuper, anemet

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31979

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300238 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-13 18:59:25 +00:00
Ayal Zaks
17f2fca608 [LV] Refactor ILV to provide vectorizeInstruction(); NFC
Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide
vectorizeInstruction(). Aligning DeadInstructions with its only user.
Facilitates driving the transformation by VPlan - follows
https://reviews.llvm.org/D28975 and its tentative breakdown.

Differential Revision: https://reviews.llvm.org/D31997


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300183 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-13 09:07:23 +00:00
Jonas Paulsson
0b09474fe1 [SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.

New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

Review: Matthew Simpson
https://reviews.llvm.org/D31601

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300061 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 13:29:25 +00:00
Jonas Paulsson
16b23e95d5 [LoopVectorizer] Improve handling of branches during cost estimation.
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.

This patch handles these cases differently with TTI based cost estimates.

Review: Matthew Simpson
https://reviews.llvm.org/D31175

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300058 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 13:13:15 +00:00
Jonas Paulsson
43d439da88 [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.

This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.

The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.

New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Review: Adam Nemet
https://reviews.llvm.org/D30680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300056 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 12:41:37 +00:00
Jonas Paulsson
c33bdfa7b1 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300052 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 11:49:08 +00:00
Anna Thomas
3d57035bd9 [LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.

A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.

Differential Revision: https://reviews.llvm.org/D31910

Reviewers: mssimpso

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299985 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-11 21:02:00 +00:00
Matthew Simpson
ccc38cc5e7 Reapply r298620: [LV] Vectorize GEPs
This patch reapplies r298620. The original patch was reverted because of two
issues. First, the patch exposed a bug in InstCombine that caused the Chromium
builds to fail (PR32414). This issue was fixed in r299017. Second, the patch
introduced a bug in the vectorizer's scalars analysis that caused test suite
builds to fail on SystemZ. The scalars analysis was too aggressive and marked a
memory instruction scalar, even though it was going to be vectorized. This
issue has been fixed in the current patch and several new test cases for the
scalars analysis have been added.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299770 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 14:15:34 +00:00
Matthew Simpson
fcdf36fabb [LV] Transform truncations of non-primary induction variables
The vectorizer tries to replace truncations of induction variables with new
induction variables having the smaller type. After r295063, this optimization
was applied to all integer induction variables, including non-primary ones.
When optimizing the truncation of a non-primary induction variable, we still
need to transform the new induction so that it has the correct start value.
This should fix PR32419.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32419

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298882 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 20:07:38 +00:00
Ivan Krasin
76f79b8934 Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413

Original change description:

[LV] Vectorize GEPs

This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Original Differential Revision: https://reviews.llvm.org/D30710



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298735 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 20:49:43 +00:00