882 Commits

Author SHA1 Message Date
Sanjoy Das
4b892417a6 [SCEV] Add and use SCEVConstant::getAPInt; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255921 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-17 20:28:46 +00:00
Charlie Turner
8c888e8a57 [SLPVectorizer] Ensure dominated reduction values.
When considering incoming values as part of a reduction phi, ensure the
incoming value is dominated by said phi.

Failing to ensure this property causes miscompiles.

Fixes PR25787.

Many thanks to Mattias Eriksson for reporting, reducing and analyzing the
problem for me.

Differential Revision: http://reviews.llvm.org/D15580



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255792 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-16 18:23:44 +00:00
Cong Hou
e956465289 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255691 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-15 22:45:09 +00:00
Cong Hou
dbef3b079d Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255463 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 17:15:38 +00:00
Cong Hou
f26946fa52 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 16:55:46 +00:00
Cong Hou
6f344e5da6 Revert r255454 as it leads to several test failers on buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255456 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 09:28:57 +00:00
Cong Hou
c731de4630 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255454 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 08:44:08 +00:00
Hal Finkel
47be3618f1 AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAA
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.

http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html

Patch by Vaivaswatha Nagaraj!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255348 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-11 17:46:01 +00:00
Silviu Baranga
90f6cd579a Re-commit r255115, with the PredicatedScalarEvolution class moved to
ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform
and Analysis modules:

[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions

Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255122 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 16:06:28 +00:00
Silviu Baranga
bdd73bcbd7 Revert r255115 until we figure out how to fix the bot failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 15:25:28 +00:00
Silviu Baranga
69c30d5b6c [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions
Summary:
This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the
usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge
by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is
that both LAA and LV should use this interface everywhere.

This also solves a problem involving the result of SCEV expression rewritting when
the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates
  P1: {a,+,b} has nsw
  P2: b = 1.

Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us
sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies).
The SCEVPredicatedLayer maintains the order of transformations by feeding back
the results of previous transformations into new transformations, and therefore
avoiding this issue.

The SCEVPredicatedLayer maintains a cache to remember the results of previous
SCEV rewritting results. This also has the benefit of reducing the overall number
of expression rewrites.

Reviewers: mzolotukhin, anemet

Subscribers: jmolloy, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D14296

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255115 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 15:03:52 +00:00
Cong Hou
c5cf58b8a7 Fix a typo in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254813 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-05 01:00:22 +00:00
Cong Hou
aaaedd7f8f Fix a typo in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254549 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-02 21:33:47 +00:00
Charlie Turner
c8dc70b584 [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.
The order in which instructions are truncated in truncateToMinimalBitwidths
effects code generation. Switch to a map with a determinisic order, since the
iteration order over a DenseMap is not defined.

This code is not hot, so the difference in container performance isn't
interesting.

Many thanks to David Blaikie for making me aware of MapVector!

Fixes PR25490.

Differential Revision: http://reviews.llvm.org/D14981



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254179 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-26 20:39:51 +00:00
Chad Rosier
33e3b5e479 [LV] Add a helper function, isReductionVariable. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253565 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19 14:19:06 +00:00
Cong Hou
ce103d4605 Fix several long lines (>80) in LoopVectorize.cpp. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253527 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-19 00:32:30 +00:00
Chad Rosier
08e6ab9d25 Typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253336 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-17 13:58:10 +00:00
Charlie Turner
ca0bc2bcbc [SLP] Enable -slp-vectorize-hor by default.
Measurements primarily on AArch64 have shown this feature does not
significantly effect compile-time. The are no significant perf changes in LNT,
but for AArch64 at least, there are wins in third party benchmarks.

As discussed on llvm-dev, we're going to try turning this on by default and see
how other targets react to the change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252733 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-11 15:03:46 +00:00
James Molloy
ae263d48b0 [LoopVectorize] Address post-commit feedback on r250032
Implemented as many of Michael's suggestions as were possible:
  * clang-format the added code while it is still fresh.
  * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible.
  * Reduce the pass list on loop-vectorization-factors.ll.
  * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I].

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252469 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-09 14:32:05 +00:00
Mehdi Amini
7d84928143 Fix SLPVectorizer commutativity reordering
The SLPVectorizer had a very crude way of trying to benefit
from associativity: it tried to optimize for splat/broadcast
or in order to have the same operator on the same side.
This is benefitial to the cost model and allows more vectorization
to occur.
This patch improve the logic and make the detection optimal (locally,
we don't look at the full tree but only at the immediate children).

Should fix https://llvm.org/bugs/show_bug.cgi?id=25247

Reviewers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D13996

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252337 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-06 20:17:51 +00:00
Elena Demikhovsky
2c4a333422 LoopVectorizer - skip 'bitcast' between GEP and load.
Skipping 'bitcast' in this case allows to vectorize load:

  %arrayidx = getelementptr inbounds double*, double** %in, i64 %indvars.iv
  %tmp53 = bitcast double** %arrayidx to i64*
  %tmp54 = load i64, i64* %tmp53, align 8

Differential Revision http://reviews.llvm.org/D14112



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251907 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-03 10:29:34 +00:00
Cong Hou
c895fd0d67 Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.

This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1).

Differential revision: http://reviews.llvm.org/D8943



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251850 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-02 22:53:48 +00:00
Silviu Baranga
a0b73c263e [SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning
Summary:
SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.

ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.

We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions

We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.

We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.

Reviewers: mzolotukhin, anemet, hfinkel, sanjoy

Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13595

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251800 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-02 14:41:02 +00:00
Cong Hou
e29e7e235a Revert the revision 251592 as it fails a test on some platforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251617 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-29 05:35:22 +00:00
Cong Hou
72daf62570 Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251592 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-29 01:28:44 +00:00
NAKAMURA Takumi
c287d4cc99 Whitespace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251437 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 19:02:52 +00:00
NAKAMURA Takumi
50aba1e345 Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"
It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp.
See also PR25324.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251436 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 19:02:36 +00:00
Charlie Turner
63fe1641e4 [SLP] Be more aggressive about reduction width selection.
Summary:
This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach.

It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize.

I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT.

The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something!

Reviewers: nadav, jmolloy

Subscribers: mssimpso, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D14116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251428 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 17:59:03 +00:00
Charlie Turner
751d6ddd6d [SLP] Try a bit harder to find reduction PHIs
Summary:
Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads.

There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT.

Reviewers: jmolloy, mcrosier, nadav

Subscribers: mssimpso, nadav, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14063

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251425 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 17:54:16 +00:00
Charlie Turner
b13834ec71 [SLP] Treat SelectInsts as reduction values.
Summary:
Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values.

The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here.

Reviewers: jmolloy, mzolotukhin, spatel, nadav

Subscribers: nadav, mssimpso, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D13949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251424 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 17:49:11 +00:00
Elena Demikhovsky
df91a787c0 Loop Vectorizer - skipping "bitcast" before GEP
Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit.
In some cases we have a "bitcast" between GEP and memory instruction.
I added code that skips the "bitcast".

http://reviews.llvm.org/D13886



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251291 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-26 13:42:41 +00:00
Michael Zolotukhin
df43fcd565 Refactor: Simplify boolean conditional return statements in lib/Transforms/Vectorize (NFC).
Summary: Use clang-tidy to simplify boolean conditional return statements

Differential Revision: http://reviews.llvm.org/D10003

Patch by Richard<legalize@xmission.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251206 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-24 20:16:42 +00:00
Mehdi Amini
1ac8af11c6 SLPVectorizer: AllSameOpcode* starts "true" only for instructions
r251085 wasn't as NFC as intended...

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251087 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-23 01:04:45 +00:00
Mehdi Amini
ddf3065b5f SLPVectorizer: refactor reorderInputsAccordingToOpcode (NFC)
This is intended to simplify the changes needed to solve PR25247.

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251085 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-23 00:46:17 +00:00
Duncan P. N. Exon Smith
f792618d1c Vectorize: Remove implicit ilist iterator conversions, NFC
Besides the usual, I finally added an overload to
`BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead
of `BasicBlock::iterator`.  Someone can go back and remove this overload
later (after updating the callers I'm going to skip going forward), but
the most common call seems to be
`BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's
better to add `->getIterator()` to every one than have the overload.
It's pretty hard to get the usage wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-19 22:06:09 +00:00
Elena Demikhovsky
da40167c02 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250686 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-19 07:43:38 +00:00
James Molloy
7dab7edf06 [LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250032 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-12 12:34:45 +00:00
Piotr Padlewski
264b0361be inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249196 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-02 22:12:22 +00:00
Michael Zolotukhin
ad0be616bd [SLP] Don't vectorize loads of non-packed types (like i1, i2).
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248943 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-30 21:05:43 +00:00
Erik Eckstein
e2b6175820 SLPVectorizer: limit the scheduling region size per basic block.
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248917 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-30 17:00:44 +00:00
Sanjoy Das
6d6e2b5a35 [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248362 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-23 01:59:04 +00:00
James Molloy
f30ebaced7 [LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions
We're currently losing any fast-math flags when synthesizing fcmps for
min/max reductions. In LV, make sure we copy over the scalar inst's
flags. In LoopUtils, we know we only ever match patterns with
hasUnsafeAlgebra, so apply that to any synthesized ops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248201 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-21 19:41:19 +00:00
Chandler Carruth
9146833fa3 [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.

This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:

- FunctionAAResults is a type-erasing alias analysis results aggregation
  interface to walk a single query across a range of results from
  different alias analyses. Currently this is function-specific as we
  always assume that aliasing queries are *within* a function.

- AAResultBase is a CRTP utility providing stub implementations of
  various parts of the alias analysis result concept, notably in several
  cases in terms of other more general parts of the interface. This can
  be used to implement only a narrow part of the interface rather than
  the entire interface. This isn't really ideal, this logic should be
  hoisted into FunctionAAResults as currently it will cause
  a significant amount of redundant work, but it faithfully models the
  behavior of the prior infrastructure.

- All the alias analysis passes are ported to be wrapper passes for the
  legacy PM and new-style analysis passes for the new PM with a shared
  result object. In some cases (most notably CFL), this is an extremely
  naive approach that we should revisit when we can specialize for the
  new pass manager.

- BasicAA has been restructured to reflect that it is much more
  fundamentally a function analysis because it uses dominator trees and
  loop info that need to be constructed for each function.

All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.

The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.

This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.

Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.

One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.

Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.

Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.

Differential Revision: http://reviews.llvm.org/D12080

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247167 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 17:55:00 +00:00
James Molloy
2993ffe264 Rename ExitCount to BackedgeTakenCount, because that's what it is.
We called a variable ExitCount, stored the backedge count in it, then redefined it to be the exit count again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247140 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 12:51:10 +00:00
James Molloy
22168a9afd Delay predication of stores until near the end of vector code generation
Predicating stores requires creating extra blocks. It's much cleaner if we do this in one pass instead of mutating the CFG while writing vector instructions.

Besides which we can make use of helper functions to update domtree for us, reducing the work we need to do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247139 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 12:51:06 +00:00
James Molloy
6ce50a6980 [LV] Don't bail to MiddleBlock if a runtime check fails, bail to ScalarPH instead
We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch.

There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246637 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:39 +00:00
James Molloy
e2568a81dc [LV] Move some code around slightly to make the intent of the function more clear.
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246636 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:32 +00:00
James Molloy
4718dafb69 [LV] Cleanup: Sink an IRBuilder closer to its uses.
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246635 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:27 +00:00
James Molloy
2b7433d981 [LV] Refactor all runtime check emissions into helper functions.
This reduces the complexity of createEmptyBlock() and will open the door to further refactoring.

The test change is simply because we're now constant folding a trivial test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246634 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:22 +00:00
James Molloy
f6d9948d5a [LV] Pull creation of trip counts into a helper function.
... and do a tad of tidyup while we're at it. Because StartIdx must now be zero, there's no difference between Count and EndIdx.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246633 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-02 10:15:16 +00:00