This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244555 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244523 91177308-0d34-0410-b5e6-96231b3b80d8
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244489 91177308-0d34-0410-b5e6-96231b3b80d8
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244485 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds a hook to TTI which enables us to selectively turn on by default
interleaved access vectorization for targets on which we have have performed
the required benchmarking.
Reviewers: rengolin
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11901
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244449 91177308-0d34-0410-b5e6-96231b3b80d8
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243994 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful when we want to do block frequency analysis
conditionally (e.g. only in PGO mode) but don't want to add
one more pass dependence.
Patch by congh.
Approved by dexonsmith.
Differential Revision: http://reviews.llvm.org/D11196
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242248 91177308-0d34-0410-b5e6-96231b3b80d8
I am planning to add more nested classes inside RuntimePointerCheck so
all these triple-nesting would be hard to follow.
Also rename it to RuntimePointerChecking (i.e. append 'ing').
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242218 91177308-0d34-0410-b5e6-96231b3b80d8
Passes should never modify it, just use the const version. While there
reduce copying in LoopInterchange. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242041 91177308-0d34-0410-b5e6-96231b3b80d8
The following functions are moved from the LoopVectorizer to VectorUtils:
- getGEPInductionOperand
- stripGetElementPtr
- getUniqueCastUse
- getStrideFromPointer
These used to be static functions in LoopVectorize, but will also be used by
the upcoming loop versioning LICM transformation.
Patch by Ashutosh Nema!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241980 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Following the discussion on r241884, it's more reasonable to assume that a
target has no vector registers by default instead of letting every such
target overrides getNumberOfRegisters.
Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to
return 0 when Vector is true, and partially reverts r241884 which
modifies NVPTXTTIImpl::getNumberOfRegisters.
It also fixes a performance bug in LoopVectorizer. Even if a target has
no vector registers, vectorization may still help ILP. So, we need both
checks to be false before disabling loop vectorization all together.
Reviewers: hfinkel
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11108
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241942 91177308-0d34-0410-b5e6-96231b3b80d8
Place all code corresponding to a run-time check in one place.
Previously we generated some code, then proceeded to a next check, then
finished the code for the first check (like splitting blocks and
generating branches). Now the code for generating a check is
self-contained.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241741 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241197 91177308-0d34-0410-b5e6-96231b3b80d8
If we are dealing with a pointer induction variable, isInductionPHI
gives back a step value of Stride / size of pointer. However, we might
be indexing with a legal type wider than the pointer width.
Handle this by inserting casts where appropriate instead of crashing.
This fixes PR23954.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240877 91177308-0d34-0410-b5e6-96231b3b80d8
With option OptForSize enabled, the Loop Vectorizer is not supposed to
create tail loop. The condition checking that was invalid and was not
matching to the comment above.
Patch by Marianne Mailhot-Sarrasin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240556 91177308-0d34-0410-b5e6-96231b3b80d8
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240137 91177308-0d34-0410-b5e6-96231b3b80d8
A reduction is a special kind of recurrence. In the loop vectorizer we currently
identify basic reductions. Future patches will extend this to identifying basic
recurrences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239835 91177308-0d34-0410-b5e6-96231b3b80d8
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239291 91177308-0d34-0410-b5e6-96231b3b80d8
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236613 91177308-0d34-0410-b5e6-96231b3b80d8
This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp.
This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235577 91177308-0d34-0410-b5e6-96231b3b80d8
(Re-apply r234361 with a fix and a testcase for PR23157)
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.
Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).
In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.
When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).
The changes also adds support to query this property of the loop and
modify the vectorizer to use this.
Patch by Ashutosh Nema!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234424 91177308-0d34-0410-b5e6-96231b3b80d8
Both run-time pointer checking and the dependence analysis are capable
of dealing with uniform addresses. I.e. it's really just an orthogonal
property of the loop that the analysis computes.
Run-time pointer checking will only try to reason about SCEVAddRec
pointers or else gives up. If the uniform pointer turns out the be a
SCEVAddRec in an outer loop, the run-time checks generated will be
correct (start and end bounds would be equal).
In case of the dependence analysis, we work again with SCEVs. When
compared against a loop-dependent address of the same underlying object,
the difference of the two SCEVs won't be constant. This will result in
returning an Unknown dependence for the pair.
When compared against another uniform access, the difference would be
constant and we should return the right type of dependence
(forward/backward/etc).
The changes also adds support to query this property of the loop and
modify the vectorizer to use this.
Patch by Ashutosh Nema!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234361 91177308-0d34-0410-b5e6-96231b3b80d8
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.
Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234042 91177308-0d34-0410-b5e6-96231b3b80d8
Now the analysis won't "fail" if the memchecks exceed the threshold. It
is the transform pass' responsibility to perform the check.
This allows the transform pass to further analyze/eliminate the
memchecks. E.g. in Loop distribution we only need to check pointers
that end up in different partitions.
Note that there is a slight change of functionality here. The logic in
analyzeLoop is that if dependence checking fails due to non-constant
distance between the pointers, another attempt is made to prove safety
of the dependences purely using run-time checks.
Before this patch we could fail the loop due to exceeding the memcheck
threshold after the first step, now we only check the threshold in the
client after the full analysis. There is no measurable compile-time
effect but I wanted to record this here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231817 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231740 91177308-0d34-0410-b5e6-96231b3b80d8
Runtime unrolling is an expensive optimization which can bring benefit
only if the loop is hot and iteration number is relatively large enough.
For some loops, we know they are not worth to be runtime unrolled.
The scalar loop from vectorization is one of the cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231631 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231270 91177308-0d34-0410-b5e6-96231b3b80d8