Commit Graph

172 Commits

Author SHA1 Message Date
Matt Masten
bbbcccbfc4 Initial support for vectorization using svml (short vector math library).
Differential Revision: https://reviews.llvm.org/D19544


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277166 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-29 16:42:44 +00:00
Elena Demikhovsky
ba55955caa [Loop Vectorizer] Handling loops FP induction variables.
Allowed loop vectorization with secondary FP IVs. Like this:
float *A;
float x = init;
for (int i=0; i < N; ++i) {
  A[i] = x;
  x -= fp_inc;
}

The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute.

Differential Revision: https://reviews.llvm.org/D21330



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276554 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-24 07:24:54 +00:00
Matthew Simpson
0414f48742 [LV] Move vector int induction update to end of latch
This patch moves the update instruction for vectorized integer induction phi
nodes to the end of the latch block. This ensures consistent placement of all
induction updates across all the kinds of int inductions we create (scalar,
splat vector, or vector phi).

Differential Revision: https://reviews.llvm.org/D22416

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276339 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 21:20:15 +00:00
Adam Nemet
42a372e9b8 [OptDiag,LV] Add hotness attribute to applied-optimization remarks
Test coverage is provided by modifying the function in the FP-math
testcase that we are allowed to vectorize.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276223 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 01:07:13 +00:00
Adam Nemet
cebe016761 [OptDiag,LV] Add hotness attribute to the derived analysis remarks
This includes FPCompute and Aliasing.

Testcase is based on no_fpmath.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276211 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 23:50:32 +00:00
Wei Mi
92a8d601a3 Recommit the patch "Use uniforms set to populate VecValuesToIgnore".
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

The recommit fixed the testcase global_alias.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275936 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:50:43 +00:00
Wei Mi
fba236f858 Revert rL275912.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275915 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:14:43 +00:00
Wei Mi
1938056381 Use uniforms set to populate VecValuesToIgnore.
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275912 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 20:59:53 +00:00
Michael Kuperstein
b1fce5cc4c [X86] Make some cast costs more precise
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275106 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-11 21:39:44 +00:00
Elena Demikhovsky
407fc99045 Fixed a bug in vectorizing GEP before gather/scatter intrinsic.
Vectorizing GEP was incorrect and broke SSA in some cases.
 
The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997.

Differential revision: http://reviews.llvm.org/D22035



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274735 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-07 06:06:46 +00:00
Michael Kuperstein
c7432f9ad3 [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274642 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-06 17:30:56 +00:00
Matt Arsenault
6bcca1a915 SLPVectorizer: Move propagateMetadata to VectorUtils
This will be re-used by the LoadStoreVectorizer.

Fix handling of range metadata and testcase by Justin Lebar.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274281 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-30 21:17:59 +00:00
Wei Mi
693c332887 Refine the set of UniformAfterVectorization instructions.
Except the seed uniform instructions (conditional branch and consecutive ptr
instructions), dependencies to be added into uniform set should only be used
by existing uniform instructions or intructions outside of current loop.

Differential Revision: http://reviews.llvm.org/D21755


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274262 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-30 18:42:56 +00:00
Artur Pilipenko
48917c9e44 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274043 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-28 18:27:25 +00:00
Artur Pilipenko
be0da39a48 Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273895 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 16:54:33 +00:00
Artur Pilipenko
9227558e8e Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273892 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 16:29:26 +00:00
Michael Kuperstein
01d6c3dbf9 [LV] For some IVs, use vector phis instead of widening in the loop body
Previously, whenever we needed a vector IV, we would create it on the fly,
by splatting the scalar IV and adding a step vector. Instead, we can create a
real vector IV. This tends to save a couple of instructions per iteration.

This only changes the behavior for the most basic case - integer primary
IVs with a constant step.

Differential Revision: http://reviews.llvm.org/D20315


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271410 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-01 17:16:46 +00:00
Tim Northover
5b363367fe Move test to X86 directory: I think it depends on X86 TTI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271019 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-27 16:56:54 +00:00
Hal Finkel
d86e7af14a Look for a loop's starting location in the llvm.loop metadata
Getting accurate locations for loops is important, because those locations are
used by the frontend to generate optimization remarks. Currently, optimization
remarks for loops often appear on the wrong line, often the first line of the
loop body instead of the loop itself. This is confusing because that line might
itself be another loop, or might be somewhere else completely if the body was
inlined function call. This happens because of the way we find the loop's
starting location. First, we look for a preheader, and if we find one, and its
terminator has a debug location, then we use that. Otherwise, we look for a
location on an instruction in the loop header.

The fallback heuristic is not bad, but will almost always find the beginning of
the body, and not the loop statement itself. The preheader location search
often fails because there's often not a preheader, and even when there is a
preheader, depending on how it was formed, it sometimes carries the location of
some preceeding code.

I don't see any good theoretical way to fix this problem. On the other hand,
this seems like a straightforward solution: Put the debug location in the
loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert
llvm.loop metadata with appropriate locations when generating debugging
information. With these changes, our loop remarks have much more accurate
locations.

Differential Revision: http://reviews.llvm.org/D19738

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270771 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 21:42:37 +00:00
Sanjay Patel
cab076f44c [x86] avoid code explosion from LoopVectorizer for gather loop (PR27826)
By making pointer extraction from a vector more expensive in the cost model,
we avoid the vectorization of a loop that is very likely to be memory-bound:
https://llvm.org/bugs/show_bug.cgi?id=27826

There are still bugs related to this, so we may need a more general solution
to avoid vectorizing obviously memory-bound loops when we don't have HW gather
support.

Differential Revision: http://reviews.llvm.org/D20601



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270729 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 17:27:54 +00:00
Wei Mi
7aaac1e6e2 Recommit r255691 since PR26509 has been fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270113 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-19 20:38:03 +00:00
David Majnemer
a89ddf6e7c [LoopVectorize] Add operand bundles to vectorized functions
Also, do not crash when calculating a cost model for loop-invariant
token values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268003 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-29 07:09:48 +00:00
Elena Demikhovsky
b7f92d0916 Masked Store in Loop Vectorizer - bugfix
Fixed a bug in loop vectorization with conditional store.

Differential Revision: http://reviews.llvm.org/D19532



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267597 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-26 20:18:04 +00:00
Hal Finkel
681428ed7d [LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops
I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.

One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.

The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.

Differential Revision: http://reviews.llvm.org/D19512

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267514 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-26 02:00:36 +00:00
Adrian Prantl
422c22e3d3 Convert this sample-based-profiling testcase to use a NoDebug CU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266481 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-15 22:05:38 +00:00
Adrian Prantl
4eeaa0da04 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266446 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-15 15:57:41 +00:00
Adam Nemet
cf0a711bff Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266282 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 08:47:17 +00:00
Artur Pilipenko
80ce67004b Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change.

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266086 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 15:58:04 +00:00
Davide Italiano
d63fceeb37 [DebugInfo/Test] Add CU as required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265999 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-11 21:16:48 +00:00
Elena Demikhovsky
9f62954aaa Loop vectorization with uniform load
Vectorization cost of uniform load wasn't correctly calculated.
As a result, a simple loop that loads a uniform value wasn't vectorized.

Differential Revision: http://reviews.llvm.org/D18940



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265901 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-10 16:53:19 +00:00
David Majnemer
951ea8be17 [LoopVectorize] Register cloned assumptions
InstCombine cannot effectively remove redundant assumptions without them
registered in the assumption cache.  The vectorizer can create identical
assumptions but doesn't register them with the cache, resulting in
slower compile times because InstCombine tries to reason about a lot
more assumptions.

Fix this by registering the cloned assumptions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265800 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-08 16:37:10 +00:00
Silviu Baranga
d8cc816f81 Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
This re-commits r265535 which was reverted in r265541 because it
broke the windows bots. The problem was that we had a PointerIntPair
which took a pointer to a struct allocated with new. The problem
was that new doesn't provide sufficient alignment guarantees.
This pattern was already present before r265535 and it just happened
to work. To fix this, we now separate the PointerToIntPair from the
ExitNotTakenInfo struct into a pointer and a bool.

Original commit message:

Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265786 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-08 14:29:09 +00:00
Silviu Baranga
89e8236bfb Revert r265535 until we know how we can fix the bots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265541 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-06 14:06:32 +00:00
Silviu Baranga
39fbde60e1 [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.

However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.

In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.

We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.

Reviewers: anemet, mzolotukhin, hfinkel, sanjoy

Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265535 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-06 13:18:26 +00:00
David Majnemer
8b680c27be [SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnan
To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has
undefined behavior for negative numbers other than -0.0 (which allows
for better optimization, because there is no need to worry about errno
being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt."

This means that it's unsafe to replace sqrt with llvm.sqrt unless the
call is annotated with nnan.

Thanks to Hal Finkel for pointing this out!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265521 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-06 07:04:53 +00:00
David Majnemer
731666ee90 [SLPVectorizer] Vectorize libcalls of sqrt
We didn't realize that we could transform the libcall into a vectorized
intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265493 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-06 00:14:59 +00:00
Davide Italiano
fe735e5923 [DebugInfo] Fix tests so that each subprogram belongs to a CU.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-05 23:37:08 +00:00
Adrian Prantl
7876f64bc3 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265081 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-01 00:16:49 +00:00
Hal Finkel
dfdada0adb [LoopVectorize] Don't vectorize loops when everything will be scalarized
This change prevents the loop vectorizer from vectorizing when all of the vector
types it generates will be scalarized. I've run into this problem on the PPC's QPX
vector ISA, which only holds floating-point vector types. The loop vectorizer
will, however, happily vectorize loops with purely integer computation. Here's
an example:

  LV: The Smallest and Widest types: 32 / 32 bits.
  LV: The Widest register is: 256 bits.
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 1 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 1 for VF 1 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 1 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Scalar loop costs: 3.
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 2 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 2 for VF 2 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 2 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 2 costs: 2.
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ]
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25
  LV: Found an estimated cost of 0 for VF 4 For instruction:   %2 = trunc i64 %indvars.iv25 to i32
  LV: Found an estimated cost of 4 for VF 4 For instruction:   store i32 %2, i32* %arrayidx, align 4
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
  LV: Found an estimated cost of 1 for VF 4 For instruction:   %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600
  LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %exitcond27, label %for.cond.cleanup, label %for.body
  LV: Vector loop of width 4 costs: 1.
  ...
  LV: Selecting VF: 8.
  LV: The target has 32 registers
  LV(REG): Calculating max register usage:
  LV(REG): At #0 Interval # 0
  LV(REG): At #1 Interval # 1
  LV(REG): At #2 Interval # 2
  LV(REG): At #4 Interval # 1
  LV(REG): At #5 Interval # 1
  LV(REG): VF = 8

The problem is that the cost model here is not wrong, exactly. Since all of
these operations are scalarized, their cost (aside from the uniform ones) are
indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF
picked, the lower the relative overhead from the loop itself (and the
induction-variable update and check), and so in a sense, picking the largest VF
here is the right thing to do.

The problem is that vectorizing like this, where all of the vectors will be
scalarized in the backend, isn't really vectorizing, but rather interleaving.
By itself, this would be okay, but then the vectorizer itself also interleaves,
and that's where the problem manifests itself. There's aren't actually enough
scalar registers to support the normal interleave factor multiplied by a factor
of VF (8 in this example). In other words, the problem with this is that our
register-pressure heuristic does not account for scalarization.

While we might want to improve our register-pressure heuristic, I don't think
this is the right motivating case for that work. Here we have a more-basic
problem: The job of the vectorizer is to vectorize things (interleaving aside),
and if the IR it generates won't generate any actual vector code, then
something is wrong. Thus, if every type looks like it will be scalarized (i.e.
will be split into VF or more parts), then don't consider that VF.

This is not a problem specific to PPC/QPX, however. The problem comes up under
SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's
reduced test case from PR26837 to this commit.

Differential Revision: http://reviews.llvm.org/D18537

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264904 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 19:37:08 +00:00
Matthias Braun
a31e891389 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This commit broke LTO builds. Reverting it to unbreak the bots while the
issue is investigated. See also:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html

This reverts r263158

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264088 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-22 20:24:34 +00:00
Artur Pilipenko
980df33d17 Support arbitrary addrspace pointers in masked load/store intrinsics
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263158 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-10 20:39:22 +00:00
Sanjay Patel
d05dce8ca6 [x86] fix cost model inaccuracy for vector memory ops
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.

Differential Revision: http://reviews.llvm.org/D18000



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263069 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-09 22:23:33 +00:00
Sanjay Patel
e8b70722e0 add a test RUN to show unexpected behavior
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263037 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-09 17:53:28 +00:00
Hans Wennborg
1836552368 Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions."
It caused PR26509.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261368 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-19 21:40:12 +00:00
Elena Demikhovsky
2c7551bff2 Create masked gather and scatter intrinsics in Loop Vectorizer.
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261140 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-17 19:23:04 +00:00
Igor Breger
3c3041375c AVX1 : Enable vector masked_load/store to AVX1.
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).

Differential Revision: http://reviews.llvm.org/D16528

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258675 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-25 10:17:11 +00:00
Cong Hou
e956465289 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255691 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-15 22:45:09 +00:00
Cong Hou
dbef3b079d Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255463 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 17:15:38 +00:00
Cong Hou
f26946fa52 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 16:55:46 +00:00
Cong Hou
6f344e5da6 Revert r255454 as it leads to several test failers on buildbots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255456 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-13 09:28:57 +00:00