788 Commits

Author SHA1 Message Date
Simon Pilgrim
ce23689b40 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356913 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-25 15:53:55 +00:00
Craig Topper
8ff8e3360b [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use.
If they have other users we'll just end up increasing the instruction count.

We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.

Fixes PR41164.

Differential Revision: https://reviews.llvm.org/D59630

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356690 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 17:50:49 +00:00
Nikita Popov
cd6f62dcd4 [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356489 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-19 17:53:56 +00:00
Warren Ristow
b017ce4be9 [SCEV] Guard movement of insertion point for loop-invariants
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)

Original commit message:

r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D57428


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356392 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-18 18:52:35 +00:00
Sanjoy Das
2d9ad10711 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355889 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-12 01:31:44 +00:00
Sanjoy Das
5b9ba1171e Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355873 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-11 22:37:31 +00:00
Sanjoy Das
ceec6f23cb Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355868 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-11 21:36:41 +00:00
Florian Hahn
b11b60da2d [InterleavedAccessAnalysis] Fix integer overflow in insertMember.
Without checking for integer overflow, invalid members can be added
 e.g. if the calculated key overflows, becomes positive and the largest key.

This fixes
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229

Reviewers: Ayal, anna, hsaito, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55538

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355613 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-07 17:50:16 +00:00
Nikita Popov
29ba81ded0 [ValueTracking] More accurate unsigned sub overflow detection
Second part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355109 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 18:04:20 +00:00
Michael Kruse
0bb9e7af42 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353738 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-11 19:45:44 +00:00
Florian Hahn
6bed14d8a4 [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353461 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-07 20:49:10 +00:00
Alina Sbirlea
db7033fef7 Check bool attribute value in getOptionalBoolLoopAttribute.
Summary:
Check the bool value of the attribute in getOptionalBoolLoopAttribute
not just its existance.
Eliminates the warning noise generated when vectorization is explicitly disabled.

Reviewers: Meinersbur, hfinkel, dmgreen

Subscribers: jlebar, sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D57260

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352555 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-29 22:33:20 +00:00
Johannes Doerfert
9e7dd29d66 [ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.

Differential Revision: https://reviews.llvm.org/D54956

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352293 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-26 23:40:35 +00:00
Simon Pilgrim
21d100aff8 [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351685 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-20 13:55:01 +00:00
Sanjay Patel
d4e1d2f774 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351010 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-12 15:27:15 +00:00
Florian Hahn
2e8e13e712 [LAA] Avoid generating RT checks for known deps preventing vectorization.
If we found unsafe dependences other than 'unknown', we already know at
compile time that they are unsafe and the runtime checks should always
fail. So we can avoid generating them in those cases.

This should have no negative impact on performance as the runtime checks
that would be created previously should always fail. As a sanity check,
I measured the test-suite, spec2k and spec2k6 and there were no regressions.

Reviewers: Ayal, anemet, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D55798


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349794 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-20 18:49:09 +00:00
Michael Kruse
42a382c204 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-20 04:58:07 +00:00
Florian Hahn
efdc43373b [LAA] Introduce enum for vectorization safety status (NFC).
This patch adds a VectorizationSafetyStatus enum, which will be extended
in a follow up patch to distinguish between 'safe with runtime checks'
and 'known unsafe' dependences.

Reviewers: anemet, anna, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D54892


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349556 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-18 22:25:11 +00:00
Sanjay Patel
83544a50cc [LoopVectorize] auto-generate complete checks; NFC
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing 
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly 
what the vectorizer has done.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349554 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-18 22:23:04 +00:00
Michael Kruse
9a395de086 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348944 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 17:32:52 +00:00
Nikita Popov
38880e6df9 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348602 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-07 15:38:13 +00:00
David L. Jones
ad6bed67ac Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants"
This change caused SEGVs in instcombine. (The r347934 change seems to me to be a
precipitating cause, not a root cause. Details are on the llvm-commits thread
for r347934.)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348426 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-05 23:13:50 +00:00
Craig Topper
26f6d0a901 [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348063 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-01 01:38:44 +00:00
Renato Golin
4e25c19165 Add a new reduction pattern match
Adding a new reduction pattern match for vectorizing code similar
to TSVC s3111:

for (int i = 0; i < N; i++)
  if (a[i] > b)
    sum += a[i];

This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.

The difference from the previous patch(https://reviews.llvm.org/D49168)
is as follows:
 - Added check of fast-math property of fp-instruction to the
   previous patch
 - Fix/add some pattern for if-reduction.ll


Differential Revision: https://reviews.llvm.org/D54464

Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>
     and Masakazu Ueno <masakazu.ueno@linaro.org>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347989 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 13:40:10 +00:00
Warren Ristow
28ade4e6f8 [SCEV] Guard movement of insertion point for loop-invariants
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.

This fixes PR30806.

Differential Revision: https://reviews.llvm.org/D54713


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347934 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 00:02:54 +00:00
Martin Storsjo
b8e45d4727 Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix"
This reverts commits r347776 and r347778.

The first one, r347776, caused significant compile time regressions
for certain input files, see PR39836 for details.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347867 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 14:39:39 +00:00
John Brawn
3ed031b2c7 [LICM] Enable control flow hoisting by default
Differential Revision: https://reviews.llvm.org/D54949


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347778 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-28 17:23:03 +00:00
Joel Jones
9951b07907 Revert unapproved commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347511 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-24 07:26:55 +00:00
Joel Jones
8ae5719795 [AArch64] Enable libm vectorized functions via SLEEF
This changeset is modeled after Intel's submission for SVML. It enables
trigonometry functions vectorization via SLEEF: http://sleef.org/.

 * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
 * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
 * A comprehensive test case is included in this changeset.
 * In a separate changeset (for clang), a new vectorization library argument is
   added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D53927


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347510 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-24 06:41:39 +00:00
Benjamin Kramer
e0270602c3 Revert "[LICM] Make LICM able to hoist phis"
This reverts commit r347190.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347225 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-19 16:51:57 +00:00
Anna Thomas
0048f91a87 [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347220 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-19 15:39:59 +00:00
John Brawn
ae7ddf0c35 [LICM] Make LICM able to hoist phis
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.

Differential Revision: https://reviews.llvm.org/D52827


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347190 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-19 11:31:24 +00:00
Simon Pilgrim
bb61c72405 [CostModel] Add more realistic SK_InsertSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346662 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-12 15:20:24 +00:00
Sanjay Patel
aeeabb506a [VectorUtils] add funnel-shifts to the list of vectorizable intrinsics
This just identifies the intrinsics as candidates for vectorization.
It does not mean we will attempt to vectorize under normal conditions
(the test file is forcing vectorization). 

The cost model must be fixed to show that the transform is profitable 
in general.

Allowing vectorization with these intrinsics is required to avoid
potential regressions from canonicalizing to the intrinsics from
generic IR:
https://bugs.llvm.org/show_bug.cgi?id=37417



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346661 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-12 15:20:14 +00:00
Sanjay Patel
af76058428 [LoopVectorize] add tests for funnel shifts; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346658 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-12 14:52:01 +00:00
Jonas Paulsson
7e36a98252 [SystemZ] Rework getInterleavedMemoryOpCost()
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.

This makes it more readable and just slightly more accurate generally.

Review: Ulrich Weigand
https://reviews.llvm.org/D53071

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345998 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-02 17:15:36 +00:00
Ayal Zaks
6d0de65682 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345959 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-02 09:16:12 +00:00
Dorit Nuzman
06bac6c858 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 09:57:56 +00:00
Jonas Paulsson
ccd4f446eb [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345603 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 14:34:15 +00:00
Renato Golin
a8bdd2f238 Revert r344172: [LV] Add a new reduction pattern match
This patch has caused fast-math issues in the reduction pattern.

Will re-work and land again.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345465 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-27 22:13:43 +00:00
Simon Pilgrim
8a10b6b077 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345263 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-25 13:06:20 +00:00
Dorit Nuzman
63aae622d8 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-24 07:11:38 +00:00
Sanjay Patel
f46dd75b53 [InstCombine] use 'match' to handle vectors and simplify code
This is another step towards completely removing the fake 
binop queries for not/neg/fneg.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345036 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-23 15:05:12 +00:00
Dorit Nuzman
c7a8ddb849 [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344883 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-22 06:17:09 +00:00
Thomas Lively
97a6779252 [LoopVectorize] Loop vectorization for minimum and maximum
Summary: Depends on D52766.

Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52767

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344816 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-19 21:11:43 +00:00
Ayal Zaks
9e75857c92 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344743 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-18 15:03:15 +00:00
Anna Thomas
c2874102cb [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344613 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-16 15:46:26 +00:00
Ayal Zaks
dacda52aca [LV] Add test checks when vectorizing loops under opt for size; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, recording
current behavior more accurately, to clarify subsequent diff ([LV] Vectorizing
loops of arbitrary trip count without remainder under opt for size).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344606 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-16 14:25:02 +00:00
Dorit Nuzman
7d7250490b recommit 344472 after fixing build failure on ARM and PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 08:50:06 +00:00
Dorit Nuzman
473da03560 revert 344472 due to failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 07:21:20 +00:00