Commit Graph

314 Commits

Author SHA1 Message Date
Orlando Cazalet-Hyams
b1ba8a93bc [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 363046

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363786 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-19 10:50:47 +00:00
Warren Ristow
31868b92df [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363581 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 17:20:08 +00:00
Bjorn Pettersson
784000b4c2 [LV] Deny irregular types in interleavedAccessCanBeWidened
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363547 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 12:02:24 +00:00
Fangrui Song
1002960b9d [lit] Delete empty lines at the end of lit.local.cfg NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363538 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 09:51:07 +00:00
Sam Parker
d021f415d1 [SCEV] Pass NoWrapFlags when expanding an AddExpr
InsertBinop now accepts NoWrapFlags, so pass them through when
expanding a simple add expression.

This is the first re-commit of the functional changes from rL362687,
which was previously reverted.

Differential Revision: https://reviews.llvm.org/D61934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363364 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-14 09:19:41 +00:00
Orlando Cazalet-Hyams
f36c1c031e Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion"
This reverts commit 1a0f7a2077b70c9864faa476e15b048686cf1ca7.
See phabricator thread for D60831.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363132 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams
6c60c71a4e [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363046 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-11 10:37:20 +00:00
Benjamin Kramer
600f7b5a8d Revert "[SCEV] Use wrap flags in InsertBinop"
This reverts commit r362687. Miscompiles llvm-profdata during selfhost.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362699 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-06 12:35:46 +00:00
Sam Parker
6892c31cf9 [SCEV] Use wrap flags in InsertBinop
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.

Differential Revision: https://reviews.llvm.org/D61934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362687 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-06 08:56:26 +00:00
Simon Pilgrim
3748537115 [CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362338 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-02 20:37:02 +00:00
Craig Topper
8c0b3da9d4 [LoopVectorize] Add FNeg instruction support
Differential Revision: https://reviews.llvm.org/D62510

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362124 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-30 18:19:35 +00:00
Craig Topper
13480b8846 [LoopVectorize] Precommit tests for D62510. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362060 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-30 06:48:13 +00:00
Sanjay Patel
1a7c631eb2 [LoopVectorizer] fix test file to not run the entire -O3 pipeline
This test file has a long history of edits from changes outside
of vectorization, and it would happen again with the proposal in
D61726.

End-to-end testing shouldn't be happening in a test file that is
specifically checking for vector masked load/store ops.
Larger-scale testing goes in PhaseOrdering or the test-suite.

I've hopefully preserved the intent by taking what was completely
unoptimized IR in some tests and passing that through the -O1
pipeline. That becomes the input IR, and now we just run the loop
vectorizer and verify that the vector masked ops are produced as
expected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360340 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-09 13:43:22 +00:00
Kostya Serebryany
53dcce4331 revert r360162 as it breaks most of the buildbots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360190 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams
8c101025dc [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel

Reviewed By: hfinkel

Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360162 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-07 15:37:38 +00:00
Keno Fischer
35a5cc8b49 [SCEV] Add explicit representations of umin/smin
Summary:
Currently we express umin as `~umax(~x, ~y)`. However, this becomes
a problem for operands in non-integral pointer spaces, because `~x`
is not something we can compute for `x` non-integral. However, since
comparisons are generally still allowed, we are actually able to
express `umin(x, y)` directly as long as we don't try to express is
as a umax. Support this by adding an explicit umin/smin representation
to SCEV. We do this by factoring the existing getUMax/getSMax functions
into a new function that does all four. The previous two functions were
largely identical.

Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D50167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360159 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-07 15:28:47 +00:00
Alina Sbirlea
1e33097313 [PassManagerBuilder] Add option for interleaved loops, for loop vectorize.
Summary:
Match NewPassManager behavior: add option for interleaved loops in the
old pass manager, and use that instead of the flag used to disable loop unroll.
No changes in the defaults.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61030

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359615 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-30 21:29:20 +00:00
Alina Sbirlea
6ac87016ff Enable LoopVectorization by default.
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.

Reviewers: chandlerc, jgorbe

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61091

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359167 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-25 04:49:48 +00:00
Eric Christopher
598198edbc Revert "Temporarily Revert "Add basic loop fusion pass.""
The reversion apparently deleted the test/Transforms directory.

Will be re-reverting again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358552 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-17 04:52:47 +00:00
Eric Christopher
02cc44c1b9 Temporarily Revert "Add basic loop fusion pass."
As it's causing some bot failures (and per request from kbarton).

This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358546 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-17 02:12:23 +00:00
Florian Hahn
3259337afd [VPlan] Determine Vector Width programmatically.
With this change, the VPlan native path is triggered with the directive:

   #pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D57598

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357156 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-28 10:37:12 +00:00
Simon Pilgrim
ce23689b40 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356913 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-25 15:53:55 +00:00
Craig Topper
8ff8e3360b [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use.
If they have other users we'll just end up increasing the instruction count.

We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.

Fixes PR41164.

Differential Revision: https://reviews.llvm.org/D59630

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356690 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 17:50:49 +00:00
Nikita Popov
cd6f62dcd4 [ValueTracking] Use computeConstantRange() for unsigned add/sub overflow
Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by
intersecting the computeConstantRange() result into the ConstantRange
created from computeKnownBits(). This allows us to detect some
additional never/always overflows conditions that can't be determined
from known bits.

This revision also adds basic handling for constants to
computeConstantRange(). Non-splat vectors will be handled in a followup.

The signed case will also be handled in a followup, as it needs some
more groundwork.

Differential Revision: https://reviews.llvm.org/D59386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356489 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-19 17:53:56 +00:00
Sanjoy Das
2d9ad10711 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355889 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-12 01:31:44 +00:00
Florian Hahn
b11b60da2d [InterleavedAccessAnalysis] Fix integer overflow in insertMember.
Without checking for integer overflow, invalid members can be added
 e.g. if the calculated key overflows, becomes positive and the largest key.

This fixes
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128
      https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229

Reviewers: Ayal, anna, hsaito, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D55538

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355613 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-07 17:50:16 +00:00
Nikita Popov
29ba81ded0 [ValueTracking] More accurate unsigned sub overflow detection
Second part of D58593.

Compute precise overflow conditions based on all known bits, rather
than just the sign bits. Unsigned a - b overflows iff a < b, and we
can determine whether this always/never happens based on the minimal
and maximal values achievable for a and b subject to the known bits
constraint.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355109 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 18:04:20 +00:00
Michael Kruse
0bb9e7af42 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353738 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-11 19:45:44 +00:00
Johannes Doerfert
9e7dd29d66 [ValueTracking] Look through casts when determining non-nullness
Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the
value of their operand and can therefore be looked through when we
determine non-nullness.

Differential Revision: https://reviews.llvm.org/D54956

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352293 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-26 23:40:35 +00:00
Simon Pilgrim
21d100aff8 [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351685 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-20 13:55:01 +00:00
Michael Kruse
42a382c204 Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.

Differential Revision: https://reviews.llvm.org/D52116

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-20 04:58:07 +00:00
Sanjay Patel
83544a50cc [LoopVectorize] auto-generate complete checks; NFC
The first test claims to show that the vectorizer will
generate a vector load/loop, but then this file runs
other passes which might scalarize that op. I'm removing 
instcombine from the RUN line here to break that dependency.
Also, I'm generating full checks to make it clear exactly 
what the vectorizer has done.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349554 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-18 22:23:04 +00:00
Michael Kruse
9a395de086 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348944 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 17:32:52 +00:00
Craig Topper
26f6d0a901 [X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348063 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-01 01:38:44 +00:00
Simon Pilgrim
bb61c72405 [CostModel] Add more realistic SK_InsertSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346662 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-12 15:20:24 +00:00
Ayal Zaks
6d0de65682 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345959 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-02 09:16:12 +00:00
Dorit Nuzman
06bac6c858 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 09:57:56 +00:00
Simon Pilgrim
8a10b6b077 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345263 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-25 13:06:20 +00:00
Dorit Nuzman
63aae622d8 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-24 07:11:38 +00:00
Dorit Nuzman
c7a8ddb849 [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344883 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-22 06:17:09 +00:00
Ayal Zaks
9e75857c92 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344743 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-18 15:03:15 +00:00
Anna Thomas
c2874102cb [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344613 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-16 15:46:26 +00:00
Ayal Zaks
dacda52aca [LV] Add test checks when vectorizing loops under opt for size; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, recording
current behavior more accurately, to clarify subsequent diff ([LV] Vectorizing
loops of arbitrary trip count without remainder under opt for size).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344606 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-16 14:25:02 +00:00
Dorit Nuzman
7d7250490b recommit 344472 after fixing build failure on ARM and PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 08:50:06 +00:00
Dorit Nuzman
473da03560 revert 344472 due to failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 07:21:20 +00:00
Dorit Nuzman
a3ff03e8e2 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344472 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 07:06:16 +00:00
Sanjay Patel
66a2c5ecaa [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Re-trying r344082 because it unintentionally included extra diffs.

Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {

  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r

}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vandps  LCPI0_0(%rip), %xmm0, %xmm0
vxorps  %xmm1, %xmm1, %xmm1
vpcmpeqd        %xmm1, %xmm0, %xmm0
vblendvps       %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vblendvps       %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpbroadcastd    LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd       %zmm1, %zmm0, %k1
vblendmps       %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpslld  $31, %xmm0, %xmm0
vptestmd        %zmm0, %zmm0, %k1
vblendmps       %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
movi    v1.4s, #1
and     v0.16b, v0.16b, v1.16b
cmeq    v0.4s, v0.4s, #0
bsl     v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
bsl     v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344181 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:47:46 +00:00
Sanjay Patel
9f5daa2df0 revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
This commit accidentally included the diffs from D53057.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344178 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:39:39 +00:00
Justin Bogner
a219e952b7 [LV] Move test for r343954 into x86 subdirectory
This test uses an x86 triple, so it needs to be in the x86 specific
test directory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344087 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-09 22:40:04 +00:00
Sanjay Patel
1dd3c06445 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-09 21:26:01 +00:00