Commit Graph

475 Commits

Author SHA1 Message Date
Farhana Aleen
13f7859c20 [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336130 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 17:55:31 +00:00
Simon Pilgrim
198bcb65d9 [SLPVectorizer][X86] Begin adding alternate tests for call operators
Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336125 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 17:23:45 +00:00
Simon Pilgrim
386f15c93a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336095 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-02 11:28:01 +00:00
Simon Pilgrim
243c2fa15c [SLPVectorizer][X86] Add some alternate tests for cast operators
Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336060 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-01 11:29:46 +00:00
Simon Pilgrim
40be0055ae [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335621 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 16:20:16 +00:00
Simon Pilgrim
5a40cf8639 [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335364 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 16:37:34 +00:00
Simon Pilgrim
ae9a1a8ee7 [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335349 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 14:04:06 +00:00
Simon Pilgrim
1c5cdb19a6 [SLPVectorizer][X86] Add alternate opcode tests for simple build vector cases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335348 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 13:53:58 +00:00
Simon Pilgrim
444b60212b [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335329 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-22 09:45:31 +00:00
Simon Pilgrim
400b266d8f [X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882)
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335216 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 11:37:13 +00:00
Simon Pilgrim
c660be2252 [SLPVectorizer][X86] Add horizontal add/sub tests
Shows PR37882 perf regression

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335215 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 11:16:10 +00:00
Simon Pilgrim
2970a29566 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335130 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-20 14:26:28 +00:00
Simon Pilgrim
afe3129d8f [SLP][X86] Add AVX2 run to POW2 SDIV Tests
Non-uniform pow2 tests are only make sense on targets with fast (low cost) non-uniform shifts

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334821 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 10:29:37 +00:00
Simon Pilgrim
b753b18785 [SLP][X86] Regenerate POW2 SDIV Tests
Added non-uniform pow2 test as well

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334819 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 10:07:03 +00:00
Farhana Aleen
4128fd181f [SLP] Add testcases of min/max reduction pattern for AMDGPU.
Author: FarhanaAleen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334435 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-11 20:29:31 +00:00
Matt Arsenault
4525054673 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332953 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 06:32:10 +00:00
Farhana Aleen
030b9437a7 [AMDGPU] Support horizontal vectorization of min/max.
Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D46604

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331920 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 21:18:34 +00:00
Shiva Chen
a8a13bc662 [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Differential Revision: https://reviews.llvm.org/D45024

Patch by Hsiangkai Wang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331841 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 02:40:45 +00:00
Farhana Aleen
20a92cda49 [AMDGPU] Support horizontal vectorization.
Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46213

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331313 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 21:41:12 +00:00
Matthew Simpson
9acd5ab38b [SLP] Add additional test for transposable binary operations with reuse
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331274 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:59:26 +00:00
Davide Italiano
44735eb19d [SLPVectorizer] Debug info shouldn't impact spill cost computation.
<rdar://problem/39794738>

(Also, PR32761).

Differential Revision:  https://reviews.llvm.org/D46199

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331199 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-30 16:57:33 +00:00
Benjamin Kramer
937d9c9219 [NVPTX] Turn on Loop/SLP vectorization
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.

This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.

I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.

Differential Revision: https://reviews.llvm.org/D46130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331035 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-27 13:36:05 +00:00
Matthew Simpson
4965d63ae5 [SLP] Add tests for transposable binary operations
These test cases are vectorizable, but we are currently unable to vectorize
them effectively.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330945 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-26 14:50:04 +00:00
Craig Topper
891d17ec5e [X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already enabled it. NFC
This makes the test similar to the arith-sub.ll and arith-mul.ll tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330144 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-16 18:14:19 +00:00
Haicheng Wu
39a435f255 [SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.

Differential Revision: https://reviews.llvm.org/D45469

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330143 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-16 18:09:49 +00:00
Haicheng Wu
05d18d68a3 [SLP] update a test case. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329818 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-11 15:09:49 +00:00
Alexey Bataev
e6a456223b [SLP] Additional tests for reorder reuse vectorization, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329603 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-09 19:02:34 +00:00
Simon Pilgrim
54d7a0223b [SLPVectorizer][X86] Regenerate some tests. NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329196 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 13:53:51 +00:00
Alexey Bataev
91811bc488 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329085 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 17:14:47 +00:00
Alexey Bataev
0b1a72a7a6 [SLP] Added tests for checks of reordering of the repeated instructions,
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329080 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 16:31:26 +00:00
Benjamin Kramer
4832f865cf Revert "[SLP] Fix PR36481: vectorize reassociated instructions."
This reverts commit r328980 and r329046. Makes the vectorizer crash.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329071 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 14:40:33 +00:00
Haicheng Wu
2784c35c0f [SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth
We use two approaches for determining the minimum bitwidth.

   * Demanded bits
   * Value tracking

If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.

But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in

%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i

are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in

%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0

it doesn't make sense to shrink %tmp15 and we can skip the value tracking.

Ideas are from Matthew Simpson!

Differential Revision: https://reviews.llvm.org/D44868

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329035 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 00:05:10 +00:00
Alexey Bataev
6616787959 [SLP] Fix PR36481: vectorize reassociated instructions.
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.

Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43776

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328980 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-02 14:51:37 +00:00
Dinar Temirbulatov
09493fff69 [SLPVectorizer] Add tests related to PR30787, NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328813 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 18:57:03 +00:00
Haicheng Wu
648a6091ec [SLP] Add more checks to a test case. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328572 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-26 18:59:28 +00:00
Haicheng Wu
b9e7253e39 [SLP] Add a test case. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328546 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-26 16:47:37 +00:00
Matthew Simpson
27f212d583 [SLP] Stop counting cost of gather sequences with multiple uses
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.

Differential Revision: https://reviews.llvm.org/D44742

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328316 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 14:18:27 +00:00
Matthew Simpson
63f2cc2aa9 [SLP] Add test case for a gather sequence with multiple uses
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328133 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-21 19:13:14 +00:00
Matthew Simpson
a7fd2c3c2a [AArch64] Implement getArithmeticReductionCost
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.

Differential Revision: https://reviews.llvm.org/D44490

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327702 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-16 11:34:15 +00:00
Alexey Bataev
7c6f31a848 [SLP] Additional tests for stores vectorization, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326740 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-05 20:20:12 +00:00
Mohammad Shahid
820fd02e9a [SLP] Added new tests and updated existing for jumbled load, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326303 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 04:19:34 +00:00
Sanjay Patel
dcf9b1dd5e [AArch64] add SLP test based on TSVC; NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326217 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-27 18:06:15 +00:00
Simon Pilgrim
0eea35a6ef [X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.

Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.

Differential Revision: https://reviews.llvm.org/D43733

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326133 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-26 22:10:17 +00:00
Alexey Bataev
b4efe59b69 [SLP] Added new test + fixed some checks, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326117 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-26 20:01:24 +00:00
Simon Pilgrim
b891e74e20 [SLPVectorizer][X86] Add load extend tests (PR36091)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325772 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-22 12:19:34 +00:00
Sanjay Patel
5c377f610d [AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325718 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-21 20:48:14 +00:00
Sanjay Patel
b0c13268b8 [AArch64] add SLP test for matmul (PR36280); NFC
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325717 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-21 20:34:16 +00:00
Alexey Bataev
27e6b3dc3f [SLP] Fix test checks, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325689 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-21 15:32:58 +00:00
Sanjay Patel
1c629279f1 revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)
There are too many perf regressions resulting from this, so we need to 
investigate (and add tests for) targets like ARM and AArch64 before 
trying to reinstate.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325658 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-21 01:42:52 +00:00
Alexey Bataev
771994be2d [SLP] Fix tests checks, NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325605 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-20 18:11:50 +00:00