D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.
This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.
The AArch64 regression will be fixed by D48172.
Differential Revision: https://reviews.llvm.org/D48174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335130 91177308-0d34-0410-b5e6-96231b3b80d8
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.
Noticed while working on D47985.
Differential Revision: https://reviews.llvm.org/D48008
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335045 91177308-0d34-0410-b5e6-96231b3b80d8
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.
Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.
Differential Revision: https://reviews.llvm.org/D48023
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334958 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.
Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.
Differential Revision: https://reviews.llvm.org/D48120
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334701 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334513 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h
Reviewers: dblaikie, craig.topper
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D47836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334492 91177308-0d34-0410-b5e6-96231b3b80d8
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332240 91177308-0d34-0410-b5e6-96231b3b80d8
Inspired by r331508, I did a grep and found these.
Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331577 91177308-0d34-0410-b5e6-96231b3b80d8
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.
Differential Revision: https://reviews.llvm.org/D45469
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330143 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329085 91177308-0d34-0410-b5e6-96231b3b80d8
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
errs() rather than dbgs().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329082 91177308-0d34-0410-b5e6-96231b3b80d8
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329046 91177308-0d34-0410-b5e6-96231b3b80d8
We use two approaches for determining the minimum bitwidth.
* Demanded bits
* Value tracking
If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.
But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in
%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i
are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in
%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0
it doesn't make sense to shrink %tmp15 and we can skip the value tracking.
Ideas are from Matthew Simpson!
Differential Revision: https://reviews.llvm.org/D44868
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329035 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328980 91177308-0d34-0410-b5e6-96231b3b80d8
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.
Differential Revision: https://reviews.llvm.org/D44742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328316 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43022
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325134 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324893 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323662 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323530 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323441 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323430 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323348 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323246 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.
Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41948
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322946 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322579 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321994 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the vectorized value is marked as extra reduction argument, its users
are not considered as external users. Patch fixes this.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41786
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321993 91177308-0d34-0410-b5e6-96231b3b80d8
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321974 91177308-0d34-0410-b5e6-96231b3b80d8
In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted.
For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation.
Differential Revision: https://reviews.llvm.org/D41139
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320736 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: mgrang, dcaballe, hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320548 91177308-0d34-0410-b5e6-96231b3b80d8