The removed assert seems bogus - it's perfectly legal for the roots of the
vectorized subtrees to be equal even if the original scalar values aren't,
if the original scalars happen to be equivalent.
This fixes PR31599.
Differential Revision: https://reviews.llvm.org/D28539
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291692 91177308-0d34-0410-b5e6-96231b3b80d8
This patch ensures the correct minimum bit width during type-shrinking.
Previously when type-shrinking, we always sign-extended values back to their
original width. However, if we are going to sign-extend, and the sign bit is
unknown, we have to increase the minimum bit width by one bit so the
sign-extend will fill the upper bits correctly. If the sign bit is known to be
zero, we can perform a zero-extend instead. This should fix PR31243.
Reference: https://llvm.org/bugs/show_bug.cgi?id=31243
Differential Revision: https://reviews.llvm.org/D27466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289470 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289402 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289043 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288508 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288497 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288412 91177308-0d34-0410-b5e6-96231b3b80d8
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.
Differential Revision: https://reviews.llvm.org/D26277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288398 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.
Differential Revision: https://reviews.llvm.org/D25517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288115 91177308-0d34-0410-b5e6-96231b3b80d8
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287882 91177308-0d34-0410-b5e6-96231b3b80d8
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287762 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This extends FCOPYSIGN support to 512-bit vectors.
I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.
Reviewers: delena, zvi, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287298 91177308-0d34-0410-b5e6-96231b3b80d8
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286233 91177308-0d34-0410-b5e6-96231b3b80d8
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
Differential Revision: https://reviews.llvm.org/D25671
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285286 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284459 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed copy+paste vector alignment to correct for per-element scalar loads
Increased to 512-bit data sizes in preparation of avx512 tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283748 91177308-0d34-0410-b5e6-96231b3b80d8
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283535 91177308-0d34-0410-b5e6-96231b3b80d8
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.
This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.
This fixes PR28474.
Differential Revision: https://reviews.llvm.org/D22554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276477 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.
Differential Revision: https://reviews.llvm.org/D22456
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276104 91177308-0d34-0410-b5e6-96231b3b80d8