Demonstrate missed opportunity for urem -> and combine for powerof2 or zero non-uniform constant dividers
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288510 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288508 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288497 91177308-0d34-0410-b5e6-96231b3b80d8
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.
Differential Revision: https://reviews.llvm.org/D24365
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288415 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288412 91177308-0d34-0410-b5e6-96231b3b80d8
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.
Differential Revision: https://reviews.llvm.org/D26277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288398 91177308-0d34-0410-b5e6-96231b3b80d8
If LoopInfo is available during GVN, BasicAA will use it. However
MergeBlockIntoPredecessor does not update LI as it merges blocks.
This didn't use to cause problems because LI was freed before
GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is
extended so LI needs to be kept up-to-date during GVN.
Differential Revision: https://reviews.llvm.org/D27288
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288307 91177308-0d34-0410-b5e6-96231b3b80d8
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288274 91177308-0d34-0410-b5e6-96231b3b80d8
We had a limited version of this for scalar 'and'; this expands
the transform to 'or' and 'xor' and allows vectors types too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288273 91177308-0d34-0410-b5e6-96231b3b80d8
Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based
for loops", introduced a bug into rebuildSSA, wherein we were iterating
over an instruction's use list while modifying it, without taking care
to do this correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288200 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288046.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288179 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288047.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288178 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288090.
Trying to see if the revert fixes a compiler crash during a stage2 LTO
build with a GVN backtrace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288177 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.
Differential Revision: https://reviews.llvm.org/D25517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288115 91177308-0d34-0410-b5e6-96231b3b80d8
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288052 91177308-0d34-0410-b5e6-96231b3b80d8
In r286814, the algorithm for calculating inline costs changed. This
caused more inlining to take place which is especially apparent
in optsize and minsize modes.
As the cost calculation removed a skewed behaviour (we were inconsistent
about the cost of calls) it isn't possible to update the thresholds to
get exactly the same behaviour as before. However, this threshold change
accounts for the very common case where an inline candidate has no
calls within it. In this case, r286814 would inline around 5-6 more (IR)
instructions.
The changes to -Oz have been heavily benchmarked. The "obvious" value
for the inline threshold at -Oz is zero, but due to inaccuracies in the
inline heuristics this can actually cause code size increases due to
not inlining key thunk functions (that then disappear). Experimentally,
5 was the sweet spot for code size over the test-suite.
For -Os, this change removes the outlier results shown up by green dragon
(http://104.154.54.203/db_default/v4/nts/13248).
Fixes D26848.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288024 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops.
Given the exponential nature of the algorithm, this is quite an overhead.
This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header.
Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao.
Subscribers: llvm-commits.
Differential Revision: http://reviews.llvm.org/D26299
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287925 91177308-0d34-0410-b5e6-96231b3b80d8
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287882 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain.
In practice, the aliasing load can be treated as a memory barrier and all stores that precede it
are a valid vectorizable prefix.
Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch.
Reviewers: jlebar, arsenm, tstellarAMD
Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits
Differential Revision: https://reviews.llvm.org/D27008
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287781 91177308-0d34-0410-b5e6-96231b3b80d8
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287762 91177308-0d34-0410-b5e6-96231b3b80d8
Without this test, you can just remove the code fixing the
switch to the first constant in ResolvedUndefs in and everything
pass. This test, instead, fails with an assertion if the code
is removed. Found while refactoring SCCP to integrate undef in
the solver.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287731 91177308-0d34-0410-b5e6-96231b3b80d8
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287710 91177308-0d34-0410-b5e6-96231b3b80d8