Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.
radar://14351991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186188 91177308-0d34-0410-b5e6-96231b3b80d8
against a constant."
This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:
#include <stdio.h>
int main() {
unsigned char first = 0, last = 255;
do { printf("%d\n", first); } while (first++ != last);
}
Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186152 91177308-0d34-0410-b5e6-96231b3b80d8
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186139 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Michele Scandale!
Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186107 91177308-0d34-0410-b5e6-96231b3b80d8
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).
Fixes PR16592.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186088 91177308-0d34-0410-b5e6-96231b3b80d8
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block. This change
allows merging in the case where there is one or more incoming values
that are undef. The undef values are rewritten to match the non-undef
value that flows from the other edge. Patch by Mark Lacey.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186069 91177308-0d34-0410-b5e6-96231b3b80d8
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.
Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.
Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:
1. By assumption A has no escaping alloca arguments itself so it can not
access the caller's stack via its arguments.
2. Since all of B's escaping alloca arguments are passed as parameters with
the nocapture attribute, we know that B does not stash said escaping
allocas in a manner that outlives B itself and thus could be accessed
indirectly by A.
With the changes introduced by this patch:
1. If we see any escaping allocas passed as a capturing argument, we do
nothing and bail early.
2. If we do not see any escaping allocas passed as captured arguments but we
do see escaping allocas passed as nocapture arguments:
i. We do not perform TRE to avoid PR962 since the code generator produces
significantly worse code for the dynamic allocas that would be created
by the TRE algorithm.
ii. If we do not return twice, mark call sites without escaping allocas
with the tail marker. *NOTE* This excludes functions with escaping
nocapture allocas.
3. If we do not see any escaping allocas at all (whether captured or not):
i. If we do not have usage of setjmp, mark all callsites with the tail
marker.
ii. If there are no dynamic/variable sized allocas in the function,
attempt to perform TRE on all callsites in the function.
Based off of a patch by Nick Lewycky.
rdar://14324281.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186057 91177308-0d34-0410-b5e6-96231b3b80d8
(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero
(add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185954 91177308-0d34-0410-b5e6-96231b3b80d8
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)
These are nice, they get rid of the xor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185915 91177308-0d34-0410-b5e6-96231b3b80d8
Back in r179493 we determined that two transforms collided with each
other. The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform. However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185808 91177308-0d34-0410-b5e6-96231b3b80d8
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185774 91177308-0d34-0410-b5e6-96231b3b80d8
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185735 91177308-0d34-0410-b5e6-96231b3b80d8
This transform was originally added in r185257 but later removed in
r185415. The original transform would create instructions speculatively
and then discard them if the speculation was proved incorrect. This has
been replaced with a scheme that splits the transform into two parts:
preflight and fold. While we preflight, we build up fold actions that
inform the folding stage on how to act.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185667 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185632 91177308-0d34-0410-b5e6-96231b3b80d8
This implies annotating it as nounwind and its arguments as nocapture. To be
conservative, we do not annotate the arguments with noalias since some platforms
do not have restrict on the declaration for gettimeofday.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185502 91177308-0d34-0410-b5e6-96231b3b80d8
I'm reverting this commit because:
1. As discussed during review, it needs to be rewritten (to avoid creating and
then deleting instructions).
2. This is causing optimizer crashes. Specifically, I'm seeing things like
this:
While deleting: i1 %
Use still stuck around after Def is destroyed: <badref> = select i1 <badref>, i32 0, i32 1
opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
I'd guess that these will go away once we're no longer creating/deleting
instructions here, but just in case, I'm adding a regression test.
Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message:
InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185415 91177308-0d34-0410-b5e6-96231b3b80d8
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185299 91177308-0d34-0410-b5e6-96231b3b80d8
Inserting a zext or trunc is sufficient. This pattern is somewhat common in
LLVM's pointer mangling code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185270 91177308-0d34-0410-b5e6-96231b3b80d8
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
%gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
%gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
%cmp.i = icmp ult i8* %gep.i, %gep2.i
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = icmp ne i1 %cmp.i, %cmp.i1
ret i1 %cmp
into:
%cmp.i = icmp slt [1 x i8]* %a, %b
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = xor i1 %cmp.i, %cmp.i1
ret i1 %cmp
By preserving the original sign, we now get:
ret i1 false
This fixes PR16483.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185259 91177308-0d34-0410-b5e6-96231b3b80d8
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185257 91177308-0d34-0410-b5e6-96231b3b80d8
We may, after other optimizations, find ourselves with IR that looks
like:
%shl = shl i32 1, %y
%cmp = icmp ult i32 %shl, 32
Instead, we should just compare the shift count:
%cmp = icmp ult i32 %y, 5
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185242 91177308-0d34-0410-b5e6-96231b3b80d8
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185230 91177308-0d34-0410-b5e6-96231b3b80d8
- lit tests verify that each line of input LLVM IR gets a !dbg node and a
corresponding entry of metadata that contains the line number
- unit tests verify that DebugIR works as advertised in the interface
- refactored some useful IR generation functionality from the MCJIT unit tests
so it can be reused
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185212 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185135 91177308-0d34-0410-b5e6-96231b3b80d8
Use vectorized instruction instead of original instruction anchored in the
original loop.
Fixes PR16452 and t2075.c of PR16455.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185081 91177308-0d34-0410-b5e6-96231b3b80d8
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.
This fixes 3 test cases of PR16455.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185051 91177308-0d34-0410-b5e6-96231b3b80d8
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185049 91177308-0d34-0410-b5e6-96231b3b80d8
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca. This patch just adds a check
to skip that conversion when it is unnecessary. This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184870 91177308-0d34-0410-b5e6-96231b3b80d8
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.
"LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184724 91177308-0d34-0410-b5e6-96231b3b80d8
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184685 91177308-0d34-0410-b5e6-96231b3b80d8
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree. With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.
Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184681 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184647 91177308-0d34-0410-b5e6-96231b3b80d8
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184444 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.
The added test case exercices that behavior.
<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184341 91177308-0d34-0410-b5e6-96231b3b80d8
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.
This changes accelerates bullet by 12% and 471_omnetpp by 5%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184317 91177308-0d34-0410-b5e6-96231b3b80d8
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184103 91177308-0d34-0410-b5e6-96231b3b80d8
This pass was assuming that if hasAddressTaken() returns false for a
function, the function's only uses are call sites. That's not true
because there can be references by BlockAddresses too.
Fix the pass to handle this case. Fix
BlockAddress::replaceUsesOfWithOnConstant() to allow a function's type
to be changed by RAUW'ing the function with a bitcast of the recreated
function.
Patch by Mark Seaborn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183933 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of a custom implementation of replaceAllUsesWith, we just call
replaceAllUsesWith and recreate llvm.used and llvm.compiler-used.
This change is particularity interesting because it makes llvm see
through what clang is doing with static used functions in extern "C"
contexts. With this change, running clang -O2 in
extern "C" {
__attribute__((used)) static void foo() {}
}
produces
@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to
i8*)], section "llvm.metadata"
define internal void @foo() #0 {
entry:
ret void
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183756 91177308-0d34-0410-b5e6-96231b3b80d8
Variadic functions are particularly fragile in the face of ABI changes, so this
limits how much the pass changes them
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183625 91177308-0d34-0410-b5e6-96231b3b80d8
r183584 tries to derive some info from the code *AFTER* a call and apply
these derived info to the code *BEFORE* the call, which is not always safe
as the call in question may never return, and in this case, the derived
info is invalid.
Thank Duncan for pointing out this potential bug.
rdar://14073661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183606 91177308-0d34-0410-b5e6-96231b3b80d8
The MemCpyOpt pass is capable of optimizing:
callee(&S); copy N bytes from S to D.
into:
callee(&D);
subject to some legality constraints.
Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
while D is an opaque-typed, 'sret' formal argument of function being compiled.
i.e. the signature of the func being compiled is something like this:
T caller(...,%opaque* noalias nocapture sret %D, ...)
The fix is that when come across such situation, instead of calling some
utility functions to get the size of D's type (which will crash), we simply
assume D has at least N bytes as implified by the copy-instruction.
rdar://14073661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183584 91177308-0d34-0410-b5e6-96231b3b80d8
IndVarSimplify is willing to move divide instructions outside of their
loop bodies if they are invariant of the loop. However, it may not be
safe to expand them if we do not know if they can trap.
Instead, check to see if it is not safe to expand the instruction and
skip the expansion.
This fixes PR16041.
Testcase by Rafael Ávila de Espíndola.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183239 91177308-0d34-0410-b5e6-96231b3b80d8
The problem this time seems to be a thinko. We were assuming that in the CFG
A
| \
| B
| /
C
speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183226 91177308-0d34-0410-b5e6-96231b3b80d8
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.
We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.
Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183152 91177308-0d34-0410-b5e6-96231b3b80d8
index greater than the size of the vector is invalid. The shuffle may be
shrinking the size of the vector. Fixes a crash!
Also drop the maximum recursion depth of the safety check for this
optimization to five.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183080 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes rdar:14036816, PR16130.
There is an opportunity to compute precise trip counts for 'or'
expressions and multi-exit loops.
rdar:14038809: Optimize trip count computation for multi-exit loops.
To do this we need to record the fact that ExitLimit assumes NSW. When
it does not we can safely assume that the loop trip count is the
minimum ExitLimt across all subexpressions and loop exits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183060 91177308-0d34-0410-b5e6-96231b3b80d8
We check that instructions in the loop don't have outside users (except if
they are reduction values). Unfortunately, we skipped this check for
if-convertable PHIs.
Fixes PR16184.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183035 91177308-0d34-0410-b5e6-96231b3b80d8
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.
Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.
<rdar://problem/13973908>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183021 91177308-0d34-0410-b5e6-96231b3b80d8
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
by making the root of additional loop metadata.
- Loop::isAnnotatedParallel now looks for llvm.loop and associated
llvm.mem.parallel_loop_access
- document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
- document llvm.vectorizer.* metadata
- add utility class LoopVectorizerHints for getting/setting loop metadata
- use llvm.vectorizer.width=1 to indicate already vectorized instead of
already_vectorized
- update existing tests that used llvm.loop.parallel and
llvm.vectorizer.already_vectorized
Reviewed by: Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
as the BinaryOperator, *not* in the block where the IRBuilder is currently
inserting into. Fixes a bug where scalarizePHI would create instructions
that would not dominate all uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182639 91177308-0d34-0410-b5e6-96231b3b80d8
We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs).
* Imroved the numbering API.
* Changed the placement of new instructions to the last root.
* Fixed a bug with external tree users with non-zero lane.
* Fixed a bug in the placement of in-tree users.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182508 91177308-0d34-0410-b5e6-96231b3b80d8
The earlier change list introduced the following inst combines:
B * (uitofp i1 C) —> select C, B, 0
A * (1 - uitofp i1 C) —> select C, 0, A
select C, 0, B + select C, A, 0 —> select C, A, B
Together these 3 changes would simplify :
A * (1 - uitofp i1 C) + B * uitofp i1 C
down to :
select C, B, A
In practice we found that the first two substitutions can have a
negative effect on performance, because they reduce opportunities to
use FMA contractions; between the two options FMAs are often the
better choice. This change list amends the previous one to enable
just these inst combines:
select C, B, 0 + select C, 0, A —> select C, B, A
A * (1 - uitofp i1 C) + B * uitofp i1 C —> select C, B, A
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182499 91177308-0d34-0410-b5e6-96231b3b80d8
The Value pointers we store in the induction variable list can be RAUW'ed by a
call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing
in some other places where we store pointers that could potentially be RAUW'ed.
Fixes PR16073.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182485 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful if something that looks like (x & (1 << y)) ? 64 : 32 is
the divisor in a modulo operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182200 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombine can be uncooperative to vectorization and sink loads into
conditional blocks. This prevents vectorization.
Undo this optimization if there are unconditional memory accesses to the same
addresses in the loop.
radar://13815763
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181860 91177308-0d34-0410-b5e6-96231b3b80d8
CXAAtExitFn was set outside a loop and before optimizations where functions
can be deleted. This patch will set CXAAtExitFn inside the loop and after
optimizations.
Seg fault when running LTO because of accesses to a deleted function.
rdar://problem/13838828
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181838 91177308-0d34-0410-b5e6-96231b3b80d8
We used to give up if we saw two integer inductions. After this patch, we base
further induction variables on the chosen one like we do in the reverse
induction and pointer induction case.
Fixes PR15720.
radar://13851975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181746 91177308-0d34-0410-b5e6-96231b3b80d8
In the presense of a block being initialized, the frontend will emit the
objc_retain on the original pointer and the release on the pointer loaded from
the alloca. The optimizer will through the provenance analysis realize that the
two are related (albiet different), but since we only require KnownSafe in one
direction, will match the inner retain on the original pointer with the guard
release on the original pointer. This is fixed by ensuring that in the presense
of allocas we only unconditionally remove pointers if both our retain and our
release are KnownSafe (i.e. we are KnownSafe in both directions) since we must
deal with the possibility that the frontend will emit what (to the optimizer)
appears to be unbalanced retain/releases.
An example of the miscompile is:
%A = alloca
retain(%x)
retain(%x) <--- Inner Retain
store %x, %A
%y = load %A
... DO STUFF ...
release(%y)
call void @use(%x)
release(%x) <--- Guarding Release
getting optimized to:
%A = alloca
retain(%x)
store %x, %A
%y = load %A
... DO STUFF ...
release(%y)
call void @use(%x)
rdar://13750319
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181743 91177308-0d34-0410-b5e6-96231b3b80d8
The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181674 91177308-0d34-0410-b5e6-96231b3b80d8
There are two transforms in visitUrem that conflict with each other.
*) One, if a divisor is a power of two, subtracts one from the divisor
and turns it into a bitwise-and.
*) The other unwraps both operands if they are surrounded by zext
instructions.
Flipping the order allows the subtraction to go beneath the sign
extension.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181668 91177308-0d34-0410-b5e6-96231b3b80d8
Use the widest induction type encountered for the cannonical induction variable.
We used to turn the following loop into an empty loop because we used i8 as
induction variable type and truncated 1024 to 0 as trip count.
int a[1024];
void fail() {
int reverse_induction = 1023;
unsigned char forward_induction = 0;
while ((reverse_induction) >= 0) {
forward_induction++;
a[reverse_induction] = forward_induction;
--reverse_induction;
}
}
radar://13862901
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181667 91177308-0d34-0410-b5e6-96231b3b80d8
For example:
bar() {
int a = A[i];
int b = A[i+1];
B[i] = a;
B[i+1] = b;
foo(a); <--- a is used outside the vectorized expression.
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181648 91177308-0d34-0410-b5e6-96231b3b80d8
The shift amount may be larger than the type leading to undefined behavior.
Limit the transform to constant shift amounts. While there update the bits to
clear in the result which may enable additional optimizations.
PR15959.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181604 91177308-0d34-0410-b5e6-96231b3b80d8
When we replace an internal alias with its target, be careful not to
replace the entry in llvm.used (and llvm.compiler_used).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181524 91177308-0d34-0410-b5e6-96231b3b80d8
That's obviously wrong. Conservatively restrict it to the sign bit, which
matches the original intention of this analysis. Fixes PR15940.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181518 91177308-0d34-0410-b5e6-96231b3b80d8
A computable loop exit count does not imply the presence of an induction
variable. Scalar evolution can return a value for an infinite loop.
Fixes PR15926.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181495 91177308-0d34-0410-b5e6-96231b3b80d8
- the temporaries "-debug.ll" files generated by DebugIR pass are considered tests, even though they are not
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181476 91177308-0d34-0410-b5e6-96231b3b80d8
- simple one-function case
- function-calling case
- external function calling case
- exception throwing case
- vector case
Note: these tests are somewhat coupled to the current format of debug metadata.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181469 91177308-0d34-0410-b5e6-96231b3b80d8
The two nested loops were confusing and also conservative in identifying
reduction variables. This patch replaces them by a worklist based approach.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181369 91177308-0d34-0410-b5e6-96231b3b80d8
We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.
Should fix PR15882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181286 91177308-0d34-0410-b5e6-96231b3b80d8
Test case by Michele Scandale!
Fixes PR10293: Load not hoisted out of loop with multiple exits.
There are few regressions with this patch, now tracked by
rdar:13817079, and a roughly equal number of improvements. The
regressions are almost certainly back luck because LoopRotate has very
little idea of whether rotation is profitable. Doing better requires a
more comprehensive solution.
This checkin is a quick fix that lacks generality (PR10293 has
a counter-example). But it trivially fixes the case in PR10293 without
interfering with other cases, and it does satify the criteria that
LoopRotate is a loop canonicalization pass that should avoid
heuristics and special cases.
I can think of two approaches that would probably be better in
the long run. Ultimately they may both make sense.
(1) LoopRotate should check that the current header would make a good
loop guard, and that the loop does not already has a sufficient
guard. The artifical SimplifiedLoopLatch check would be unnecessary,
and the design would be more general and canonical. Two difficulties:
- We need a strong guarantee that we won't endlessly rotate, so the
analysis would need to be precise in order to avoid the
SimplifiedLoopLatch precondition.
- Analysis like this are usually based on SCEV, which we don't want to
rely on.
(2) Rotate on-demand in late loop passes. This could even be done by
shoving the loop back on the queue after the optimization that needs
it. This could work well when we find LICM opportunities in
multi-branch loops. This requires some work, and it doesn't really
solve the problem of SCEV wanting a loop guard before the analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181230 91177308-0d34-0410-b5e6-96231b3b80d8
A * (1 - (uitofp i1 C)) -> select C, 0, A
B * (uitofp i1 C) -> select C, B, 0
select C, 0, A + select C, B, 0 -> select C, B, A
These come up in code that has been hand-optimized from a select to a linear blend,
on platforms where that may have mattered. We want to undo such changes
with the following transform:
A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181216 91177308-0d34-0410-b5e6-96231b3b80d8
We used to disable constant merging not only if a constant is llvm.used, but
also if an alias of a constant is llvm.used. This change fixes that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181175 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for min/max reductions when "no-nans-float-math" is enabled. This
allows us to assume we have ordered floating point math and treat ordered and
unordered predicates equally.
radar://13723044
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181144 91177308-0d34-0410-b5e6-96231b3b80d8
We can just use the initial element that feeds the reduction.
max(max(x, y), z) == max(max(x,y), max(x,z))
radar://13723044
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181141 91177308-0d34-0410-b5e6-96231b3b80d8
By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.
We can now vectorize this loop:
int foo(int *A, int *B, int n) {
for (int i=0; i < n; i++) {
int x = 9;
if (A[i] > B[i]) {
if (A[i] > 19) {
x = 3;
} else if (B[i] < 4 ) {
x = 4;
} else {
x = 5;
}
}
A[i] = x;
}
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181037 91177308-0d34-0410-b5e6-96231b3b80d8
Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180875 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r180802
There's ongoing discussion about whether this is the right place to make
this transformation. Reverting for now while we figure it out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180834 91177308-0d34-0410-b5e6-96231b3b80d8
Always fold a shuffle-of-shuffle into a single shuffle when there's only one
input vector in the first place. Continue to be more conservative when there's
multiple inputs.
rdar://13402653
PR15866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180802 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes the optimization introduced in r179748 and reverted in r179750.
While the optimization was sound, it did not properly respect differences in
bit-width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180777 91177308-0d34-0410-b5e6-96231b3b80d8
This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:
a[i] =
may-alias with a[i] load
if (cond)
a[i] = Y
into an unconditional store.
a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp
We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.
hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180731 91177308-0d34-0410-b5e6-96231b3b80d8
Turning retains into retainRV calls disrupts the data flow analysis in
ObjCARCOpts. Thus we move it as late as we can by moving it into
ObjCARCContract.
We leave in the conversion from retainRV -> retain in ObjCARCOpt since
it enables the dataflow analysis.
rdar://10813093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180698 91177308-0d34-0410-b5e6-96231b3b80d8
When Reassociator optimize "(x | C1)" ^ "(X & C2)", it may swap the two
subexpressions, however, it forgot to swap cached constants (of C1 and C2)
accordingly.
rdar://13739160
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180676 91177308-0d34-0410-b5e6-96231b3b80d8
Mainly adding paranoid checks for the closing brace of a function to
help with FileCheck error readability. Also some other minor changes.
No actual CHECK changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180668 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.
Patch by Daisuke Takahashi!
Fixes PR15758.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180196 91177308-0d34-0410-b5e6-96231b3b80d8