Sort the operands of the other entries in the current vectorization root
according to the first entry's operands opcodes.
%conv0 = uitofp ...
%load0 = load float ...
= fmul %conv0, %load0
= fmul %load0, %conv1
= fmul %load0, %conv2
Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0,
%load0, %load0>.
This makes it more likely to obtain vectorizable trees. We have to be careful
when we sort that we don't destroy 'good' existing ordering implied by source
order.
radar://15080067
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191977 91177308-0d34-0410-b5e6-96231b3b80d8
Generalize the API so we can distinguish symbols that are needed just for a DSO
symbol table from those that are used from some native .o.
The symbols that are only wanted for the dso symbol table can be dropped if
llvm can prove every other dso has a copy (linkonce_odr) and the address is not
important (unnamed_addr).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191922 91177308-0d34-0410-b5e6-96231b3b80d8
Don't vectorize with a runtime check if it requires a
comparison between pointers with different address spaces.
The values can't be assumed to be directly comparable.
Previously it would create an illegal bitcast.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191862 91177308-0d34-0410-b5e6-96231b3b80d8
This recursively strips all GEPs like the existing code. It also handles bitcasts and
other operations that do not change the pointer value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191847 91177308-0d34-0410-b5e6-96231b3b80d8
Switch instructions were crashing the StructurizeCFG pass, and it's
probably easier anyway if we don't need to handle them in this pass.
Reviewed-by: Christian König <christian.koenig@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191841 91177308-0d34-0410-b5e6-96231b3b80d8
infrastructure.
This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.
Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191835 91177308-0d34-0410-b5e6-96231b3b80d8
The test's output doesn't change, but this ensures
this is actually hit with a different address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191701 91177308-0d34-0410-b5e6-96231b3b80d8
Defines away the issue where cast<Instruction> would fail because constant
folding happened. Also slightly cleaner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191674 91177308-0d34-0410-b5e6-96231b3b80d8
Inspired by the object from the SLPVectorizer. This found a minor bug in the
debug loc restoration in the vectorizer where the location of a following
instruction was attached instead of the location from the original instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191673 91177308-0d34-0410-b5e6-96231b3b80d8
when it was actually a Constant*.
There are quite a few other casts to Instruction that might have the same problem,
but this is the only one I have a test case for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191668 91177308-0d34-0410-b5e6-96231b3b80d8
Currently foldSelectICmpAndOr asserts if the "or" involves a vector
containing several of the same power of two. We can easily avoid this by
only performing the fold on integer types, like foldSelectICmpAnd does.
Fixes <rdar://problem/15012516>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191552 91177308-0d34-0410-b5e6-96231b3b80d8
We were previously using getFirstInsertionPt to insert PHI
instructions when vectorizing, but getFirstInsertionPt also skips past
landingpads, causing this to generate invalid IR.
We can avoid this issue by using getFirstNonPHI instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191526 91177308-0d34-0410-b5e6-96231b3b80d8
Put them under a separate flag for experimentation. They are more likely to
interfere with loop vectorization which happens later in the pass pipeline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191371 91177308-0d34-0410-b5e6-96231b3b80d8
Revert 191122 - with extra checks we are allowed to vectorize math library
function calls.
Standard library indentifiers are reserved names so functions with external
linkage must not overrided them. However, functions with internal linkage can.
Therefore, we can vectorize calls to math library functions with a check for
external linkage and matching signature. This matches what we do during
SelectionDAG building.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191206 91177308-0d34-0410-b5e6-96231b3b80d8
This makes using array_pod_sort significantly safer. The implementation relies
on function pointer casting but that should be safe as we're dealing with void*
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191175 91177308-0d34-0410-b5e6-96231b3b80d8
SROA wants to convert any types of equivalent widths but it's not possible to
convert vectors of pointers to an integer scalar with a single cast. As a
workaround we add a bitcast to the corresponding int ptr type first. This type
of cast used to be an edge case but has become common with SLP vectorization.
Fixes PR17271.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191143 91177308-0d34-0410-b5e6-96231b3b80d8
Reapply r191108 with a fix for a memory corruption error I introduced. Of
course, we can't reference the scalars that we replace by vectorizing and then
call their eraseFromParent method. I only 'needed' the scalars to get the
DebugLoc. Just store the DebugLoc before actually vectorizing instead. As a nice
side effect, this also simplifies the interface between BoUpSLP and the
HorizontalReduction class to returning a value pointer (the vectorized tree
root).
radar://14607682
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191123 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r191108.
The horizontal.ll test case fails under libgmalloc. Thanks Shuxin for pointing
this out to me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191121 91177308-0d34-0410-b5e6-96231b3b80d8
The problem of r191017 is that when GVN fabricate a val-number for a dead instruction (in order
to make following expr-PRE happy), it forget to fabricate a leader-table entry for it as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191118 91177308-0d34-0410-b5e6-96231b3b80d8
Match reductions starting at binary operation feeding into a phi. The code
handles trees like
r += v1 + v2 + v3 ...
and
r += v1
r += v2
...
and
r *= v1 + v2 + ...
We currently only handle associative operations (add, fadd fast).
The code can now also handle reductions feeding into stores.
a[i] = v1 + v2 + v3 + ...
The code is currently disabled behind the flag "-slp-vectorize-hor". The cost
model for most architectures is not there yet.
I found one opportunity of a horizontal reduction feeding a phi in TSVC
(LoopRerolling-flt) and there are several opportunities where reductions feed
into stores.
radar://14607682
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191108 91177308-0d34-0410-b5e6-96231b3b80d8
The GEP pattern is what SCEV expander emits for "ugly geps". The latter is what
you get for pointer subtraction in C code. The rest of instcombine already
knows how to deal with that so just canonicalize on that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191090 91177308-0d34-0410-b5e6-96231b3b80d8
If "C1/X" were having multiple uses, the only benefit of this
transformation is to potentially shorten critical path. But it is at the
cost of instroducing additional div.
The additional div may or may not incur cost depending on how div is
implemented. If it is implemented using Newton–Raphson iteration, it dosen't
seem to incur any cost (FIXME). However, if the div blocks the entire
pipeline, that sounds to be pretty expensive. Let CodeGen to take care
this transformation.
This patch sees 6% on a benchmark.
rdar://15032743
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191037 91177308-0d34-0410-b5e6-96231b3b80d8