The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314117 91177308-0d34-0410-b5e6-96231b3b80d8
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.
As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.
Differential Revision: https://reviews.llvm.org/D37236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312248 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Reviewers: grosser, mkuper, fpetrogalli, efriedma
Reviewed By: fpetrogalli
Subscribers: gareevroman, llvm-commits
Differential Revision: https://reviews.llvm.org/D37064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312110 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307451 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This matches the behavior we already had for compares and makes us consistent everywhere.
Reviewers: dberlin, hfinkel, spatel
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33604
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305049 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug that can cause extractelements with operands that
haven't been defined yet to be inserted at a wrong point when
optimising insertelements.
Patch by Karl Hylen.
Differential Revision: https://reviews.llvm.org/D33449
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304701 91177308-0d34-0410-b5e6-96231b3b80d8
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1
As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.
Differential Revision: https://reviews.llvm.org/D31196
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298520 91177308-0d34-0410-b5e6-96231b3b80d8
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289756 91177308-0d34-0410-b5e6-96231b3b80d8
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286423 91177308-0d34-0410-b5e6-96231b3b80d8
If inserting more than one constant into a vector:
define <4 x float> @foo(<4 x float> %x) {
%ins1 = insertelement <4 x float> %x, float 1.0, i32 1
%ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2
ret <4 x float> %ins2
}
InstCombine could reduce that to a shufflevector:
define <4 x float> @goo(<4 x float> %x) {
%shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3>
ret <4 x float> %shuf
}
Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e.
shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float
undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> ->
insertelement <4 x float> %v, float 1.0, 1
Differential Revision: https://reviews.llvm.org/D24182
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282237 91177308-0d34-0410-b5e6-96231b3b80d8
The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards
shrinking that to a single shufflevector.
Note that the transform is intentionally limited to shuffles that are equivalent to vector
selects to avoid creating arbitrary shuffle masks that may not lower well.
This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126
Differential Revision: https://reviews.llvm.org/D23886
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280504 91177308-0d34-0410-b5e6-96231b3b80d8
scalarizePHI only looked for phis that have exactly two uses - the "latch"
use, and an extract. Unfortunately, we can not assume all equivalent extracts
are CSE'd, since InstCombine itself may create an extract which is a duplicate
of an existing one. This extends it to handle several distinct extracts from
the same index.
This should fix at least some of the performance regressions from PR27988.
Differential Revision: http://reviews.llvm.org/D20983
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271961 91177308-0d34-0410-b5e6-96231b3b80d8
Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262376 91177308-0d34-0410-b5e6-96231b3b80d8
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259236 91177308-0d34-0410-b5e6-96231b3b80d8
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256394 91177308-0d34-0410-b5e6-96231b3b80d8
Stop relying on implicit conversions of ilist iterators in
LLVMInstCombine. No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250183 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are. This lead to assertion failures.
This fixes PR23113.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234046 91177308-0d34-0410-b5e6-96231b3b80d8
Adding nullptr to all the IRBuilder stuff because it's the first thing
that fails to build when testing without the back-compat functions, so
I'll keep having to re-add these locally for each chunk of migration I
do. Might as well check them in to save me the churn. Eventually I'll
have to migrate these too, but I'm going breadth-first.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232270 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231740 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.
Test Plan: make check
Reviewers: jvoung, chandlerc
Subscribers: llvm-commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230560 91177308-0d34-0410-b5e6-96231b3b80d8
creating a non-internal header file for the InstCombine pass.
I thought about calling this InstCombiner.h or in some way more clearly
associating it with the InstCombiner clas that it is primarily defining,
but there are several other utility interfaces defined within this for
InstCombine. If, in the course of refactoring, those end up moving
elsewhere or going away, it might make more sense to make this the
combiner's header alone.
Naturally, this is a bikeshed to a certain degree, so feel free to lobby
for a different shade of paint if this name just doesn't suit you.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226783 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables transformations:
BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2))
BinOp(shuffle(v1), const1) -> shuffle(BinOp, const2)
They allow to eliminate extra shuffles in some cases.
Differential Revision: http://reviews.llvm.org/D3525
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208488 91177308-0d34-0410-b5e6-96231b3b80d8
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206844 91177308-0d34-0410-b5e6-96231b3b80d8
header files and into the cpp files.
These files will require more touches as the header files actually use
DEBUG(). Eventually, I'll have to introduce a matched #define and #undef
of DEBUG_TYPE for the header files, but that comes as step N of many to
clean all of this up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206777 91177308-0d34-0410-b5e6-96231b3b80d8
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203364 91177308-0d34-0410-b5e6-96231b3b80d8
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).
This should allow shuffles with different numbers of elements on the
input and output sides as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203229 91177308-0d34-0410-b5e6-96231b3b80d8
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8