Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.
This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.
Reviewers: jmolloy, majnemer
Subscribers: mcrosier, mssimpso, llvm-commits
Differential Revision: http://reviews.llvm.org/D13887
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251281 91177308-0d34-0410-b5e6-96231b3b80d8
First, the motivation: LLVM currently does not realize that:
((2072 >> (L == 0)) >> 7) & 1 == 0
where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.
This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:
1. InstCombine won't automatically pick up the associated logic in
InstSimplify (InstCombine uses InstSimplify, but not via the API that
passes in the original instruction).
2. Putting the logic in InstCombine allows the resulting simplifications to become
part of the iterative worklist
3. Putting the logic in InstSimplify allows the resulting simplifications to be
used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
and many others).
And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251146 91177308-0d34-0410-b5e6-96231b3b80d8
Allow LLVM to optimize the sequence like the following:
%inc = add nsw i32 %i, 1
%cmp = icmp slt %n, %inc
into:
%cmp = icmp sle i32 %n, %i
The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250746 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Differential Revision: http://reviews.llvm.org/D13348
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250609 91177308-0d34-0410-b5e6-96231b3b80d8
Stop relying on implicit conversions of ilist iterators in
LLVMInstCombine. No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250183 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.
The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250160 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886
Without this IR transform, the backend (x86 at least) was producing inefficient code.
This patch is making 2 assumptions:
1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.
Differential Revision: http://reviews.llvm.org/D13076
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249702 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow us to optimize code such as:
int f(int *p) {
int x;
return p == &x;
}
as well as:
int *allocate(void);
int f() {
int x;
int *p = allocate();
return p == &x;
}
The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.
This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.
Differential Revision: http://reviews.llvm.org/D13358
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249490 91177308-0d34-0410-b5e6-96231b3b80d8
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.
Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.
This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.
Fixes PR24922.
Differential Revision: http://reviews.llvm.org/D13219
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249390 91177308-0d34-0410-b5e6-96231b3b80d8
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249196 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:
void foo(char *, char *);
void bar(int Size, bool flag) {
for (int i = 0; i < Size; ++i) {
char text[1];
char buff[1];
if (flag)
foo(text, buff); // BBFoo
}
}
the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.
This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).
This fixes PR24598: excessive compile time after r234581.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13305
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249018 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.
Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.
We may end up with byte shuffles with constant masks as the result of inlining.
Differential Revision: http://reviews.llvm.org/D13252
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248913 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.
This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).
This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.
Differential Revision: http://reviews.llvm.org/D12935
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248784 91177308-0d34-0410-b5e6-96231b3b80d8
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766
We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:
#include <stdbool.h>
bool switchy(char x1, char x2, char condition) {
bool conditionMet = false;
switch (condition) {
case 0: conditionMet = (x1 == x2); break;
case 1: conditionMet = (x1 <= x2); break;
}
return conditionMet;
}
bool switchy2(char x1, char x2, char condition) {
switch (condition) {
case 0: return (x1 == x2);
case 1: return (x1 <= x2);
}
return false;
}
As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it,
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.
Differential Revision: http://reviews.llvm.org/D12866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248689 91177308-0d34-0410-b5e6-96231b3b80d8
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723
My first attempt at this was to change what I thought was the root problem:
xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32
...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!
My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.
I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):
bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq
Differential Revision: http://reviews.llvm.org/D12705
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248634 91177308-0d34-0410-b5e6-96231b3b80d8
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248368 91177308-0d34-0410-b5e6-96231b3b80d8
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.
This fixes PR24873.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248074 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`). This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.
Later, we can also consider optimizing:
icmp slt signum(x), 0 --> icmp slt x, 0
icmp sle signum(x), 1 --> true
etc.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12703
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247846 91177308-0d34-0410-b5e6-96231b3b80d8
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant. The "so small" case includes negative zero.
Differential review: http://reviews.llvm.org/D11210
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247708 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not covered under llvm/test, but fixes should be pretty obvious.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12779
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247587 91177308-0d34-0410-b5e6-96231b3b80d8
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):
<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.
Added constant folding support.
Differential Revision: http://reviews.llvm.org/D12731
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247504 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12779
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247356 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.
Reviewers: reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D12772
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247353 91177308-0d34-0410-b5e6-96231b3b80d8
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.
Differential Revision: http://reviews.llvm.org/D12520
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247271 91177308-0d34-0410-b5e6-96231b3b80d8
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247167 91177308-0d34-0410-b5e6-96231b3b80d8