This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented
as an independent pass, so there's no stretching of scope and feature creep for an existing pass.
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028
The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.
Decomposing remainder may allow removing some code from the backend (PPC and possibly others).
Differential Revision: https://reviews.llvm.org/D37121
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312862 91177308-0d34-0410-b5e6-96231b3b80d8
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.
Differential revision: https://reviews.llvm.org/D27846
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312791 91177308-0d34-0410-b5e6-96231b3b80d8
This patch expands the support of lowerInterleavedload to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) and we plan to include the store (deinterleved side).
The patch goal is to optimize the following sequence:
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
into
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
Reviewers
1. zvi
2. igor
3. guyblank
4. dorit
5. Ayal
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312722 91177308-0d34-0410-b5e6-96231b3b80d8
Consider this type of a loop:
for (...) {
...
if (...) continue;
...
}
Normally, the "continue" would branch to the loop control code that
checks whether the loop should continue iterating and which contains
the (often) unique loop latch branch. In certain cases jump threading
can "thread" the inner branch directly to the loop header, creating
a second loop latch. Loop canonicalization would then transform this
loop into a loop nest. The problem with this is that in such a loop
nest neither loop is countable even if the original loop was. This
may inhibit subsequent loop optimizations and be detrimental to
performance.
Differential Revision: https://reviews.llvm.org/D36404
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312664 91177308-0d34-0410-b5e6-96231b3b80d8
This is a preliminary step towards solving the remaining part of PR27145 - IR for isfinite():
https://bugs.llvm.org/show_bug.cgi?id=27145
In order to solve that one more generally, we need to add matching for and/or of fcmp ord/uno
with a constant operand.
But while looking at those patterns, I realized we were missing a canonicalization for nonzero
constants. Rather than limiting to just folds for constants, we're adding a general value
tracking method for this based on an existing DAG helper.
By transforming everything to 0.0, we can simplify the existing code in foldLogicOfFCmps()
and pick up missing vector folds.
Differential Revision: https://reviews.llvm.org/D37427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312591 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested in D37427, we could have a value tracking function and folds that use
it to simplify these cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312578 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This intrinsic represents a label with a list of associated metadata
strings. It is modelled as reading and writing inaccessible memory so
that it won't be removed as dead code. I think the intention is that the
annotation strings should appear at most once in the debug info, so I
marked it noduplicate. We are allowed to inline code with annotations as
long as we strip the annotation, but that can be done later.
Reviewers: majnemer
Subscribers: eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D36904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312569 91177308-0d34-0410-b5e6-96231b3b80d8
This is possible if C1 and C2 are both powers of 2. Or if binop is 'and' then ~C2 needs to be a power of 2.
We already support this for 'or', but we should be able to support 'and' and 'xor'. This will be enhanced by D37274.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312519 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count.
Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow:
1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks.
2. Compute MaxVF returned 16 on a target supporting AVX512.
3. OptForSize -> choose VF:=MaxVF.
4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop.
With this patch step 2. will choose MaxVF=8 based on TripCount.
Reviewers: Ayal, dorit, mkuper, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D37425
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312472 91177308-0d34-0410-b5e6-96231b3b80d8
Debug information can be, and was, corrupted when the runtime
remainder loop was fully unrolled. This is because a !null node can
be created instead of a unique one describing the loop. In this case,
the original node gets incorrectly updated with the NewLoopID
metadata.
In the case when the remainder loop is going to be quickly fully
unrolled, there isn't the need to add loop metadata for it anyway.
Differential Revision: https://reviews.llvm.org/D37338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312471 91177308-0d34-0410-b5e6-96231b3b80d8
These are all tests that result in a constant, so moving the tests over to where they are actually handled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312411 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.
Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).
Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway. Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc. In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.
This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.
This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve. It also requires a
bunch of caching that i'll eventually like to eliminate.
The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.
Reviewers: davide, mcrosier
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37175
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312401 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches decomposeBitTestICmp to look through truncate instructions on the input to the compare. If a truncate is found it will now return the pre-truncated Value and appropriately extend the APInt mask.
This allows some code to be removed from InstSimplify that was doing this functionality.
This allows InstCombine's bit test combining code to match a pre-truncate Value with the same Value appear with an 'and' on another icmp. Or it allows us to combine a truncate to i16 and a truncate to i8. This also required removing the type check from the beginning of getMaskedTypeForICmpPair, but I believe that's ok because we still have to find two values from the input to each icmp that are equal before we'll do any transformation. So the type check was really just serving as an early out.
There was one user of decomposeBitTestICmp that didn't want to look through truncates, so I've added a flag to prevent that behavior when necessary.
Differential Revision: https://reviews.llvm.org/D37158
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312382 91177308-0d34-0410-b5e6-96231b3b80d8
Previously this would sporadically crash as TargetType
was never initialized. We special-case the single-operand
case returning earlier and trying to mimic the behaviour of
isLegalAddressingMode as closely as possible.
Differential Revision: https://reviews.llvm.org/D37277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312357 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: When we backtranslate expressions, we can't use the predicateinfo, since we are evaluating them in a different context.
Reviewers: davide, mcrosier
Subscribers: sanjoy, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D37174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312352 91177308-0d34-0410-b5e6-96231b3b80d8
Adding test for debug info for integer
variables whose type is shrinked to bool.
Patch by Nikola Prica.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312325 91177308-0d34-0410-b5e6-96231b3b80d8
comparisons into memcmp.
Thanks to recent improvements in the LLVM codegen, the memcmp is typically
inlined as a chain of efficient hardware comparisons.
This typically benefits C++ member or nonmember operator==().
For now this is disabled by default until:
- https://bugs.llvm.org/show_bug.cgi?id=33329 is complete
- Benchmarks show that this is always useful.
Differential Revision:
https://reviews.llvm.org/D33987
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312315 91177308-0d34-0410-b5e6-96231b3b80d8
The BasicBlock passed to FindPredecessorRetainWithSafePath should be the
parent block of Autorelease. This fixes a crash that occurs in
FindDependencies when StartInst is not in StartBB.
rdar://problem/33866381
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312266 91177308-0d34-0410-b5e6-96231b3b80d8
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.
As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.
Differential Revision: https://reviews.llvm.org/D37236
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312248 91177308-0d34-0410-b5e6-96231b3b80d8
Current implementation of parseLoopStructure interprets the latch comparison as a
comarison against `iv.next`. If the actual comparison is made against the `iv` current value
then the loop may be rejected, because this misinterpretation leads to incorrect evaluation
of the latch start value.
This patch teaches the IRCE to distinguish this kind of loops and perform the optimization
for them. Now we use `IndVarBase` variable which can be either next or current value of the
induction variable (previously we used `IndVarNext` which was always the value on next iteration).
Differential Revision: https://reviews.llvm.org/D36215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312221 91177308-0d34-0410-b5e6-96231b3b80d8
See D37236 for discussion. It seems unlikely that we actually want/need
to do this kind of folding in InstCombine in the long run, but moving
everything will be a bigger follow-up step.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312172 91177308-0d34-0410-b5e6-96231b3b80d8
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312144 91177308-0d34-0410-b5e6-96231b3b80d8
This code is double-dead:
1. We simplify all selects with constant true/false condition in InstSimplify.
I've minimized/moved the tests to show that works as expected.
2. All remaining vector selects with a constant condition are canonicalized to
shufflevector, so we really can't see this pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312123 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the first insertelement instruction has multiple users and inserts at
position 0, we can re-use this instruction when folding a chain of
insertelement instructions. As we need to generate the first
insertelement instruction anyways, this should be a strict improvement.
We could get rid of the restriction of inserting at position 0 by
creating a different shufflemask, but it is probably worth to keep the
first insertelement instruction with position 0, as this is easier to do
efficiently than at other positions I think.
Reviewers: grosser, mkuper, fpetrogalli, efriedma
Reviewed By: fpetrogalli
Subscribers: gareevroman, llvm-commits
Differential Revision: https://reviews.llvm.org/D37064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312110 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When jumptable encoding does not match target code encoding (arm vs
thumb), a veneer is inserted by the linker. We can not avoid this
in all cases, because entries within one jumptable must have the same
encoding, but we can make it less common by selecting the jumptable
encoding to match the majority of its targets.
This change only covers FullLTO, and not ThinLTO.
Reviewers: pcc
Subscribers: aemerson, mehdi_amini, javed.absar, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37171
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312054 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Cross-DSO CFI needs all __cfi_check exports to use the same encoding
(ARM vs Thumb).
Reviewers: pcc
Subscribers: aemerson, srhines, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D37243
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312052 91177308-0d34-0410-b5e6-96231b3b80d8
This is to fix PR34257. rL309059 takes an early return when FindLIVLoopCondition
fails to find a loop invariant condition. This is wrong and it will disable loop
unswitch for select. The patch fixes the bug.
Differential Revision: https://reviews.llvm.org/D36985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312045 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If SimplifyCFG pass is able to merge conditional stores into single one,
it loses the alignment. This may lead to incorrect codegen. Patch
sets the alignment of the new instruction if it is set in the original
one.
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36841
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312030 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.
One test didn't vectorize optimize properly due to another bug so a TODO has been added.
Differential Revision: https://reviews.llvm.org/D37253
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312023 91177308-0d34-0410-b5e6-96231b3b80d8
I forgot to specify -unroll-loop-peel, making this test not
really effective. While here, adjust some details (naming and
run line). Thanks to Sanjoy and Michael Z. for pointing out in
their post-commit reviews.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312015 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: We originally assume that in pgo-icp, the promoted direct call will never be null after strip point casts. However, stripPointerCasts is so smart that it could possibly return the value of the function call if it knows that the return value is always an argument. In this case, the returned value cannot cast to Instruction. In this patch, null check is added to ensure null pointer will not be accessed.
Reviewers: tejohnson, xur, davidxl, djasper
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37252
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312005 91177308-0d34-0410-b5e6-96231b3b80d8