We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.
This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.
Fixes PR34841.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314971 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds splat support to transformZExtICmp. The test cases are vector versions of tests that failed when commenting out parts of the existing scalar code.
One test didn't vectorize optimize properly due to another bug so a TODO has been added.
Differential Revision: https://reviews.llvm.org/D37253
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312023 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the bitsToClear from the LHS of an 'and' comes back non-zero, but all of those bits are known zero on the RHS, we can reset bitsToClear.
Without this, the 'or' in the modified test case blocks the transform because it has non-zero bits in its RHS in those bits.
Reviewers: spatel, majnemer, davide
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36944
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311343 91177308-0d34-0410-b5e6-96231b3b80d8
I couldn't find any smaller folds to help the cases in:
https://bugs.llvm.org/show_bug.cgi?id=34046
after:
rL310141
The truncated rotate-by-variable patterns elude all of the existing transforms because
of multiple uses and knowledge about demanded bits and knownbits that doesn't exist
without the whole pattern. So we need an unfortunately large pattern match. But by
simplifying this pattern in IR, the backend is already able to generate
rolb/rolw/rorb/rorw for x86 using its existing rotate matching logic (although
there is a likely extraneous 'and' of the rotate amount).
Note that rotate-by-constant doesn't have this problem - smaller folds should already
produce the narrow IR ops.
Differential Revision: https://reviews.llvm.org/D36395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310509 91177308-0d34-0410-b5e6-96231b3b80d8
In addition to moving the shift transforms over, we may want to
detect too-wide rotate patterns here (PR34046).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310181 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.
This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307451 91177308-0d34-0410-b5e6-96231b3b80d8
There are 2 parts to this patch made simultaneously to avoid a regression.
We're reversing the canonicalization that moves bitwise vector ops before bitcasts.
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks
of the patch. The motivation is that there's only one fold that currently depends on
the existing canonicalization (see next), but there are many folds that would
automatically benefit from the new canonicalization.
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these
patterns in IR.
There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated
because it requires 4 logic ops with optional bitcast and sext ops.
Test diffs:
1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference -
bitcast comes before logic.
2. There are also tests with no diffs in bitcast.ll that verify that we're still doing
folds that were enabled by the previous canonicalization.
3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns
to look through bitcasts.
4. logical-select.ll contains several tests for the or(and,andn) --> select fold to
verify that we are still handling those cases. The lone diff shows the movement of
the bitcast from the new canonicalization rule.
Differential Revision: https://reviews.llvm.org/D33517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306011 91177308-0d34-0410-b5e6-96231b3b80d8
These methods are specifically optimized to only counting leading zeros without an additional uint64_t compare.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304876 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise we end up miscompiling, transforming:
define i8 @tinky() {
%sext = sext i1 1 to i16
%hibit = lshr i16 %sext, 15
%tr = trunc i16 %hibit to i8
ret i8 %tr
}
into:
%sext = sext i1 1 to i8
ret i8 %sext
and the first get folded to ret i8 1, while the second gets folded
to ret i8 -1.
Eventually we should get rid of this transform entirely, but for now,
this at least fixes a know correctness bug.
Differential Revision: https://reviews.llvm.org/D33338
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303513 91177308-0d34-0410-b5e6-96231b3b80d8
The comment says to avoid the case where zero bits are shifted into the truncated value,
but the code checks that the shift is smaller than the truncated value instead of the
number of bits added by the sign extension. Fixing this allows a shift by more than the
value size to be introduced, which is undefined behavior, so the shift is capped at the
value size minus one, which has the expected behavior of filling the value with the sign
bit.
Patch by Jacob Young!
Differential Revision: https://reviews.llvm.org/D32285
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302548 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301432 91177308-0d34-0410-b5e6-96231b3b80d8
We currently only support folding a subtract into a select but not a PHI. This fixes that.
I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.
Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.
Differential Revision: https://reviews.llvm.org/D31686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300363 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The call to canEvaluateZExtd in InstCombiner::visitZExt may
return with BitsToClear == SrcTy->getScalarSizeInBits(), but
there is an assert that BitsToClear should be smaller than
SrcTy->getScalarSizeInBits().
I have a test case that triggers the assert, but it only happens
for my downstream target. I've not been able to trigger it for
any upstream target.
The assert triggered for a piece of code such as this
%shr1 = lshr i16 undef, 15
...
%shr2 = lshr i16 %shr1, 1
%conv = zext i16 %shr2 to i32
Normally the lshr instructions are constant folded before we
visit the zext (that is why it is so hard to reproduce).
The original pattern, before instcombine, is of course a lot more
complicated in my test case. The shift count in the second lshr
is for example determined by the outcome of a PHI instruction.
It seems like other rewrites by instcombine leads up to
the pattern above. And then the zext is pulled from the
worklist, and visited (hitting the assert), before we detect
that the lshr instrucions can be constant folded.
Anyway, since the canEvaluateZExtd may return with BitsToClear
equal to SrcTy->getScalarSizeInBits(), and since the rewrite
that converts the expression type to avoid a zero extend works
also for the case where SrcBitsKept ends up being zero, then
it should be OK to liberate the assert to
assert(BitsToClear <= SrcTy->getScalarSizeInBits() &&
"Unreasonable BitsToClear");
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D30993
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297952 91177308-0d34-0410-b5e6-96231b3b80d8
This one seems more obvious than D30270 that it can't make improvements because an extension always needs
all of the incoming bits. There's one specific transform in SimplifyDemandedInstructionBits of converting
a sext to a zext when the sign-bit is known zero, but that is handled explicitly in visitSext() with
ComputeSignBit().
Like D30270, there are no IR differences (other than instruction names) for the case in PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037
...and no regression test differences.
Zext/sext are a smaller part of the profile, but this still appears to shave off another 0.5% or so from
'opt -O2'.
Differential Revision: https://reviews.llvm.org/D30280
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296129 91177308-0d34-0410-b5e6-96231b3b80d8
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292172 91177308-0d34-0410-b5e6-96231b3b80d8
We had a limited version of this for scalar 'and'; this expands
the transform to 'or' and 'xor' and allows vectors types too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288273 91177308-0d34-0410-b5e6-96231b3b80d8
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Differential Revision: https://reviews.llvm.org/D26641
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287707 91177308-0d34-0410-b5e6-96231b3b80d8
1. Use 'auto' with dyn_cast.
2. Variables start with a capital letter.
3. Use proper punctuation in comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285200 91177308-0d34-0410-b5e6-96231b3b80d8
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Differential Revision: https://reviews.llvm.org/D23896
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285116 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r272987 with a fix for infinitely looping
when the truncated value is another shift of a constant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281379 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation.
Tested via `make check` and `lnt`.
Background
==========
For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again.
Details on the submitted changes
================================
- In `visitZExt()` we combine the unfolding and optimization of `zext` instructions.
- In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly.
- In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions.
- The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes.
- The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105.
Reviewers: grosser, majnemer, spatel
Subscribers: eli.friedman, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22864
Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277635 91177308-0d34-0410-b5e6-96231b3b80d8
A ConstantVector can have ConstantExpr operands and vice versa.
However, the folder had no ability to fold ConstantVectors which, in
some cases, was an optimization barrier.
Instead, rephrase the folder in terms of Constants instead of
ConstantExprs and teach callers how to deal with failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277099 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following:
- Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter.
- Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there.
- Simplify interface of `shouldOptimizeCast()`.
- Minor code style adaptions in `shouldOptimizeCast()`.
- Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant.
- Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format.
- Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability.
- Simplify the interface of `isEliminableCastPair()`.
- Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration.
- Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer.
- Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant.
Reviewers: vtjnash, grosser
Subscribers: majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D22449
Contributed-by: Matthias Reisinger
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275964 91177308-0d34-0410-b5e6-96231b3b80d8