[InstCombine] Refactor optimization of zext(or(icmp, icmp)) to enable more aggressive cast-folding

Summary: InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation. Tested via `make check` and `lnt`. Background ========== For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again. Details on the submitted changes ================================ - In `visitZExt()` we combine the unfolding and optimization of `zext` instructions. - In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly. - In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions. - The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes. - The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105. Reviewers: grosser, majnemer, spatel Subscribers: eli.friedman, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22864 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277635 91177308-0d34-0410-b5e6-96231b3b80d8
2024-11-24 20:29:53 +00:00 · 2016-08-03 19:30:35 +00:00 · 2016-08-03 19:30:35 +00:00 · dd5930fbf7
commit dd5930fbf7
parent 52f0f65dcd
4 changed files with 87 additions and 7 deletions
--- a/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@ -1209,8 +1209,7 @@ Instruction *InstCombiner::foldCastedBitwiseLogic(BinaryOperator &I) {
  Value *Cast1Src = Cast1->getOperand(0);

  // fold logic(cast(A), cast(B)) -> cast(logic(A, B))
-  if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) &&
-      shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
+  if (shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
    Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
                                        I.getName());
    return CastInst::Create(CastOpcode, NewOp, DestTy);
--- a/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/lib/Transforms/InstCombine/InstCombineCasts.cpp
@ -623,7 +623,9 @@ Instruction *InstCombiner::transformZExtICmp(ICmpInst *ICI, ZExtInst &CI,

        if (CI.getType() == In->getType())
          return replaceInstUsesWith(CI, In);
-        return CastInst::CreateIntegerCast(In, CI.getType(), false/*ZExt*/);
+
+        Value *IntCast = Builder->CreateIntCast(In, CI.getType(), false);
+        return replaceInstUsesWith(CI, IntCast);
      }
    }
  }
@ -890,16 +892,26 @@ Instruction *InstCombiner::visitZExt(ZExtInst &CI) {

  BinaryOperator *SrcI = dyn_cast<BinaryOperator>(Src);
  if (SrcI && SrcI->getOpcode() == Instruction::Or) {
-    // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least one
-    // of the (zext icmp) will be transformed.
+    // zext (or icmp, icmp) -> or (zext icmp), (zext icmp) if at least one
+    // of the (zext icmp) can be eliminated. If so, immediately perform the
+    // according elimination.
    ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
    ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
    if (LHS && RHS && LHS->hasOneUse() && RHS->hasOneUse() &&
        (transformZExtICmp(LHS, CI, false) ||
         transformZExtICmp(RHS, CI, false))) {
+      // zext (or icmp, icmp) -> or (zext icmp), (zext icmp)
      Value *LCast = Builder->CreateZExt(LHS, CI.getType(), LHS->getName());
      Value *RCast = Builder->CreateZExt(RHS, CI.getType(), RHS->getName());
-      return BinaryOperator::Create(Instruction::Or, LCast, RCast);
+      BinaryOperator *Or = BinaryOperator::Create(Instruction::Or, LCast, RCast);
+
+      // Perform the elimination.
+      if (auto *LZExt = dyn_cast<ZExtInst>(LCast))
+        transformZExtICmp(LHS, *LZExt);
+      if (auto *RZExt = dyn_cast<ZExtInst>(RCast))
+        transformZExtICmp(RHS, *RZExt);
+
+      return Or;
    }
  }

--- a/test/Transforms/InstCombine/zext-or-icmp.ll
+++ b/test/Transforms/InstCombine/zext-or-icmp.ll
@ -13,8 +13,8 @@ define i8 @zext_or_icmp_icmp(i8 %a, i8 %b) {
 ; CHECK-LABEL: zext_or_icmp_icmp(
 ; CHECK-NEXT:    %mask = and i8 %a, 1
 ; CHECK-NEXT:    %toBool2 = icmp eq i8 %b, 0
-; CHECK-NEXT:    %1 = xor i8 %mask, 1
 ; CHECK-NEXT:    %toBool22 = zext i1 %toBool2 to i8
+; CHECK-NEXT:    %1 = xor i8 %mask, 1
 ; CHECK-NEXT:    %zext = or i8 %1, %toBool22
 ; CHECK-NEXT:    ret i8 %zext
 }
--- a/test/Transforms/InstCombine/zext.ll
+++ b/test/Transforms/InstCombine/zext.ll
@ -70,3 +70,72 @@ define <2 x i64> @fold_xor_zext_sandwich_vec(<2 x i1> %a) {
  ret <2 x i64> %zext2
 }

+; Assert that zexts in and(zext(icmp), zext(icmp)) can be folded.
+; CHECK-LABEL: @fold_and_zext_icmp(
+; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
+; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
+; CHECK-NEXT:    [[AND:%.*]] = and i1 [[ICMP1]], [[ICMP2]]
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[AND]] to i8
+; CHECK-NEXT:    ret i8 [[ZEXT]]
+define i8 @fold_and_zext_icmp(i64 %a, i64 %b, i64 %c) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = and i8 %2, %4
+  ret i8 %5
+}
+
+; Assert that zexts in or(zext(icmp), zext(icmp)) can be folded.
+; CHECK-LABEL: @fold_or_zext_icmp(
+; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
+; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
+; CHECK-NEXT:    [[OR:%.*]] = or i1 [[ICMP1]], [[ICMP2]]
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[OR]] to i8
+; CHECK-NEXT:    ret i8 [[ZEXT]]
+define i8 @fold_or_zext_icmp(i64 %a, i64 %b, i64 %c) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = or i8 %2, %4
+  ret i8 %5
+}
+
+; Assert that zexts in xor(zext(icmp), zext(icmp)) can be folded.
+; CHECK-LABEL: @fold_xor_zext_icmp(
+; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
+; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
+; CHECK-NEXT:    [[XOR:%.*]] = xor i1 [[ICMP1]], [[ICMP2]]
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[XOR]] to i8
+; CHECK-NEXT:    ret i8 [[ZEXT]]
+define i8 @fold_xor_zext_icmp(i64 %a, i64 %b, i64 %c) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = xor i8 %2, %4
+  ret i8 %5
+}
+
+; Assert that zexts in logic(zext(icmp), zext(icmp)) are also folded accross
+; nested logical operators.
+; CHECK-LABEL: @fold_nested_logic_zext_icmp(
+; CHECK-NEXT:    [[ICMP1:%.*]] = icmp sgt i64 %a, %b
+; CHECK-NEXT:    [[ICMP2:%.*]] = icmp slt i64 %a, %c
+; CHECK-NEXT:    [[AND:%.*]] = and i1 [[ICMP1]], [[ICMP2]]
+; CHECK-NEXT:    [[ICMP3:%.*]] = icmp eq i64 %a, %d
+; CHECK-NEXT:    [[OR:%.*]] = or i1 [[AND]], [[ICMP3]]
+; CHECK-NEXT:    [[ZEXT:%.*]] = zext i1 [[OR]] to i8
+; CHECK-NEXT:    ret i8 [[ZEXT]]
+define i8 @fold_nested_logic_zext_icmp(i64 %a, i64 %b, i64 %c, i64 %d) {
+  %1 = icmp sgt i64 %a, %b
+  %2 = zext i1 %1 to i8
+  %3 = icmp slt i64 %a, %c
+  %4 = zext i1 %3 to i8
+  %5 = and i8 %2, %4
+  %6 = icmp eq i64 %a, %d
+  %7 = zext i1 %6 to i8
+  %8 = or i8 %5, %7
+  ret i8 %8
+}