[NewPM] Avoid redundant CGSCC run for updated SCC

Summary: When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We can avoid rerun of the same SCC when the SCC is set to be processed by both mechanisms *back to back*. In pathological cases, such redundant rerun could cause exponential size growth due to inlining along cycles, even when there's no SCC mutation and hence convergence is not a problem. Note that it's ok to have SCC updated and rerun immediately, and also in the work list if we have actually moved an SCC to be topologically "below" the current one due to merging. In that case, we will need to revisit the current SCC after those moved SCCs. For that reason, the redundant avoidance here only targets back to back rerun of the same SCC - the case described by the now removed FIXME comment. Reviewers: chandlerc, wmi Subscribers: llvm-commits, hoy Tags: #llvm Differential Revision: https://reviews.llvm.org/D80589
2025-02-28 00:07:22 +00:00 · 2020-05-26 13:53:35 -07:00 · 2020-05-26 13:53:35 -07:00 · f6c8ca4756
commit f6c8ca4756
parent e63e26dab3
3 changed files with 61 additions and 10 deletions
--- a/include/llvm/Analysis/CGSCCPassManager.h
+++ b/include/llvm/Analysis/CGSCCPassManager.h
@ -820,6 +820,12 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
      LLVM_DEBUG(dbgs() << "Running an SCC pass across the RefSCC: " << *RC
                        << "\n");

+      // The top of the worklist may *also* be the same SCC we just ran over
+      // (and invalidated for). Keep track of that last SCC we processed due
+      // to SCC update to avoid redundant processing when an SCC is both just
+      // updated itself and at the top of the worklist.
+      LazyCallGraph::SCC *LastUpdatedC = nullptr;
+
      // Push the initial SCCs in reverse post-order as we'll pop off the
      // back and so see this in post-order.
      for (LazyCallGraph::SCC &C : llvm::reverse(*RC))
@ -835,6 +841,10 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
          LLVM_DEBUG(dbgs() << "Skipping an invalid SCC...\n");
          continue;
        }
+        if (LastUpdatedC == C) {
+          LLVM_DEBUG(dbgs() << "Skipping redundant run on SCC: " << *C << "\n");
+          continue;
+        }
        if (&C->getOuterRefSCC() != RC) {
          LLVM_DEBUG(dbgs() << "Skipping an SCC that is now part of some other "
                               "RefSCC...\n");
@ -863,11 +873,6 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
        // invalidate the analyses for any SCCs other than themselves which
        // are mutated. However, that seems to lose the robustness of the
        // pass-manager driven invalidation scheme.
-        //
-        // FIXME: This is redundant in one case -- the top of the worklist may
-        // *also* be the same SCC we just ran over (and invalidated for). In
-        // that case, we'll end up doing a redundant invalidation here as
-        // a consequence.
        CGAM.invalidate(*C, UR.CrossSCCPA);

        do {
@ -877,6 +882,7 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
          assert(&C->getOuterRefSCC() == RC &&
                 "Processing an SCC in a different RefSCC!");

+          LastUpdatedC = UR.UpdatedC;
          UR.UpdatedRC = nullptr;
          UR.UpdatedC = nullptr;

--- a/test/Transforms/Inline/cgscc-cycle-debug.ll
+++ b/test/Transforms/Inline/cgscc-cycle-debug.ll
@ -0,0 +1,45 @@
+; When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
+; that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We need to avoid rerun of
+; the same SCC when the SCC is set to be processed by both mechanisms back to back. In pathological cases, such extra,
+; redundant rerun could cause exponential size growth due to inlining along cycles.
+;
+; The test cases here illustrates potential redundant rerun and how it's prevented, however there's no extra inlining
+; even if we allow the redundant rerun. In real code, when inliner makes different decisions for different call sites
+; of the same caller-callee edge, we could end up getting more recursive inlining without SCC mutation.
+;
+; REQUIRES: asserts
+; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=500 -debug-only=cgscc -S 2>&1 | FileCheck %s
+
+; CHECK: Running an SCC pass across the RefSCC: [(test1_c, test1_a, test1_b)]
+; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
+; CHECK: Enqueuing a newly formed SCC:(test1_c)
+; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
+; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
+; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
+; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
+; CHECK: Skipping redundant run on SCC: (test1_c, test1_a)
+; CHECK: Skipping an SCC that is now part of some other RefSCC...
+
+declare void @external(i32 %seed)
+
+define void @test1_a(i32 %num) {
+entry:
+  call void @test1_b(i32 %num)
+  call void @external(i32 %num)
+  ret void
+}
+
+define void @test1_b(i32 %num) {
+entry:
+  call void @test1_c(i32 %num)
+  call void @test1_a(i32 %num)
+  call void @external(i32 %num)
+  ret void
+}
+
+define void @test1_c(i32 %num) #0 {
+  call void @test1_a(i32 %num)
+  ret void
+}
+
+attributes #0 = { noinline nounwind optnone }
--- a/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/unittests/Analysis/CGSCCPassManagerTest.cpp
@ -1289,29 +1289,29 @@ TEST_F(CGSCCPassManagerTest, TestAnalysisInvalidationCGSCCUpdate) {
  // We run over four SCCs the first time. But then we split an SCC into three.
  // And then we merge those three back into one. However, this also
  // invalidates all three SCCs further down in the PO walk.
-  EXPECT_EQ(4 + 3 + 1 + 3, SCCAnalysisRuns);
+  EXPECT_EQ(4 + 3 + 3, SCCAnalysisRuns);
  // The module analysis pass should be run three times.
  EXPECT_EQ(3, ModuleAnalysisRuns);
  // We run over four SCCs the first time. Then over the two new ones. Then the
  // entire module is invalidated causing a full run over all seven. Then we
  // fold three SCCs back to one, re-compute for it and the two SCCs above it
  // in the graph, and then run over the whole module again.
-  EXPECT_EQ(4 + 2 + 7 + 1 + 3 + 4, IndirectSCCAnalysisRuns);
-  EXPECT_EQ(4 + 2 + 7 + 1 + 3 + 4, DoublyIndirectSCCAnalysisRuns);
+  EXPECT_EQ(4 + 2 + 7 + 3 + 4, IndirectSCCAnalysisRuns);
+  EXPECT_EQ(4 + 2 + 7 + 3 + 4, DoublyIndirectSCCAnalysisRuns);

  // First we run over all six functions. Then we re-run it over three when we
  // split their SCCs. Then we re-run over the whole module. Then we re-run
  // over three functions merged back into a single SCC, then those three
  // functions again, the two functions in SCCs above it in the graph, and then
  // over the whole module again.
-  EXPECT_EQ(6 + 3 + 6 + 3 + 3 + 2 + 6, FunctionAnalysisRuns);
+  EXPECT_EQ(6 + 3 + 6 + 3 + 2 + 6, FunctionAnalysisRuns);

  // Re run the function analysis over the entire module, and then re-run it
  // over the `(h3, h1, h2)` SCC due to invalidation. Then we re-run it over
  // the entire module, then the three functions merged back into a single SCC,
  // those three functions again, then the two functions in SCCs above it in
  // the graph, and then over the whole module.
-  EXPECT_EQ(6 + 3 + 6 + 3 + 3 + 2 + 6, IndirectFunctionAnalysisRuns);
+  EXPECT_EQ(6 + 3 + 6 + 3 + 2 + 6, IndirectFunctionAnalysisRuns);
 }

 // The (negative) tests below check for assertions so we only run them if NDEBUG