mirror of
https://github.com/RPCS3/llvm.git
synced 2025-01-24 11:36:10 +00:00
4fea871248
entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into *insanely* large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char *str; void g(const char *); template <bool K, int N> void f(bool *B, bool *E) { if (K) g(str<N>); if (B == E) return; if (*B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); } extern bool *arr, *end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an *awesome* job here as long as it does *ok*. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here *should* be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297374 91177308-0d34-0410-b5e6-96231b3b80d8
102 lines
4.7 KiB
LLVM
102 lines
4.7 KiB
LLVM
; Basic test for the new LTO pipeline.
|
|
; For now the only difference is between -O1 and everything else, so
|
|
; -O2, -O3, -Os, -Oz are the same.
|
|
|
|
; RUN: opt -disable-verify -debug-pass-manager \
|
|
; RUN: -passes='lto<O1>' -S %s 2>&1 \
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O
|
|
; RUN: opt -disable-verify -debug-pass-manager \
|
|
; RUN: -passes='lto<O2>' -S %s 2>&1 \
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O --check-prefix=CHECK-O2
|
|
; RUN: opt -disable-verify -debug-pass-manager \
|
|
; RUN: -passes='lto<O3>' -S %s 2>&1 \
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O --check-prefix=CHECK-O2
|
|
; RUN: opt -disable-verify -debug-pass-manager \
|
|
; RUN: -passes='lto<Os>' -S %s 2>&1 \
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O --check-prefix=CHECK-O2
|
|
; RUN: opt -disable-verify -debug-pass-manager \
|
|
; RUN: -passes='lto<Oz>' -S %s 2>&1 \
|
|
; RUN: | FileCheck %s --check-prefix=CHECK-O --check-prefix=CHECK-O2
|
|
|
|
; CHECK-O: Starting llvm::Module pass manager run.
|
|
; CHECK-O-NEXT: Running pass: PassManager<{{.*}}Module
|
|
; CHECK-O-NEXT: Starting llvm::Module pass manager run.
|
|
; CHECK-O-NEXT: Running pass: GlobalDCEPass
|
|
; CHECK-O-NEXT: Running pass: ForceFunctionAttrsPass
|
|
; CHECK-O-NEXT: Running pass: InferFunctionAttrsPass
|
|
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
|
|
; CHECK-O2-NEXT: PGOIndirectCallPromotion
|
|
; CHECK-O2-NEXT: Running pass: IPSCCPPass
|
|
; CHECK-O-NEXT: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}PostOrderFunctionAttrsPass>
|
|
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
|
; CHECK-O-NEXT: Running analysis: InnerAnalysisManagerProxy
|
|
; CHECK-O-NEXT: Running analysis: LazyCallGraphAnalysis
|
|
; CHECK-O-NEXT: Running analysis: FunctionAnalysisManagerCGSCCProxy
|
|
; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy<{{.*}}LazyCallGraph{{.*}}>
|
|
; CHECK-O-NEXT: Running analysis: AAManager
|
|
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis
|
|
; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
|
|
; CHECK-O-NEXT: Running analysis: CallGraphAnalysis
|
|
; CHECK-O-NEXT: Running pass: GlobalSplitPass
|
|
; CHECK-O-NEXT: Running pass: WholeProgramDevirtPass
|
|
; CHECK-O2-NEXT: Running pass: GlobalOptPass
|
|
; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PromotePass>
|
|
; CHECK-O2-NEXT: Running analysis: DominatorTreeAnalysis
|
|
; CHECK-O2-NEXT: Running analysis: AssumptionAnalysis
|
|
; CHECK-O2-NEXT: Running pass: ConstantMergePass
|
|
; CHECK-O2-NEXT: Running pass: DeadArgumentEliminationPass
|
|
; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}InstCombinePass>
|
|
; CHECK-O2-NEXT: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}InlinerPass>
|
|
; CHECK-O2-NEXT: Running pass: GlobalOptPass
|
|
; CHECK-O2-NEXT: Running pass: GlobalDCEPass
|
|
; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PassManager{{.*}}>
|
|
; CHECK-O2-NEXT: Starting llvm::Function pass manager run.
|
|
; CHECK-O2-NEXT: Running pass: InstCombinePass
|
|
; CHECK-O2-NEXT: Running pass: JumpThreadingPass
|
|
; CHECK-O2-NEXT: Running analysis: LazyValueAnalysis
|
|
; CHECK-O2-NEXT: Running pass: SROA on foo
|
|
; CHECK-O2-NEXT: Finished llvm::Function pass manager run.
|
|
; CHECK-O2-NEXT: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}PostOrderFunctionAttrsPass>
|
|
; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PassManager{{.*}}>
|
|
; CHECK-O2-NEXT: Running analysis: MemoryDependenceAnalysis
|
|
; CHECK-O2-NEXT: Running analysis: OptimizationRemarkEmitterAnalysis
|
|
; CHECK-O2-NEXT: Running analysis: TargetIRAnalysis
|
|
; CHECK-O2-NEXT: Running analysis: DemandedBitsAnalysis
|
|
; CHECK-O2-NEXT: Running pass: CrossDSOCFIPass
|
|
; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}SimplifyCFGPass>
|
|
; CHECK-O2-NEXT: Running pass: EliminateAvailableExternallyPass
|
|
; CHECK-O2-NEXT: Running pass: GlobalDCEPass
|
|
; CHECK-O-NEXT: Finished llvm::Module pass manager run.
|
|
; CHECK-O-NEXT: Running pass: PrintModulePass
|
|
|
|
; Make sure we get the IR back out without changes when we print the module.
|
|
; CHECK-O-LABEL: define void @foo(i32 %n) local_unnamed_addr {
|
|
; CHECK-O-NEXT: entry:
|
|
; CHECK-O-NEXT: br label %loop
|
|
; CHECK-O: loop:
|
|
; CHECK-O-NEXT: %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
|
|
; CHECK-O-NEXT: %iv.next = add i32 %iv, 1
|
|
; CHECK-O-NEXT: tail call void @bar()
|
|
; CHECK-O-NEXT: %cmp = icmp eq i32 %iv, %n
|
|
; CHECK-O-NEXT: br i1 %cmp, label %exit, label %loop
|
|
; CHECK-O: exit:
|
|
; CHECK-O-NEXT: ret void
|
|
; CHECK-O-NEXT: }
|
|
;
|
|
; CHECK-O-NEXT: Finished llvm::Module pass manager run.
|
|
|
|
declare void @bar() local_unnamed_addr
|
|
|
|
define void @foo(i32 %n) local_unnamed_addr {
|
|
entry:
|
|
br label %loop
|
|
loop:
|
|
%iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
|
|
%iv.next = add i32 %iv, 1
|
|
tail call void @bar()
|
|
%cmp = icmp eq i32 %iv, %n
|
|
br i1 %cmp, label %exit, label %loop
|
|
exit:
|
|
ret void
|
|
}
|