mirror of
https://github.com/RPCSX/llvm.git
synced 2025-01-08 04:52:50 +00:00
961ef79d43
Summary: Widening load in GVN is too early because it will block other optimizations like PRE, LICM. https://llvm.org/bugs/show_bug.cgi?id=29110 The SPECCPU2006 benchmark impact of this patch: Reference: o2_nopatch (1): o2_patched Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 25.2 -0.08% spec/2006/fp/C++/447.dealII 45.92 +1.05% spec/2006/fp/C++/450.soplex 41.7 -0.26% spec/2006/fp/C++/453.povray 35.65 +1.68% spec/2006/fp/C/433.milc 23.79 +0.42% spec/2006/fp/C/470.lbm 41.88 -1.12% spec/2006/fp/C/482.sphinx3 47.94 +1.67% spec/2006/int/C++/471.omnetpp 22.46 -0.36% spec/2006/int/C++/473.astar 21.19 +0.24% spec/2006/int/C++/483.xalancbmk 36.09 -0.11% spec/2006/int/C/400.perlbench 33.28 +1.35% spec/2006/int/C/401.bzip2 22.76 -0.04% spec/2006/int/C/403.gcc 32.36 +0.12% spec/2006/int/C/429.mcf 41.04 -0.41% spec/2006/int/C/445.gobmk 26.94 +0.04% spec/2006/int/C/456.hmmer 24.5 -0.20% spec/2006/int/C/458.sjeng 28 -0.46% spec/2006/int/C/462.libquantum 55.25 +0.27% spec/2006/int/C/464.h264ref 45.87 +0.72% geometric mean +0.23% For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400. Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames Subscribers: gberry, junbuml Differential Revision: https://reviews.llvm.org/D24096 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281074 91177308-0d34-0410-b5e6-96231b3b80d8 |
||
---|---|---|
.. | ||
AliasAnalysis.cpp | ||
AliasAnalysisEvaluator.cpp | ||
AliasAnalysisSummary.cpp | ||
AliasAnalysisSummary.h | ||
AliasSetTracker.cpp | ||
Analysis.cpp | ||
AssumptionCache.cpp | ||
BasicAliasAnalysis.cpp | ||
BlockFrequencyInfo.cpp | ||
BlockFrequencyInfoImpl.cpp | ||
BranchProbabilityInfo.cpp | ||
CallGraph.cpp | ||
CallGraphSCCPass.cpp | ||
CallPrinter.cpp | ||
CaptureTracking.cpp | ||
CFG.cpp | ||
CFGPrinter.cpp | ||
CFLAndersAliasAnalysis.cpp | ||
CFLGraph.h | ||
CFLSteensAliasAnalysis.cpp | ||
CGSCCPassManager.cpp | ||
CMakeLists.txt | ||
CodeMetrics.cpp | ||
ConstantFolding.cpp | ||
CostModel.cpp | ||
Delinearization.cpp | ||
DemandedBits.cpp | ||
DependenceAnalysis.cpp | ||
DivergenceAnalysis.cpp | ||
DominanceFrontier.cpp | ||
DomPrinter.cpp | ||
EHPersonalities.cpp | ||
GlobalsModRef.cpp | ||
IndirectCallPromotionAnalysis.cpp | ||
InlineCost.cpp | ||
InstCount.cpp | ||
InstructionSimplify.cpp | ||
Interval.cpp | ||
IntervalPartition.cpp | ||
IteratedDominanceFrontier.cpp | ||
IVUsers.cpp | ||
LazyBlockFrequencyInfo.cpp | ||
LazyBranchProbabilityInfo.cpp | ||
LazyCallGraph.cpp | ||
LazyValueInfo.cpp | ||
Lint.cpp | ||
LLVMBuild.txt | ||
Loads.cpp | ||
LoopAccessAnalysis.cpp | ||
LoopInfo.cpp | ||
LoopPass.cpp | ||
LoopPassManager.cpp | ||
LoopUnrollAnalyzer.cpp | ||
MemDepPrinter.cpp | ||
MemDerefPrinter.cpp | ||
MemoryBuiltins.cpp | ||
MemoryDependenceAnalysis.cpp | ||
MemoryLocation.cpp | ||
ModuleDebugInfoPrinter.cpp | ||
ModuleSummaryAnalysis.cpp | ||
ObjCARCAliasAnalysis.cpp | ||
ObjCARCAnalysisUtils.cpp | ||
ObjCARCInstKind.cpp | ||
OptimizationDiagnosticInfo.cpp | ||
OrderedBasicBlock.cpp | ||
PHITransAddr.cpp | ||
PostDominators.cpp | ||
ProfileSummaryInfo.cpp | ||
PtrUseVisitor.cpp | ||
README.txt | ||
RegionInfo.cpp | ||
RegionPass.cpp | ||
RegionPrinter.cpp | ||
ScalarEvolution.cpp | ||
ScalarEvolutionAliasAnalysis.cpp | ||
ScalarEvolutionExpander.cpp | ||
ScalarEvolutionNormalization.cpp | ||
ScopedNoAliasAA.cpp | ||
SparsePropagation.cpp | ||
StratifiedSets.h | ||
TargetLibraryInfo.cpp | ||
TargetTransformInfo.cpp | ||
Trace.cpp | ||
TypeBasedAliasAnalysis.cpp | ||
TypeMetadataUtils.cpp | ||
ValueTracking.cpp | ||
VectorUtils.cpp |
Analysis Opportunities: //===---------------------------------------------------------------------===// In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the ScalarEvolution expression for %r is this: {1,+,3,+,2}<loop> Outside the loop, this could be evaluated simply as (%n * %n), however ScalarEvolution currently evaluates it as (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n)) In addition to being much more complicated, it involves i65 arithmetic, which is very inefficient when expanded into code. //===---------------------------------------------------------------------===// In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll, ScalarEvolution is forming this expression: ((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32))) This could be folded to (-1 * (trunc i64 undef to i32)) //===---------------------------------------------------------------------===//