llvm/lib/Analysis
Andrea Di Biagio 029a76b0a2 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-12 23:43:47 +00:00
..
IPA GlobalsModRef: Unify and clean up duplicated pointer analysis code. 2014-02-10 14:17:30 +00:00
AliasAnalysis.cpp [cleanup] Move the Dominators.h and Verifier.h headers into the IR 2014-01-13 09:26:24 +00:00
AliasAnalysisCounter.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
AliasAnalysisEvaluator.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
AliasDebugger.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
AliasSetTracker.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
Analysis.cpp [PM] Make the verifier work independently of any pass manager. 2014-01-19 02:22:18 +00:00
BasicAliasAnalysis.cpp Fix known typos 2014-01-24 17:20:08 +00:00
BlockFrequencyInfo.cpp BlockFrequencyInfo: Readded getEntryFreq. 2013-12-20 22:11:11 +00:00
BranchProbabilityInfo.cpp [block-freq] Teach branch probability how to return the edge weight in between a BasicBlock and one of its successors. 2013-12-14 02:24:25 +00:00
CaptureTracking.cpp Make nocapture analysis work with addrspacecast 2014-01-14 19:11:52 +00:00
CFG.cpp Make succ_iterator a real random access iterator and clean up a couple of users. 2014-02-10 14:17:42 +00:00
CFGPrinter.cpp Use the new script to sort the includes of every file under lib. 2012-12-03 16:50:05 +00:00
CMakeLists.txt [PM] Add a new "lazy" call graph analysis pass for the new pass manager. 2014-02-06 04:37:03 +00:00
CodeMetrics.cpp Begin fleshing out an interface in TTI for modelling the costs of 2013-01-22 11:26:02 +00:00
ConstantFolding.cpp Add addrspacecast instruction. 2013-11-15 01:34:59 +00:00
CostModel.cpp [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called 2014-02-12 23:43:47 +00:00
Delinearization.cpp Re-sort all of the includes with ./utils/sort_includes.py so that 2014-01-07 11:48:04 +00:00
DependenceAnalysis.cpp Fix known typos 2014-01-24 17:20:08 +00:00
DominanceFrontier.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
DomPrinter.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
InstCount.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
InstructionSimplify.cpp InstSimplify: Make shift, select and GEP simplifications vector-aware. 2014-01-24 17:09:53 +00:00
Interval.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
IntervalPartition.cpp
IVUsers.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
LazyCallGraph.cpp [PM] Fix horrible typos that somehow didn't cause a failure in a C++11 2014-02-06 05:17:02 +00:00
LazyValueInfo.cpp Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. 2013-07-04 01:31:24 +00:00
LibCallAliasAnalysis.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
LibCallSemantics.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
Lint.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
LLVMBuild.txt LLVMBuild: Introduce a common section which currently has a list of the 2011-12-12 22:45:54 +00:00
Loads.cpp Change GetPointerBaseWithConstantOffset's DataLayout argument from a 2013-01-31 02:00:45 +00:00
LoopInfo.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
LoopPass.cpp Disable most IR-level transform passes on functions marked 'optnone'. 2014-02-06 00:07:05 +00:00
Makefile
MemDepPrinter.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
MemoryBuiltins.cpp Update optimization passes to handle inalloca arguments 2014-01-28 02:38:36 +00:00
MemoryDependenceAnalysis.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
ModuleDebugInfoPrinter.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
NoAliasAnalysis.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
PHITransAddr.cpp [cleanup] Move the Dominators.h and Verifier.h headers into the IR 2014-01-13 09:26:24 +00:00
PostDominators.cpp [PM] Pull the generic graph algorithms and data structures for dominator 2014-01-13 10:52:56 +00:00
PtrUseVisitor.cpp Hoist the GEP constant address offset computation to a common home on 2012-12-11 10:29:10 +00:00
README.txt
RegionInfo.cpp [PM] Split DominatorTree into a concrete analysis result object which 2014-01-13 13:07:17 +00:00
RegionPass.cpp Remove the the block_node_iterator of Region, replace it by the block_iterator. 2012-08-27 13:49:24 +00:00
RegionPrinter.cpp Use the new script to sort the includes of every file under lib. 2012-12-03 16:50:05 +00:00
ScalarEvolution.cpp SCEV: Cast switched values to make -Wswitch more useful. 2014-02-11 19:02:55 +00:00
ScalarEvolutionAliasAnalysis.cpp Use the new script to sort the includes of every file under lib. 2012-12-03 16:50:05 +00:00
ScalarEvolutionExpander.cpp [cleanup] Move the Dominators.h and Verifier.h headers into the IR 2014-01-13 09:26:24 +00:00
ScalarEvolutionNormalization.cpp [cleanup] Move the Dominators.h and Verifier.h headers into the IR 2014-01-13 09:26:24 +00:00
SparsePropagation.cpp Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
TargetTransformInfo.cpp Make succ_iterator a real random access iterator and clean up a couple of users. 2014-02-10 14:17:42 +00:00
Trace.cpp Put the functionality for printing a value to a raw_ostream as an 2014-01-09 02:29:41 +00:00
TypeBasedAliasAnalysis.cpp TBAA: fix PR17620. 2013-10-22 01:40:25 +00:00
ValueTracking.cpp Allow speculating llvm.sqrt, fma and fmuladd 2014-01-31 00:09:00 +00:00

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//