172 Commits

Author SHA1 Message Date
Fangrui Song
af7b1832a0 Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338293 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 19:41:25 +00:00
Simon Pilgrim
7a7cfd8a89 [TargetTransformInfo] Add pow2 analysis for scalar constants
Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336827 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 17:51:27 +00:00
Sanjay Patel
42f462a392 [IR] move shuffle mask queries from TTI to ShuffleVectorInst
The optimizer is getting smarter (eg, D47986) about differentiating shuffles 
based on its mask values, so we should make queries on the mask constant 
operand generally available to avoid code duplication.

We'll probably use this soon in the vectorizers and instcombine (D48023 and 
https://bugs.llvm.org/show_bug.cgi?id=37806).

We might clean up TTI a bit more once all of its current 'SK_*' options are 
covered.

Differential Revision: https://reviews.llvm.org/D48236


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335067 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-19 18:44:00 +00:00
Benjamin Kramer
c011f6948e Fix namespaces. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334890 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-16 13:37:52 +00:00
Simon Pilgrim
419887cd06 [CostModel] Cleanup isSingleSourceVectorMask to match other shuffle matchers. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334699 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 09:48:19 +00:00
Simon Pilgrim
d9dafe02fb [CostModel] Recognise REVERSE shuffle mask if the elements come from the second src
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334698 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-14 09:35:00 +00:00
Simon Pilgrim
31dfcf10a6 [CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334620 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-13 16:52:02 +00:00
Simon Pilgrim
21582f2af6 [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334513 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 16:12:29 +00:00
Simon Pilgrim
2ccbb4c82e Fix signed/unsigned warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334509 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 15:14:34 +00:00
Simon Pilgrim
861e3f325f [CostModel] Treat Identity shuffle masks as zero cost
As discussed on D47985, identity shuffle masks should probably be free.

I've limited this to the case where the input and output types all match - but we could probably accept all cases.

Differential Revision: https://reviews.llvm.org/D47986

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334506 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 14:47:13 +00:00
Simon Pilgrim
91eac1017f [TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput
This enables us to detect more fast path sdiv cases under cost analysis.

This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.

Found while working on D46276

Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.

Differential Revision: https://reviews.llvm.org/D46637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332969 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 10:40:09 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Matthew Simpson
daa39fa144 [TTI, AArch64] Add transpose shuffle kind
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.

Differential Revision: https://reviews.llvm.org/D45982

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330941 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-26 13:48:33 +00:00
Krzysztof Parzyszek
c6388fd9c7 [LV] Introduce TTI::getMinimumVF
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.

Differential Revision: https://reviews.llvm.org/D45271


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330062 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-13 20:16:32 +00:00
David Blaikie
ebb5c41145 Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header dependency
Thanks to echristo for the pointers on direction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328737 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-28 22:28:50 +00:00
Krzysztof Parzyszek
4153b00c21 [LV] Add TTI::shouldMaximizeVectorBandwidth to allow enabling it per target
The default implementation returns false and keeps the current behavior.

Differential Revision: https://reviews.llvm.org/D44735


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328632 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 16:14:11 +00:00
Krzysztof Parzyszek
48972e6c8f [LSR] Allow giving priority to post-incrementing addressing modes
Implement TTI interface for targets to indicate that the LSR should give
priority to post-incrementing addressing modes.

Combination of patches by Sebastian Pop and Brendon Cahoon.

Differential Revision: https://reviews.llvm.org/D44758


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328490 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-26 13:10:09 +00:00
Sanjay Patel
74007202a3 [LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681)
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base 
address offsets.

SPEC2017 on Ryzen shows no significant perf difference.

Differential Revision: https://reviews.llvm.org/D42607



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324289 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-05 23:43:05 +00:00
Zaara Syeda
12b04c98f7 Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323778 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30 16:17:22 +00:00
Zaara Syeda
bbb2a7afcc Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322748 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 20:00:15 +00:00
Zaara Syeda
4379bc4ba1 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322721 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 18:22:55 +00:00
Guozhi Wei
de740eaa76 Revert r321377, it causes regression to https://reviews.llvm.org/P8055.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321528 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-28 17:02:34 +00:00
Guozhi Wei
7f53692f1d [SimplifyCFG] Don't do if-conversion if there is a long dependence chain
If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor.

This patch checks for the long dependence chain, and give up if-conversion if find one.

Differential Revision: https://reviews.llvm.org/D39352



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321377 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22 18:54:04 +00:00
Sean Fertile
b71c6a9b39 [Memcpy Loop Lowering] Remove the fixed int8 lowering.
Switch over to the lowering that uses target supplied operand types.

Differential Revision: https://reviews.llvm.org/D41201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320989 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-18 15:31:14 +00:00
Sanjay Patel
8a189ea7bf [PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result
This should fix PR31455:
https://bugs.llvm.org/show_bug.cgi?id=31455

Differential Revision: https://reviews.llvm.org/D28314


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319094 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-27 21:15:43 +00:00
Clement Courbet
4ccf677f27 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316905 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-30 14:19:33 +00:00
Artem Belevich
c79e8ba6d6 [NVPTX] allow address space inference for volatile loads/stores.
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.

Differential Revision: https://reviews.llvm.org/D39026

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316495 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-24 20:31:44 +00:00
Daniel Jasper
e9712285e9 Revert r314923: "Recommit : Use the basic cost if a GEP is not used as addressing mode"
Significantly reduces performancei (~30%) of gipfeli
(https://github.com/google/gipfeli)

I have not yet managed to reproduce this regression with the open-source
version of the benchmark on github, but will work with others to get a
reproducer to you later today.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315680 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13 14:04:21 +00:00
Jun Bum Lim
e3f6227d56 Recommit : Use the basic cost if a GEP is not used as addressing mode
Recommitting r314517 with the fix for handling ConstantExpr.

Original commit message:
  Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
  mode in the target. However, since it doesn't check its actual users, it will
  return FREE even in cases where the GEP cannot be folded away as a part of
  actual addressing mode. For example, if an user of the GEP is a call
  instruction taking the GEP as a parameter, then the GEP may not be folded in
  isel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314923 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-04 18:33:52 +00:00
Alex Shlyapnikov
682384e698 Revert "Use the basic cost if a GEP is not used as addressing mode"
This reverts commit r314517.

This commit crashes sanitizer bots, for example:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167

Stack snippet:
...
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0
llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>)
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0
...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314560 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29 22:04:45 +00:00
Jun Bum Lim
fd8d5dac84 Use the basic cost if a GEP is not used as addressing mode
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.

Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng

Reviewed By: hfinkel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D38085

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314517 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29 14:50:16 +00:00
Clement Courbet
bb3c660e87 [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.

Remove the unused Instruction* parameter.

Differential Revision: https://reviews.llvm.org/D38165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314096 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-25 06:35:16 +00:00
Sanjay Patel
193e898f75 [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented 
as an independent pass, so there's no stretching of scope and feature creep for an existing pass. 
I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost 
this same functionality as an addition to CGP in the motivating example of PR31028:
https://bugs.llvm.org/show_bug.cgi?id=31028

The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow 
more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and
undo the hoisting that is done here.

Decomposing remainder may allow removing some code from the backend (PPC and possibly others).

Differential Revision: https://reviews.llvm.org/D37121 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312862 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-09 13:38:18 +00:00
Guozhi Wei
19969b8b8f [TargetTransformInfo] Add a new public interface getInstructionCost
Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface:

  enum TargetCostKind {
    TCK_RecipThroughput, ///< Reciprocal throughput.
    TCK_Latency,         ///< The latency of instruction.
    TCK_CodeSize         ///< Instruction code size.
  };

  int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const;

All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model.

This patch also provides a simple default implementation of getInstructionLatency.

The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways:

   Add more detail into this function.
   Add getXXXLatency function and call it from here.
   Implement target specific getInstructionLatency function.

Differential Revision: https://reviews.llvm.org/D37170



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312832 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 22:29:17 +00:00
Alexey Bataev
4fcc7e8528 [SLP] Support for horizontal min/max reduction.
SLP vectorizer supports horizontal reductions for Add/FAdd binary
operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for
binary operation reductions and getMinMaxReductionCost() for min/max
reductions.
Patch fixes PR26956.

Differential revision: https://reviews.llvm.org/D27846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312791 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-08 13:49:36 +00:00
Tobias Grosser
2050a0312d Model cache size and associativity in TargetTransformInfo
Summary:
We add the precise cache sizes and associativity for the following Intel
architectures:

  - Penry
  - Nehalem
  - Westmere
  - Sandy Bridge
  - Ivy Bridge
  - Haswell
  - Broadwell
  - Skylake
  - Kabylake

Polly uses since several months a performance model for BLAS computations that
derives optimal cache and register tile sizes from cache and latency
information (based on ideas from "Analytical Modeling Is Enough for High-Performance BLIS", by Tze Meng Low published at TOMS 2016).
While bootstrapping this model, these target values have been kept in Polly.
However, as our implementation is now rather mature, it seems time to teach
LLVM itself about cache sizes.

Interestingly, L1 and L2 cache sizes are pretty constant across
micro-architectures, hence a set of architecture specific default values
seems like a good start. They can be expanded to more target specific values,
in case certain newer architectures require different values. For now a set
of Intel architectures are provided.

Just as a little teaser, for a simple gemm kernel this model allows us to
improve performance from 1.2s to 0.27s. For gemm kernels with less optimal
memory layouts even larger speedups can be reported.

Reviewers: Meinersbur, bollu, singam-sanjay, hfinkel, gareevroman, fhahn, sebpop, efriedma, asb

Reviewed By: fhahn, asb

Subscribers: lsaba, asb, pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D37051

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311647 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-24 09:46:25 +00:00
Jonas Paulsson
b7123745ed [LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess()
isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.

The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().

The isFoldableMemAccess() hook has been removed everywhere.

Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310463 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-09 11:28:01 +00:00
Alexey Bataev
5a34abfe3e [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309563 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-31 14:19:32 +00:00
Mohammed Agabaria
3c150611fb [TTI] fixing a bug in the isLegalMaskedScatter API
isLegalMaskedScatter called the Gather version which is a bug.
use test case is provided within the patch of AVX2 gathers at: https://reviews.llvm.org/D35772

Differential Revision: https://reviews.llvm.org/D35786



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309260 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-27 10:28:16 +00:00
Jonas Paulsson
ed69aeeaad [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-21 11:59:37 +00:00
Haicheng Wu
8c939cb97f [TTI] Refine the cost of EXT in getUserCost()
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.

Differential Revision: https://reviews.llvm.org/D34458

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308076 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-15 02:12:16 +00:00
Sean Fertile
471398ffea Extend memcpy expansion in Transform/Utils to handle wider operand types.
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.

Differential revision: https://reviews.llvm.org/D32536

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307346 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 02:00:06 +00:00
Evgeny Astigeevich
0812c948be [TargetTransformInfo, API] Add a list of operands to TTI::getUserCost
The changes are a result of discussion of https://reviews.llvm.org/D33685.
It solves the following problem:

1. We can inform getGEPCost about simplified indices to help it with
   calculating the cost. But getGEPCost does not take into account the
   context which GEPs are used in.
2. We have getUserCost which can take the context into account but we cannot
   inform about simplified indices.

With the changes getUserCost will have access to additional information
as getGEPCost has.

The one parameter getUserCost is also provided.

Differential Revision: https://reviews.llvm.org/D34057



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306674 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-29 13:42:12 +00:00
Geoff Berry
28b3f06e1a [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper

Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D34531

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-28 15:53:17 +00:00
Alexander Timofeev
7807f69e9b DivergencyAnalysis patch for review
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305494 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 19:33:10 +00:00
Anna Thomas
bacc83353b [Atomics][LoopIdiom] Recognize unordered atomic memcpy
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.

Background:  http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

Patch by Daniel Neilson!

Reviewers: reames, anna, skatkov

Reviewed By: reames, anna

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33243

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304806 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 16:45:25 +00:00
Evgeny Stupachenko
17e210d01a Fix PR23384 (part 2 of 3) NFC
Summary:
The patch moves LSR cost comparison to target part.

Reviewers: qcolombet

Differential Revision: http://reviews.llvm.org/D30561

From: Evgeny Stupachenko <evstupac@gmail.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304750 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-05 23:37:00 +00:00
Zaara Syeda
682f92f568 [PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.

Differential Revision: https://reviews.llvm.org/D28637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-31 17:12:38 +00:00
Jonas Paulsson
a551a28baa [LoopVectorizer] Let target prefer scalar addressing computations.
The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:

* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)

Benchmarking show improvements on SystemZ with this new behaviour.

Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().

Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-24 13:42:56 +00:00
Adam Nemet
2efa5091b6 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303116 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-15 21:15:01 +00:00