60 Commits

Author SHA1 Message Date
Matthew Simpson
201896c9fd [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 15:15:35 +00:00
Mohammed Agabaria
9c6b24cc3a [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 08:23:37 +00:00
Evandro Menezes
eb75eeae13 [AArch64] Consider all vector types for FeatureSlowMisaligned128Store
The original code considered only v2i64 as slow for this feature. This patch
consider all 128-bit long vector types as slow candidates.

In internal tests, extending this feature to all 128-bit vector types
resulted in an overall improvement of 1% on Exynos M1.

Differential revision: https://reviews.llvm.org/D27998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291616 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 23:42:21 +00:00
Mohammed Agabaria
6bf7471dbc Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291106 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 14:03:41 +00:00
Matthew Simpson
ad1bdf6350 [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D27677

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289845 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 18:36:59 +00:00
Matthias Braun
70f2292ce7 AArch64: Do not test for CPUs, use SubtargetFeatures
Testing for specific CPUs has a number of problems, better use subtarget
features:
- When some tweak is added for a specific CPU it is often desirable for
  the next version of that CPU as well, yet we often forget to add it.
- It is hard to keep track of checks scattered around the target code;
  Declaring all target specifics together with the CPU in the tablegen
  file is a clear representation.
- Subtarget features can be tweaked from the command line.

To discourage people from using CPU checks in the future I removed the
isCortexXX(), isCyclone(), ... functions. I added an getProcFamily()
function for exceptional circumstances but made it clear in the comment
that usage is discouraged.

Reformat feature list in AArch64.td to have 1 feature per line in
alphabetical order to simplify merging and sorting for out of tree
tweaks.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D20762

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271555 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 18:03:53 +00:00
Matthew Simpson
b4a38c601d Add parentheses to silence buildbot warning
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267734 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-27 16:25:04 +00:00
Matthew Simpson
d0229876a9 [TTI] Add hook for vector extract with extension
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.

Differential Revision: http://reviews.llvm.org/D18523

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267725 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-27 15:20:21 +00:00
Adam Nemet
515ab47338 [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264806 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-29 23:45:52 +00:00
Adam Nemet
b022ece108 [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263772 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 00:27:43 +00:00
Adam Nemet
b4954720ad [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263771 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 00:27:38 +00:00
Adam Nemet
bbb72f5976 [Aarch64] Add pass LoopDataPrefetch for Cyclone
Summary:
This wires up the pass for Cyclone but keeps it off for now because we
need a few more TTIs.

The getPrefetchMinStride value is not very well tuned right now but it
works well with CFP2006/433.milc which motivated this.

Tests will be added as part of the upcoming large-stride prefetching
patch.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, hfinkel, rengolin

Differential Revision: http://reviews.llvm.org/D17943

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263770 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-18 00:27:29 +00:00
Matthew Simpson
449a562ef6 [AArch64] Reduce vector insert/extract cost for Kryo
Differential Revision: http://reviews.llvm.org/D17379

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261237 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-18 18:35:45 +00:00
Chad Rosier
1f88b2d0b7 [AArch64] Add support for Qualcomm Kryo CPU.
Machine model description by Dave Estes <cestes@codeaurora.org>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260686 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-12 15:51:51 +00:00
Ahmed Bougacha
65422bf239 [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255089 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 01:19:50 +00:00
Philip Reames
3817e67f7f [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)
When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access.  Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing.  Note that the actual implementation was always bailing if the load or store wasn't simple.  

Reminder:
- "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered
- "ordered" - imposes ordering constraints on other nearby memory operations
- "atomic" - can't be split or sheared.  In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used.
- "simple" - a load which is none of the above.  These are normal loads and what most of the optimizer works with.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254805 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-05 00:18:33 +00:00
Matthew Simpson
2e13349774 [Aarch64] Add cost for missing extensions.
This patch adds a cost estimate for some missing sign and zero extensions. The
costs were determined by counting the number of shift instructions generated
without context for each new extension.

Differential Revision: http://reviews.llvm.org/D14730

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253482 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-18 18:03:06 +00:00
Craig Topper
1d1d5f6090 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251490 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-28 04:02:12 +00:00
Craig Topper
156f73362e Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251382 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 04:14:24 +00:00
Craig Topper
e4745a7931 Use MVT::SimpleValueType instead of MVT in template parameter. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251217 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-25 00:27:14 +00:00
Craig Topper
75af6938b2 Call the version of ConvertCostTableLookup that takes a statically sized array rather than pointer and size. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251196 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-24 18:40:22 +00:00
Silviu Baranga
076967c806 [CostModel][AArch64] Remove amortization factor for some of the vector select instructions
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.

Reviewers: mcrosier

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12724

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247156 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 15:35:02 +00:00
Silviu Baranga
1127a30ffe [CostModel][AArch64] Increase cost of vector insert element and add missing cast costs
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.

Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.

Reviewers: rengolin

Subscribers: mcrosier, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245226 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-17 16:05:09 +00:00
Chandler Carruth
da49414c4b [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244080 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-05 18:08:10 +00:00
Silviu Baranga
541d079947 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243270 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-27 14:39:34 +00:00
Mehdi Amini
691b2ff11e Remove getDataLayout() from TargetLowering
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11042

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241779 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-09 02:09:52 +00:00
Mehdi Amini
f29cc18dcb Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241775 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-09 02:09:04 +00:00
Hao Liu
380417ac84 [AArch64] Lower interleaved memory accesses to ldN/stN intrinsics. This patch also adds a function to calculate the cost of interleaved memory accesses.
E.g. Lower an interleaved load:
        %wide.vec = load <8 x i32>, <8 x i32>* %ptr
        %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>
        %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>
     into:
        %ld2 = { <4 x i32>, <4 x i32> } call llvm.aarch64.neon.ld2(%ptr)
        %vec0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
        %vec1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Lower an interleaved store:
        %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11>
        store <12 x i32> %i.vec, <12 x i32>* %ptr
     into:
        %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3>
        %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7>
        %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11>
        call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)

Differential Revision: http://reviews.llvm.org/D10533


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240754 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-26 02:32:07 +00:00
Hao Liu
5ab48a2f69 [AArch64] Revert r239711 again. We need to discuss how to share code between AArch64 and ARM backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239713 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-15 01:56:40 +00:00
Hao Liu
6024ab3b8f [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X.
This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239711 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-15 01:35:49 +00:00
Rafael Espindola
688e7b3049 This reverts commit r239529 and r239514.
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."

The  test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239544 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-11 17:30:33 +00:00
Hao Liu
442f620296 [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"

E.g. Transform an interleaved load (Factor = 2):
       %wide.vec = load <8 x i32>, <8 x i32>* %ptr
       %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
       %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
     Into:
       %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
       %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
       %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Transform an interleaved store (Factor = 2):
       %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
       store <8 x i32> %i.vec, <8 x i32>* %ptr
     Into:
       %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
       %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
       call void aarch64.neon.st2(%v0, %v1, %ptr)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239514 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-11 09:05:02 +00:00
Wei Mi
cac51be31f [X86] Disable loop unrolling in loop vectorization pass when VF is 1.
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.

Differential Revision: http://reviews.llvm.org/D9515


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236613 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-06 17:12:25 +00:00
Kevin Qin
40e66277f7 [AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231632 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-09 06:14:28 +00:00
Benjamin Kramer
30fa873958 Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230894 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-01 18:09:56 +00:00
Chandler Carruth
26bc071088 [multiversion] Remove the function parameter from the unrolling
preferences interface on TTI now that all of TTI is per-function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227741 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 14:31:23 +00:00
Chandler Carruth
1937233a22 [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227685 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 11:17:59 +00:00
Chandler Carruth
a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 03:43:40 +00:00
Chad Rosier
13faabb6c5 Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227149 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:51:15 +00:00
Kevin Qin
f73400a864 [AArch64] Enable partial & runtime unrolling on cortex-a57.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219401 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 10:13:27 +00:00
Chad Rosier
ea64dce261 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218607 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 13:59:31 +00:00
Gerolf Hoflehner
e7648ae68d [AArch64] Revert r216141 for cyclone
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 20:31:57 +00:00
Sanjay Patel
87c977a52b Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:58:16 +00:00
Karthik Bhat
e637d65af3 Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 04:56:54 +00:00
James Molloy
824431f680 [LoopVectorize] Up the maximum unroll factor to 4 for AArch64
Only for Cortex-A57 and Cyclone for now, where it has shown wins.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216141 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 00:02:51 +00:00
James Molloy
72035e9a8e Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 12:30:34 +00:00
Eric Christopher
9f85dccfc6 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:25:23 +00:00
Tim Northover
8bfc50e4a9 AArch64: improve handling & modelling of FP_TO_XINT nodes.
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210985 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-15 09:27:15 +00:00
Tim Northover
94fe5c1fe2 AArch64: improve vector [su]itofp handling.
This somehow got missed in the AArch64 merge, so should fix a
performance regression since 3.4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210984 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-15 09:27:06 +00:00
Tim Northover
29f94c7201 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-24 12:50:23 +00:00