100 Commits

Author SHA1 Message Date
David Green
d9061bcf04 [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.

Differential Revision: https://reviews.llvm.org/D57671


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353040 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-04 11:58:48 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Dorit Nuzman
06bac6c858 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 09:57:56 +00:00
Simon Pilgrim
474181f1d6 [TTI] Add generic SK_Broadcast shuffle costs
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.

This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.

Differential Revision: https://reviews.llvm.org/D53570

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345253 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-25 10:52:36 +00:00
Simon Pilgrim
befe74f503 [TTI] Add generic cost handling of SK_Reverse shuffles
These can be treated as a general permute.

This required a fix for missing reverse patterns on ARM

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345015 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-23 09:42:10 +00:00
Dorit Nuzman
7d7250490b recommit 344472 after fixing build failure on ARM and PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344475 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 08:50:06 +00:00
Dorit Nuzman
473da03560 revert 344472 due to failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344473 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 07:21:20 +00:00
Dorit Nuzman
a3ff03e8e2 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344472 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-14 07:06:16 +00:00
Zhaoshi Zheng
ab8edbddaa [Thumb1] Any imm8 should have cost of 1
A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or
[0, 255] in unsigned i8 type on Thumb1.

Differential Revision: https://reviews.llvm.org/D52257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342898 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-24 16:15:23 +00:00
Fangrui Song
af7b1832a0 Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338293 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 19:41:25 +00:00
David Green
e101271f21 [UnrollAndJam] New Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

  for i..
    ForeBlocks(i)
    for j..
      SubLoopBlocks(i, j)
    AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

  for i... i+=2
    ForeBlocks(i)
    ForeBlocks(i+1)
    for j..
      SubLoopBlocks(i, j)
      SubLoopBlocks(i+1, j)
    AftBlocks(i)
    AftBlocks(i+1)
  Remainder Loop

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336062 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-01 12:47:30 +00:00
Simon Pilgrim
21582f2af6 [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334513 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 16:12:29 +00:00
David Green
00d34a85c6 Revert 333358 as it's failing on some builders.
I'm guessing the tests reply on the ARM backend being built.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333359 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-27 12:54:33 +00:00
David Green
3dde46793c [UnrollAndJam] Add a new Unroll and Jam pass
This is a simple implementation of the unroll-and-jam classical loop
optimisation.

The basic idea is that we take an outer loop of the form:

for i..
  ForeBlocks(i)
  for j..
    SubLoopBlocks(i, j)
  AftBlocks(i)

Instead of doing normal inner or outer unrolling, we unroll as follows:

for i... i+=2
  ForeBlocks(i)
  ForeBlocks(i+1)
  for j..
    SubLoopBlocks(i, j)
    SubLoopBlocks(i+1, j)
  AftBlocks(i)
  AftBlocks(i+1)
Remainder

So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.

To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).

Differential Revision: https://reviews.llvm.org/D41953



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333358 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-27 12:11:21 +00:00
Nicola Zaghen
0818e789cb Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 12:53:11 +00:00
Craig Topper
f137ed238d [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328806 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 17:21:10 +00:00
David Blaikie
b91d9a7128 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328397 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 23:58:31 +00:00
David Blaikie
9d9a46a465 Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328395 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 23:58:25 +00:00
David Green
7f3a2c29f3 [ARM] Fix issue with large xor constants.
Fixup to rL325573 for large xor constants.

Thanks to Eli Friedman for the catch.

Differential revision: https://reviews.llvm.org/D43549



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325761 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-22 09:38:57 +00:00
Hiroshi Inoue
1763be4373 [NFC] fix trivial typos in comments
"a a" -> "a"


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325752 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-22 07:48:29 +00:00
David Green
3be7f43a55 [ARM] Mark -1 as cheap in xor's for thumb1
We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1
would not otherwise be considered a cheap constant. This prevents the
-1's from being pulled out into constants and potentially hoisted.

Differential Revision: https://reviews.llvm.org/D43451



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325573 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-20 11:07:35 +00:00
Eli Friedman
fcc597840c [Inliner] Restrict soft-float inlining penalty.
The penalty is currently getting applied in a bunch of places where it
doesn't make sense, like bitcasts (which are free) and calls (which
were getting the call penalty applied twice). Instead, just apply the
penalty to binary operators and floating-point casts.

While I'm here, also fix getFPOpCost() to do the right thing in more
cases, so we don't have to dig into function attributes.

Differential Revision: https://reviews.llvm.org/D41522



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@321332 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-22 02:08:08 +00:00
David Blaikie
e3a9b4ce3a Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318490 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-17 01:07:10 +00:00
Sam Parker
dc1f81fb55 [ARM] Allow unrolling of multi-block loops.
Before, loop unrolling was only enabled for loops with a single
block. This restriction has been removed and replaced by:
- allow a maximum of two exiting blocks,
- a four basic block limit for cores with a branch predictor.

Differential Revision: https://reviews.llvm.org/D38952


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-23 08:05:14 +00:00
Eugene Zelenko
20d1cb14e8 [ARM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313823 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-20 21:35:51 +00:00
Sam Parker
f281371190 [ARM] Improve loop unrolling for Cortex-M
- Set the default runtime unroll count to 4 and use the newly added
  UnrollRemainder option.
- Create loop cost and force unroll for a cost less than 12.
- Disable unrolling on Thumb1 only targets.

Differential Revision: https://reviews.llvm.org/D36134


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310997 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 07:42:44 +00:00
Sam Parker
e46d723b12 [ARM] Enable partial and runtime unrolling
Enable runtime and partial loop unrolling of simple loops without
calls on M-class cores. The thresholds are calculated based on
whether the target is Thumb or Thumb-2.

Differential Revision: https://reviews.llvm.org/D34619


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308956 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-25 08:51:30 +00:00
Florian Hahn
68f374b8d3 [ARM] Inline callee if its target-features are a subset of the caller
Summary:
Similar to X86, it should be safe to inline callees if their
target-features are a subset of the caller. As some subtarget features
provide different instructions depending on whether they are set or
unset (e.g. ThumbMode and ModeSoftFloat), we use a whitelist of
target-features describing hardware capabilities only.

Reviewers: kristof.beyls, rengolin, t.p.northover, SjoerdMeijer, peter.smith, silviu.baranga, efriedma

Reviewed By: SjoerdMeijer, efriedma

Subscribers: dschuff, efriedma, aemerson, sdardis, javed.absar, arichardson, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D34697

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307889 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-13 08:26:17 +00:00
Jonas Paulsson
c33bdfa7b1 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300052 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-12 11:49:08 +00:00
Matthew Simpson
42df88e1f8 [ARM/AArch64] Ensure valid vector element types for interleaved accesses
This patch refactors and strengthens the type checks performed for interleaved
accesses. The primary functional change is to ensure that the interleaved
accesses have valid element types. The added test cases previously failed
because the element type is f128.

Differential Revision: https://reviews.llvm.org/D31817

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299864 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-10 18:34:37 +00:00
Matthew Simpson
201896c9fd [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296751 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-02 15:15:35 +00:00
Ahmed Bougacha
6ea43f266f [ARM] Make f16 interleaved accesses expensive.
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.

Teach the cost model to consider f16 interleaved operations as
expensive.  Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294819 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-11 01:53:04 +00:00
Mohammed Agabaria
9c6b24cc3a [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 08:23:37 +00:00
Mohammed Agabaria
6bf7471dbc Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291106 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 14:03:41 +00:00
James Molloy
b2500d2186 [ARM] ADD with a negative offset can become SUB for free
So model that directly in TTI::getIntImmCost().

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281044 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 13:35:36 +00:00
James Molloy
f53c395722 [ARM] icmp %x, -C can be lowered to a simple ADDS or CMN
Tell TargetTransformInfo about this so ConstantHoisting is informed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281043 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 13:35:28 +00:00
James Molloy
ac84ac058a [Thumb1] AND with a constant operand can be converted into BIC
So model the cost of materializing the constant operand C as the minimum of
C and ~C.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280929 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 12:58:12 +00:00
James Molloy
210e77be29 [Thumb1] Fix cost calculation for complemented immediates
Materializing something like "-3" can be done as 2 instructions:
  MOV r0, #3
  MVN r0, r0

This has a cost of 2, not 3. It looks like we were already trying to detect this pattern in TII::getIntImmCost(), but were taking the complement of the zero-extended value instead of the sign-extended value which is unlikely to ever produce a number < 256.

There were no tests failing after changing this... :/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 12:58:04 +00:00
Sjoerd Meijer
05488db23f This implements a more optimal algorithm for selecting a base constant in
constant hoisting. It not only takes into account the number of uses and the
cost of expressions in which constants appear, but now also the resulting
integer range of the offsets. Thus, the algorithm maximizes the number of uses
within an integer range that will enable more efficient code generation. On
ARM, for example, this will enable code size optimisations because less
negative offsets will be created. Negative offsets/immediates are not supported
by Thumb1 thus preventing more compact instruction encoding.

Differential Revision: http://reviews.llvm.org/D21183



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275382 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 07:44:20 +00:00
Diana Picus
96303e05fa [ARM] Do not test for CPUs, use SubtargetFeatures (Part 3). NFCI
This is a follow-up for r273544 and r273853.

The end goal is to get rid of the isSwift / isCortexXY / isWhatever methods.
This commit also marks them as obsolete.

Differential Revision: http://reviews.llvm.org/D21796

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274616 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-06 09:22:23 +00:00
Weiming Zhao
98f00b718b [ARM] Fix 28282: cost computation for constant hoisting
Summary:
This fixes bug: https://llvm.org/bugs/show_bug.cgi?id=28282

Currently the cost model of constant hoisting checks the bit width of the data type of the constants.
However, the actual immediate value is small enough and not need to be hoisted.
This patch checks for the actual bit width needed for the constant.

Reviewers: t.p.northover, rengolin

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D21668

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274073 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-28 22:30:45 +00:00
Tim Northover
1da7db081b ARM: don't try to hoist constant RHS out of a division.
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.

I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-15 18:17:18 +00:00
Tim Northover
5a4800a05f ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266260 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-13 23:08:27 +00:00
Ahmed Bougacha
65422bf239 [AArch64][ARM] Don't base interleaved op legality on type alloc size.
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).

DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.

One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.

Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.

Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255089 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-09 01:19:50 +00:00
Charlie Turner
7a016a152d [ARM] Don't pessimize i32 vselect.
The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.

I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.

From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%

I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change

Differential Revision: http://reviews.llvm.org/D14743



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253349 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-17 17:25:15 +00:00
Craig Topper
1d1d5f6090 Remove templates from CostTableLookup functions. All instantiations had the same type.
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251490 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-28 04:02:12 +00:00
Craig Topper
156f73362e Convert cost table lookup functions to return a pointer to the entry or nullptr instead of the index.
This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer.

While there make use of ArrayRef and std::find_if.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251382 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-27 04:14:24 +00:00
Silviu Baranga
170aefe60c [CostModel][ARM] Increase cost of insert/extract operations
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.

Reviewers: rengolin

Subscribers: mssimpso, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12030

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245225 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-17 15:57:05 +00:00
Chandler Carruth
da49414c4b [TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'
rather than 'unsigned' for their costs.

For something like costs in particular there is a natural "negative"
value, that of savings or saved cost. As a consequence, there is a lot
of code that subtracts or creates negative values based on cost, all of
which is prone to awkwardness or bugs when dealing with an unsigned
type. Similarly, we *never* want these values to wrap, as that would
cause Very Bad code generation (likely percieved as an infinite loop as
we try to emit over 2^32 instructions or some such insanity).

All around 'int' seems a much better fit for these basic metrics. I've
added asserts to ensure that at least the TTI interface never returns
negative numbers here. If we ever have a use case for negative numbers,
we can remove this, but this way a bug where someone used '-1' to
produce a 'very large' cost will be caught by the assert.

This passes all tests, and is also UBSan clean.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D11741

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244080 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-05 18:08:10 +00:00
Silviu Baranga
541d079947 [ARM/AArch64] Fix cost model for interleaved accesses
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.

This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.

No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D11524

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243270 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-27 14:39:34 +00:00