Commit Graph

221 Commits

Author SHA1 Message Date
Simon Pilgrim
0016b62b09 [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types
We already have patterns in place to support 128/256-bit shifts without AVX512VL

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292077 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 20:44:00 +00:00
Simon Pilgrim
3a60120921 [CostModel][X86] Drop separate AVX512VL checks - they match existing AVX512 costs
Keep the tests though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292076 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 20:19:28 +00:00
Simon Pilgrim
43b72e4d01 [CostModel][X86] Update vector shift tests to correctly check by non-constant uniform values.
Use shuffle( scslar_to_vector, zeroinitializer) pattern instead of shuffle( vec, zeroinitializer)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292075 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-15 20:10:28 +00:00
Simon Pilgrim
75f614f4c2 [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292023 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-14 19:24:23 +00:00
Simon Pilgrim
b237097a65 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291665 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 10:36:51 +00:00
Simon Pilgrim
4f063324dc Fix line endings
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291663 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 10:25:31 +00:00
Mohammed Agabaria
9c6b24cc3a [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 08:23:37 +00:00
Evandro Menezes
eb75eeae13 [AArch64] Consider all vector types for FeatureSlowMisaligned128Store
The original code considered only v2i64 as slow for this feature. This patch
consider all 128-bit long vector types as slow candidates.

In internal tests, extending this feature to all 128-bit vector types
resulted in an overall improvement of 1% on Exynos M1.

Differential revision: https://reviews.llvm.org/D27998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291616 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 23:42:21 +00:00
Simon Pilgrim
16ed161b13 [CostModel][X86] Add AVX512VL vector shift cost tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291585 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-10 19:04:12 +00:00
Simon Pilgrim
35f85cb068 [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291391 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-08 14:14:36 +00:00
Simon Pilgrim
93f6323c31 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291390 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-08 13:12:03 +00:00
Simon Pilgrim
4f2c5010fd [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291372 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 22:27:43 +00:00
Simon Pilgrim
ee6faf574a [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291366 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 22:08:09 +00:00
Simon Pilgrim
371d289738 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291365 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 21:47:10 +00:00
Simon Pilgrim
21886bd4a8 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291354 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 17:54:10 +00:00
Simon Pilgrim
f9fdf76b96 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291347 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 15:37:50 +00:00
Simon Pilgrim
3beff6a4d1 [CostModel][X86] Add AVX512 and 512-bit vector shift cost tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291269 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 19:41:26 +00:00
Chad Rosier
ed5adf8510 [AArch64] Reduce vector insert/extract cost for Falkor.
Differential Revision: https://reviews.llvm.org/D28403

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291254 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 18:03:26 +00:00
Simon Pilgrim
e5088f5e84 [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291229 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 11:12:53 +00:00
Simon Pilgrim
4c61c4605a [CostModel][X86] Add SDIV/UDIV cost tests for a wider range of targets
Added a test demonstrating bug in AVX512 division costs

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291228 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 11:02:40 +00:00
Simon Pilgrim
6c924280fe [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291149 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 18:20:25 +00:00
Chad Rosier
1c1849cdd8 [AArch64][CostModel] Add coverage for bswap intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291140 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 16:55:32 +00:00
Simon Pilgrim
2cf83fd91e [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291122 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 15:56:08 +00:00
Chad Rosier
213f249d8d [AArch64] Remove mcpu option as this test is not target specific. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291117 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 15:05:03 +00:00
Chad Rosier
c8deff8e27 [AArch64] Remove unused arguments from tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291112 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 14:48:53 +00:00
Simon Pilgrim
19aab9f9fa [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290962 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 14:01:33 +00:00
Elena Demikhovsky
6c207672d9 Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290812 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-02 11:44:10 +00:00
Elena Demikhovsky
c2b6a16ee9 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290810 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-02 10:37:52 +00:00
Simon Pilgrim
373eadc326 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290267 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-21 20:00:10 +00:00
Matthew Simpson
ad1bdf6350 [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D27677

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289845 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 18:36:59 +00:00
Simon Pilgrim
255071b56f [CostModel][X86] Updated reverse shuffle costs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289819 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 14:24:07 +00:00
Simon Pilgrim
1e5c5ee6b5 [CostModel] Fix long standing bug with reverse shuffle mask detection
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289811 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 12:12:45 +00:00
Simon Pilgrim
768d26663d [CostModel][X86] Add tests for reverse shuffle costs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289800 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 10:45:53 +00:00
Haicheng Wu
d55bd2eef3 [AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()
In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511].

Differential Revision: https://reviews.llvm.org/D27480

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288876 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-07 01:45:04 +00:00
Haicheng Wu
f4638220d6 [TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode()
Fix a bug when we call isLegalAddressingMode() from getGEPCost().

Differential Revision: https://reviews.llvm.org/D27357

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288569 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-03 01:57:24 +00:00
Guozhi Wei
9d91f9059d [ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.

Differential Revision: https://reviews.llvm.org/D26713



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288560 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-03 00:41:43 +00:00
Alexey Bataev
171807f96b [SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.

Differential Revision: https://reviews.llvm.org/D26277

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288398 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-01 18:42:42 +00:00
Alexey Bataev
c35da94c10 [SLP] Additional tests with the cost of vector operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288377 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-01 17:26:54 +00:00
Alexey Bataev
7d6a2cb700 Revert "[SLP] Additional tests with the cost of vector operations."
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288371 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-01 16:45:04 +00:00
Alexey Bataev
d9c0875e6d [SLP] Additional tests with the cost of vector operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288369 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-01 16:11:48 +00:00
Simon Pilgrim
5d31f856ab [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-24 14:46:55 +00:00
Simon Pilgrim
e547b01b86 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287762 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 14:01:18 +00:00
Simon Pilgrim
df7181d34e [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287760 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 13:42:09 +00:00
Simon Pilgrim
fca1eeaa1f [CostModel][X86] Add v2f32 -> v2i64 fptosi/fptoui cost tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287756 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 11:43:00 +00:00
Simon Pilgrim
d93dfb18c0 [CostModel][X86] Updated sitofp/uitofp scalar/vector cost tests
Better coverage of all legal types + special cases.

Removed old fptoui tests which are all handled in fptoui.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287678 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-22 18:55:49 +00:00
Craig Topper
fdec4f508b [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287298 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 02:25:34 +00:00
Simon Pilgrim
ad27fdae89 [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286838 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 15:54:24 +00:00
Simon Pilgrim
2e6f35ab88 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286832 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 14:45:16 +00:00
Simon Pilgrim
169b408a54 [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286233 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 14:10:28 +00:00
Alexey Bataev
7af02fcbc9 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285564 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 12:10:53 +00:00