184 Commits

Author SHA1 Message Date
Simon Pilgrim
b237097a65 [X86][AVX512BW] Vectorize v64i8 vector shifts
Differential Revision: https://reviews.llvm.org/D28447

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291665 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 10:36:51 +00:00
Mohammed Agabaria
9c6b24cc3a [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-11 08:23:37 +00:00
Simon Pilgrim
35f85cb068 [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291391 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-08 14:14:36 +00:00
Simon Pilgrim
93f6323c31 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291390 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-08 13:12:03 +00:00
Simon Pilgrim
4f2c5010fd [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291372 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 22:27:43 +00:00
Simon Pilgrim
ee6faf574a [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291366 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 22:08:09 +00:00
Simon Pilgrim
371d289738 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291365 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 21:47:10 +00:00
Simon Pilgrim
e1ddc8e7d2 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291364 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 21:33:00 +00:00
Simon Pilgrim
129141ec2f [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291355 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 18:19:25 +00:00
Simon Pilgrim
21886bd4a8 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291354 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 17:54:10 +00:00
Simon Pilgrim
5988cea66b [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291353 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 17:27:39 +00:00
Simon Pilgrim
9724d35716 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291352 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 17:03:51 +00:00
Simon Pilgrim
f9fdf76b96 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291347 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-07 15:37:50 +00:00
Simon Pilgrim
e5088f5e84 [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291229 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-06 11:12:53 +00:00
Simon Pilgrim
39e9d60ebd [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291187 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 22:48:02 +00:00
Simon Pilgrim
38716ac7b5 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291165 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 19:42:43 +00:00
Simon Pilgrim
7c5fe202d7 Remove trailing whitespace. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291163 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 19:24:25 +00:00
Simon Pilgrim
a04ecdd76e [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291162 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 19:19:39 +00:00
Simon Pilgrim
688489139f [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291158 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 19:01:50 +00:00
Simon Pilgrim
944aa53645 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291152 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 18:36:48 +00:00
Simon Pilgrim
6c924280fe [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291149 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 18:20:25 +00:00
Simon Pilgrim
46a2104916 [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291146 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 17:56:19 +00:00
Simon Pilgrim
2cf83fd91e [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291122 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 15:56:08 +00:00
Simon Pilgrim
8e5a39580d [CostModel][X86] Pulled out common type legalization code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291109 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 14:33:32 +00:00
Mohammed Agabaria
6bf7471dbc Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291106 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 14:03:41 +00:00
Mohammed Agabaria
030c24dcda [Test Commit] fixing some format issue in X86TTI to match clang-format output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291095 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-05 09:51:02 +00:00
Simon Pilgrim
19aab9f9fa [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290962 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 14:01:33 +00:00
Simon Pilgrim
7b65390293 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290956 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 12:08:41 +00:00
Elena Demikhovsky
6c207672d9 Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290812 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-02 11:44:10 +00:00
Elena Demikhovsky
c2b6a16ee9 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290810 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-02 10:37:52 +00:00
Simon Pilgrim
373eadc326 [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290267 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-21 20:00:10 +00:00
Simon Pilgrim
255071b56f [CostModel][X86] Updated reverse shuffle costs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289819 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-15 14:24:07 +00:00
Simon Pilgrim
5d31f856ab [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287882 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-24 14:46:55 +00:00
Simon Pilgrim
e547b01b86 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287762 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 14:01:18 +00:00
Simon Pilgrim
df7181d34e [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287760 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-23 13:42:09 +00:00
Simon Pilgrim
ad27fdae89 [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286838 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 15:54:24 +00:00
Simon Pilgrim
2e6f35ab88 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286832 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 14:45:16 +00:00
Simon Pilgrim
169b408a54 [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286233 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 14:10:28 +00:00
Alexey Bataev
7af02fcbc9 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285564 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-31 12:10:53 +00:00
Simon Pilgrim
6c0e6ef493 [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285329 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-27 18:32:06 +00:00
Simon Pilgrim
0de3e81c28 [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285304 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-27 15:27:00 +00:00
Simon Pilgrim
e8a0965304 [X86][SSE] Add SSE41/AVX1 costs for vector shifts.
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284939 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-23 16:49:04 +00:00
Michael Kuperstein
fe040323c6 [X86] Enable interleaved memory access by default
This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 21:04:31 +00:00
Simon Pilgrim
e1ac64bc87 [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284755 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 18:00:35 +00:00
Simon Pilgrim
99edc4fc3c [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284744 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-20 16:39:11 +00:00
Simon Pilgrim
e615ec6e15 [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284459 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 07:42:15 +00:00
Alexey Bataev
1a62037b43 NFC: The Cost Model specialization, by Andrey Tischenko
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.

Differential Revision: https://reviews.llvm.org/D25186

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284012 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-12 13:24:13 +00:00
Justin Bogner
6673ea81f6 Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 05:10:15 +00:00
Charles Davis
d77fdcbf64 Revert "[X86] Support the "ms-hotpatch" attribute."
This reverts commit r278048. Something changed between the last time I
built this--it takes awhile on my ridiculously slow and ancient
computer--and now that broke this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278053 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-08 21:20:15 +00:00
Charles Davis
cedd288a90 [X86] Support the "ms-hotpatch" attribute.
Summary:
Based on two patches by Michael Mueller.

This is a target attribute that causes a function marked with it to be
emitted as "hotpatchable". This particular mechanism was originally
devised by Microsoft for patching their binaries (which they are
constantly updating to stay ahead of crackers, script kiddies, and other
ne'er-do-wells on the Internet), but is now commonly abused by Windows
programs to hook API functions.

This mechanism is target-specific. For x86, a two-byte no-op instruction
is emitted at the function's entry point; the entry point must be
immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding.
This padding is where the patch code is written. The two byte no-op is
then overwritten with a short jump into this code. The no-op is usually
a `movl %edi, %edi` instruction; this is used as a magic value
indicating that this is a hotpatchable function.

Reviewers: majnemer, sanjoy, rnk

Subscribers: dberris, llvm-commits

Differential Revision: https://reviews.llvm.org/D19908

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278048 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-08 21:01:39 +00:00