340 Commits

Author SHA1 Message Date
Simon Pilgrim
7cb9be5d19 [CostModel][X86] Add CTLZ scalar costs
Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it.

For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374786 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-14 16:30:17 +00:00
Simon Pilgrim
13da61c8c4 [CostModel][X86] Add CTPOP scalar costs (PR43656)
Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have).

For targets supporting POPCNT, we provide overrides that assume 1cy costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374775 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-14 14:07:43 +00:00
Sam Parker
a2dc88278d [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374763 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-14 10:00:21 +00:00
Simon Pilgrim
fb4c141673 [CostModel][X86] Improve sum reduction costs.
I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.

I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374655 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-12 13:21:50 +00:00
Zi Xuan Wu
704914973a recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374634 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-12 02:53:04 +00:00
Jinsong Ji
cf65f7210c Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a.
This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374091 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-08 17:32:56 +00:00
Zi Xuan Wu
ee8d82e802 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374017 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-08 03:28:33 +00:00
David Zarzycki
d80d047ed6 [X86] Enable inline memcmp() to use AVX512
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373706 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-04 07:42:34 +00:00
Craig Topper
0e4565a1bc [X86] Remove -x86-experimental-vector-widening-legalization command line flag
This was added back to allow some performance regressions to be
investigated. The main perf issue was fixed shortly after adding
this back and no other major issues have been reported. So I
think its safe to remove this again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373174 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-29 23:32:37 +00:00
Guillaume Chatelet
71864c0be5 [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373081 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-27 12:54:21 +00:00
Simon Pilgrim
c55d5fc412 [CostModel][X86] Fix SLM <2 x i64> icmp costs
SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs.

This should remove some of the SLM codegen diffs in D43582

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372954 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-26 10:14:38 +00:00
Simon Pilgrim
3ff0f3bdeb [Cost][X86] Add more missing vector truncation costs
The AVX512 cases still need some work to correct recognise the PMOV truncation cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372514 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-22 16:46:15 +00:00
Simon Pilgrim
0a8c54aac9 [Cost][X86] Add v2i64 truncation costs
We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll

I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372498 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-22 12:04:38 +00:00
Guillaume Chatelet
3ad084e5dd [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371063 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-05 13:09:42 +00:00
Craig Topper
2a60161513 [X86] Simplify the setOperationAction handling for fp_to_uint by improving the Custom handler a bit.
This merges the 32-bit and 64-bit mode code to just use Custom
for both i32 and i64. We already had most of the handling in
the custom handling due to the AVX512 having legal fp_to_uint.
Just needed to add the i32->i64 promotion handling. Refactor
the fp_to_uint code in the custom handler to simplify the
number of times we check things.

Tweak cost model tables to match the default handling we were
getting due to Expand before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370700 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-03 05:57:22 +00:00
Craig Topper
3d187fdcb7 [X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening legalization.
I don't really understand the costs we're using for fp_to_sint,
but prior to widening legalization we used 20 as the cost for this
via the v2i64->v2f64 entry. That number seems better than the 40
we got with widening legalization. So now we need either a
v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether
AVX is enabled or not since we skip the first SSE2 table look up
under AVX.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369628 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-22 08:18:45 +00:00
Craig Topper
c5e7143807 [X86] Add back the -x86-experimental-vector-widening-legalization comand line flag and all associated code, but leave it enabled by default
Google is reporting performance issues with the new default behavior
and have asked for a way to switch back to the old behavior while we
investigate and make fixes.

I've restored all of the code that had since been removed and added
additional checks of the command flag onto code paths that are
not otherwise guarded by a check of getTypeAction.

I've also modified the cost model tables to hopefully get us back
to the previous costs.

Hopefully we won't need to support this for very long since we
have no test coverage of the old behavior so we can very easily
break it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369332 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-20 06:58:00 +00:00
Craig Topper
de920afcf3 [X86] Improve cost model for subvector extraction of less than 128-bit vectors
Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements.

Differential Revision: https://reviews.llvm.org/D65892

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369022 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-15 17:29:42 +00:00
Craig Topper
50d3b49c52 [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs
Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG.

For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high.

Differential Revision: https://reviews.llvm.org/D66169

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368858 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-14 14:52:39 +00:00
Craig Topper
9ef33c9301 Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368183 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-07 16:24:26 +00:00
Mitch Phillips
fcdae32ca1 Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368107 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-06 23:00:43 +00:00
Craig Topper
bd34ddf016 [X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367901 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-05 18:25:36 +00:00
Clement Courbet
ff07b17325 [ExpandMemCmp] Honor prefer-vector-width.
Reviewers: gchatelet, echristo, spatel, atdt

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364384 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-26 07:06:49 +00:00
Clement Courbet
6ef46a770d [ExpandMemCmp] Move all options to TargetTransformInfo.
Split off from D60318.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364281 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-25 08:04:13 +00:00
Warren Ristow
31868b92df [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363581 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 17:20:08 +00:00
Simon Pilgrim
3748537115 [CostModel][X86] Improve masked load/store AVX1/AVX2 costs
A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range.

e.g. SandyBridge
defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>;
defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;
defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>;

e.g. Btver2
defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>;
defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>;
defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>;
defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>;

Differential Revision: https://reviews.llvm.org/D61257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362338 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-02 20:37:02 +00:00
Simon Pilgrim
852d2fed62 [TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI.
Prep work before resurrecting D61257.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362335 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-02 18:06:42 +00:00
Simon Pilgrim
ab39cf6afa [CostModel][X86] Add min/max reduction costs for all SSE targets
The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference).

I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360528 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-11 17:12:52 +00:00
Simon Pilgrim
abf63388b0 [TTI][X86] Make getAddressComputationCost cost value const. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359999 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-05 20:03:51 +00:00
Simon Pilgrim
cd248aacbf [CostModel][X86] Add bool anyof/allof reduction costs
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......

Differential Revision: https://reviews.llvm.org/D60403

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358574 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-17 10:58:19 +00:00
Simon Pilgrim
0e1be17323 [CostModel][X86] Masked load legalization requires an binary-shuffle not a select (PR39812)
Expansion/truncation is better described by SK_PermuteTwoSrc than SK_Select

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357864 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-07 13:26:09 +00:00
Clement Courbet
60639d89eb [X86MacroFusion] Handle branch fusion (AMD CPUs).
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.

See D59688 for context.

Reviewers: andreadb, lebedev.ri

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59872

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357171 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-28 14:12:46 +00:00
Craig Topper
c1b79b3c59 [ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics.
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.

I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.

Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.

Differential Revision: https://reviews.llvm.org/D59180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356687 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 17:38:52 +00:00
Craig Topper
80bb69d69b [X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather.
We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store.

For FP we now explicitly check for only double and float.

For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly.

We only check bitwidth after checking that the type is an integer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355667 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-08 07:33:43 +00:00
Eric Christopher
46f8c8ccf1 Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)"
As this has broken the lto bootstrap build for 3 days and is
showing a significant regression on the Dither_benchmark results (from
the LLVM benchmark suite) -- specifically, on the
BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
Sandybridge.

This reverts commit r353923.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354434 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-20 04:42:07 +00:00
Craig Topper
82fbd87152 [X86] Don't consider functions ABI compatible for ArgumentPromotion pass if they view 512-bit vectors differently.
The use of the -mprefer-vector-width=256 command line option mixed with functions
using vector intrinsics can create situations where one function thinks 512 vectors
are legal, but another fucntion does not.

If a 512 bit vector is passed between them via a pointer, its possible ArgumentPromotion
might try to pass by value instead. This will result in type legalization for the two
functions handling the 512 bit vector differently leading to runtime failures.

Had the 512 bit vector been passed by value from clang codegen, both functions would
have been tagged with a min-legal-vector-width=512 function attribute. That would
make them be legalized the same way.

I observed this issue in 32-bit mode where a union containing a 512 bit vector was
being passed by a function that used intrinsics to one that did not. The caller
ended up passing in zmm0 and the callee tried to read it from ymm0 and ymm1.

The fix implemented here is just to consider it a mismatch if two functions
would handle 512 bit differently without looking at the types that are being
considered. This is the easist and safest fix, but it can be improved in the future.

Differential Revision: https://reviews.llvm.org/D58390

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354376 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-19 20:12:20 +00:00
Craig Topper
7300bbb814 [X86] Filter out tuning feature flags and a few ISA feature flags when checking for function inline compatibility.
Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining.

There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present.

Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible.

I've based the implementation here of the similar code in AMDGPU.

Differential Revision: https://reviews.llvm.org/D58371

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354355 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-19 17:05:11 +00:00
Anton Afanasyev
74c86f7499 [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
Try to use 64-bit SLP vectorization. In addition to horizontal instrs
this change triggers optimizations for partial vector operations (for instance,
using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
<2 x float>).

Fixes llvm.org/PR32433

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353923 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-13 08:26:43 +00:00
Nikita Popov
98d9d8b1f1 [CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD
Followup to D56636, this time handling the UADDSAT case by expanding
uadd.sat(a, b) to umin(a, ~b) + b.

Differential Revision: https://reviews.llvm.org/D56869

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352409 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-28 19:19:09 +00:00
Simon Pilgrim
a6db2ced99 [TTI] Add generic SADDO/SSUBO costs
Added x86 scalar sadd_with_overflow/ssub_with_overflow costs.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352045 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-24 13:36:45 +00:00
Simon Pilgrim
37114573a1 [TTI] Add generic UADDO/USUBO costs
Added x86 scalar uadd_with_overflow/usub_with_overflow costs.

Differential Revision: https://reviews.llvm.org/D56907

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352043 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-24 12:10:20 +00:00
Simon Pilgrim
598d00ecd6 [CostModel][X86] Add ICMP Predicate specific costs
First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly.

Differential Revision: https://reviews.llvm.org/D57013

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351810 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-22 12:29:38 +00:00
Simon Pilgrim
21d100aff8 [CostModel][X86] Add explicit vector select costs
Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)|(Y & ~C)) bit select.

Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason).

The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351685 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-20 13:55:01 +00:00
Simon Pilgrim
38ece52d64 [CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets
Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351684 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-20 13:21:43 +00:00
Simon Pilgrim
c33a4fdc9c [TTI][X86] Reordered getCmpSelInstrCost cost tables in descending ISA order. NFCI.
Minor tidyup to make it clearer whats going on before adding additional costs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351683 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-20 12:28:13 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Nikita Popov
b96b37dc8f Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Reapplying with updated SLPVectorizer tests.

Differential Revision: https://reviews.llvm.org/D56636

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351219 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-15 18:43:41 +00:00
Nikita Popov
f0a0953bb6 Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351129 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-14 22:18:39 +00:00
Nikita Popov
ec124e9d4d [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351125 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-14 21:43:30 +00:00
Sanjay Patel
d5bcb1b022 [x86] explicitly set cost of integer add/sub
There are no test changes here in the existing cost model
regression tests because integer add/sub have a default
legal cost of 1 already. This would break, however, if
we custom lower those ops because the default cost model
assumes that custom-lowered ops are more expensive.

This is similar to the change in rL350403. See discussion
in D56011 for more details. When we enhance that patch to
handle integer ops, we need this cost model change to avoid
unintended diffs here from the custom lowering.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350496 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-06 16:21:42 +00:00