33 Commits

Author SHA1 Message Date
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Craig Topper
a551fc9484 Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344965 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-22 22:14:05 +00:00
Craig Topper
2af624687d Revert r344877 "[X86] Stop promoting integer loads to vXi64"
Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344921 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-22 16:59:24 +00:00
Craig Topper
e01c86dd47 [X86] Stop promoting integer loads to vXi64
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344877 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-21 21:30:26 +00:00
Craig Topper
de6038d9af Revert r344873 "foo"
Rebase gone wrong left this in my tree.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344875 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-21 21:08:37 +00:00
Craig Topper
75cb0ad4dd foo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344873 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-21 21:07:25 +00:00
Michael Zolotukhin
2312c1a546 Remove redundant includes from lib/Target/X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320636 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-13 21:31:19 +00:00
Simon Pilgrim
943d3e07f8 [APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.

Differential Revision: https://reviews.llvm.org/D30780

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297458 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-10 13:44:32 +00:00
Simon Pilgrim
29ec93c7b6 [X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037).
If the constants are already the correct size, we can copy them directly into the shuffle mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297381 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-09 14:06:39 +00:00
Craig Topper
72dbe0cc0e [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation
Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized.

Differential Revision: https://reviews.llvm.org/D30387

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296353 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 16:15:27 +00:00
Craig Topper
4b7fe0758a [X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296352 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 16:15:25 +00:00
Simon Pilgrim
95d021ba68 [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296272 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-25 20:01:58 +00:00
Simon Pilgrim
0ab7c7b1f0 Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296147 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 18:31:04 +00:00
Simon Pilgrim
30e6a76a6a [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296141 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 17:46:18 +00:00
Simon Pilgrim
02e6cb0f2d [APInt] Add APInt::setBits() method to set all bits in range
The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296102 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-24 10:15:29 +00:00
Simon Pilgrim
5eef3502f8 [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295845 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-22 15:04:55 +00:00
Simon Pilgrim
a6fc694e85 Use APInt::isAllOnesValue instead of popcnt. NFCI.
More obvious implementation and faster too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284937 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-23 15:09:44 +00:00
Craig Topper
63ae3007f1 [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself.
This is especially important for 32-bit targets with 64-bit shuffle elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284453 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 04:48:33 +00:00
Craig Topper
a97a64ce70 [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself.
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.

Reviewers: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25666

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284451 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-18 04:00:32 +00:00
Simon Pilgrim
ec8ee2ad55 [X86][SSE] Cleaned up shuffle decode assertion messages
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283050 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-01 20:12:56 +00:00
Douglas Katzman
9ef51728b6 [X86] Avoid "unused" warnings if no asserts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282732 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-29 17:26:12 +00:00
Simon Pilgrim
8f1cd7aac1 [X86][SSE] Added common helper for shuffle mask constant pool decodes.
The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data.

This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements.

Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282720 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-29 15:25:48 +00:00
Simon Pilgrim
c0ce5801a7 [X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask
Added AVX512F VPERMILPS shuffle decoding support

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275270 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-13 15:10:43 +00:00
Simon Pilgrim
319708c881 [X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining
Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274571 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-05 18:31:17 +00:00
Chandler Carruth
bb0eb61dd0 Try a bit harder to remove the signed and unsigned comparison warning.
Hopefully this time it actually works and stays away.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272463 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-11 09:13:00 +00:00
Chandler Carruth
3e01c6e81f Compare to an unsigned literal to avoid a -Wsign-compare warning.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272459 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-11 08:02:01 +00:00
Simon Pilgrim
b8b77a8df5 [X86][XOP] Tidied up DecodeVPERMIL2PMask to more closely match DecodeVPERMILPMask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271830 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-05 14:33:43 +00:00
Simon Pilgrim
b6dac61a73 [X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271809 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-04 21:44:28 +00:00
Simon Pilgrim
c710179bcb [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265874 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-09 14:51:26 +00:00
Simon Pilgrim
0c5336e71f [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.

This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.

Removed non-constant pool mask decode path as we have no way of testing it right now.

Differential Revision: http://reviews.llvm.org/D17916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262809 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-06 21:54:52 +00:00
Simon Pilgrim
dd18dd4735 Fix spelling. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262078 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-26 21:56:27 +00:00
Simon Pilgrim
764196a8c8 [X86][SSE] Improve PSHUFB shuffle mask decoding.
In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool.

The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261201 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-18 10:17:40 +00:00
Craig Topper
8cb3fe2069 [X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256680 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-31 22:40:45 +00:00