archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	9d9d7803a0	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Simon Pilgrim	aa5013fecf	[X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037). If the constants are already the correct size, we can copy them directly into the shuffle mask. llvm-svn: 297381	2017-03-09 14:06:39 +00:00
Craig Topper	5eb13883a0	[X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized. Differential Revision: https://reviews.llvm.org/D30387 llvm-svn: 296353	2017-02-27 16:15:27 +00:00
Craig Topper	f2749ccb21	[X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30386 llvm-svn: 296352	2017-02-27 16:15:25 +00:00
Simon Pilgrim	879a5b0804	[APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied) The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272	2017-02-25 20:01:58 +00:00
Simon Pilgrim	643050a88e	Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147	2017-02-24 18:31:04 +00:00
Simon Pilgrim	a9d3aa72eb	[APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141	2017-02-24 17:46:18 +00:00
Simon Pilgrim	af689eb896	[APInt] Add APInt::setBits() method to set all bits in range The current pattern for setting bits in range is typically: Mask \|= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102	2017-02-24 10:15:29 +00:00
Simon Pilgrim	25f98269b4	[X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI. llvm-svn: 295845	2017-02-22 15:04:55 +00:00
Simon Pilgrim	7070280c5e	Use APInt::isAllOnesValue instead of popcnt. NFCI. More obvious implementation and faster too. llvm-svn: 284937	2016-10-23 15:09:44 +00:00
Craig Topper	f805b093b4	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453	2016-10-18 04:48:33 +00:00
Craig Topper	a71d800155	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451	2016-10-18 04:00:32 +00:00
Simon Pilgrim	c8c1e1ed53	[X86][SSE] Cleaned up shuffle decode assertion messages llvm-svn: 283050	2016-10-01 20:12:56 +00:00
Douglas Katzman	d5fbe7bbc1	[X86] Avoid "unused" warnings if no asserts llvm-svn: 282732	2016-09-29 17:26:12 +00:00
Simon Pilgrim	e14120722d	[X86][SSE] Added common helper for shuffle mask constant pool decodes. The shuffle mask decodes have a large amount of repeated code extracting/splitting mask values from Constant data. This patch pulls all of this duplicated code into a single helper function to identify undef elements and combine/split constant integer data into the requested shuffle mask elements. Updated PSHUFB/VPERMIL/VPERMIL2/VPPERM decoders to use it (VPERMV/VPERMV3 could be converted as well in the future). llvm-svn: 282720	2016-09-29 15:25:48 +00:00
Simon Pilgrim	4bfe4bab88	[X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask Added AVX512F VPERMILPS shuffle decoding support llvm-svn: 275270	2016-07-13 15:10:43 +00:00
Simon Pilgrim	aec49bab5b	[X86][AVX512] Fixed decoding of permd/permpd variable mask shuffles + enabled them for target shuffle combining Corrected element mask masking to extract the bottom index bits (now matches the perm2 implementation but for unary inputs). llvm-svn: 274571	2016-07-05 18:31:17 +00:00
Chandler Carruth	1b236505ea	Try a bit harder to remove the signed and unsigned comparison warning. Hopefully this time it actually works and stays away. llvm-svn: 272463	2016-06-11 09:13:00 +00:00
Chandler Carruth	c5dd6b188a	Compare to an unsigned literal to avoid a -Wsign-compare warning. llvm-svn: 272459	2016-06-11 08:02:01 +00:00
Simon Pilgrim	8b04898abe	[X86][XOP] Tidied up DecodeVPERMIL2PMask to more closely match DecodeVPERMILPMask. llvm-svn: 271830	2016-06-05 14:33:43 +00:00
Simon Pilgrim	2edc73fed4	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding llvm-svn: 271809	2016-06-04 21:44:28 +00:00
Simon Pilgrim	c8f7b4ec60	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Simon Pilgrim	2df8497807	[X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809	2016-03-06 21:54:52 +00:00
Simon Pilgrim	b240484d51	Fix spelling. NFCI. llvm-svn: 262078	2016-02-26 21:56:27 +00:00
Simon Pilgrim	82dcce5934	[X86][SSE] Improve PSHUFB shuffle mask decoding. In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool. The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly. llvm-svn: 261201	2016-02-18 10:17:40 +00:00
Craig Topper	cf29b2e15a	[X86] Move shuffle decoding for constant pool into the X86CodeGen library to remove a layering violation in the Util library. llvm-svn: 256680	2015-12-31 22:40:45 +00:00

26 Commits