RPCSX/llvm - llvm - Gitea: Git with a cup of tea

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2024-11-30 23:20:54 +00:00

Author	SHA1	Message	Date
Simon Pilgrim	7537c45fbd	[CostModel][X86] Tidied up checks git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269770 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-17 14:43:41 +00:00
Simon Pilgrim	78ba7287ef	[CostModel][X86] Added scalar bitreverse tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269594 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-15 17:40:48 +00:00
Simon Pilgrim	15a59473b3	[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268972 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-09 21:14:38 +00:00
Simon Pilgrim	ba60f16656	[CostModel][X86] Extended comparison instruction cost model tests to include SSE2/SSE3/SSSE3/SSE41/SSE42 targets git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268877 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-08 15:24:53 +00:00
Simon Pilgrim	a3ea594334	[CostModel][X86] Split BSWAP/BITREVERSE cost tests from CTPOP/CTLZ/CTTZ 'bit count' cost tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268859 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-07 16:34:16 +00:00
Simon Pilgrim	c67a48457d	[CostModel][X86] Tweak 'SSE2-only' test CPU as it was only disabling SSE41 not SSE3/SSSE3 etc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268763 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-06 17:50:07 +00:00
Simon Pilgrim	556c69a0f5	[CostModel][X86] Added ctlz/cttz undef-zero costmodel tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268761 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-06 17:48:35 +00:00
Simon Pilgrim	179257b158	[CostModel][X86] Added costmodel tests for vector ctpop/ctlz/cttz/bitreverse/bswap git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268738 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-06 14:38:14 +00:00
Ashutosh Nema	91d7ac06b6	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267123 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-22 08:34:05 +00:00
Adam Nemet	cf0a711bff	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266282 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-14 08:47:17 +00:00
Artur Pilipenko	80ce67004b	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266086 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-12 15:58:04 +00:00
Benjamin Kramer	638cd0356d	[TTI] Let the cost model estimate ctpop costs based on legality PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265005 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 10:42:40 +00:00
Matt Arsenault	bb366db643	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264376 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 01:16:40 +00:00
Matt Arsenault	359a7d918e	AMDGPU: Partially implement getArithmeticInstrCost for FP ops git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264374 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 01:00:32 +00:00
Matt Arsenault	e4e369ab90	TTI: Report 0 cost for free addrspacecasts git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264369 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 00:26:29 +00:00
Matt Arsenault	93e0b28a0e	TTI: Use 0 for cost of fabs if free Ideally this would also happen for fneg, but that isn't a distinct operation in the IR. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264368 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 00:26:22 +00:00
Matt Arsenault	42792ff39d	AMDGPU: TTI: Make insertelement free. We don't want to have a cost to scalarizing operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264364 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-25 00:14:11 +00:00
Matthias Braun	a31e891389	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264088 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-22 20:24:34 +00:00
Artur Pilipenko	980df33d17	Support arbitrary addrspace pointers in masked load/store intrinsics This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263158 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-10 20:39:22 +00:00
Matthew Simpson	449a562ef6	[AArch64] Reduce vector insert/extract cost for Kryo Differential Revision: http://reviews.llvm.org/D17379 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261237 91177308-0d34-0410-b5e6-96231b3b80d8	2016-02-18 18:35:45 +00:00
Elena Demikhovsky	84f6badccc	Implemented cost model for masked gather and scatter operations The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256519 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-28 20:10:59 +00:00
Cong Hou	0d04e87535	[X86][SSE] Transform truncations between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256194 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-21 20:42:43 +00:00
Matt Arsenault	b11dd50509	AMDGPU: Override getCFInstrCost The default cost was 0 with the assumption that it is predictable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255796 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-16 18:37:19 +00:00
Cong Hou	e265d9a2d4	[X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1. Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255315 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-11 00:31:39 +00:00
Cong Hou	21aabdad38	Don't punish vectorized arithmetic instruction whose type will be split to multiple registers Currently in LLVM's cost model, a vectorized arithmetic instruction will have high cost if its type is split into multiple registers. However, this punishment is too heavy and unnecessary. The overhead of the split should not be on arithmetic instructions but instructions that implement the split. Note that during vectorization we have calculated the register pressure, and we only choose proper interleaving factor (and also vectorization factor) so that we don't use more registers than the maximum number. Here is a very simple example: if a vadd has the cost 1, and if we double VF so that we need two registers to perform it, then its cost will become 4 with the current implementation, which will prevent us to use larger VF. Differential revision: http://reviews.llvm.org/D15159 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254671 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-04 00:36:58 +00:00
Elena Demikhovsky	cd9551564b	AVX-512: Updated cost of FP/SINT/UINT conversion operations I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode. Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets). Differential Revision: http://reviews.llvm.org/D15074 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254494 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-02 08:59:47 +00:00
Matt Arsenault	2320de6adb	AMDGPU: Report extractelement as free in cost model The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254438 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-01 19:08:39 +00:00
Elena Demikhovsky	ef5008e6d0	Fixed a failure in cost calculation for vector GEP Cost calculation for vector GEP failed with due to invalid cast to GEP index operand. The bug is fixed, added a test. http://reviews.llvm.org/D14976 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254408 91177308-0d34-0410-b5e6-96231b3b80d8	2015-12-01 12:08:36 +00:00
Charlie Turner	7a016a152d	[ARM] Don't pessimize i32 vselect. The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad scalarization that is still happening there. I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements. From my benchmarks, I saw these improvements in A57 (T32) spec.cpu2000.ref.177_mesa 5.95% lnt.SingleSource/Benchmarks/Shootout/strcat 12.93% lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89% I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change Differential Revision: http://reviews.llvm.org/D14743 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253349 91177308-0d34-0410-b5e6-96231b3b80d8	2015-11-17 17:25:15 +00:00
Simon Pilgrim	913c649b16	[CostModel] Fixed AVX integer shift costs Targets with AVX but without AVX2 were incorrectly reporting costs of 256-bit integer shifts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250611 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-17 13:23:38 +00:00
Simon Pilgrim	d656aeb2d1	[X86] Completed SHL cost model tests As discussed in D8690. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249990 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-11 18:33:48 +00:00
Simon Pilgrim	6bb2edf33a	[X86] Renamed SHL cost model tests Matches naming conventions for ASHR/LSHR cost tests As discussed in D8690. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249984 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-11 17:34:32 +00:00
Simon Pilgrim	d9e98836ed	[X86] Added LSHR cost model tests There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing. As discussed in D8690. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249983 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-11 17:29:26 +00:00
Simon Pilgrim	76788647d0	[X86] Added ASHR cost model tests There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing. As discussed in D8690. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249981 91177308-0d34-0410-b5e6-96231b3b80d8	2015-10-11 17:08:05 +00:00
Simon Pilgrim	4e042482c8	[X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248878 91177308-0d34-0410-b5e6-96231b3b80d8	2015-09-30 08:17:50 +00:00
Silviu Baranga	076967c806	[CostModel][AArch64] Remove amortization factor for some of the vector select instructions Summary: We are not scalarizing the wide selects in codegen for i16 and i32 and therefore we can remove the amortization factor. We still have issues with i64 vectors in codegen though. Reviewers: mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12724 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247156 91177308-0d34-0410-b5e6-96231b3b80d8	2015-09-09 15:35:02 +00:00
Hal Finkel	ecebcfc3a1	[PowerPC] Include the permutation cost for unaligned vector loads Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX types), even when accounting for the combining that takes place for multiple consecutive such loads, there is at least one load instructions and one permutation for each load. Make sure the cost reported reflects the cost of the permutes as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246807 91177308-0d34-0410-b5e6-96231b3b80d8	2015-09-03 21:23:18 +00:00
Hal Finkel	2551be3865	[PowerPC] Cleanup cost model for unaligned vector loads/stores I'm adding a regression test to better cover code generation for unaligned vector loads and stores, but there's no functional change to the code generation here. There is an improvement to the cost model for unaligned vector loads and stores, mostly for QPX (for which we were not previously accounting for the permutation-based loads), and the cost model implementation is cleaner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246712 91177308-0d34-0410-b5e6-96231b3b80d8	2015-09-02 21:03:28 +00:00
Silviu Baranga	170aefe60c	[CostModel][ARM] Increase cost of insert/extract operations Summary: This change limits the minimum cost of an insert/extract element operation to 2 in cases where this would result in mixing of NEON and VFP code. Reviewers: rengolin Subscribers: mssimpso, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12030 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245225 91177308-0d34-0410-b5e6-96231b3b80d8	2015-08-17 15:57:05 +00:00
Simon Pilgrim	5ff91d8781	[X86][SSE] Vectorize i64 ASHR operations This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed. Differential Revision: http://reviews.llvm.org/D11439 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243569 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-29 20:31:45 +00:00
Jingyue Wu	580991b5c9	Roll forward r243250 r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build slaves, but I couldn't reproduce this failure locally. Probably a false positive as I saw this test was broken by r243246 or r243247 too but passed later without people fixing anything. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243253 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-26 19:10:03 +00:00
Jingyue Wu	e48b1257f1	Revert r243250 breaks tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243251 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-26 18:30:13 +00:00
Jingyue Wu	9f141640b5	[TTI/CostModel] improve TTI::getGEPCost and use it in CostModel::getInstructionCost Summary: This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider addressing modes. It now returns TCC_Free when the GEP can be completely folded to an addresing mode. I started this patch as I refactored SLSR. Function isGEPFoldable looks common and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost. Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best testing bed seems CostModel, but its getInstructionCost method invokes getAddressComputationCost for GEPs which provides very coarse estimation. So this patch also makes getInstructionCost call the updated getGEPCost for GEPs. This change inevitably breaks some tests because the cost model changes, but nothing looks seriously wrong -- if we believe the new cost model is the right way to go, these tests should be updated. This patch is not perfect yet -- the comments in some tests need to be updated. I want to know whether this is a right approach before fixing those details. Reviewers: chandlerc, hfinkel Subscribers: aschwaighofer, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D9819 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@243250 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-26 17:28:13 +00:00
Simon Pilgrim	9549a0c0bc	[X86][SSE] Updated SHL/LSHR i64 vectorization costs. This was missed in D8416. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242621 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-18 20:06:30 +00:00
Simon Pilgrim	f9df477221	[X86][SSE] Vectorized v4i32 non-uniform shifts. While the v4i32 shl operation is already vectorized using a cvttps2dq/pmulld pattern, the lshr/ashr opeations are still scalarized. This patch adds vectorization support for non-uniform v4i32 shift operations - it splats constant shift amounts to allow them to use the immediate sse shift instructions, or extracts/zero-extends non-constant shift amounts. The individual results are then blended together. Differential Revision: http://reviews.llvm.org/D11063 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241989 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-12 11:15:19 +00:00
Simon Pilgrim	6970be03d1	[X86][SSE] Vectorized i64 uniform constant SRA shifts This patch adds vectorization support for uniform constant i64 arithmetic shift right operators. Differential Revision: http://reviews.llvm.org/D9645 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241514 91177308-0d34-0410-b5e6-96231b3b80d8	2015-07-06 22:35:19 +00:00
Simon Pilgrim	1e71d7421a	[X86][SSE][CostModel] Added full set of sitofp/uitofp costings for SSE2/AVX/AVX2/AVX512F. Merged separate (but equivalent) SSE2/AVX512F tests. Removed codegen tests since these are already done better in test/CodeGen/X86. The actual cost values still need to be updated to match recent codegen improvements. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240219 91177308-0d34-0410-b5e6-96231b3b80d8	2015-06-20 14:58:01 +00:00
Simon Pilgrim	4ac9a2e70f	[X86][SSE][CostModel] Fixed uitofp/sitofp cost target tests to specify sse2/avx2/avx512f directly instead of via a cpu model. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240062 91177308-0d34-0410-b5e6-96231b3b80d8	2015-06-18 21:26:01 +00:00
Simon Pilgrim	44226ffc19	[X86][SSE] Vectorized i8 and i16 shift operators This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements: 1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node. 2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper). 3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial. The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however. Tested on SSE2, SSE41 and AVX machines. Differential Revision: http://reviews.llvm.org/D9474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239509 91177308-0d34-0410-b5e6-96231b3b80d8	2015-06-11 07:46:37 +00:00
Simon Pilgrim	ab18d0e7cb	[X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED) Fixed broken tests. Differential Revision: http://reviews.llvm.org/D8416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232682 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 22:18:51 +00:00

1 2 3

131 Commits