archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Francis Visoiu Mistrih	fd11bc0813	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" \) -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320022 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	ca0df55065	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(\1)/g' find . \( -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" \) -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319665 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	a4ec08b6fd	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319187 91177308-0d34-0410-b5e6-96231b3b80d8	2017-11-28 17:15:09 +00:00
Craig Topper	406408a5da	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314914 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-04 17:20:12 +00:00
Geoff Berry	c3ef7ae13a	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314814 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-03 16:59:13 +00:00
Geoff Berry	d990d28864	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314729 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-02 22:01:37 +00:00
Craig Topper	4c92030df7	[X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313610 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-19 04:39:55 +00:00
Sam McCall	c7c869be7e	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312490 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-04 15:47:00 +00:00
Geoff Berry	d168a77ec3	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312328 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-01 14:27:20 +00:00
Hans Wennborg	92b6b153a4	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312178 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 22:11:37 +00:00
Geoff Berry	62c7c252f8	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312154 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 18:41:07 +00:00
Craig Topper	f9a2a4b7d4	[AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312138 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 16:38:33 +00:00
Geoff Berry	6c9f36933c	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-18 01:43:11 +00:00
Geoff Berry	d93db263e5	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 23:06:55 +00:00
Craig Topper	7feb6fc8e5	[AVX512] Don't switch unmasked subvector insert/extract instructions when AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311091 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 15:40:25 +00:00
Geoff Berry	a6a5be21df	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 04:04:11 +00:00
Geoff Berry	31db6f3bd2	[MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-16 20:50:01 +00:00
Sanjay Patel	8f61a6eb1f	[DAGCombiner] use narrow vector ops to eliminate concat/extract (PR32790) In the best case: extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN ...we kill all of the extract/concat and just have narrow binops remaining. If only one of the binop operands is amenable, this transform is still worthwhile because we kill some of the extract/concat. Optional bitcasting makes the code more complicated, but there doesn't seem to be a way to avoid that. The TODO about extending to more than bitwise logic is there because we really will regress several x86 tests including madd, psad, and even a plain integer-multiply-by-2 or shift-left-by-1. I don't think there's anything fundamentally wrong with this patch that would cause those regressions; those folds are just missing or brittle. If we extend to more binops, I found that this patch will fire on at least one non-x86 regression test. There's an ARM NEON test in test/CodeGen/ARM/coalesce-subregs.ll with a pattern like: t5: v2f32 = vector_shuffle<0,3> t2, t4 t6: v1i64 = bitcast t5 t8: v1i64 = BUILD_VECTOR Constant:i64<0> t9: v2i64 = concat_vectors t6, t8 t10: v4f32 = bitcast t9 t12: v4f32 = fmul t11, t10 t13: v2i64 = bitcast t12 t16: v1i64 = extract_subvector t13, Constant:i32<0> There was no functional change in the codegen from this transform from what I could see though. For the x86 test changes: 1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case, but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops, there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op. SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat ops to match the pattern. 2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract. Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026 3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT. 4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction count by one in each case by eliminating two insert/extract while adding one narrower logic op. https://bugs.llvm.org/show_bug.cgi?id=32790 Differential Revision: https://reviews.llvm.org/D33137 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303997 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-26 15:33:18 +00:00
Ayman Musa	1ebb1b7a38	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301386 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-26 07:08:44 +00:00
Amjad Aboud	a90eb62d9e	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296859 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 09:03:24 +00:00
Craig Topper	ff0f1ca865	[X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294824 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-11 05:32:57 +00:00
Craig Topper	53eca87929	[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294130 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-05 18:33:14 +00:00
Simon Pilgrim	2d628eed7f	[X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292493 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-19 16:25:02 +00:00
Simon Pilgrim	b3f7c2ec02	[X86][SSE] Added tests for pre-truncating arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292487 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-19 15:03:00 +00:00
Simon Pilgrim	a947b2a548	[X86] Attempt to pre-truncate arithmetic operations if useful In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290947 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-04 08:05:42 +00:00
Simon Pilgrim	0f01171f78	[X86][SSE] Add extra truncated arithmetic tests for D28219 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290902 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-03 19:18:07 +00:00
Simon Pilgrim	0934969dbc	[X86][AVX512DQ] Add truncated math tests for AVX512DQ. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290772 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-30 22:43:41 +00:00
Simon Pilgrim	50c0e28e08	[X86][SSE] Fix truncated math test names. Inconsistent naming convention and wrong name for some input/output types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290771 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-30 22:40:32 +00:00
Simon Pilgrim	373eadc326	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290267 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-21 20:00:10 +00:00
Sanjay Patel	b10a96eb1c	[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289837 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-15 18:03:38 +00:00
Simon Pilgrim	dcc9618afe	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289426 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-12 10:49:15 +00:00
Michael Kuperstein	1ac52953b4	[DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputs This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283480 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-06 18:58:24 +00:00
Craig Topper	22cd3ed2a2	[AVX512] Add ExeDomain to vector extend and truncate instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276394 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-22 05:46:44 +00:00
Craig Topper	f876acdcd8	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276393 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-22 05:00:52 +00:00
Craig Topper	fefffbf697	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275769 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-18 06:14:54 +00:00
Simon Pilgrim	be745b9c8c	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275543 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-15 09:49:12 +00:00
Matthias Braun	79519fecc3	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274952 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-09 00:19:07 +00:00
Simon Pilgrim	75aa6fae3a	[X86][SSE] Added truncated vector arithmetic tests. For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status. Example bugs: PR14666, PR22703 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263384 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-13 19:08:01 +00:00

38 Commits