RPCSX/llvm - llvm - Gitea: Git with a cup of tea

RPCSX/llvm

mirror of https://github.com/RPCSX/llvm.git synced 2024-12-11 21:57:55 +00:00

Author	SHA1	Message	Date
Chuang-Yu Cheng	9585d8910f	[ppc64] Reenable sibling call optimization on ppc64 since fixed tsan library tail-call issue print-stack-trace.cc test failure of compiler-rt has been fixed by r266869 (http://reviews.llvm.org/D19148), so reenable sibling call optimization on ppc64 Reviewers: nemanjai kbarton git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267527 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-26 07:38:24 +00:00
Junmo Park	e0e3aca12f	Remove MinLatency in SchedMachineModel. NFC. Summary: We don't use MinLatency any more since r184032. Reviewers: atrick, hfinkel, mcrosier Differential Revision: http://reviews.llvm.org/D19474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267502 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-26 00:37:46 +00:00
Marcin Koscielnicki	c627cc351a	[PowerPC] [PR27387] Disallow r0 for ADD8TLS. ADD8TLS, a variant of add instruction used for initial-exec TLS, currently accepts r0 as a source register. While add itself supports r0 just fine, linker can relax it to a local-exec sequence, converting it to addi - which doesn't support r0. Differential Revision: http://reviews.llvm.org/D19193 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267388 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-25 09:24:34 +00:00
Junmo Park	5aea4d1b7f	Minor code cleanups. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267375 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-25 01:40:54 +00:00
Marcin Koscielnicki	8ac661cafb	[PowerPC] [SSP] Fix stack guard load for 32-bit. r266809 incorrectly used LD to load the stack guard, it should be LWZ. Differential Revision: http://reviews.llvm.org/D19358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267017 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-21 17:36:05 +00:00
Tim Shen	ac94d4bd34	[PPC, SSP] Support PowerPC Linux stack protection. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266809 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-19 20:14:52 +00:00
Mehdi Amini	f6071e14c5	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266595 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-18 09:17:29 +00:00
Nirav Dave	fb9467aca8	Fix typing on generated LXV2DX/STXV2DX instructions [PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266438 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-15 15:01:38 +00:00
Nemanja Ivanovic	b935913676	[PowerPC] Basic support for P9 byte comparison and count trailing zero insns This patch corresponds to review: http://reviews.llvm.org/D17850 This patch implements the following instructions: cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266228 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-13 18:51:18 +00:00
Chuang-Yu Cheng	18569e85b3	[PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0 Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046). The PPCInstrInfo::optimizeCompareInstr function could create a new use of CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of CR0 is created. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D18884 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266040 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng	3cc0d8fe8d	[PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field In the ELFv2 ABI, we are not required to save all CR fields. If only one nonvolatile CR field is clobbered, use mfocrf instead of mfcr to selectively save the field, because mfocrf has short latency compares to mfcr. Thanks Nemanja's invaluable hint! Reviewers: nemanjai tjablin hfinkel kbarton http://reviews.llvm.org/D17749 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266038 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-12 03:04:44 +00:00
Chuang-Yu Cheng	eb92f5a745	CXX_FAST_TLS calling convention: performance improvement for PPC64 This is the same change on PPC64 as r255821 on AArch64. I have even borrowed his commit message. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D17533 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265781 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-08 12:04:32 +00:00
Ehsan Amiri	6ff814c730	[PPC] Enable transformations in PPCPassConfig::addIRPasses at O2 http://reviews.llvm.org/D18562 A large number of testcases has been modified so they pass after this test. One testcase is deleted, because I realized even after undoing the original change that was committed with this testcase, the testcase still passes. So I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll) is an arbitrary change to keep it passing. Given the original intention of the testcase, and the fact that fixing it will require some time to change the testcase, we concluded that this quick change will be enough. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265683 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-07 15:30:55 +00:00
JF Bastien	b36d1a86f1	NFC: make AtomicOrdering an enum class Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265602 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 21:19:33 +00:00
Ehsan Amiri	b2bc21bb1b	[PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP http://reviews.llvm.org/D18405 When the integer value loaded is never used directly as integer we should use VSX or Floating Point Facility integer loads and avoid extra direct move git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265593 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 20:12:29 +00:00
Chuang-Yu Cheng	6b74529a35	[ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap test. http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265528 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 10:48:36 +00:00
Matthias Braun	dc2f859a3f	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265511 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 02:47:09 +00:00
Chuang-Yu Cheng	4f461cab38	[ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and add a couple of test cases. This patch also passed llvm/clang bootstrap test, and spec2006 build/run/result validation. Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617 Great thanks to Tom's (tjablin) help, he contributed a lot to this patch. Thanks Hal and Kit's invaluable opinions! Reviewers: hfinkel kbarton http://reviews.llvm.org/D16315 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265506 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 02:04:38 +00:00
Chuang-Yu Cheng	ff2190e04a	[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance This patch implement the following instructions: - addpcis subpcis - maddhd maddhdu maddld - modsw moduw modsd modud - darn - extswsli extswsli. - setb - dtstsfi dtstsfiq Total 15 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton http://reviews.llvm.org/D17885 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265505 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 01:47:02 +00:00
Chuang-Yu Cheng	c77546e444	[Power9] Implement copy-paste, msgsync, slb, and stop instructions This patch implements the following BookII and Book III instructions: - copy copy_first cp_abort paste paste. paste_last - msgsync - slbieg slbsync - stop Total 10 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265504 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 01:46:45 +00:00
Derek Schuff	9b3da26fa8	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265313 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-04 17:09:25 +00:00
Chuang-Yu Cheng	ff2f27d909	[PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the tail call branch instruction might disappear Bug Pattern: # BB#0: # %entry cmpldi 3, 0 beq- 0, .LBB0_2 # BB#1: # %exit lwz 4, 0(3) #TC_RETURNd8 LVComputationKind 0 .LBB0_2: # %cond.false mflr 0 std 0, 16(1) stdu 1, -96(1) .Ltmp0: .cfi_def_cfa_offset 96 .Ltmp1: .cfi_offset lr, 16 bl __assert_fail nop The branch instruction for tail call return is not generated, because the shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit block is not sent to emitEpilogue, that's why the branch is not generated. Thanks Kit's opinions! Reviewers: nemanjai hfinkel tjablin kbarton http://reviews.llvm.org/D17606 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265112 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-01 06:44:32 +00:00
Hal Finkel	e2d3b8ca15	[PowerPC] Add a late MI-level pass for QPX load/splat simplification Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265047 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 20:39:41 +00:00
Hans Wennborg	f864a32c13	Change eliminateCallFramePseudoInstr() to return an iterator This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265036 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 18:33:38 +00:00
Ehsan Amiri	54993bc649	[PPC] basic support for Power 9 direct move instructions http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265031 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 17:47:17 +00:00
Ulrich Weigand	9bb23c069d	[PowerPC] Correctly compute 64-bit offsets in fast isel PPCSimplifyAddress contains this code: IntegerType OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(Context) : Type::getInt64Ty(Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually always MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265023 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 15:37:06 +00:00
Nemanja Ivanovic	5a73f4bec5	[PowerPC] Basic support for P9 atomic loads and stores This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265022 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 15:26:37 +00:00
Ulrich Weigand	ca50bf5da1	[PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast isel The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct if and only if the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265020 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 14:44:50 +00:00
Hal Finkel	dc6860fc82	[PowerPC] Load two floats directly instead of using one 64-bit integer load When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264988 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-31 02:56:05 +00:00
Nirav Dave	e0d3b8510d	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264872 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-30 15:41:12 +00:00
Adam Nemet	1e5a35493b	[PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distance After the previous change, this can now be overridden centrally in the pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264807 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-29 23:45:56 +00:00
Hal Finkel	f0740433d3	[PowerPC] Refactor popcnt[dw] target features Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264690 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-29 01:36:01 +00:00
Hal Finkel	ec11ea3bec	[PowerPC] Clarify a comment in PPCTTI about vector loads This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264683 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 22:39:35 +00:00
Hal Finkel	8541eae72c	[PowerPC] On the A2, popcnt[dw] are very slow The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264600 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 17:52:08 +00:00
Chuang-Yu Cheng	005c5a33b6	[Power9] Implement new altivec instructions: bcd* series This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264568 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng	8187457176	[Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264567 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 08:34:28 +00:00
Chuang-Yu Cheng	fe3f9efc0a	[Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264565 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-28 07:38:01 +00:00
Hal Finkel	68e74bdd60	[PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264532 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-27 05:40:56 +00:00
David Majnemer	4fda227053	[PowerPC] Disable the CTR optimization in the presence of {min,max}num The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264508 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-26 09:42:31 +00:00
Chuang-Yu Cheng	4f013a7ba6	[Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264504 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-26 05:46:11 +00:00
Eric Christopher	ee2d71dfcd	Finish the incomplete 'd' inline asm constraint support for PPC by making sure we give it a register and mark it as a register constraint. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264340 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-24 21:04:52 +00:00
Nemanja Ivanovic	c4a24ae263	[PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode This patch corresponds to review: http://reviews.llvm.org/D17711 It disables direct moves on these operations in 32-bit mode since the patterns assume 64-bit registers. The final patch is slightly different from the Phabricator review as the bitcast operations needed to be disabled in 32-bit mode as well. This fixes PR26617. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264282 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-24 13:40:33 +00:00
Kyle Butt	03d7c088c2	Codegen: [PPC] Word Rotates are Zero Extending. Add Word rotates to the list of instructions that are zero extending. This allows them to be used in dot form to compare with zero. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264183 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-23 19:51:22 +00:00
Ehsan Amiri	ffbba125c0	adding another optimization opportunity to readme file git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263775 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-18 04:02:25 +00:00
Tim Shen	0ead5321f0	[PPC, FastISel] Fix ordered/unordered fcmp For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263753 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 22:27:58 +00:00
Petar Jovanovic	cd74bb6415	[PowerPC] Disable CTR loops optimization for soft float operations This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263727 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-17 17:11:33 +00:00
James Y Knight	ceb61bf42f	Tweak some atomics functions in preparation for larger changes; NFC. - Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263665 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-16 22:12:04 +00:00
Sanjay Patel	9d31cd8fef	[DAG] use !isUndef() ; NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263453 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 18:09:43 +00:00
Sanjay Patel	3e87fcf215	[DAG] use isUndef() ; NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263448 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-14 17:28:46 +00:00
Nemanja Ivanovic	6f20310e9e	Fix for PR 26378 This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263338 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-12 10:23:07 +00:00

1 2 3 4 5 ...

4712 Commits