RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-21 12:56:10 +00:00

Author	SHA1	Message	Date
Matt Arsenault	d8706fcd74	MIR: Allow targets to serialize MachineFunctionInfo This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356215 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-14 22:54:43 +00:00
Matt Arsenault	6e8fb99b69	IR: Add immarg attribute This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355981 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-12 21:02:54 +00:00
Matt Arsenault	ed12070421	DAG: Don't try to cluster loads with tied inputs This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355728 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-08 20:46:15 +00:00
Dmitry Preobrazhensky	ea85a46c1e	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354969 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-27 13:12:12 +00:00
Stanislav Mekhanoshin	1a46c551c7	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354926 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-26 20:56:25 +00:00
Matt Arsenault	0410b9ebcc	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354634 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-21 23:27:46 +00:00
Stanislav Mekhanoshin	37bcd272bb	[AMDGPU] fix commuted case of sub combine Differential Revision: https://reviews.llvm.org/D58481 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354543 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-21 02:58:00 +00:00
Stanislav Mekhanoshin	2f5fd7e3bd	[AMDGPU] Ressociate 'add (add x, y), z' to use SALU Reassociate adds to collect scalar operands in a single instruction when possible. That will result in a scalar add followed by vector instead of two vector adds, thus better utilizing SALU. Differential Revision: https://reviews.llvm.org/D58220 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354066 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-14 22:11:25 +00:00
Stanislav Mekhanoshin	3071c1157b	[AMDGPU] Split dot-insts feature Differential Revision: https://reviews.llvm.org/D57971 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353587 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-09 00:34:21 +00:00
Craig Topper	e3696113b6	Implementation of asm-goto support in LLVM This patch accompanies the RFC posted here: http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html This patch adds a new CallBr IR instruction to support asm-goto inline assembly like gcc as used by the linux kernel. This instruction is both a call instruction and a terminator instruction with multiple successors. Only inline assembly usage is supported today. This also adds a new INLINEASM_BR opcode to SelectionDAG and MachineIR to represent an INLINEASM block that is also considered a terminator instruction. There will likely be more bug fixes and optimizations to follow this, but we felt it had reached a point where we would like to switch to an incremental development model. Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii Differential Revision: https://reviews.llvm.org/D53765 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353563 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-08 20:48:56 +00:00
Matt Arsenault	e6576d59f1	AMDGPU/GlobalISel: Legalize addrspacecast Use a placeholder constant for now on targets that need the load from the queue ptr. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353497 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-08 02:40:47 +00:00
Scott Linder	2221866e2e	[AMDGPU] Consider XOR in waterfall loop as a terminator Ensure the XOR in the waterfall loop for indirect addressing is considered a terminator. Differential Revision: https://reviews.llvm.org/D57703 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353207 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-05 19:50:32 +00:00
Scott Linder	6186e8e51b	[AMDGPU] Support emitting GOT relocations for function calls Differential Revision: https://reviews.llvm.org/D57416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353083 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-04 20:00:07 +00:00
Tim Corringham	3e0069dcb6	[AMDGPU] Fix for vector element insertion Summary: Incorrect code was generated when lowering insertelement operations for vectors with 8 or 16 bit elements. The value being inserted was not adjusted for the position of the element within the 32 bit word and so only the low element within each 32 bit word could receive the intended value. Fixed by simply replicating the value to each element of a congruent vector before the mask and or operation used to update the intended element. A number of affected LIT tests have been updated appropriately. before the mask & or into the intended Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: llvm-commits, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Tags: #llvm Differential Revision: https://reviews.llvm.org/D57588 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352885 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-01 16:51:09 +00:00
Matt Arsenault	6b89186f0e	AMDGPU: Add DS append/consume intrinsics Since these pass the pointer in m0 unlike other DS instructions, these need to worry about whether the address is uniform or not. This assumes the address is dynamically uniform, and just uses readfirstlane to get a copy into an SGPR. I don't know if these have the same 16-bit add for the addressing mode offset problem on SI or not, but I've just assumed they do. Also includes some misc. changes to avoid test differences between the LDS and GDS versions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352422 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-28 20:14:49 +00:00
Tim Corringham	b77f4df06f	[AMDGPU] Add intrinsics for 16 bit interpolation Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352357 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-28 13:48:59 +00:00
Matt Arsenault	1f5f9eca96	Codegen support for atomicrmw fadd/fsub git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351851 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-22 18:36:06 +00:00
Chandler Carruth	6b547686c5	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-19 08:50:56 +00:00
Matt Arsenault	083f0ffcc0	AMDGPU: Remove llvm.SI.load.const It's taken 3 years, but now all of the old AMDGPU and SI intrinsics are finally gone git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351586 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-18 20:27:02 +00:00
Changpeng Fang	17717539b0	AMDGPU: Adjust the chain for loads writing to the HI part of a register. Summary: For these loads that write to the HI part of a register, we should chain them to the op that writes to the LO part of the register to maintain the appropriate order. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D56454 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351379 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-16 21:32:53 +00:00
Marek Olsak	73f9f91f2a	AMDGPU: Add llvm.amdgcn.ds.ordered.add & swap Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52944 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351351 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-16 15:43:53 +00:00
Marek Olsak	68fa6767b4	AMDGPU: Add a fast path for icmp.i1(src, false, NE) Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351150 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-15 02:13:18 +00:00
David Stuttard	1da08e2514	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-14 11:55:24 +00:00
Stanislav Mekhanoshin	802080d0e0	[AMDGPU] Separate feature dot-insts Differential Revision: https://reviews.llvm.org/D56524 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350793 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-10 03:25:20 +00:00
Stanislav Mekhanoshin	26652ec519	Remove check for single use in ShrinkDemandedConstant This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350684 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-09 02:24:22 +00:00
Piotr Sobczak	85979bea18	[AMDGPU] Handle OR as operand of raw load/store Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@350208 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-02 09:47:41 +00:00
Simon Pilgrim	141d07dd41	Fix unused variable warning. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348649 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 21:44:25 +00:00
Matt Arsenault	eced3599bc	AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.load git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348625 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 18:41:39 +00:00
Matt Arsenault	6ee22ea781	AMDGPU: Remove llvm.SI.tbuffer.store git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348619 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 18:03:47 +00:00
Matt Arsenault	d687922c19	AMDGPU: Remove llvm.AMDGPU.kill This is the last of the old AMDGPU intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348615 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 17:46:16 +00:00
Nicolai Haehnle	e3924b1c15	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348050 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 22:55:38 +00:00
Nicolai Haehnle	37b386de21	AMDGPU: Fix various issues around the VirtReg2Value mapping Summary: The VirtReg2Value mapping is crucial for getting consistently reliable divergence information into the SelectionDAG. This patch fixes a bunch of issues that lead to incorrect divergence info and introduces tight assertions to ensure we don't regress: 1. VirtReg2Value is generated lazily; there were some cases where a lookup was performed before all relevant virtual registers were created, leading to an out-of-sync mapping. Those cases were: - Complex code to lower formal arguments that generated CopyFromReg nodes from live-in registers (fixed by never querying the mapping for live-in registers). - Code that generates CopyToReg for formal arguments that are used outside the entry basic block (fixed by never querying the mapping for Register nodes, which don't need the divergence info anyway). 2. For complex values that are lowered to a sequence of registers, all registers must be reflected in the VirtReg2Value mapping. I am not adding any new tests, since I'm not actually aware of any bugs that these problems are causing with trunk as-is. However, I recently added a test case (in r346423) which fails when D53283 is applied without this change. Also, the new assertions should provide most of the effective test coverage. There is one test change in sdwa-peephole.ll. The underlying issue is that since the divergence info is now correct, the DAGISel will select V_OR_B32 directly instead of S_OR_B32. This leads to an extra COPY which affects the behavior of MachineLICM in a way that ends up with the S_MOV_B32 with the constant in a different basic block than the V_OR_B32, which is presumably what defeats the peephole. Reviewers: alex-t, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54340 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348049 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 22:55:29 +00:00
David Stuttard	cb049f818d	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 20:14:17 +00:00
David Stuttard	e88aff0286	Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347876 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 15:56:36 +00:00
David Stuttard	7fd87699a5	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 15:21:13 +00:00
Stanislav Mekhanoshin	c8872d113a	[AMDGPU] Disable DAG combine at -O0 Differential Revision: https://reviews.llvm.org/D54358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347659 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-27 15:13:37 +00:00
Fangrui Song	89a80b6791	[AMDGPU] Fix -Wunused-variable git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347234 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-19 17:54:27 +00:00
Stanislav Mekhanoshin	4402e81711	[AMDGPU] Convert insert_vector_elt into set of selects This allows to avoid scratch use or indirect VGPR addressing for small vectors. Differential Revision: https://reviews.llvm.org/D54606 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347231 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-19 17:39:20 +00:00
Stanislav Mekhanoshin	e9eedd7fa6	[AMDGPU] combine extractelement into several selects An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346800 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-13 21:18:21 +00:00
Nicolai Haehnle	69f971eb18	Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics" This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346364 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-07 21:53:43 +00:00
Craig Topper	40f2fec254	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346180 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-05 23:26:13 +00:00
Sylvestre Ledru	062cd21484	Fixed inclusion of M_PI fow MinGW-w64 Patch by KOLANICH git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346000 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-02 17:25:40 +00:00
Neil Henning	70c62a14e0	[AMDGPU] UBSan bug fix for r345710 UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345962 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-02 10:24:57 +00:00
Reid Kleckner	b7d45e1d88	Fix clang -Wimplicit-fallthrough warnings across llvm, NFC This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345882 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-01 19:54:45 +00:00
Neil Henning	5ab552691a	[AMDGPU] support image load/store a16 Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345710 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 10:34:48 +00:00
Matt Arsenault	3031f2125f	AMDGPU: Remove custom BUILD_VECTOR combine This was looping in a testcase and removing it now slightly improves a test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345560 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-30 01:37:59 +00:00
Matt Arsenault	c0db9a7416	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344914 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-22 16:27:27 +00:00
Nicolai Haehnle	1db6c09686	AMDGPU: Avoid selecting ds_{read,write}2_b32 on SI Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344698 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-17 15:37:48 +00:00
Nicolai Haehnle	cc436fd266	AMDGPU: Divergence-driven selection of scalar buffer load intrinsics Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344696 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-17 15:37:30 +00:00
Konstantin Zhuravlyov	b57394b3c2	AMDGPU: Rename isAmdCodeObjectV2 -> isAmdHsaOrMesa The isAmdCodeObjectV2 is a misleading name which actually checks whether the os is amdhsa or mesa. Also add a test to make sure we do not generate old kernel header for code object v3. Differential Revision: https://reviews.llvm.org/D52897 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343813 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-04 21:02:16 +00:00

1 2 3 4 5 ...

571 Commits