archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Matt Arsenault	ab2c9f7c79	AMDGPU: Stop adding m0 implicit def to SGPR spills r375293 removed the SGPR spilling with scalar stores path, so this is no longer necessary. This also always had the defect of adding the def even when this path wasn't in use. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375448 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 19:42:29 +00:00
Stanislav Mekhanoshin	6e6c5e79c9	[AMDGPU] Select AGPR in PHI operand legalization If a PHI defines AGPR legalize its operands to AGPR. At the moment we can get an AGPR PHI with VGPR operands. I am not aware of any problems as it seems to be handled gracefully in RA, but this is not right anyway. It also slightly decreases VGPR pressure in some cases because we do not have to a copy via VGPR. Differential Revision: https://reviews.llvm.org/D69206 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375446 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-21 19:25:27 +00:00
Matt Arsenault	f154896069	AMDGPU: Split flat offsets that don't fit in DAG We handle it this way for some other address spaces. Since r349196, SILoadStoreOptimizer has been trying to do this. This is after SIFoldOperands runs, which can change the addressing patterns. It's simpler to just split this earlier. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375366 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-20 17:34:44 +00:00
Matt Arsenault	7d97468a23	AMDGPU: Relax 32-bit SGPR register class Mostly use SReg_32 instead of SReg_32_XM0 for arbitrary values. This will allow the register coalescer to do a better job eliminating copies to m0. For GlobalISel, as a terrible hack, use SGPR_32 for things that should use SCC until booleans are solved. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375267 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-18 18:26:37 +00:00
David Stuttard	9cb56c603f	[AMDGPU] Fix-up cases where writelane has 2 SGPR operands Summary: Even though writelane doesn't have the same constraints as other valu instructions it still can't violate the >1 SGPR operand constraint Due to later register propagation (e.g. fixing up vgpr operands via readfirstlane) changing writelane to only have a single SGPR is tricky. This implementation puts a new check after SIFixSGPRCopies that prevents multiple SGPRs being used in any writelane instructions. The algorithm used is to check for trivial copy prop of suitable constants into one of the SGPR operands and perform that if possible. If this isn't possible put an explicit copy of Src1 SGPR into M0 and use that instead (this is allowable for writelane as the constraint is for SGPR read-port and not constant-bus access). Reviewers: rampitec, tpr, arsenm, nhaehnle Reviewed By: rampitec, arsenm, nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, mgorny, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D51932 Change-Id: Ic7553fa57440f208d4dbc4794fc24345d7e0e9ea git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@375004 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-16 14:37:39 +00:00
Austin Kerbow	1f3d0040fc	AMDGPU: Fix infinite searches in SIFixSGPRCopies Summary: Two conditions could lead to infinite loops when processing PHI nodes in SIFixSGPRCopies. The first condition involves a REG_SEQUENCE that uses registers defined by both a PHI and a COPY. The second condition arises when a physical register is copied to a virtual register which is then used in a PHI node. If the same virtual register is copied to the same physical register, the result is an endless loop. %0:sgpr_64 = COPY $sgpr0_sgpr1 %2 = PHI %0, %bb.0, %1, %bb.1 $sgpr0_sgpr1 = COPY %0 Reviewers: alex-t, rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68970 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374944 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-15 19:59:45 +00:00
Stanislav Mekhanoshin	e7c9c5bebd	[AMDGPU] Support mov dpp with 64 bit operands We define mov/update dpp intrinsics as overloaded but do not support i64, which is a practically useful type. Fix the selection and lowering. Differential Revision: https://reviews.llvm.org/D68673 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374910 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-15 16:41:15 +00:00
Alexander Timofeev	775d402bf1	[AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.' Detailed description: After https://reviews.llvm.org/D59990 submit several issues were discovered. Changes in common code were preserved but AMDGPU specific part was reverted to keep the backend working correctly. Discovered issues were addressed in the following commits: https://reviews.llvm.org/D67662 https://reviews.llvm.org/D67101 https://reviews.llvm.org/D63953 https://reviews.llvm.org/D63731 This change brings back AMDGPU specific changes. Reviewed by: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D68635 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374767 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-14 12:01:10 +00:00
Matt Arsenault	45cd9c275c	AMDGPU: Use SGPR_128 instead of SReg_128 for vregs SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@374284 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-10 07:11:33 +00:00
Matt Arsenault	b4bef9ad2f	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373937 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-07 18:43:29 +00:00
Stanislav Mekhanoshin	60eeded1a4	[AMDGPU] Fix illegal agpr use by VALU When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373544 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-02 23:23:46 +00:00
Piotr Sobczak	8db4d72fa5	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373491 91177308-0d34-0410-b5e6-96231b3b80d8	2019-10-02 17:22:36 +00:00
Changpeng Fang	b07705b701	Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjoint Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373024 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-26 22:53:44 +00:00
Simon Pilgrim	e8d798abaa	[TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr& Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372882 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-25 14:55:57 +00:00
Alexander Timofeev	7ae04842e0	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372086 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-17 09:08:58 +00:00
Alexander Timofeev	f884885482	Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371873 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-13 17:37:30 +00:00
Matt Arsenault	3360813c03	AMDGPU: Inline constant when materalizing FI with add on gfx9 This was relying on the SGPR usable for the carry out clobber to also be used for the input. There was no carry out on gfx9. With no carry out clobber to worry about, so the literal can just be directly used with a VOP2 add. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371791 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-12 23:46:46 +00:00
Michael Liao	263e402d98	[AMDGPU] Fix crash in phi-elimination hook. Summary: - Pre-check in case there's just a single PHI insn. Reviewers: alex-t, rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D67451 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371649 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-11 19:55:20 +00:00
Alexander Timofeev	6235d9d3b6	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371508 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-10 10:58:57 +00:00
Matt Arsenault	3940fc1fe6	AMDGPU: Allow getMemOperandWithOffset to analyze stack accesses Report soffset as a base register if the scratch resource can be ignored. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371149 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-05 23:54:35 +00:00
Matt Arsenault	1305215be7	AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9 Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370929 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-04 17:12:57 +00:00
Matt Arsenault	0013bc0cab	AMDGPU: Don't use frame virtual registers SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370281 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-29 01:13:47 +00:00
Stanislav Mekhanoshin	e86bd0e33f	[AMDGPU] w/a for gfx908 mfma SrcC literal HW bug gfx908 ignores an mfma if SrcC is a literal. Differential Revision: https://reviews.llvm.org/D66670 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369818 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-23 22:22:29 +00:00
Alexander Timofeev	7822759c5f	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369532 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-21 15:15:04 +00:00
Matt Arsenault	31b0c418f8	Revert "AMDGPU: Fix iterator error when lowering SI_END_CF" This reverts r367500 and r369203. This is causing various test failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369417 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-20 17:45:25 +00:00
Daniel Sanders	57a8129407	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369041 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-15 19:22:08 +00:00
Austin Kerbow	a186599dd5	Re-commit: [AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367969 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-06 02:16:11 +00:00
Dmitri Gribenko	036f7618c9	Revert "[AMDGPU] Use S_DENORM_MODE for gfx10" This reverts commit r367882. It broke the test MC/Disassembler/AMDGPU/gfx10_dasm_all.txt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367904 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-05 18:36:43 +00:00
Austin Kerbow	5e1edf01fd	[AMDGPU] Use S_DENORM_MODE for gfx10 Summary: During fdiv32 lowering use S_DENORM_MODE to select denorm mode in gfx10. Reviewers: arsenm, rampitec Reviewed By: arsenm, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65620 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367882 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-05 16:09:49 +00:00
Daniel Sanders	c7a3c5c5d1	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367633 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-01 23:27:28 +00:00
Matt Arsenault	eb43d3eca1	Reapply "AMDGPU: Split block for si_end_cf" This reverts commit r359363, reapplying r357634 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367500 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-01 01:25:27 +00:00
Jay Foad	b48f7dcd7b	[AMDGPU] Fix typo in error message git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367235 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-29 16:17:13 +00:00
Carl Ritson	64e952bdc5	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367097 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-26 09:54:12 +00:00
Matt Arsenault	2d4a72809c	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366607 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-19 19:47:30 +00:00
Matt Arsenault	b18edea81b	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366237 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-16 18:05:29 +00:00
Jay Foad	7f18dddc3d	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365910 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-12 15:59:40 +00:00
Fangrui Song	727b16e096	Delete dead stores git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365903 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-12 14:58:15 +00:00
Stanislav Mekhanoshin	cb57db0336	[AMDGPU] gfx908 agpr spilling Differential Revision: https://reviews.llvm.org/D64594 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365833 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin	6644a1885f	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365824 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-11 21:19:33 +00:00
Matt Arsenault	59762a4b80	AMDGPU: Make s34 the FP register Make the FP register callee saved. This is tricky because now the FP needs to be spilled in the prolog relative to the incoming SP register, rather than the frame register used throughout the rest of the function. I don't like how this bypassess the standard mechanism for CSR spills just to get the correct insert point. I may look for a better solution, since all CSR VGPRs may also need to have all lanes activated. Another option might be to make getFrameIndexReference change the base register if the frame index is a CSR, and then try to figure out the right insertion point in emitProlog. If there is a free VGPR lane available for SGPR spilling, try to use it for the FP. If that would require intrtoducing a new VGPR spill, try to use a free call clobbered SGPR. Only fallback to introducing a new VGPR spill as a last resort. This also doesn't attempt to handle SGPR spilling with scalar stores. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365372 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-08 19:03:38 +00:00
Nicolai Haehnle	c77d9f5b72	AMDGPU/GFX10: fix scratch resource descriptor Summary: The stride should depend on the wave size, not the hardware generation. Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be relevant. Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63808 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364788 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-01 15:43:00 +00:00
Nicolai Haehnle	90f5eff5ac	AMDGPU: Write LDS objects out as global symbols in code generation Summary: The symbols use the processor-specific SHN_AMDGPU_LDS section index introduced with a previous change. The linker is then expected to resolve relocations, which are also emitted. Initially disabled for HSA and PAL environments until they have caught up in terms of linker and runtime loader. Some notes: - The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered to a constant at compile times, which means some tests can no longer be applied. The current "solution" is a terrible hack, but the intrinsic isn't used by Mesa, so we can keep it for now. - We no longer know the full LDS size per kernel at compile time, which means that we can no longer generate a relevant error message at compile time. It would be possible to add a check for the size of individual variables, but ultimately the linker will have to perform the final check. Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275 Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61494 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364297 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-25 11:52:30 +00:00
Matt Arsenault	d544680fb3	AMDGPU: Don't clobber VCC in MUBUF addr64 emulation Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363904 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-20 00:51:28 +00:00
Matt Arsenault	d5a79b9727	AMDGPU: Consolidate some getGeneration checks This is incomplete, and ideally these would all be removed, but it's better to localize them to the subtarget first with comments about what they're for. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363902 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 23:54:58 +00:00
Matt Arsenault	bff29f6333	AMDGPU: Fix folding immediate into readfirstlane through reg_sequence The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363876 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 20:44:15 +00:00
Matt Arsenault	e1eedb6602	Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics" This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363870 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 19:55:27 +00:00
Simon Pilgrim	1ad9529ddc	Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363797 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 13:00:54 +00:00
Matt Arsenault	6a59b73682	AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics There may or may not be additional work to handle this correctly on SI/CI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363678 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-18 13:19:57 +00:00
Matt Arsenault	9f1f314a53	AMDGPU: Change API for checking for exec modification Invert the name and return value to better reflect the imprecise nature. Force passing in the DefMI, since it's known in the 2 users and could possibly fail for an arbitrary vreg. Allow specifying a specific user instruction. Scan through use instructions, instead of use operands. Add scan thresholds instead of searching infinitely. Stop using a set to track seen uses. I didn't understand this usage, or why it would not check the last use. I don't think the use list has any particular order. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363675 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-18 12:48:36 +00:00
Sander de Smalen	f4bff34d4d	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363533 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 09:13:29 +00:00

1 2 3 4 5 ...

401 Commits