archived-llvm

mirror of https://github.com/RPCSX/llvm.git synced 2026-01-31 01:05:23 +01:00

Author	SHA1	Message	Date
Stanislav Mekhanoshin	64620b1c31	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow Differential Revision: https://reviews.llvm.org/D26939 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287608 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-22 01:42:34 +00:00
Matt Arsenault	8b1782955d	DAG: Ignore call site attributes when emitting target intrinsic A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287593 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-21 22:56:42 +00:00
Konstantin Zhuravlyov	5527d64b74	[AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input Differential Revision: https://reviews.llvm.org/D26862 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287389 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-18 22:31:08 +00:00
Tom Stellard	a006842d47	AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287342 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-18 13:53:34 +00:00
Nicolai Haehnle	1e27a618c6	AMDGPU: Fix legalization of MUBUF instructions in shaders Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287339 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-18 11:55:52 +00:00
Matt Arsenault	cbafc5829d	AMDGPU: Fix crash on illegal type for inlineasm There are still crashes on non-MVT types in other places. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287310 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-18 04:42:57 +00:00
Konstantin Zhuravlyov	1d609512ed	Revert "AMDGPU: Enable ConstrainCopy DAG mutation" This reverts commit r287146. This breaks few conformance tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287233 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-17 16:41:49 +00:00
Konstantin Zhuravlyov	44a5962ac5	[AMDGPU] Add missing test for rL287203 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287204 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov	8fd8772fcd	[AMDGPU] Promote f16/i16 conversions to f32/i32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287201 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov	54556b5c81	[AMDGPU] Expand `br_cc` for f16 Differential Revision: https://reviews.llvm.org/D26732 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287199 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-17 03:49:01 +00:00
Matt Arsenault	4fbd908949	AMDGPU: Enable ConstrainCopy DAG mutation This fixes a probably unintended divergence from the default scheduler behavior. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287146 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-16 20:35:23 +00:00
Tom Stellard	ae5ecee3ec	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287131 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-16 18:42:17 +00:00
Konstantin Zhuravlyov	7f4d4fdf3f	[AMDGPU] Handle f16 select{_cc} - Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287074 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-16 03:16:26 +00:00
Jan Vesely	ea73d16588	AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287056 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 23:55:15 +00:00
Tom Stellard	1d353ca419	AMDGPU/SI: Fix pattern for i16 = sign_extend i1 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287035 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 21:25:56 +00:00
Matt Arsenault	7036e8dad1	AMDGPU: Enable store clustering Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287021 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 20:22:55 +00:00
Matt Arsenault	339cf19d8c	AMDGPU: Analyze mubuf with immediate soffset Fixes giving up on clustering common addr64 accesses with constant 0 soffset. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287018 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin	79f84ef39d	[AMDGPU] Add wave barrier builtin The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287007 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 19:00:15 +00:00
Matt Arsenault	9de96caccf	AMDGPU: Fix f16 fabs/fneg git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286931 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 02:25:28 +00:00
Matt Arsenault	856f36957c	AMDGPU: Fix formatting of 1/2pi immediate git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286912 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-15 00:04:33 +00:00
Changpeng Fang	af76145600	AMDGPU/SI: Support data types other than V4f32 in image intrinsics Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286860 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-14 18:33:18 +00:00
Matt Arsenault	4404d0d6e3	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286766 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov	9027123253	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286753 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-13 07:01:11 +00:00
Tom Stellard	da5a5c79fa	AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286687 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-12 00:19:11 +00:00
Tom Stellard	6687aabcf5	AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286676 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-11 23:35:42 +00:00
Matthias Braun	ee5205bfae	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps() addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286544 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-11 01:34:21 +00:00
Stanislav Mekhanoshin	a0c045c407	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286530 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-11 00:22:34 +00:00
Yaxun Liu	a2ee7d2991	AMDGPU: Emit runtime metadata as a note element in .note section Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286502 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-10 21:18:49 +00:00
Tom Stellard	0deee390af	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286464 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-10 16:02:37 +00:00
Stanislav Mekhanoshin	c312996e7a	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286171 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-07 23:04:50 +00:00
Matt Arsenault	f577de357a	AMDGPU: Remove unnecessary and on conditional branch The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286134 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-07 19:09:33 +00:00
Tom Stellard	ae152bde49	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285995 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-04 13:06:34 +00:00
Tom Stellard	7c173dd5fa	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285939 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-03 17:13:50 +00:00
Alexander Timofeev	d767830f39	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285919 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-03 14:37:13 +00:00
Matt Arsenault	0786e7941b	AMDGPU: Cleanup some xfailed tests Some of these are already fixed or tested somewhere else. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285840 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-02 17:24:54 +00:00
Matt Arsenault	0a892bb4c2	BranchRelaxation: Fix computing indirect branch block size git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285828 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-02 16:18:29 +00:00
Matt Arsenault	f8e2c0ab06	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285765 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-01 23:14:20 +00:00
Matt Arsenault	e9a23c4c57	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285762 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-01 22:55:07 +00:00
Konstantin Zhuravlyov	a2791e7e20	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285716 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-01 17:49:33 +00:00
Tom Stellard	2b70009a94	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64 I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285704 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-01 16:31:48 +00:00
Valery Pykhtin	a66e032afa	[AMDGPU] Expand vector mulhu/mulhs Differential revision: https://reviews.llvm.org/D26077 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285684 91177308-0d34-0410-b5e6-96231b3b80d8	2016-11-01 10:26:48 +00:00
Matt Arsenault	ac5efca3f0	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285490 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-29 04:05:06 +00:00
Matt Arsenault	d6028cdcc7	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285463 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 21:55:15 +00:00
Matt Arsenault	0e18bbf16a	AMDGPU: Change check prefix in test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285449 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 20:33:01 +00:00
Matt Arsenault	6cabc8f486	AMDGPU: Diagnose using too many SGPRs This is possible when using inline asm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285447 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 20:31:47 +00:00
Matt Arsenault	2d7bc6b1e1	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285435 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-28 19:43:31 +00:00
Nicolai Haehnle	ce0a5230f5	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285273 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-27 08:15:07 +00:00
Yaxun Liu	c93e472923	AMDGPU: Refactor processor definition to use ISA version features Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285210 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-26 16:37:56 +00:00
Matt Arsenault	f63894ba9e	Reapply "AMDGPU: Don't use offen if it is 0" This reverts r283003 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285203 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-26 15:08:16 +00:00
Tom Stellard	1a633d108a	AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285198 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-26 14:38:47 +00:00

1 2 3 4 5 ...

676 Commits