archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Matt Arsenault	fadb61df65	AMDGPU: Recompute scc liveness The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312819 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-08 18:51:26 +00:00
Matt Arsenault	0bb6355f63	AMDGPU: Start selecting v_mad_mix_f32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312732 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 18:05:07 +00:00
Konstantin Zhuravlyov	3964b8bfc8	AMDGPU: Handle non-temporal loads and stores Differential Revision: https://reviews.llvm.org/D36862 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312729 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 17:14:54 +00:00
Matt Arsenault	ca22b05483	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312699 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-07 05:37:34 +00:00
Stanislav Mekhanoshin	6148c30603	[AMDGPU] Use v_pk_max_f16 for fcanonicalize Differential Revision: https://reviews.llvm.org/D37325 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312676 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin	953b70393a	[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize Differential Revision: https://reviews.llvm.org/D37522 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312660 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 18:29:51 +00:00
Stanislav Mekhanoshin	651c4efd77	[AMDGPU] Fix shouldClusterMemOps to process flat loads Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312640 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 15:31:30 +00:00
Nicolai Haehnle	adf1cb63f2	AMDGPU: Make worst-case assumption about the wait states in inline assembly Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312635 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 13:50:13 +00:00
Yaxun Liu	1e1d0b01c1	[AMDGPU] Transform __read_pipe_* and __write_pipe_* When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312598 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-06 00:30:27 +00:00
Matt Arsenault	4e0c4fb9c1	AMDGPU: Fix not accounting for tail call resource usage If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312561 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 18:36:36 +00:00
Simon Pilgrim	b474446ca8	[AMDGPU] Added extra test checks to make D19325 diff clearer git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312537 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-05 14:32:06 +00:00
Sam McCall	c7c869be7e	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312490 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-04 15:47:00 +00:00
Stanislav Mekhanoshin	99106502b7	[AMDGPU] Testcase for computeKnownBits recursion. NFC. Testcase for rL312364: [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312388 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-01 22:25:22 +00:00
Nicolai Haehnle	96b6414540	AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait states Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312337 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-01 16:56:32 +00:00
Geoff Berry	d168a77ec3	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312328 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-01 14:27:20 +00:00
Matt Arsenault	fcd77e8a04	AMDGPU: Fold clamp modifier for packed instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312297 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-31 23:53:50 +00:00
Hans Wennborg	92b6b153a4	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312178 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 22:11:37 +00:00
Geoff Berry	62c7c252f8	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312154 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 18:41:07 +00:00
Balaram Makam	b3e25cea9c	Re-land MachineInstr: Reason locally about some memory objects before going to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312126 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 14:57:12 +00:00
Stanislav Mekhanoshin	0a4c4f2a1d	[AMDGPU] Use v_max_f* for fcanonicalize If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312095 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 03:03:38 +00:00
Matt Arsenault	90ea18a51f	AMDGPU: Select clamp pattern with v2f16 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312087 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-30 01:20:17 +00:00
Stanislav Mekhanoshin	9324a77aa4	[AMDGPU] Fix regression in AMDGPULibCalls allowing native for doubles Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was allowed to produce HSAIL's nsqrt instruction. HSAIL is not here and we stick with non-existing native_sqrt(double) as a result. Add check for f64 to not return native functions and also remove handling of f64 case for fold_sqrt. Differential Revision: https://reviews.llvm.org/D37223 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311900 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-28 18:00:08 +00:00
Stanislav Mekhanoshin	f4dd1bdd9a	[AMDGPU] computeKnownBitsForTargetNode for 24 bit mul Differential Revision: https://reviews.llvm.org/D37168 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311896 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-28 16:35:37 +00:00
Matt Arsenault	1a6aed20fa	IPRA: Don't assume called function is first call operand Fixes not finding the called global for AMDGPU call pseudoinstructions, which prevented IPRA from doing much. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311637 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-24 07:55:15 +00:00
Geoff Berry	6c9f36933c	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311142 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-18 01:43:11 +00:00
Geoff Berry	d93db263e5	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311135 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 23:06:55 +00:00
Geoff Berry	a6a5be21df	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311062 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-17 04:04:11 +00:00
Geoff Berry	31db6f3bd2	[MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311038 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-16 20:50:01 +00:00
Balaram Makam	edd00a7e54	Revert "MachineInstr: Reason locally about some memory objects before going to AA." r310825 caused the clang-ppc64le-linux-lnt bot to go red (http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712) because of a test-suite failure of SingleSource/UnitTests/2003-07-09-SignedArgs This reverts commit `0028f6a872`. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311008 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-16 14:17:43 +00:00
Stanislav Mekhanoshin	2df3fafbea	[AMDGPU] Eliminate no effect instructions before s_endpgm Differential Revision: https://reviews.llvm.org/D36585 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310987 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-16 04:43:49 +00:00
Jakub Kuderski	c0f00a9516	[Dominators] Include infinite loops in PostDominatorTree Summary: This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change. What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG. This patch makes the following assumptions: - A sequence of updates should produce the same tree as a recalculating it. - Any sequence of the same updates should lead to the same tree. - Siblings and roots are unordered. The last two properties are essential to efficiently perform batch updates in the future. When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently. This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough. That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call: ``` # functions: 52283 # samples: 337609 # reverse unreachable BBs: 216022 # BBs: 247840796 Percent reverse-unreachable: 0.08716159869015269 % Max(PercRevUnreachable) in a function: 87.58620689655172 % # > 25 % samples: 471 ( 0.1395104988314885 % samples ) ... in 145 ( 0.27733680163724345 % functions ) ``` Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway. I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :). Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel Reviewed By: dberlin Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050 Differential Revision: https://reviews.llvm.org/D35851 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310940 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-15 18:14:57 +00:00
Balaram Makam	0028f6a872	MachineInstr: Reason locally about some memory objects before going to AA. This addresses a FIXME in MachineInstr::mayAlias. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310825 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-14 09:41:40 +00:00
Matt Arsenault	45424dbebb	AMDGPU: Start adding tail call support Handle the sibling call cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310753 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-11 20:42:08 +00:00
Stanislav Mekhanoshin	9643e6bd78	[AMDGPU] Ported and adopted AMDLibCalls pass The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310731 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-11 16:42:09 +00:00
Matt Arsenault	e695a23276	AMDGPU: Fix assert on n inline asm constraint git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310515 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-09 20:09:35 +00:00
Connor Abbott	8e92b2817e	[AMDGPU] Add llvm.amdgpu.update.dpp intrinsic Summary: Now that we've made all the necessary backend changes, we can add a new intrinsic which exposes the new capabilities to IR producers. Since llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we should deprecate the former. We also add tests for all the functionality that was added in previous changes, now that we can access it via an IR construct. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34718 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310399 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-08 18:52:22 +00:00
Connor Abbott	4026c85e90	[AMDGPU] Add pseudo "old" source to all DPP instructions Summary: All instructions with the DPP modifier may not write to certain lanes of the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask aren't set, so the destination register may be both defined and modified. The right way to handle this is to add a constraint that the destination register is the same as one of the inputs. We could tie the destination to the first source, but that would be too restrictive for some use-cases where we want the destination to be some other value before the instruction executes. Instead, add a fake "old" source and tie it to the destination. Effectively, the "old" source defines what value unwritten lanes will get. We'll expose this functionality to users with a new intrinsic later. Also, we want to use DPP instructions for computing derivatives, which means we need to set WQM for them. We also need to enable the entire wavefront when using DPP intrinsics to implement nonuniform subgroup reductions, since otherwise we'll get incorrect results in some cases. To accomodate this, add a new operand to all DPP instructions which will be interpreted by the SI WQM pass. This will be exposed with a new intrinsic later. We'll also add support for Whole Wavefront Mode later. I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up the test. However, I could also keep the old behavior (where lanes that aren't written are undefined) if people want it. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34716 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310283 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-07 19:10:56 +00:00
Matt Arsenault	e525b474ef	AMDGPU: Remove -mcpu=SI Leftover from before amdgcn/r600 split. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310277 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-07 18:30:35 +00:00
Matt Arsenault	2e48864110	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310258 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-07 14:58:04 +00:00
Matt Arsenault	e1f2d6cc65	IPRA: Don't crash on null getCallPreservedMask Kernels aren't callable, so they don't have a call preserved mask. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310172 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-05 07:50:18 +00:00
Connor Abbott	c300b1a6d3	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310088 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 18:36:54 +00:00
Connor Abbott	ecf573917a	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310087 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 18:36:52 +00:00
Connor Abbott	7af89e579d	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310085 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 18:36:49 +00:00
Stanislav Mekhanoshin	6069fa8531	[AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL In case if SI_KILL is in between of the SI_IF and SI_END_CF we need to preserve the bits actually flipped by if rather then restoring the original mask. Differential Revision: https://reviews.llvm.org/D36299 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310031 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 06:58:42 +00:00
Connor Abbott	1a05d247fa	[AMDGPU] Add missing hazard for DPP-after-EXEC-write Summary: Following the docs, we need at least 5 wait states between an EXEC write and an instruction that uses DPP. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34849 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310013 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 01:09:43 +00:00
Matt Arsenault	0856e7acd5	AMDGPU: Don't use report_fatal_error for unsupported call types git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310004 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:32:41 +00:00
Matt Arsenault	dfb2cfedfc	AMDGPU: Remove error on calls for amdgcn Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310003 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:24:05 +00:00
Matt Arsenault	688929ea0f	AMDGPU: Fix implicitarg.ptr handling special inputs git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310002 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:12:44 +00:00
Matt Arsenault	c60159767d	AMDGPU: Pass special input registers to functions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309998 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 23:00:29 +00:00
Connor Abbott	6615aeaf28	test commit git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309981 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 20:22:30 +00:00

1 2 3 4 5 ...

1198 Commits