archived-llvm

mirror of https://github.com/RPCSX/llvm.git synced 2026-01-31 01:05:23 +01:00

Author	SHA1	Message	Date
Nirav Dave	3bbf394145	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297695 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-14 00:34:14 +00:00
Matt Arsenault	32cb946c46	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297658 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-13 19:47:31 +00:00
Matt Arsenault	a8ffe4b37c	AMDGPU: Remove packf16 intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297557 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-11 05:51:16 +00:00
Matt Arsenault	dbe625a311	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: `e184e01dd7` ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297556 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin	3081264dbe	[AMDGPU] Remove getBidirectionalReasonRank This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297536 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-11 00:29:27 +00:00
Matt Arsenault	6d62c71357	DAG: Check no signed zeros instead of unsafe math attribute git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297354 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-09 01:36:39 +00:00
Matt Arsenault	f06f68a796	AMDGPU: Don't wait at end of block with a trivial successor If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297251 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-08 01:06:58 +00:00
Matt Arsenault	fc8387b8d1	AMDGPU: Constant fold rcp node When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297248 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-08 00:48:46 +00:00
Changpeng Fang	2e729706f1	AMDGPU/SI: Do not insert EndCf in an unreachable block Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297243 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-07 23:29:36 +00:00
Konstantin Zhuravlyov	58580c59ae	Revert "AMDGPU: Set MCAsmInfo::PointerSize" It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297118 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-07 04:44:33 +00:00
Jan Vesely	ec8e013baa	AMDGPU/R600: Fix ALU clause markers use detection also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297060 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-06 20:10:05 +00:00
Chandler Carruth	f970832c3b	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296862 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-03 10:02:25 +00:00
Tobias Grosser	2f35f8a7a2	Revert "AMDGPU: Re-do update for branch-relaxation test" This commit also relied on r296812, which I just reverted. We should probably apply it again, after the r296812 has been discussed and been reapplied in some variant. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296820 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 21:47:51 +00:00
Matthias Braun	0e90d42fce	LiveRegMatrix: Fix some subreg interference checks Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296722 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-02 00:35:08 +00:00
Artur Pilipenko	a4e52312e8	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296651 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 18:12:29 +00:00
Matt Arsenault	eaeac1f1aa	AMDGPU: Re-do update for branch-relaxation test Modify the test so that it is still testing something closer to what it was intended to originally. I think the original intent was to test the situation where there was a branch on execz and then unconditional branch required relaxing.With the change in r296539, there was no longer and execz branch. Change the test so that there is now an execz branch inserted. There is no longer an unconditional branch after the execz branch, so this might need to be tricked in some other way to keep that there. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296574 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-01 03:36:04 +00:00
Daniel Berlin	f5d4310169	Update AMDGPU test branch-relaxation.ll for changes after post-dom fixes git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296539 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 23:35:24 +00:00
Nirav Dave	bfdb3f2a5a	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296476 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-28 14:24:15 +00:00
Matt Arsenault	dd2186aaab	AMDGPU: Use v_med3_{f16\|i16\|u16} git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296401 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 22:40:39 +00:00
Matt Arsenault	27f4f2f4bc	AMDGPU: Support v2i16/v2f16 packed operations git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296396 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 22:15:25 +00:00
Matt Arsenault	a4e4156e12	AMDGPU: Support inlineasm for packed instructions Add packed types as legal so they may be used with inlineasm. Keep all operations expanded for now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296379 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 20:52:10 +00:00
Matt Arsenault	132ab30572	AMDGPU: Don't fold immediate if clamp/omod are set Doesn't fix any practical problems because clamp/omod are currently folded after peephole optimizer. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296375 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 20:21:31 +00:00
Matt Arsenault	dd23defd5c	AMDGPU: Fold omod into instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296372 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 19:35:42 +00:00
Matt Arsenault	29df731fe5	AMDGPU: Add f16 to shader calling conventions Mostly useful for writing tests for f16 features. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296370 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-27 19:24:47 +00:00
Nirav Dave	b89cc7e5e3	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296279 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-26 01:27:32 +00:00
Nirav Dave	32147cef64	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296252 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-25 11:43:58 +00:00
Wei Ding	5d1e915557	AMDGPU : Replace FMAD with FMA when denormals are enabled. Differential Revision: http://reviews.llvm.org/D29958 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296186 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin	fef0dbe59c	Revert "Correct register pressure calculation in presence of subregs" This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296182 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 21:56:16 +00:00
Sanjay Patel	9a9478ccb0	[DAGCombiner] add missing folds for scalar select of {-1,0,1} The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296137 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-24 17:17:33 +00:00
Stanislav Mekhanoshin	0bf4d71d50	Correct register pressure calculation in presence of subregs If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296009 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-23 20:19:44 +00:00
Jan Vesely	dae323db22	AMDGPU/SI: Fix trunc i16 pattern Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295990 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-23 16:12:21 +00:00
Matt Arsenault	32a81bbff2	AMDGPU: Add another BFE pattern This is the pattern that falls out of the instruction's definition if offset == 0. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295912 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-23 00:23:43 +00:00
Matt Arsenault	cd39b42cab	AMDGPU: Use clamp with f64 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295908 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 23:53:37 +00:00
Matt Arsenault	e184e01dd7	AMDGPU: Fold FP clamp as modifier bit The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295905 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 23:27:53 +00:00
Wei Ding	1cfed01e02	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295904 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 23:22:19 +00:00
Matt Arsenault	c2d34b5027	AMDGPU: Add replacement bfe intrinsics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295899 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 23:04:58 +00:00
Matt Arsenault	206dfa3c0d	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295891 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 22:23:32 +00:00
Matt Arsenault	c1d17d5f71	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295878 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 21:16:41 +00:00
Matt Arsenault	138d429065	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295877 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 21:05:25 +00:00
Matt Arsenault	1b020b3be5	AMDGPU: Change exp with compr bit printing git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295873 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 20:37:12 +00:00
Wei Ding	9b1c9472f5	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295871 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 20:29:22 +00:00
Wei Ding	d70493f450	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295867 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 20:05:06 +00:00
Bill Seurer	6ef315bddb	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295849 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 16:27:33 +00:00
Matt Arsenault	7d65faa5cc	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295797 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 00:27:34 +00:00
Matt Arsenault	7d9379397a	AMDGPU: Remove some uses of llvm.SI.export in tests Merge some of the old, smaller tests into more complete versions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295792 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-22 00:02:21 +00:00
Matt Arsenault	6de2a82753	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295789 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-21 23:46:04 +00:00
Matt Arsenault	aac82e218f	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295788 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-21 23:35:48 +00:00
Matt Arsenault	8a7ccd7129	AMDGPU: Remove dead declarations in tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295757 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-21 19:31:33 +00:00
Matt Arsenault	f2616d2fd3	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295754 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-21 19:27:33 +00:00
Matt Arsenault	bcb6a77aca	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295753 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-21 19:12:08 +00:00

1 2 3 4 5 ...

902 Commits