archived-llvm

mirror of https://github.com/RPCSX/llvm.git synced 2026-01-31 01:05:23 +01:00

Author	SHA1	Message	Date
Alexander Timofeev	23db8abf86	Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track" This reverts commit `ce06d9cb99`. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295054 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-14 14:29:05 +00:00
Wei Ding	c75c94d0eb	AMDGPU : Add trap handler support. Differential Revision: http://reviews.llvm.org/D26010 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294692 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-10 02:15:29 +00:00
Stanislav Mekhanoshin	bcd8c96d2e	[AMDGPU] Override PSet for M0 This change returns empty PSet list for M0 register. Otherwise its PSet as defined by tablegen is SReg_32. This results in incorrect register pressure calculation every time an instruction uses M0. Such uses count as SReg_32 PSet and inadequately increase pressure on SGPRs. Differential Revision: https://reviews.llvm.org/D29798 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294691 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-10 02:07:58 +00:00
Matt Arsenault	13384c6c8a	AMDGPU: Add pass to expand memcpy/memmove/memset git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294635 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 22:00:42 +00:00
Konstantin Zhuravlyov	6270090e5c	[AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement Differential Revision: https://reviews.llvm.org/D29741 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294627 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 21:33:23 +00:00
Konstantin Zhuravlyov	017228cd76	[AMDGPU] Add target information that is required by tools to metadata Differential Revision: https://reviews.llvm.org/D28760#fb670e28 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294449 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 14:05:23 +00:00
Matt Arsenault	6930aa1312	AMDGPU: Enable InferAddressSpaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294408 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-08 06:16:04 +00:00
Alexander Timofeev	ce06d9cb99	[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track lane masks. Differential revision: https://reviews.llvm.org/D29442 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294324 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-07 17:57:48 +00:00
Yaxun Liu	d14a49a05f	[AMDGPU] Lower null pointers in static variable initializer For amdgcn target Clang generates addrspacecast to represent null pointers in private and local address spaces. In LLVM codegen, the static variable initializer is lowered by virtual function AsmPrinter::lowerConstant which is target generic. Since addrspacecast is target specific, AsmPrinter::lowerConst This patch overrides AsmPrinter::lowerConstant with AMDGPUAsmPrinter::lowerConstant, which is able to lower the target-specific addrspacecast in the null pointer representation so that -1 is co Differential Revision: https://reviews.llvm.org/D29284 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294265 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-07 00:43:21 +00:00
Brendon Cahoon	f7dc192ed6	[RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUE An assert occurs when calling SlotIndexes::getInstructionIndex with a DBG_VALUE instruction because the function expects an instruction with a slot index. However, there is no slot index for a DBG_VALUE instruction. Differential Revision: https://reviews.llvm.org/D29048 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294070 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-04 00:10:22 +00:00
Matt Arsenault	ee8a0f044d	AMDGPU: Cleanup scalar_to_vector test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294038 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 20:49:48 +00:00
Matt Arsenault	6c28c24b6e	AMDGPU: Set MCAsmInfo::PointerSize git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294031 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 20:02:23 +00:00
Matt Arsenault	52b8adef73	AMDGPU: Fold fneg into fmin/fmax_legacy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293972 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 00:51:50 +00:00
Matt Arsenault	1c3956ed62	AMDGPU: Fold fneg into fminnum/fmaxnum git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293968 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-03 00:23:15 +00:00
Konstantin Zhuravlyov	ce0fa7d2ba	llvm-readobj: fix next note entry calculation and print unknown note types Differential Revision: https://reviews.llvm.org/D29131 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293964 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 23:44:49 +00:00
Matt Arsenault	aa5760346f	AMDGPU: Check if users of fneg can fold mods In multi-use cases this can save a few instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293962 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 23:21:23 +00:00
Nirav Dave	529986a15d	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293915 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 18:24:55 +00:00
Nirav Dave	99b0642f83	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293893 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 14:39:42 +00:00
Matt Arsenault	9f7e91552b	AMDGPU: Use source modifiers with f16->f32 conversions The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293857 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-02 02:27:04 +00:00
Stanislav Mekhanoshin	a1d4ee75a4	[AMDGPU] Account workgroup size in LDS occupancy limits Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293837 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-01 22:59:50 +00:00
Matt Arsenault	982ff7f443	AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293776 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-01 16:25:23 +00:00
Kyle Butt	5818a513ae	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293716 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 23:48:32 +00:00
Matt Arsenault	90657acceb	AMDGPU: Use source mods with fcanonicalize git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293654 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 17:28:40 +00:00
Tom Stellard	d925fa77b2	AMDGPU/SI: Fix inst-select-load-smrd.mir on some builds Summary: For some reason instructions are being inserted in the wrong order with some builds. I'm not sure why this is happening. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D29325 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293639 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 15:24:11 +00:00
Nicolai Haehnle	ab43652716	[DAGCombine] require UnsafeFPMath for re-association of addition Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293635 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 14:35:37 +00:00
Matt Arsenault	3b595d2304	AMDGPU: Generalize matching of v_med3_f32 I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293598 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-31 03:07:46 +00:00
Tom Stellard	f7f8a35213	Re-commit AMDGPU/GlobalISel: Add support for simple shaders Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293551 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin	aef8c41869	[AMDGPU] Internalize non-kernel symbols Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293549 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 21:05:18 +00:00
Matt Arsenault	1c86edab55	AMDGPU: Undo sub x, c -> add x, -c canonicalization This is worse if the original constant is an inline immediate. This should also be done for 64-bit adds, but requires fixing operand folding bugs first. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293540 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 19:30:24 +00:00
Matt Arsenault	f39022545d	AMDGPU: Make i32 uaddo/usubo legal git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293514 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 18:11:38 +00:00
Matt Arsenault	70c07bb572	DAG: Fold fneg into compare with constant into the constant fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293512 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 17:57:28 +00:00
Tom Stellard	78e51c03b5	Revert "AMDGPU/GlobalISel: Add support for simple shaders" This reverts commit r293503. Revert while I investigate some of the buildbot failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293509 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 17:42:41 +00:00
Tom Stellard	945c85d877	AMDGPU/GlobalISel: Add support for simple shaders Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293503 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 17:09:15 +00:00
Matt Arsenault	430953ebc8	DAG: Constant fold fp16_to_fp/fp16_to_fp This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293499 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-30 16:57:41 +00:00
Matt Arsenault	1aa15b49aa	AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands Accomplishes what r292982 was supposed to, which ended up only really making the necessary test changes. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293310 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-27 17:42:26 +00:00
Stanislav Mekhanoshin	1f3b497b08	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293300 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-27 16:38:10 +00:00
Nirav Dave	d9031ef908	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293188 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 16:46:13 +00:00
Nirav Dave	dbb7a65598	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293184 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 16:02:24 +00:00
Valery Pykhtin	9016ce4442	[AMDGPU] Fix typo in GCNSchedStrategy Differential revision: https://reviews.llvm.org/D28980 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293171 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 10:51:47 +00:00
Matt Arsenault	baa08a9804	AMDGPU: Fold fneg into round instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293127 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-26 01:25:36 +00:00
Matt Arsenault	d1c6dad551	AMDGPU: Set call_convention bit in kernel_code_t According to the documentation this is supposed to be -1 if indirect calls are not supported. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293081 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 20:21:57 +00:00
Matt Arsenault	b4bb247cca	AMDGPU: Check nsz instead of unsafe math git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293028 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 06:27:02 +00:00
Matt Arsenault	9291d3c697	DAG: Recognize no-signed-zeros-fp-math attribute clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293024 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 06:08:42 +00:00
Matt Arsenault	f337952e56	DAGCombiner: Allow negating ConstantFP after legalize git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293019 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 04:54:34 +00:00
Matt Arsenault	4dc43963ef	AMDGPU: Implement early ifcvt target hooks. Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293016 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-25 04:25:02 +00:00
Matt Arsenault	e8e3365d52	AMDGPU: Remove spurious out branches after a kill The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292985 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 22:18:39 +00:00
Matt Arsenault	d019e8638a	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292982 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 22:02:15 +00:00
Changpeng Fang	6562a033a3	AMDGPU/SI: Give up in promote alloca when a pointer may be captured. Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292966 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 19:06:28 +00:00
Stanislav Mekhanoshin	f4866eec22	[AMDGPU] Add VGPR copies post regalloc fix pass Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292956 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 17:46:17 +00:00
Wei Ding	2e3d9f4dbc	AMDGPU : Add trap handler support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292893 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 06:41:21 +00:00

1 2 3 4 5 ...

838 Commits