archived-llvm

mirror of https://github.com/RPCS3/llvm.git synced 2026-01-31 01:25:19 +01:00

Author	SHA1	Message	Date
Benjamin Kramer	63e6891819	[AMDGPU] Clean up symbols in the global namespace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317051 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-31 23:21:30 +00:00
Yaxun Liu	091c043b90	[AMDGPU] Lower enqueued blocks and generate runtime metadata This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315352 91177308-0d34-0410-b5e6-96231b3b80d8	2017-10-10 19:39:48 +00:00
Stanislav Mekhanoshin	8b431a9469	[AMDGPU] Set fast-math flags on functions given the options We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314568 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-29 23:40:19 +00:00
Stanislav Mekhanoshin	fbf0e1603c	[AMDGPU] Port of HSAIL inliner Differential Revision: https://reviews.llvm.org/D36849 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@313714 91177308-0d34-0410-b5e6-96231b3b80d8	2017-09-20 04:25:58 +00:00
Stanislav Mekhanoshin	9643e6bd78	[AMDGPU] Ported and adopted AMDLibCalls pass The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310731 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-11 16:42:09 +00:00
Tom Stellard	39aad0ab08	AMDGPU: Move R600 parts of AMDGPUISelDAGToDAG into their own class Summary: This refactoring is required in order to split the R600 and GCN tablegen files. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36286 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310336 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-08 04:57:55 +00:00
Matt Arsenault	4f81bb6abb	AMDGPU: Remove FixControlFlowLiveIntervals pass This hasn't done anything in a long time. This was running after the the control flow pseudos were expanded, so this would never find them. The control flow pseudo expansion was moved to solve the problem this pass was supposed to solve in the first place, except handling it earlier also fixes it for fast regalloc which doesn't use LiveIntervals. Noticed by checking LCOV reports. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310274 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-07 18:12:47 +00:00
Connor Abbott	ecf573917a	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310087 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-04 18:36:52 +00:00
Matt Arsenault	c8c75789a0	AMDGPU: Add analysis pass for function argument info This will allow only adding necessary inputs to callee functions that need special inputs forwarded from the kernel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309996 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-03 22:30:46 +00:00
Tom Stellard	cd14d227ff	AMDGPU/R600: Initialize more passes Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36128 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309893 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-02 22:19:45 +00:00
Stanislav Mekhanoshin	5b53ac928d	[AMDGPU] Collapse adjacent SI_END_CF Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309762 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-01 23:14:32 +00:00
Matt Arsenault	acac0ef9e7	AMDGPU: Add pass to replace out arguments It is better to return arguments directly in registers if we are making a call rather than introducing expensive stack usage. In one of sample compile from one of Blender's many kernel variants, this fires on about ~20 different functions. Future improvements may be to recognize simple cases where the pointer is indexing a small array. This also fails when the store to the out argument is in a separate block from the return, which happens in a few of the Blender functions. This should also probably be using MemorySSA which might help with that. I'm not sure this is correct as a FunctionPass, but MemoryDependenceAnalysis seems to not work with a ModulePass. I'm also not sure where it should run.I think it should run before DeadArgumentElimination, so maybe either EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309416 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-28 18:40:05 +00:00
Konstantin Zhuravlyov	4c49579c51	AMDGPU: Implement memory model git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308781 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-21 21:19:23 +00:00
Matt Arsenault	a20c1d0cec	AMDGPU: Annotate call graph with used features Previously this wouldn't detect used features indirectly used in callee functions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307967 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-13 21:43:42 +00:00
Matt Arsenault	c278dccfd0	AMDGPU: Remove SITypeRewriter This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306603 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-28 21:38:50 +00:00
Matt Arsenault	7796b916f8	AMDGPU: Register AMDGPUAlwaysInline git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304574 91177308-0d34-0410-b5e6-96231b3b80d8	2017-06-02 18:02:42 +00:00
Francis Visoiu Mistrih	ae1c853358	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303360 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-18 17:21:13 +00:00
Jan Sjodin	4f10728b0c	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303111 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-15 20:18:37 +00:00
Jan Sjodin	dd98c46159	Revert 303091. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303098 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-15 18:39:47 +00:00
Jan Sjodin	9ee4b4d97c	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303091 91177308-0d34-0410-b5e6-96231b3b80d8	2017-05-15 18:13:56 +00:00
Stanislav Mekhanoshin	bb9002fbb2	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300102 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-12 20:48:56 +00:00
Kannan Narayanan	d3302ddc52	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing. Based on comments in https://reviews.llvm.org/D31161. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300023 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-12 03:25:12 +00:00
Matt Arsenault	40cf5b3d29	AMDGPU: Fix crash when disassembling VOP3 mac The unused dummy src2_modifiers is missing, so it crashes when trying to print it. I tried to fully remove src2_modifiers, but there are some irritations in the places where it is converted to mad since it starts to require modifying use lists while iterating over them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299861 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-10 17:58:06 +00:00
Yaxun Liu	e4f931e960	[AMDGPU] Temporarily change constant address space from 4 to 2 Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299690 91177308-0d34-0410-b5e6-96231b3b80d8	2017-04-06 19:17:32 +00:00
Stanislav Mekhanoshin	38e381b5a3	[AMDGPU] Add GlobalOpt parameter to Always Inliner pass If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299108 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-30 20:16:02 +00:00
Yaxun Liu	ab3be33d40	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-27 14:04:01 +00:00
Matt Arsenault	876bc45420	AMDGPU: Unify divergent function exits. StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298729 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-24 19:52:05 +00:00
Sam Kolton	f885500f47	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298365 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-21 12:51:34 +00:00
Stanislav Mekhanoshin	ee8b410dd7	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298172 91177308-0d34-0410-b5e6-96231b3b80d8	2017-03-17 23:56:58 +00:00
Matt Arsenault	83c857cd3a	AMDGPU: Merge initial gfx9 support git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295554 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-18 18:29:53 +00:00
Matt Arsenault	13384c6c8a	AMDGPU: Add pass to expand memcpy/memmove/memset git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294635 91177308-0d34-0410-b5e6-96231b3b80d8	2017-02-09 22:00:42 +00:00
Stanislav Mekhanoshin	1f3b497b08	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293300 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-27 16:38:10 +00:00
Stanislav Mekhanoshin	f4866eec22	[AMDGPU] Add VGPR copies post regalloc fix pass Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292956 91177308-0d34-0410-b5e6-96231b3b80d8	2017-01-24 17:46:17 +00:00
Stanislav Mekhanoshin	fcb33c493a	[AMDGPU] Add amdgpu-unify-metadata pass Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289092 91177308-0d34-0410-b5e6-96231b3b80d8	2016-12-08 19:46:04 +00:00
Mehdi Amini	ae5f5d3d3c	Move the global variables representing each Target behind accessor function This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283702 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-09 23:00:34 +00:00
Konstantin Zhuravlyov	49e7805871	[AMDGPU] Pass optimization level to SelectionDAGISel git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283133 91177308-0d34-0410-b5e6-96231b3b80d8	2016-10-03 18:47:26 +00:00
Matt Arsenault	0461ece2ce	AMDGPU: Partially fix control flow at -O0 Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8	2016-09-29 01:44:16 +00:00
Matt Arsenault	7517ed227a	AMDGPU: Split SILowerControlFlow into two pieces Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279464 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-22 19:33:16 +00:00
Matt Arsenault	44cd439f9a	AMDGPU: Prune includes git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278391 91177308-0d34-0410-b5e6-96231b3b80d8	2016-08-11 19:18:50 +00:00
Matt Arsenault	63be72069d	AMDGPU: Change fdiv lowering based on !fpmath metadata If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276051 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-19 23:16:53 +00:00
Matt Arsenault	4d120e9b24	AMDGPU/R600: Delete/rename intrinsics no longer used by mesa Use the replacement pass to update the tests, and delete old names. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275375 91177308-0d34-0410-b5e6-96231b3b80d8	2016-07-14 05:47:17 +00:00
Matt Arsenault	11c2d4bf28	AMDGPU: Add stub custom CodeGenPrepare pass This will do various things including ones CodeGenPrepare does, but with knowledge of uniform values. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273657 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-24 07:07:55 +00:00
Matt Arsenault	dad6f6f388	AMDGPU: Properly initialize SIShrinkInstructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272336 91177308-0d34-0410-b5e6-96231b3b80d8	2016-06-09 23:18:47 +00:00
Matt Arsenault	9d102db2c1	AMDGPU: Remove unused address space Also return a single StringRef instead of building a string. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271296 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-31 16:57:45 +00:00
Jan Vesely	a8d19bb081	AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2) Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19785 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269473 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov	2147c01e5a	[AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops Differential Revision: http://reviews.llvm.org/D20117 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269098 91177308-0d34-0410-b5e6-96231b3b80d8	2016-05-10 18:33:41 +00:00
Nicolai Haehnle	d07340545e	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266345 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-14 17:42:29 +00:00
Nicolai Haehnle	ea7a0c0467	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265589 91177308-0d34-0410-b5e6-96231b3b80d8	2016-04-06 19:40:20 +00:00
Nicolai Haehnle	f0b7f107b9	AMDGPU: Add SIWholeQuadMode pass Summary: Whole quad mode is already enabled for pixel shaders that compute derivatives, but it must be suspended for instructions that cause a shader to have side effects (i.e. stores and atomics). This pass addresses the issue by storing the real (initial) live mask in a register, masking EXEC before instructions that require exact execution and (re-)enabling WQM where required. This pass is run before register coalescing so that we can use machine SSA for analysis. The changes in this patch expose a problem with the second machine scheduling pass: target independent instructions like COPY implicitly use EXEC when they operate on VGPRs, but this fact is not encoded in the MIR. This can lead to miscompilation because instructions are moved past changes to EXEC. This patch fixes the problem by adding use-implicit operands to target independent instructions. Some general codegen passes are relaxed to work with such implicit use operands. Reviewers: arsenm, tstellarAMD, mareko Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18162 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263982 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-21 20:28:33 +00:00
Matt Arsenault	7137d0dce6	AMDGPU: R600 code splitting cleanup Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263204 91177308-0d34-0410-b5e6-96231b3b80d8	2016-03-11 08:00:27 +00:00

1 2

67 Commits