RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-16 18:35:53 +00:00

Author	SHA1	Message	Date
Matt Arsenault	44ee20512e	AMDGPU: Add additional MIR tests for exec mask optimizations Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357091 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-27 16:58:30 +00:00
Matt Arsenault	951c9d9b26	CodeGen: Refactor regallocator command line and target selection This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356506 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-19 19:33:12 +00:00
Neil Henning	89fc4394cb	[AMDGPU] Add an experimental buffer fat pointer address space. Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356373 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-18 14:44:28 +00:00
Matt Arsenault	2066fb5fc9	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356347 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-17 21:31:35 +00:00
Matt Arsenault	d8706fcd74	MIR: Allow targets to serialize MachineFunctionInfo This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356215 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-14 22:54:43 +00:00
Aakanksha Patil	67bb6a3b71	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV) A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355574 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-07 00:54:04 +00:00
Matt Arsenault	42e25a8fd0	AMDGPU: Fix typo git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355056 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 00:52:33 +00:00
Matt Arsenault	df3568d8a9	AMDGPU: Enable function calls by default Fixes some crashes on illegal call situations which are unfortunately still valid IR. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355051 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 00:40:32 +00:00
Matt Arsenault	0410b9ebcc	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354634 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-21 23:27:46 +00:00
Valery Pykhtin	a0ecdf4bba	[AMDGPU] Enable DPP combiner pass by default. Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353691 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-11 11:15:03 +00:00
Chandler Carruth	6b547686c5	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-19 08:50:56 +00:00
David Stuttard	1da08e2514	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-14 11:55:24 +00:00
Aakanksha Patil	cc31a27f1e	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349084 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-13 21:23:12 +00:00
Aakanksha Patil	b20ee3547f	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348971 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-12 20:49:17 +00:00
Tim Corringham	c9d081f818	[AMDGPU] Add new Mode Register pass A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348754 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-10 12:06:10 +00:00
David Green	7135d8b482	[Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348585 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 12:10:23 +00:00
Valery Pykhtin	92a20faea1	[AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default Turn the combiner back off as there're failures until the issue is fixed. Differential revision: https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348487 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-06 14:20:02 +00:00
Valery Pykhtin	62a4268c77	[AMDGPU]: Turn on the DPP combiner by default Differential revision: https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348371 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-05 15:21:17 +00:00
Valery Pykhtin	d339265d52	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347993 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 14:21:56 +00:00
David Stuttard	cb049f818d	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 20:14:17 +00:00
David Stuttard	7fd87699a5	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 15:21:13 +00:00
Matt Arsenault	44412401b3	AMDGPU: Don't optimize exec masks at -O0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347573 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-26 17:02:02 +00:00
Ron Lieberman	322a8075ef	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347008 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-16 01:13:34 +00:00
Matt Arsenault	0cb12ca8f6	Allow subclassing ExternalAA This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346353 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-07 20:26:42 +00:00
Nicolai Haehnle	776a459079	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345719 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 13:27:08 +00:00
Scott Linder	92794ee479	[AMDGPU] Add a pass to promote bitcast calls AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345382 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-26 13:18:36 +00:00
Neil Henning	b461f4de29	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-08 15:49:19 +00:00
Matt Arsenault	47e2c38609	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343657 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-03 02:47:25 +00:00
Matt Arsenault	b3a02e1925	AMDGPU: Expand atomicrmw nand in IR git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-02 03:50:56 +00:00
Sameer Sahasrabuddhe	116128c1c0	[AMDGPU] restore r342722 which was reverted with r342743 [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342956 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-25 09:39:21 +00:00
Sameer Sahasrabuddhe	0ccb4cd734	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342743 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe	5b5e532790	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342722 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-21 11:26:55 +00:00
Matt Arsenault	c8c005cb52	AMDGPU: Stop forcing internalize at -O0 This doesn't really matter if clang is always emitting the visibility as hidden by default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341168 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-31 06:02:36 +00:00
Matt Arsenault	7e212e4168	AMDGPU: Remove remnants of old address space mapping git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8	2018-08-31 05:49:54 +00:00
Mark Searles	249b255c78	run post-RA hazard recognizer pass late Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337154 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-16 10:02:41 +00:00
Tom Stellard	1d6fd076a3	AMDGPU: Refactor Subtarget classes Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-11 20:59:01 +00:00
Matt Arsenault	e07c9538b5	Reapply "AMDGPU: Force inlining if LDS global address is used" This reverts commit r336623 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336675 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-10 14:03:41 +00:00
Vlad Tsyrklevich	3bda4ed0ec	Revert "AMDGPU: Force inlining if LDS global address is used" This reverts commit r336587, it was causing test failures on the sanitizer bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336623 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-10 00:46:07 +00:00
Matt Arsenault	12d30e1e27	AMDGPU: Force inlining if LDS global address is used These won't work for the forseeable future. These aren't allowed from OpenCL, but IPO optimizations can make them appear. Also directly set the attributes on functions, regardless of the linkage rather than cloning functions like before. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336587 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-09 19:22:22 +00:00
Stanislav Mekhanoshin	0e1a98e255	[AMDGPU] Enable LICM in the BE pipeline This allows to hoist code portion to compute reciprocal of loop invariant denominator in integer division after codegen prepare expansion. Differential Revision: https://reviews.llvm.org/D48604 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335988 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-29 16:26:53 +00:00
Matt Arsenault	a2ba13d731	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335650 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-26 19:10:00 +00:00
Stanislav Mekhanoshin	e5240df77a	[AMDGPU] Construct memory clauses before RA Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-31 20:13:51 +00:00
Tom Stellard	e0c801c31a	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333605 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-30 22:55:35 +00:00
Matt Arsenault	c40f49e881	AMDGPU: Fix typo in option description git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333457 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-29 19:35:46 +00:00
Mark Searles	d900d78107	[AMDGPU][Waitcnt] Remove obsolete waitcnt option With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333303 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-25 20:24:08 +00:00
Matt Arsenault	cdbb0ae52b	AMDGPU: Add pass to optimize reqd_work_group_size Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332771 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-18 21:35:00 +00:00
Matt Arsenault	0d44e5b362	AMDGPU: Rename OpenCL lowering pass to be R600 specific. This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332196 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-13 10:04:48 +00:00
Mark Searles	ac95aa5361	[AMDGPU][Waitcnt] Remove the old waitcnt pass Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained and getting crufty Differential Revision: https://reviews.llvm.org/D46448 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331641 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-07 14:43:28 +00:00
Adrian Prantl	26b584c691	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-01 15:54:18 +00:00
Tom Stellard	70b7270658	AMDGPU: Initialize GlobalISel passes Summary: This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU target without any other targets that use GlobalISel. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D45353 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329588 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-09 16:09:13 +00:00

1 2 3 4 5

231 Commits