RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-20 20:36:11 +00:00

Author	SHA1	Message	Date
Michael Liao	e67155b002	[AMDGPU] Add the adjusted FP as a livein register. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366223 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-16 15:57:12 +00:00
Matt Arsenault	5cf57efaf1	AMDGPU: Serialize mode from MachineFunctionInfo git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365653 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-10 16:09:26 +00:00
Matt Arsenault	d3a003ba19	AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass Add a string attribute instead of directly setting MachineFunctionInfo. This avoids trying to get the analysis in the MachineFunctionInfo in a way that doesn't work with the new pass manager. This will also avoid re-visiting the call graph for every single function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365241 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-05 20:26:13 +00:00
Matt Arsenault	bec095b916	AMDGPU: Add pass to lower SGPR spills This is split out from my patches to split register allocation into a separate SGPR and VGPR phase, and has some parts that aren't yet used (like maintaining LiveIntervals). This simplifies making the frame pointer register callee saved. As it is now, the code to determine callee saves needs to predict all the possible SGPR spills and how many callee saved VGPRs are needed. By handling this before PrologEpilogInserter, it's possible to just check the spill objects that already exist. Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365095 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-03 23:32:29 +00:00
Michael Liao	992f5f0eea	[AMDGPU] Enable serializing of argument info. Summary: - Support serialization of all arguments in machine function info. This enables fabricating MIR tests depending on argument info. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64096 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364995 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-03 02:00:21 +00:00
Alexander Timofeev	cefe9c682f	[AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside Differential Revision: https://reviews.llvm.org/D63953 Reviewers: rampitec, nhaehnle, arsenm git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364950 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-02 17:59:44 +00:00
Matt Arsenault	5b56cc85b0	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363757 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-19 00:25:39 +00:00
Stanislav Mekhanoshin	dc6cf0fe9d	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363614 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 20:42:48 +00:00
Stanislav Mekhanoshin	95dd0d84d2	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363588 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 17:57:50 +00:00
Stanislav Mekhanoshin	f1a5ef5a39	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363586 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-17 17:47:28 +00:00
Tom Stellard	9f6eb4ee41	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit 374571301dc8e9bc9fdd1d70f86015de198673bd) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363028 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-11 03:21:13 +00:00
Tom Stellard	2cda3f1ac0	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362990 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-10 22:12:56 +00:00
Richard Trieu	c77f794940	[AMDGPU] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360713 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-14 21:54:37 +00:00
Stanislav Mekhanoshin	791e0e1dab	[AMDGPU] gfx1010 GCNRegBankReassign pass Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359704 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin	1c29f9f7f5	[AMDGPU] gfx1010 GCNNSAReassign pass Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359700 91177308-0d34-0410-b5e6-96231b3b80d8	2019-05-01 16:40:49 +00:00
Amara Emerson	3d4f91d8d0	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358368 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-15 04:53:46 +00:00
Nikita Popov	cdee3b7f23	Reapply [ValueTracking] Support min/max selects in computeConstantRange() Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357870 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin	5d206b6327	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357805 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-05 20:11:32 +00:00
Neil Henning	fcc236c268	[AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure. This change incorporates an effort by Connor Abbot to change how we deal with WWM operations potentially trashing valid values in inactive lanes. Previously, the SIFixWWMLiveness pass would work out which registers were being trashed within WWM regions, and ensure that the register allocator did not have any values it was depending on resident in those registers if the WWM section would trash them. This worked perfectly well, but would cause sometimes severe register pressure when the WWM section resided before divergent control flow (or at least that is where I mostly observed it). This fix instead runs through the WWM sections and pre allocates some registers for WWM. It then reserves these registers so that the register allocator cannot use them. This results in a significant register saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just this change!). Differential Revision: https://reviews.llvm.org/D59295 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357400 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-01 15:19:52 +00:00
Matt Arsenault	44ee20512e	AMDGPU: Add additional MIR tests for exec mask optimizations Also includes one example of how this transform is unsound. This isn't verifying the copies are used in the control flow intrinisic patterns. Also add option to disable exec mask opt pass. Since this pass is unsound, it may be useful to turn it off until it is fixed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357091 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-27 16:58:30 +00:00
Matt Arsenault	951c9d9b26	CodeGen: Refactor regallocator command line and target selection This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356506 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-19 19:33:12 +00:00
Neil Henning	89fc4394cb	[AMDGPU] Add an experimental buffer fat pointer address space. Add an experimental buffer fat pointer address space that is currently unhandled in the backend. This commit reserves address space 7 as a non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer descriptor + 32-bit offset) that is heavily used in graphics workloads using the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D58957 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356373 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-18 14:44:28 +00:00
Matt Arsenault	2066fb5fc9	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356347 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-17 21:31:35 +00:00
Matt Arsenault	d8706fcd74	MIR: Allow targets to serialize MachineFunctionInfo This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356215 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-14 22:54:43 +00:00
Aakanksha Patil	67bb6a3b71	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV) A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355574 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-07 00:54:04 +00:00
Matt Arsenault	42e25a8fd0	AMDGPU: Fix typo git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355056 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 00:52:33 +00:00
Matt Arsenault	df3568d8a9	AMDGPU: Enable function calls by default Fixes some crashes on illegal call situations which are unfortunately still valid IR. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355051 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 00:40:32 +00:00
Matt Arsenault	0410b9ebcc	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354634 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-21 23:27:46 +00:00
Valery Pykhtin	a0ecdf4bba	[AMDGPU] Enable DPP combiner pass by default. Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353691 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-11 11:15:03 +00:00
Chandler Carruth	6b547686c5	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-19 08:50:56 +00:00
David Stuttard	1da08e2514	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-14 11:55:24 +00:00
Aakanksha Patil	cc31a27f1e	Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute This patch breaks RADV (and probably RadeonSI as well) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349084 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-13 21:23:12 +00:00
Aakanksha Patil	b20ee3547f	[AMDGPU] Support for "uniform-work-group-size" attribute Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348971 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-12 20:49:17 +00:00
Tim Corringham	c9d081f818	[AMDGPU] Add new Mode Register pass A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348754 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-10 12:06:10 +00:00
David Green	7135d8b482	[Targets] Add errors for tiny and kernel codemodel on targets that don't support them Adds fatal errors for any target that does not support the Tiny or Kernel codemodels by rejigging the getEffectiveCodeModel calls. Differential Revision: https://reviews.llvm.org/D50141 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348585 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-07 12:10:23 +00:00
Valery Pykhtin	92a20faea1	[AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default Turn the combiner back off as there're failures until the issue is fixed. Differential revision: https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348487 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-06 14:20:02 +00:00
Valery Pykhtin	62a4268c77	[AMDGPU]: Turn on the DPP combiner by default Differential revision: https://reviews.llvm.org/D55314 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348371 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-05 15:21:17 +00:00
Valery Pykhtin	d339265d52	[AMDGPU] Combine DPP mov with use instructions (VOP1/2/3) Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses. Differential revision: https://reviews.llvm.org/D53762 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347993 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-30 14:21:56 +00:00
David Stuttard	cb049f818d	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 20:14:17 +00:00
David Stuttard	7fd87699a5	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 15:21:13 +00:00
Matt Arsenault	44412401b3	AMDGPU: Don't optimize exec masks at -O0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347573 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-26 17:02:02 +00:00
Ron Lieberman	322a8075ef	[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD\|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347008 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-16 01:13:34 +00:00
Matt Arsenault	0cb12ca8f6	Allow subclassing ExternalAA This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346353 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-07 20:26:42 +00:00
Nicolai Haehnle	776a459079	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345719 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-31 13:27:08 +00:00
Scott Linder	92794ee479	[AMDGPU] Add a pass to promote bitcast calls AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345382 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-26 13:18:36 +00:00
Neil Henning	b461f4de29	[AMDGPU] Add an AMDGPU specific atomic optimizer. This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-08 15:49:19 +00:00
Matt Arsenault	47e2c38609	AMDGPU: Always run AMDGPUAlwaysInline Even if calls are enabled, it still needs to be run for forcing inline of functions that use LDS. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343657 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-03 02:47:25 +00:00
Matt Arsenault	b3a02e1925	AMDGPU: Expand atomicrmw nand in IR git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343559 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-02 03:50:56 +00:00
Sameer Sahasrabuddhe	116128c1c0	[AMDGPU] restore r342722 which was reverted with r342743 [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342956 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-25 09:39:21 +00:00
Sameer Sahasrabuddhe	0ccb4cd734	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342743 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-21 16:31:51 +00:00

1 2 3 4 5

250 Commits