archived-llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2026-01-31 01:35:20 +01:00

Author	SHA1	Message	Date
Matt Arsenault	0a3745b2a6	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Matt Arsenault	332360a091	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Matt Arsenault	40dcb3af13	AMDGPU: Don't legalize i16 extloads to i32 with legal i16 Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699	2017-09-07 05:37:34 +00:00
Matt Arsenault	70c0f608ef	AMDGPU: Select clamp pattern with v2f16 llvm-svn: 312087	2017-08-30 01:20:17 +00:00
Matt Arsenault	09562e957d	AMDGPU: Start adding tail call support Handle the sibling call cases. llvm-svn: 310753	2017-08-11 20:42:08 +00:00
Connor Abbott	c83a4aedcc	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	547b308884	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Matt Arsenault	7e30f0bba6	AMDGPU: Remove pointless asserts llvm-svn: 310007	2017-08-04 00:00:13 +00:00
Matt Arsenault	4c3807ce80	AMDGPU: Don't use report_fatal_error for unsupported call types llvm-svn: 310004	2017-08-03 23:32:41 +00:00
Matt Arsenault	0dd1f06c83	AMDGPU: Remove error on calls for amdgcn Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. llvm-svn: 310003	2017-08-03 23:24:05 +00:00
Matt Arsenault	fad81f3bb7	AMDGPU: Fix implicitarg.ptr handling special inputs llvm-svn: 310002	2017-08-03 23:12:44 +00:00
Matt Arsenault	50852a087b	AMDGPU: Pass special input registers to functions llvm-svn: 309998	2017-08-03 23:00:29 +00:00
Matt Arsenault	b1b7bf72ff	AMDGPU: Analyze callee resource usage in AsmPrinter llvm-svn: 309781	2017-08-02 01:31:28 +00:00
Matt Arsenault	a2c6c90d9f	AMDGPU: Don't place arguments in emergency stack slot When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776	2017-08-02 00:59:51 +00:00
Matt Arsenault	d374291e12	AMDGPU: Fix handling of div_scale with undef inputs The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743	2017-08-01 20:49:41 +00:00
Matt Arsenault	508f6988b6	AMDGPU: Initial implementation of calls Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732	2017-08-01 19:54:18 +00:00
Matt Arsenault	00e223b7a3	AMDGPU: Teach isLegalAddressingMode about global_* instructions Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471	2017-07-29 01:12:31 +00:00
Matt Arsenault	2442068eec	AMDGPU: Annotate implicitarg.ptr usage We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398	2017-07-28 15:52:08 +00:00
Zvi Rackover	422bf896b2	TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085	2017-07-26 08:06:58 +00:00
Jonas Paulsson	c38a4eb7d4	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Matt Arsenault	b8eb99e870	AMDGPU: Figure out private memory regs after lowering Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325	2017-07-18 16:44:56 +00:00
Matt Arsenault	bd0ea56e67	AMDGPU: Return correct type during argument lowering The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082	2017-07-15 05:52:59 +00:00
Stanislav Mekhanoshin	9046f7bb15	[AMDGPU] fcaninicalize optimization for GFX9+ Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976	2017-07-13 23:59:15 +00:00
Stanislav Mekhanoshin	efe34aaf12	[AMDGPU] fcanonicalize elimination optimization We are using multiplication by 1.0 to flush denormals and quiet sNaNs. That is possible to omit this multiplication if source of the fcanonicalize instruction is known to be flushed/quieted, i.e. if it comes from another instruction known to do the normalization and we are using IEEE mode to quiet sNaNs. Differential Revision: https://reviews.llvm.org/D35218 llvm-svn: 307848	2017-07-12 21:20:28 +00:00
Nirav Dave	c55f8bfffc	Add DAG argument to canMergeStoresTo NFC. llvm-svn: 307583	2017-07-10 20:25:54 +00:00
Simon Pilgrim	7f479d6ac7	[AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI. llvm-svn: 307485	2017-07-08 19:50:03 +00:00
Stanislav Mekhanoshin	c39b2b753b	[AMDGPU] Always use rcp + mul with fast math Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308	2017-07-06 20:34:21 +00:00
Craig Topper	d8ebaac997	[Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292	2017-07-06 18:39:47 +00:00
Stanislav Mekhanoshin	9d3604967f	[AMDGPU] Simplify setcc (sext from i1 b), -1\|0, cc Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446	2017-06-27 18:53:03 +00:00
Stanislav Mekhanoshin	f85be265f1	[AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0 Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439	2017-06-27 18:25:26 +00:00
Matt Arsenault	06d9b51c75	AMDGPU: Whitespace fixes llvm-svn: 306265	2017-06-26 03:01:36 +00:00
Matt Arsenault	9b6d8d16dd	AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264	2017-06-26 03:01:31 +00:00
David Stuttard	acb1b94192	[AMDGPU] Add intrinsics for tbuffer load and store - build error fix Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034	2017-06-22 17:15:49 +00:00
David Stuttard	a19aa0bc03	[AMDGPU] Add intrinsics for tbuffer load and store Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031	2017-06-22 16:29:22 +00:00
Stanislav Mekhanoshin	9d9d25f071	[AMDGPU] Add FP_CLASS to the add/setcc combine This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970	2017-06-21 23:46:22 +00:00
Stanislav Mekhanoshin	10582312cd	[AMDGPU] Combine add and adde, sub and sube If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964	2017-06-21 22:30:01 +00:00
Stanislav Mekhanoshin	ca766419df	[AMDGPU] simplify add x, *ext (setcc) => addc\|subb x, 0, setcc This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962	2017-06-21 22:05:06 +00:00
Matt Arsenault	1b9121f71c	AMDGPU: Cleanup CreateLiveInRegister llvm-svn: 305748	2017-06-19 21:52:45 +00:00
Matt Arsenault	cf4059efeb	AMDGPU: Teach isLegalAddressingMode about flat offsets Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203	2017-06-12 17:06:35 +00:00
Chandler Carruth	eb66b33867	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Mandeep Singh Grang	efd068d7d5	[llvm] Remove double semicolons Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767	2017-06-06 05:08:36 +00:00
Alexander Timofeev	b673edd025	AMDGPUAnnotateUniformValue should always treat volatile loads as divergent llvm-svn: 304554	2017-06-02 15:25:52 +00:00
Nirav Dave	0855b10b8c	[AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI. Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767	2017-05-24 15:59:09 +00:00
Stanislav Mekhanoshin	6d04b87725	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Matt Arsenault	ab4fb8ba2f	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Matt Arsenault	13c6f6f3f1	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Matt Arsenault	b41d61b11a	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Davide Italiano	2059f769c1	[AMDGPU] Placate unused variable warning in release builds. llvm-svn: 302821	2017-05-11 19:58:52 +00:00
Matt Arsenault	fd9981d4ab	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Marek Olsak	372ac26def	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00

1 2 3 4 5 ...

353 Commits